MVP Factory
ai startup development

eBPF observability that replaced our $4K/month APM

KW
Krystian Wiewiór · · 5 min read

eBPF observability that replaced our $4K/month APM

Meta description: Learn how eBPF-based observability with BPF CO-RE delivers per-pod Kubernetes metrics without sidecars, slashing costs and CPU overhead vs. Istio and commercial APM tools.

Tags: kubernetes, devops, backend, cloud, architecture

TL;DR

We replaced a commercial APM tool and sidecar-based service mesh observability with an eBPF pipeline using BPF CO-RE portable probes, feeding per-pod HTTP latency histograms and TCP retransmit metrics into Prometheus/Grafana. The result: kernel-level visibility with no application code changes, a fraction of the memory footprint of Istio sidecars, and a monitoring bill that dropped from ~$4K/month to infrastructure we already owned. I want to show you what the architecture looks like and where the tradeoffs hide.

The problem with sidecars and commercial APM

Observability costs on Kubernetes hit a painful inflection point somewhere around 20-30 microservices. You’re either paying per-host APM licensing that scales linearly with your cluster, or you’re running sidecar proxies on every pod that silently eat 50-100MB of memory each.

Most teams treat the sidecar tax as unavoidable. It isn’t.

The resource tax

MetricIstio sidecar (Envoy)Linkerd sidecar (linkerd2-proxy)eBPF DaemonSet agent
Memory per pod50-100 MB20-30 MB0 (per-node: ~40 MB)
CPU overhead per pod1-3% added latency<1% added latencyNegligible (kernel-space)
Deployment modelPer-pod sidecarPer-pod sidecarPer-node DaemonSet
Code changes requiredNone (mTLS config)None (inject)None
Cluster of 200 pods (memory)~10-20 GB total sidecar overhead~4-6 GB~600 MB (15-node cluster)

Look at the last row. Sidecar models multiply overhead by pod count. eBPF multiplies by node count. At startup scale — dozens of nodes, hundreds of pods — that difference pays for an engineer.

Building the pipeline: BPF CO-RE and portable probes

The thing that makes this whole approach viable is BPF CO-RE (Compile Once, Run Everywhere). Before CO-RE, eBPF programs needed kernel headers matched to each node’s exact kernel version. In a managed Kubernetes environment where node pools auto-update, that was a non-starter.

CO-RE uses BTF (BPF Type Format) type information embedded in modern kernels (5.8+) to relocate struct field accesses at load time. Your probe binary compiled on a CI machine runs on any BTF-enabled node without recompilation.

A simplified probe attaching to TCP connect for retransmit tracking:

SEC("tracepoint/tcp/tcp_retransmit_skb")
int trace_tcp_retransmit(struct trace_event_raw_tcp_event_sk_skb *ctx)
{
    struct sock *sk = (struct sock *)ctx->skaddr;
    u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
    u32 daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);

    struct retransmit_event evt = {
        .dport = bpf_ntohs(dport),
        .daddr = daddr,
        .timestamp = bpf_ktime_get_ns(),
    };
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
    return 0;
}

This fires in kernel space on every TCP retransmit — zero userspace overhead until the event buffer is read. We correlate the destination address to pod IPs using the Kubernetes API to label metrics per service.

Per-pod HTTP latency without a proxy

For HTTP latency histograms, we attach uprobes to the accept and read/write syscall boundaries, then parse enough of the request line in-kernel to extract the HTTP method and status code. Tools like Kepler, Pixie (now open-sourced as part of the CNCF), and Cilium’s Hubble take this approach to varying degrees.

The userspace agent running as a DaemonSet aggregates these into Prometheus histograms:

http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.05"} 14210
http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.1"} 15002

No instrumentation libraries. No language-specific agents. No application restarts. This works for Go, Rust, Python, Node — anything making syscalls, which is everything.

The blind spots you need to know about

eBPF is not a free lunch. These are the real gaps:

  • No distributed tracing out of the box. eBPF sees network calls, not trace context headers. You still need OpenTelemetry SDKs or header propagation for cross-service trace IDs.
  • Encrypted payloads are opaque. If services use mTLS (and they should), eBPF at the socket layer sees ciphertext. You need to attach uprobes at the TLS library level (e.g., OpenSSL’s SSL_read/SSL_write), which works but breaks across library versions. We’ve been bitten by this after routine base image updates.
  • Kernel version floor. BTF support requires kernel 5.8+. Most managed Kubernetes offerings (GKE, EKS with AL2023, AKS) meet this today, but verify before committing.

Cost comparison

SolutionMonthly cost (50-node cluster)What you get
Commercial APM (per-host)$3,000-5,000+Full tracing, dashboards, alerting, support
Istio + Prometheus/Grafana~$0 (licensing) + sidecar CPU/memL7 metrics, mTLS, traffic management
eBPF + Prometheus/Grafana~$0 (licensing) + minimal overheadL4/L7 metrics, retransmit tracking, no sidecars

The commercial APM gives you nice UIs and support contracts. The eBPF stack gives you ownership and overhead measured in single-digit megabytes per node. For a startup watching burn rate, we picked eBPF without much debate.

What I’d do if I were starting today

Start with TCP retransmit tracking. Seriously, just this one probe. Retransmits directly correlate to user-perceived latency spikes between services, the tracepoint is stable across kernel versions, and you can deploy it in an afternoon. It was the single probe that convinced our team this approach was worth investing in.

Use BPF CO-RE from the beginning. Don’t build kernel-version-specific probes. Target BTF-enabled kernels and use libraries like libbpf or frameworks like bpf2go (Go) to compile once and distribute as a container image. You’ll thank yourself the first time a node pool upgrades underneath you.

Keep OpenTelemetry for tracing and use eBPF for metrics. They solve different problems. eBPF is great at aggregate network metrics with zero code changes; OTel is great at request-scoped distributed traces. We run both and pay for neither.


Share: Twitter LinkedIn