eBPF observability that replaced our $4K/month APM

Meta description: Learn how eBPF-based observability with BPF CO-RE delivers per-pod Kubernetes metrics without sidecars, slashing costs and CPU overhead vs. Istio and commercial APM tools.

Tags: kubernetes, devops, backend, cloud, architecture

TL;DR

We replaced a commercial APM tool and sidecar-based service mesh observability with an eBPF pipeline using BPF CO-RE portable probes, feeding per-pod HTTP latency histograms and TCP retransmit metrics into Prometheus/Grafana. The result: kernel-level visibility with no application code changes, a fraction of the memory footprint of Istio sidecars, and a monitoring bill that dropped from ~$4K/month to infrastructure we already owned. I want to show you what the architecture looks like and where the tradeoffs hide.

The problem with sidecars and commercial APM

Observability costs on Kubernetes hit a painful inflection point somewhere around 20-30 microservices. You’re either paying per-host APM licensing that scales linearly with your cluster, or you’re running sidecar proxies on every pod that silently eat 50-100MB of memory each.

Most teams treat the sidecar tax as unavoidable. It isn’t.

The resource tax

Metric	Istio sidecar (Envoy)	Linkerd sidecar (linkerd2-proxy)	eBPF DaemonSet agent
Memory per pod	50-100 MB	20-30 MB	0 (per-node: ~40 MB)
CPU overhead per pod	1-3% added latency	<1% added latency	Negligible (kernel-space)
Deployment model	Per-pod sidecar	Per-pod sidecar	Per-node DaemonSet
Code changes required	None (mTLS config)	None (inject)	None
Cluster of 200 pods (memory)	~10-20 GB total sidecar overhead	~4-6 GB	~600 MB (15-node cluster)

Look at the last row. Sidecar models multiply overhead by pod count. eBPF multiplies by node count. At startup scale — dozens of nodes, hundreds of pods — that difference pays for an engineer.

Building the pipeline: BPF CO-RE and portable probes

The thing that makes this whole approach viable is BPF CO-RE (Compile Once, Run Everywhere). Before CO-RE, eBPF programs needed kernel headers matched to each node’s exact kernel version. In a managed Kubernetes environment where node pools auto-update, that was a non-starter.

CO-RE uses BTF (BPF Type Format) type information embedded in modern kernels (5.8+) to relocate struct field accesses at load time. Your probe binary compiled on a CI machine runs on any BTF-enabled node without recompilation.

A simplified probe attaching to TCP connect for retransmit tracking:

SEC("tracepoint/tcp/tcp_retransmit_skb")
int trace_tcp_retransmit(struct trace_event_raw_tcp_event_sk_skb *ctx)
{
    struct sock *sk = (struct sock *)ctx->skaddr;
    u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
    u32 daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);

    struct retransmit_event evt = {
        .dport = bpf_ntohs(dport),
        .daddr = daddr,
        .timestamp = bpf_ktime_get_ns(),
    };
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
    return 0;
}

This fires in kernel space on every TCP retransmit — zero userspace overhead until the event buffer is read. We correlate the destination address to pod IPs using the Kubernetes API to label metrics per service.

Per-pod HTTP latency without a proxy

For HTTP latency histograms, we attach uprobes to the accept and read/write syscall boundaries, then parse enough of the request line in-kernel to extract the HTTP method and status code. Tools like Kepler, Pixie (now open-sourced as part of the CNCF), and Cilium’s Hubble take this approach to varying degrees.

The userspace agent running as a DaemonSet aggregates these into Prometheus histograms:

http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.05"} 14210
http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.1"} 15002

No instrumentation libraries. No language-specific agents. No application restarts. This works for Go, Rust, Python, Node — anything making syscalls, which is everything.

eBPF is not a free lunch. These are the real gaps:

No distributed tracing out of the box. eBPF sees network calls, not trace context headers. You still need OpenTelemetry SDKs or header propagation for cross-service trace IDs.
Encrypted payloads are opaque. If services use mTLS (and they should), eBPF at the socket layer sees ciphertext. You need to attach uprobes at the TLS library level (e.g., OpenSSL’s SSL_read/SSL_write), which works but breaks across library versions. We’ve been bitten by this after routine base image updates.
Kernel version floor. BTF support requires kernel 5.8+. Most managed Kubernetes offerings (GKE, EKS with AL2023, AKS) meet this today, but verify before committing.

Cost comparison

Solution	Monthly cost (50-node cluster)	What you get
Commercial APM (per-host)	$3,000-5,000+	Full tracing, dashboards, alerting, support
Istio + Prometheus/Grafana	~$0 (licensing) + sidecar CPU/mem	L7 metrics, mTLS, traffic management
eBPF + Prometheus/Grafana	~$0 (licensing) + minimal overhead	L4/L7 metrics, retransmit tracking, no sidecars

The commercial APM gives you nice UIs and support contracts. The eBPF stack gives you ownership and overhead measured in single-digit megabytes per node. For a startup watching burn rate, we picked eBPF without much debate.

What I’d do if I were starting today

Start with TCP retransmit tracking. Seriously, just this one probe. Retransmits directly correlate to user-perceived latency spikes between services, the tracepoint is stable across kernel versions, and you can deploy it in an afternoon. It was the single probe that convinced our team this approach was worth investing in.

Use BPF CO-RE from the beginning. Don’t build kernel-version-specific probes. Target BTF-enabled kernels and use libraries like libbpf or frameworks like bpf2go (Go) to compile once and distribute as a container image. You’ll thank yourself the first time a node pool upgrades underneath you.

Keep OpenTelemetry for tracing and use eBPF for metrics. They solve different problems. eBPF is great at aggregate network metrics with zero code changes; OTel is great at request-scoped distributed traces. We run both and pay for neither.

eBPF observability that replaced our $4K/month APM

eBPF observability that replaced our $4K/month APM

TL;DR

The problem with sidecars and commercial APM

The resource tax

Building the pipeline: BPF CO-RE and portable probes

Per-pod HTTP latency without a proxy

The blind spots you need to know about

Cost comparison

What I’d do if I were starting today

Related Posts

eBPF observability that replaced our $4K/month APM

Gradle Build Cache Deep Dive: Content-Addressable Storage, Remote Cache Invalidation, and the Configuration That Cut Our KMP CI Times by 65%

KV Cache Quantization for On-Device LLM Inference on Android: INT4 Attention States, Sliding Window Eviction, and the Memory Architecture That Fits a 7B Model in 4GB RAM