eBPF observability that replaced our $4K/month APM
eBPF observability that replaced our $4K/month APM
Meta description: Learn how eBPF-based observability with BPF CO-RE delivers per-pod Kubernetes metrics without sidecars, slashing costs and CPU overhead vs. Istio and commercial APM tools.
Tags: kubernetes, devops, backend, cloud, architecture
TL;DR
We replaced a commercial APM tool and sidecar-based service mesh observability with an eBPF pipeline using BPF CO-RE portable probes, feeding per-pod HTTP latency histograms and TCP retransmit metrics into Prometheus/Grafana. The result: kernel-level visibility with no application code changes, a fraction of the memory footprint of Istio sidecars, and a monitoring bill that dropped from ~$4K/month to infrastructure we already owned. I want to show you what the architecture looks like and where the tradeoffs hide.
The problem with sidecars and commercial APM
Observability costs on Kubernetes hit a painful inflection point somewhere around 20-30 microservices. You’re either paying per-host APM licensing that scales linearly with your cluster, or you’re running sidecar proxies on every pod that silently eat 50-100MB of memory each.
Most teams treat the sidecar tax as unavoidable. It isn’t.
The resource tax
| Metric | Istio sidecar (Envoy) | Linkerd sidecar (linkerd2-proxy) | eBPF DaemonSet agent |
|---|---|---|---|
| Memory per pod | 50-100 MB | 20-30 MB | 0 (per-node: ~40 MB) |
| CPU overhead per pod | 1-3% added latency | <1% added latency | Negligible (kernel-space) |
| Deployment model | Per-pod sidecar | Per-pod sidecar | Per-node DaemonSet |
| Code changes required | None (mTLS config) | None (inject) | None |
| Cluster of 200 pods (memory) | ~10-20 GB total sidecar overhead | ~4-6 GB | ~600 MB (15-node cluster) |
Look at the last row. Sidecar models multiply overhead by pod count. eBPF multiplies by node count. At startup scale — dozens of nodes, hundreds of pods — that difference pays for an engineer.
Building the pipeline: BPF CO-RE and portable probes
The thing that makes this whole approach viable is BPF CO-RE (Compile Once, Run Everywhere). Before CO-RE, eBPF programs needed kernel headers matched to each node’s exact kernel version. In a managed Kubernetes environment where node pools auto-update, that was a non-starter.
CO-RE uses BTF (BPF Type Format) type information embedded in modern kernels (5.8+) to relocate struct field accesses at load time. Your probe binary compiled on a CI machine runs on any BTF-enabled node without recompilation.
A simplified probe attaching to TCP connect for retransmit tracking:
SEC("tracepoint/tcp/tcp_retransmit_skb")
int trace_tcp_retransmit(struct trace_event_raw_tcp_event_sk_skb *ctx)
{
struct sock *sk = (struct sock *)ctx->skaddr;
u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
u32 daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
struct retransmit_event evt = {
.dport = bpf_ntohs(dport),
.daddr = daddr,
.timestamp = bpf_ktime_get_ns(),
};
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
return 0;
}
This fires in kernel space on every TCP retransmit — zero userspace overhead until the event buffer is read. We correlate the destination address to pod IPs using the Kubernetes API to label metrics per service.
Per-pod HTTP latency without a proxy
For HTTP latency histograms, we attach uprobes to the accept and read/write syscall boundaries, then parse enough of the request line in-kernel to extract the HTTP method and status code. Tools like Kepler, Pixie (now open-sourced as part of the CNCF), and Cilium’s Hubble take this approach to varying degrees.
The userspace agent running as a DaemonSet aggregates these into Prometheus histograms:
http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.05"} 14210
http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.1"} 15002
No instrumentation libraries. No language-specific agents. No application restarts. This works for Go, Rust, Python, Node — anything making syscalls, which is everything.
The blind spots you need to know about
eBPF is not a free lunch. These are the real gaps:
- No distributed tracing out of the box. eBPF sees network calls, not trace context headers. You still need OpenTelemetry SDKs or header propagation for cross-service trace IDs.
- Encrypted payloads are opaque. If services use mTLS (and they should), eBPF at the socket layer sees ciphertext. You need to attach uprobes at the TLS library level (e.g., OpenSSL’s
SSL_read/SSL_write), which works but breaks across library versions. We’ve been bitten by this after routine base image updates. - Kernel version floor. BTF support requires kernel 5.8+. Most managed Kubernetes offerings (GKE, EKS with AL2023, AKS) meet this today, but verify before committing.
Cost comparison
| Solution | Monthly cost (50-node cluster) | What you get |
|---|---|---|
| Commercial APM (per-host) | $3,000-5,000+ | Full tracing, dashboards, alerting, support |
| Istio + Prometheus/Grafana | ~$0 (licensing) + sidecar CPU/mem | L7 metrics, mTLS, traffic management |
| eBPF + Prometheus/Grafana | ~$0 (licensing) + minimal overhead | L4/L7 metrics, retransmit tracking, no sidecars |
The commercial APM gives you nice UIs and support contracts. The eBPF stack gives you ownership and overhead measured in single-digit megabytes per node. For a startup watching burn rate, we picked eBPF without much debate.
What I’d do if I were starting today
Start with TCP retransmit tracking. Seriously, just this one probe. Retransmits directly correlate to user-perceived latency spikes between services, the tracepoint is stable across kernel versions, and you can deploy it in an afternoon. It was the single probe that convinced our team this approach was worth investing in.
Use BPF CO-RE from the beginning. Don’t build kernel-version-specific probes. Target BTF-enabled kernels and use libraries like libbpf or frameworks like bpf2go (Go) to compile once and distribute as a container image. You’ll thank yourself the first time a node pool upgrades underneath you.
Keep OpenTelemetry for tracing and use eBPF for metrics. They solve different problems. eBPF is great at aggregate network metrics with zero code changes; OTel is great at request-scoped distributed traces. We run both and pay for neither.