eBPF for Service Mesh Observability Without Sidecars
The Observability Tax: Deconstructing Sidecar Performance Overhead
In modern microservice architectures, the service mesh has become a standard for managing inter-service communication, security, and observability. The dominant pattern, popularized by tools like Istio and Linkerd, is the sidecar proxy. This model injects a user-space proxy (typically Envoy) into each application pod. Network traffic is then transparently redirected to this proxy using iptables rules.
While powerful, this pattern imposes a non-trivial performance tax. For senior engineers operating at scale, understanding this tax is critical:
Service A to Service B follows this path:    *   Service A (user space) -> Kernel TCP/IP Stack
    *   Kernel -> iptables PREROUTING -> Redirect to Envoy
* Envoy Proxy (user space) receives packet
* Envoy processes L7 logic (metrics, tracing, routing)
    *   Envoy (user space) -> Kernel TCP/IP Stack to send to Service B
* This entire process repeats in reverse on the destination pod.
This overhead, often termed the "observability tax," becomes a limiting factor for latency-sensitive or high-throughput services. eBPF (extended Berkeley Packet Filter) presents a revolutionary alternative: moving this logic directly into the Linux kernel, eliminating the sidecar entirely for many observability use cases.
This article will guide you through building a production-viable, sidecar-less observability agent using eBPF and Go. We will not cover eBPF basics, but rather the specific implementation patterns for solving this problem.
Architectural Blueprint: eBPF Agent vs. Sidecar Proxy
Let's visualize the fundamental shift in data flow.
Sidecar Architecture:
graph TD
    subgraph Pod A
        ServiceA[Service A Process]
        EnvoyA[Envoy Proxy]
    end
    subgraph Pod B
        ServiceB[Service B Process]
        EnvoyB[Envoy Proxy]
    end
    ServiceA -- 1. localhost TCP --> EnvoyA
    EnvoyA -- 2. Kernel TCP/IP --> EnvoyB
    EnvoyB -- 3. localhost TCP --> ServiceBPath: Service -> Kernel -> Userspace Proxy -> Kernel -> Userspace Proxy -> Kernel -> Service
eBPF Agent Architecture:
graph TD
    subgraph Node
        subgraph Pod A
            ServiceA[Service A Process]
        end
        subgraph Pod B
            ServiceB[Service B Process]
        end
        Agent[Userspace Agent]
        Kernel[Linux Kernel]
    end
    ServiceA -- 1. Kernel TCP/IP --> ServiceB
    Kernel -- eBPF Hooks --> AgentPath: Service -> Kernel -> Service. Observability data is siphoned off in-kernel to a node-level agent.
The eBPF agent runs as a DaemonSet, one per node, and uses kernel probes (kprobes, tracepoints) and traffic control hooks (TC) to inspect network traffic for all pods on that node. This is fundamentally more efficient.
Production Pattern: An L4 Observability Agent with Go and eBPF
We will now build a simplified agent that captures L4 TCP connection metadata (source/dest IP/port, bytes sent/received, duration) for all pods on a node and exposes it as Prometheus metrics.
Components:
We'll use the cilium/ebpf library in Go, which provides excellent abstractions for working with eBPF.
1. The eBPF Kernel-Space Program (`bpf_bpfel_x86.go`)
First, we write our C code. For use with cilium/ebpf, it's common to embed this in a Go file for compilation via go:generate. The code attaches kprobes to tcp_connect, tcp_close, and tcp_sendmsg/tcp_recvmsg to track the lifecycle of TCP connections.
Key Concepts Used:
*   kprobe & kretprobe: Attach to the entry and return of a kernel function, respectively.
*   bpf_get_current_pid_tgid(): Gets the process ID.
* eBPF Maps: Kernel-space key-value stores. We use:
    *   conns_info (Hash Map): Stores active connection details, keyed by a connection tuple.
    *   events (Perf Buffer): A high-performance ring buffer to send events from kernel to user space without data loss.
//go:build ignore
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
// Define a common structure for passing data between kernel and user space
struct conn_info_t {
    u64 ts_us;
    u32 pid;
    char comm[TASK_COMM_LEN];
    u32 saddr;
    u32 daddr;
    u16 sport;
    u16 dport;
    u64 tx_bytes;
    u64 rx_bytes;
};
// Perf event map to send data to user space
struct { 
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(int));
    __uint(value_size, sizeof(int));
} events SEC(".maps");
// Map to store active connection info
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, struct conn_info_t);
    __type(value, struct conn_info_t);
} conns_info SEC(".maps");
// Helper to populate the connection info struct
static __always_inline int populate_conn_info(struct conn_info_t* info, struct sock* sk) {
    // BPF_CORE_READ is a helper for reading kernel structs safely (CO-RE)
    u16 family = BPF_CORE_READ(sk, __sk_common.skc_family);
    if (family != AF_INET) {
        return 0; // We only care about IPv4 for this example
    }
    info->ts_us = bpf_ktime_get_ns() / 1000;
    u64 pid_tgid = bpf_get_current_pid_tgid();
    info->pid = pid_tgid >> 32;
    bpf_get_current_comm(&info->comm, sizeof(info->comm));
    info->saddr = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
    info->daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
    info->sport = BPF_CORE_READ(sk, __sk_common.skc_num);
    info->dport = bpf_ntohs(BPF_CORE_READ(sk, __sk_common.skc_dport));
    
    return 1;
}
// kprobe on tcp_connect to trace active connection attempts
SEC("kprobe/tcp_connect")
int BPF_KPROBE(tcp_connect, struct sock *sk)
{
    struct conn_info_t conn_info = {};
    if (!populate_conn_info(&conn_info, sk)) {
        return 0;
    }
    bpf_map_update_elem(&conns_info, &conn_info, &conn_info, BPF_ANY);
    return 0;
}
// kretprobe on tcp_connect to handle failed connections
SEC("kretprobe/tcp_connect")
int BPF_KRETPROBE(tcp_connect_ret, int ret)
{
    u64 pid_tgid = bpf_get_current_pid_tgid();
    // If connect fails, remove the entry we just added
    if (ret != 0) {
        // This is simplified. In a real scenario, you need a more robust way
        // to find the key. This requires passing state from kprobe to kretprobe.
        // We'll omit that complexity here for clarity.
    }
    return 0;
}
// kprobe on tcp_close to trace connection termination
SEC("kprobe/tcp_close")
int BPF_KPROBE(tcp_close, struct sock *sk)
{
    struct conn_info_t conn_info = {};
    if (!populate_conn_info(&conn_info, sk)) {
        return 0;
    }
    // Find the connection in the map
    struct conn_info_t* existing_conn = bpf_map_lookup_elem(&conns_info, &conn_info);
    if (existing_conn) {
        // Send the final connection data to user space
        bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, existing_conn, sizeof(*existing_conn));
        // Clean up the map
        bpf_map_delete_elem(&conns_info, &conn_info);
    }
    return 0;
}
char LICENSE[] SEC("license") = "GPL";Note: Tracking RX/TX bytes is more complex as it requires probing tcp_sendmsg/tcp_recvmsg and updating the map. For brevity, that logic is omitted but follows a similar pattern of looking up the connection and incrementing counters. 
2. The Go User-Space Agent
This Go program performs the following steps:
go generate with bpf2go to compile the C code and embed it as a Go package./metrics endpoint for Prometheus.# First, set up go generate
# You need clang, llvm, and libbpf-dev installed
go install github.com/cilium/bpf2go/cmd/bpf2go@latest
# In your project directory:
go generateHere is the main Go application (main.go):
package main
import (
	"bytes"
	"encoding/binary"
	"errors"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"github.com/cilium/ebpf/link"
	"github.com/cilium/ebpf/perf"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)
//go:generate go run github.com/cilium/bpf2go/cmd/bpf2go bpf bpf.c -- -I./headers
// Prometheus metrics
var (
	connectionsClosed = prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Name: "tcp_connections_closed_total",
			Help: "Total number of closed TCP connections",
		},
		[]string{"saddr", "daddr", "sport", "dport", "comm"},
	)
)
func main() {
	// Subscribe to signals for graceful shutdown
	stopper := make(chan os.Signal, 1)
	signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
	// Register Prometheus metrics
	prometheus.MustRegister(connectionsClosed)
	// Load pre-compiled BPF objects
	objs := bpfObjects{}
	if err := loadBpfObjects(&objs, nil); err != nil {
		log.Fatalf("loading objects: %v", err)
	}
	defer objs.Close()
	// Attach kprobes
	kpConnect, err := link.Kprobe("tcp_connect", objs.TcpConnect, nil)
	if err != nil {
		log.Fatalf("attaching tcp_connect kprobe: %s", err)
	}
	defer kpConnect.Close()
	kpClose, err := link.Kprobe("tcp_close", objs.TcpClose, nil)
	if err != nil {
		log.Fatalf("attaching tcp_close kprobe: %s", err)
	}
	defer kpClose.Close()
	// Open a perf reader from the BPF map
	rd, err := perf.NewReader(objs.Events, os.Getpagesize())
	if err != nil {
		log.Fatalf("creating perf event reader: %s", err)
	}
	defer rd.Close()
	log.Println("Agent started. Waiting for events...")
	// Start Prometheus HTTP server
	go func() {
		http.Handle("/metrics", promhttp.Handler())
		if err := http.ListenAndServe(":9091", nil); err != nil {
			log.Fatalf("failed to start metrics server: %v", err)
		}
	}()
	// Main event loop
	go func() {
		var event bpfConnInfoT
		for {
			record, err := rd.Read()
			if err != nil {
				if errors.Is(err, perf.ErrClosed) {
					return
				}
				log.Printf("reading from perf buffer: %s", err)
				continue
			}
			if record.LostSamples != 0 {
				log.Printf("perf buffer lost %d samples", record.LostSamples)
				continue
			}
			if err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event); err != nil {
				log.Printf("parsing perf event: %s", err)
				continue
			}
			// Process the event
			processEvent(event)
		}
	}()
	// Wait for a signal to exit
	<-stopper
	log.Println("Received signal, shutting down...")
}
func processEvent(event bpfConnInfoT) {
	// In a real application, you would add a cache here to resolve IPs 
	// to Kubernetes Pod/Service names by querying the K8s API server.
	// This is a critical enrichment step.
	saddr := intToIP(event.Saddr).String()
	daddr := intToIP(event.Daddr).String()
	sport := event.Sport
	dport := event.Dport
	comm := string(event.Comm[:bytes.IndexByte(event.Comm[:], 0)])
	log.Printf("TCP Close: %s:%d -> %s:%d, Comm: %s", saddr, sport, daddr, dport, comm)
	connectionsClosed.With(prometheus.Labels{
		"saddr": saddr,
		"daddr": daddr,
		"sport": fmt.Sprintf("%d", sport),
		"dport": fmt.Sprintf("%d", dport),
		"comm":  comm,
	}).Inc()
}
// Helper to convert uint32 IP to net.IP
func intToIP(ip uint32) net.IP {
	result := make(net.IP, 4)
	binary.LittleEndian.PutUint32(result, ip)
	return result
}This agent, when deployed as a DaemonSet with the necessary capabilities (CAP_SYS_ADMIN, CAP_BPF), provides node-level L4 observability with minimal overhead.
The Next Frontier: L7 Protocol Parsing with eBPF
Capturing L4 metadata is powerful, but true service mesh observability requires L7 context, such as HTTP endpoints, gRPC methods, and status codes. This is significantly more challenging in eBPF due to several factors:
* Bounded Complexity: The eBPF verifier imposes strict limits on program size, complexity, and stack usage. Full L7 parsers often exceed these limits.
* Packet State: TCP is a stream-based protocol. A single L7 message (e.g., an HTTP request) can be split across multiple TCP packets. Reassembling this stream in the kernel is a complex task, requiring state management for out-of-order packets and retransmissions.
* TLS Encryption: Most modern traffic is encrypted with TLS. eBPF programs operate at a layer below TLS decryption, meaning they only see encrypted gibberish.
There are two primary advanced patterns for tackling L7 parsing:
Pattern 1: Socket Filter / TC Packet Reassembly
This is the approach used by projects like Cilium. An eBPF program is attached to the Traffic Control (TC) layer of a network interface. This program sees every packet.
bpf_skb_load_bytes to read packet data.Edge Cases & Challenges:
* This is exceptionally complex to implement correctly.
* Handling large request/response bodies requires careful buffer management to avoid exceeding memory limits.
* It's highly inefficient for protocols that don't have clear message delimiters at the start of the stream.
Pattern 2: User-space Probes (uprobes) for TLS
To overcome TLS, we can move our probes from the kernel's networking stack to the user-space encryption libraries (like OpenSSL or Go's crypto/tls).
SSL_read and SSL_write.bpf_probe_read_user.Example Scenario: Tracing Go gRPC calls
A Go binary's gRPC calls can be traced by attaching uprobes to its TLS library functions:
   crypto/tls.(Conn).Write
   crypto/tls.(Conn).Read
The eBPF program can inspect the function arguments (passed via registers on amd64) to get a pointer to the plaintext buffer and its length. It can then parse the HTTP/2 frames to extract gRPC method and status information.
Edge Cases & Challenges:
* Fragile: This technique is tightly coupled to the specific version and implementation of the user-space library. A library update can break the probes.
* Symbol Tables: It requires binaries not to be fully stripped of their symbol tables.
* Language Specifics: Extracting arguments requires deep knowledge of the language's calling convention and memory layout (e.g., Go's stack-based argument passing).
Performance and Security Considerations in Production
Deploying eBPF at scale requires careful consideration of its operational impact.
* The Verifier is Your Enemy (and Friend): The eBPF verifier ensures kernel stability by rejecting unsafe programs. You will spend significant time appeasing it. Key constraints include:
* No unbounded loops: All loops must have a compile-time constant upper bound. This makes parsing dynamic data structures challenging.
* Stack Size Limit: The stack is limited to 512 bytes.
* Pointer Safety: All memory access is statically checked to prevent out-of-bounds reads/writes.
* Kernel Version Dependency & CO-RE: Historically, eBPF programs were brittle and broke with kernel updates. The modern solution is CO-RE (Compile Once - Run Everywhere), which uses BTF (BPF Type Format). BTF embeds debugging information about kernel types into the kernel itself, allowing the eBPF loader to perform runtime relocations. Your eBPF program can then adapt to small changes in struct layouts between kernel versions. Always use a toolchain that supports CO-RE for production systems.
   CPU Overhead: While far less than a sidecar, eBPF is not free. A program attached to tcp_sendmsg runs for every single TCP send operation on the system*. Inefficient eBPF code can introduce measurable CPU overhead. Use bpftool prog profile to measure the execution time of your programs and identify hotspots.
*   Security Context: Loading eBPF programs requires elevated privileges (CAP_BPF or CAP_SYS_ADMIN). This makes the eBPF agent a highly privileged component. Its security is paramount. Use read-only eBPF maps where possible, minimize the agent's attack surface, and consider using Linux Security Modules (LSMs) to restrict what eBPF programs can do.
Conclusion: A Paradigm Shift with a Higher Skill Floor
eBPF offers a path to highly efficient, transparent, and low-latency service mesh observability. By moving logic from sidecar proxies into the kernel, we can eliminate entire classes of performance bottlenecks and significantly reduce the resource footprint of our observability plane.
However, this power comes at the cost of complexity. Building robust eBPF-based agents requires a deep, systems-level understanding of the Linux kernel, networking, and the intricacies of the eBPF virtual machine and its verifier. It is not a replacement for sidecars in all scenarios—especially where complex L7 traffic management and policy are required—but for observability, it represents the future. For senior engineers tasked with optimizing performance at the limits, mastering eBPF is no longer optional; it's a critical tool for building the next generation of cloud-native infrastructure.