eBPF for Service Mesh Observability Without Sidecars

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Observability Tax: Deconstructing Sidecar Performance Overhead

In modern microservice architectures, the service mesh has become a standard for managing inter-service communication, security, and observability. The dominant pattern, popularized by tools like Istio and Linkerd, is the sidecar proxy. This model injects a user-space proxy (typically Envoy) into each application pod. Network traffic is then transparently redirected to this proxy using iptables rules.

While powerful, this pattern imposes a non-trivial performance tax. For senior engineers operating at scale, understanding this tax is critical:

  • Network Path Elongation: Each network packet traverses the TCP/IP stack multiple times. A request from Service A to Service B follows this path:
  • * Service A (user space) -> Kernel TCP/IP Stack

    * Kernel -> iptables PREROUTING -> Redirect to Envoy

    * Envoy Proxy (user space) receives packet

    * Envoy processes L7 logic (metrics, tracing, routing)

    * Envoy (user space) -> Kernel TCP/IP Stack to send to Service B

    * This entire process repeats in reverse on the destination pod.

  • Resource Consumption: Every pod now runs an additional, resource-intensive process. For a cluster with thousands of pods, the aggregate CPU and memory consumption of the sidecar fleet can be substantial, often measured in whole nodes.
  • Context Switching & Memory Copies: The multiple journeys between kernel space and user space for each packet introduce significant latency due to context switching and data copying between kernel buffers and user-space memory.
  • This overhead, often termed the "observability tax," becomes a limiting factor for latency-sensitive or high-throughput services. eBPF (extended Berkeley Packet Filter) presents a revolutionary alternative: moving this logic directly into the Linux kernel, eliminating the sidecar entirely for many observability use cases.

    This article will guide you through building a production-viable, sidecar-less observability agent using eBPF and Go. We will not cover eBPF basics, but rather the specific implementation patterns for solving this problem.


    Architectural Blueprint: eBPF Agent vs. Sidecar Proxy

    Let's visualize the fundamental shift in data flow.

    Sidecar Architecture:

    mermaid
    graph TD
        subgraph Pod A
            ServiceA[Service A Process]
            EnvoyA[Envoy Proxy]
        end
    
        subgraph Pod B
            ServiceB[Service B Process]
            EnvoyB[Envoy Proxy]
        end
    
        ServiceA -- 1. localhost TCP --> EnvoyA
        EnvoyA -- 2. Kernel TCP/IP --> EnvoyB
        EnvoyB -- 3. localhost TCP --> ServiceB

    Path: Service -> Kernel -> Userspace Proxy -> Kernel -> Userspace Proxy -> Kernel -> Service

    eBPF Agent Architecture:

    mermaid
    graph TD
        subgraph Node
            subgraph Pod A
                ServiceA[Service A Process]
            end
            subgraph Pod B
                ServiceB[Service B Process]
            end
            Agent[Userspace Agent]
            Kernel[Linux Kernel]
        end
    
        ServiceA -- 1. Kernel TCP/IP --> ServiceB
        Kernel -- eBPF Hooks --> Agent

    Path: Service -> Kernel -> Service. Observability data is siphoned off in-kernel to a node-level agent.

    The eBPF agent runs as a DaemonSet, one per node, and uses kernel probes (kprobes, tracepoints) and traffic control hooks (TC) to inspect network traffic for all pods on that node. This is fundamentally more efficient.


    Production Pattern: An L4 Observability Agent with Go and eBPF

    We will now build a simplified agent that captures L4 TCP connection metadata (source/dest IP/port, bytes sent/received, duration) for all pods on a node and exposes it as Prometheus metrics.

    Components:

  • eBPF Program (C): A set of C functions that will be compiled into BPF bytecode and loaded into the kernel. This program will attach to TCP-related kernel functions.
  • User-space Agent (Go): A Go application that loads and manages the eBPF program, reads data from kernel-space eBPF maps, enriches it with Kubernetes metadata, and exposes it.
  • We'll use the cilium/ebpf library in Go, which provides excellent abstractions for working with eBPF.

    1. The eBPF Kernel-Space Program (`bpf_bpfel_x86.go`)

    First, we write our C code. For use with cilium/ebpf, it's common to embed this in a Go file for compilation via go:generate. The code attaches kprobes to tcp_connect, tcp_close, and tcp_sendmsg/tcp_recvmsg to track the lifecycle of TCP connections.

    Key Concepts Used:

    * kprobe & kretprobe: Attach to the entry and return of a kernel function, respectively.

    * bpf_get_current_pid_tgid(): Gets the process ID.

    * eBPF Maps: Kernel-space key-value stores. We use:

    * conns_info (Hash Map): Stores active connection details, keyed by a connection tuple.

    * events (Perf Buffer): A high-performance ring buffer to send events from kernel to user space without data loss.

    go
    //go:build ignore
    
    #include "vmlinux.h"
    #include <bpf/bpf_helpers.h>
    #include <bpf/bpf_tracing.h>
    #include <bpf/bpf_core_read.h>
    
    // Define a common structure for passing data between kernel and user space
    struct conn_info_t {
        u64 ts_us;
        u32 pid;
        char comm[TASK_COMM_LEN];
        u32 saddr;
        u32 daddr;
        u16 sport;
        u16 dport;
        u64 tx_bytes;
        u64 rx_bytes;
    };
    
    // Perf event map to send data to user space
    struct { 
        __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
        __uint(key_size, sizeof(int));
        __uint(value_size, sizeof(int));
    } events SEC(".maps");
    
    // Map to store active connection info
    struct {
        __uint(type, BPF_MAP_TYPE_HASH);
        __uint(max_entries, 10240);
        __type(key, struct conn_info_t);
        __type(value, struct conn_info_t);
    } conns_info SEC(".maps");
    
    // Helper to populate the connection info struct
    static __always_inline int populate_conn_info(struct conn_info_t* info, struct sock* sk) {
        // BPF_CORE_READ is a helper for reading kernel structs safely (CO-RE)
        u16 family = BPF_CORE_READ(sk, __sk_common.skc_family);
        if (family != AF_INET) {
            return 0; // We only care about IPv4 for this example
        }
    
        info->ts_us = bpf_ktime_get_ns() / 1000;
        u64 pid_tgid = bpf_get_current_pid_tgid();
        info->pid = pid_tgid >> 32;
        bpf_get_current_comm(&info->comm, sizeof(info->comm));
    
        info->saddr = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
        info->daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
        info->sport = BPF_CORE_READ(sk, __sk_common.skc_num);
        info->dport = bpf_ntohs(BPF_CORE_READ(sk, __sk_common.skc_dport));
        
        return 1;
    }
    
    // kprobe on tcp_connect to trace active connection attempts
    SEC("kprobe/tcp_connect")
    int BPF_KPROBE(tcp_connect, struct sock *sk)
    {
        struct conn_info_t conn_info = {};
        if (!populate_conn_info(&conn_info, sk)) {
            return 0;
        }
    
        bpf_map_update_elem(&conns_info, &conn_info, &conn_info, BPF_ANY);
        return 0;
    }
    
    // kretprobe on tcp_connect to handle failed connections
    SEC("kretprobe/tcp_connect")
    int BPF_KRETPROBE(tcp_connect_ret, int ret)
    {
        u64 pid_tgid = bpf_get_current_pid_tgid();
        // If connect fails, remove the entry we just added
        if (ret != 0) {
            // This is simplified. In a real scenario, you need a more robust way
            // to find the key. This requires passing state from kprobe to kretprobe.
            // We'll omit that complexity here for clarity.
        }
        return 0;
    }
    
    // kprobe on tcp_close to trace connection termination
    SEC("kprobe/tcp_close")
    int BPF_KPROBE(tcp_close, struct sock *sk)
    {
        struct conn_info_t conn_info = {};
        if (!populate_conn_info(&conn_info, sk)) {
            return 0;
        }
    
        // Find the connection in the map
        struct conn_info_t* existing_conn = bpf_map_lookup_elem(&conns_info, &conn_info);
        if (existing_conn) {
            // Send the final connection data to user space
            bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, existing_conn, sizeof(*existing_conn));
            // Clean up the map
            bpf_map_delete_elem(&conns_info, &conn_info);
        }
    
        return 0;
    }
    
    char LICENSE[] SEC("license") = "GPL";

    Note: Tracking RX/TX bytes is more complex as it requires probing tcp_sendmsg/tcp_recvmsg and updating the map. For brevity, that logic is omitted but follows a similar pattern of looking up the connection and incrementing counters.

    2. The Go User-Space Agent

    This Go program performs the following steps:

  • Generate eBPF code: Uses go generate with bpf2go to compile the C code and embed it as a Go package.
  • Load eBPF objects: Loads the compiled eBPF program and maps into the kernel.
  • Attach kprobes: Attaches the loaded programs to the specified kernel functions.
  • Listen for events: Opens the perf buffer and reads connection data sent from the kernel.
  • Enrich data: Correlates IP addresses with Kubernetes Pod/Service metadata.
  • Expose metrics: Serves a /metrics endpoint for Prometheus.
  • bash
    # First, set up go generate
    # You need clang, llvm, and libbpf-dev installed
    go install github.com/cilium/bpf2go/cmd/bpf2go@latest
    
    # In your project directory:
    go generate

    Here is the main Go application (main.go):

    go
    package main
    
    import (
    	"bytes"
    	"encoding/binary"
    	"errors"
    	"log"
    	"net/http"
    	"os"
    	"os/signal"
    	"syscall"
    
    	"github.com/cilium/ebpf/link"
    	"github.com/cilium/ebpf/perf"
    	"github.com/prometheus/client_golang/prometheus"
    	"github.com/prometheus/client_golang/prometheus/promhttp"
    )
    
    //go:generate go run github.com/cilium/bpf2go/cmd/bpf2go bpf bpf.c -- -I./headers
    
    // Prometheus metrics
    var (
    	connectionsClosed = prometheus.NewCounterVec(
    		prometheus.CounterOpts{
    			Name: "tcp_connections_closed_total",
    			Help: "Total number of closed TCP connections",
    		},
    		[]string{"saddr", "daddr", "sport", "dport", "comm"},
    	)
    )
    
    func main() {
    	// Subscribe to signals for graceful shutdown
    	stopper := make(chan os.Signal, 1)
    	signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
    
    	// Register Prometheus metrics
    	prometheus.MustRegister(connectionsClosed)
    
    	// Load pre-compiled BPF objects
    	objs := bpfObjects{}
    	if err := loadBpfObjects(&objs, nil); err != nil {
    		log.Fatalf("loading objects: %v", err)
    	}
    	defer objs.Close()
    
    	// Attach kprobes
    	kpConnect, err := link.Kprobe("tcp_connect", objs.TcpConnect, nil)
    	if err != nil {
    		log.Fatalf("attaching tcp_connect kprobe: %s", err)
    	}
    	defer kpConnect.Close()
    
    	kpClose, err := link.Kprobe("tcp_close", objs.TcpClose, nil)
    	if err != nil {
    		log.Fatalf("attaching tcp_close kprobe: %s", err)
    	}
    	defer kpClose.Close()
    
    	// Open a perf reader from the BPF map
    	rd, err := perf.NewReader(objs.Events, os.Getpagesize())
    	if err != nil {
    		log.Fatalf("creating perf event reader: %s", err)
    	}
    	defer rd.Close()
    
    	log.Println("Agent started. Waiting for events...")
    
    	// Start Prometheus HTTP server
    	go func() {
    		http.Handle("/metrics", promhttp.Handler())
    		if err := http.ListenAndServe(":9091", nil); err != nil {
    			log.Fatalf("failed to start metrics server: %v", err)
    		}
    	}()
    
    	// Main event loop
    	go func() {
    		var event bpfConnInfoT
    		for {
    			record, err := rd.Read()
    			if err != nil {
    				if errors.Is(err, perf.ErrClosed) {
    					return
    				}
    				log.Printf("reading from perf buffer: %s", err)
    				continue
    			}
    
    			if record.LostSamples != 0 {
    				log.Printf("perf buffer lost %d samples", record.LostSamples)
    				continue
    			}
    
    			if err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event); err != nil {
    				log.Printf("parsing perf event: %s", err)
    				continue
    			}
    
    			// Process the event
    			processEvent(event)
    		}
    	}()
    
    	// Wait for a signal to exit
    	<-stopper
    	log.Println("Received signal, shutting down...")
    }
    
    func processEvent(event bpfConnInfoT) {
    	// In a real application, you would add a cache here to resolve IPs 
    	// to Kubernetes Pod/Service names by querying the K8s API server.
    	// This is a critical enrichment step.
    	saddr := intToIP(event.Saddr).String()
    	daddr := intToIP(event.Daddr).String()
    	sport := event.Sport
    	dport := event.Dport
    	comm := string(event.Comm[:bytes.IndexByte(event.Comm[:], 0)])
    
    	log.Printf("TCP Close: %s:%d -> %s:%d, Comm: %s", saddr, sport, daddr, dport, comm)
    
    	connectionsClosed.With(prometheus.Labels{
    		"saddr": saddr,
    		"daddr": daddr,
    		"sport": fmt.Sprintf("%d", sport),
    		"dport": fmt.Sprintf("%d", dport),
    		"comm":  comm,
    	}).Inc()
    }
    
    // Helper to convert uint32 IP to net.IP
    func intToIP(ip uint32) net.IP {
    	result := make(net.IP, 4)
    	binary.LittleEndian.PutUint32(result, ip)
    	return result
    }

    This agent, when deployed as a DaemonSet with the necessary capabilities (CAP_SYS_ADMIN, CAP_BPF), provides node-level L4 observability with minimal overhead.


    The Next Frontier: L7 Protocol Parsing with eBPF

    Capturing L4 metadata is powerful, but true service mesh observability requires L7 context, such as HTTP endpoints, gRPC methods, and status codes. This is significantly more challenging in eBPF due to several factors:

    * Bounded Complexity: The eBPF verifier imposes strict limits on program size, complexity, and stack usage. Full L7 parsers often exceed these limits.

    * Packet State: TCP is a stream-based protocol. A single L7 message (e.g., an HTTP request) can be split across multiple TCP packets. Reassembling this stream in the kernel is a complex task, requiring state management for out-of-order packets and retransmissions.

    * TLS Encryption: Most modern traffic is encrypted with TLS. eBPF programs operate at a layer below TLS decryption, meaning they only see encrypted gibberish.

    There are two primary advanced patterns for tackling L7 parsing:

    Pattern 1: Socket Filter / TC Packet Reassembly

    This is the approach used by projects like Cilium. An eBPF program is attached to the Traffic Control (TC) layer of a network interface. This program sees every packet.

  • Attach: Use bpf_skb_load_bytes to read packet data.
  • Filter: Quickly filter for relevant traffic (e.g., port 80/443/50051).
  • Manage State: Use eBPF maps to store partial L7 data per-connection, reassembling the TCP stream as new packets arrive.
  • Parse: Once a full message is buffered (e.g., HTTP headers are complete), parse the relevant fields.
  • Edge Cases & Challenges:

    * This is exceptionally complex to implement correctly.

    * Handling large request/response bodies requires careful buffer management to avoid exceeding memory limits.

    * It's highly inefficient for protocols that don't have clear message delimiters at the start of the stream.

    Pattern 2: User-space Probes (uprobes) for TLS

    To overcome TLS, we can move our probes from the kernel's networking stack to the user-space encryption libraries (like OpenSSL or Go's crypto/tls).

  • Identify Functions: Find the functions that handle plaintext data before encryption and after decryption. For OpenSSL, these are often SSL_read and SSL_write.
  • Attach uprobes: Attach eBPF uprobes to these functions in running processes.
  • Extract Data: The function arguments will contain pointers to the plaintext buffers. The eBPF program can then read this data directly from user-space memory using bpf_probe_read_user.
  • Example Scenario: Tracing Go gRPC calls

    A Go binary's gRPC calls can be traced by attaching uprobes to its TLS library functions:

    crypto/tls.(Conn).Write

    crypto/tls.(Conn).Read

    The eBPF program can inspect the function arguments (passed via registers on amd64) to get a pointer to the plaintext buffer and its length. It can then parse the HTTP/2 frames to extract gRPC method and status information.

    Edge Cases & Challenges:

    * Fragile: This technique is tightly coupled to the specific version and implementation of the user-space library. A library update can break the probes.

    * Symbol Tables: It requires binaries not to be fully stripped of their symbol tables.

    * Language Specifics: Extracting arguments requires deep knowledge of the language's calling convention and memory layout (e.g., Go's stack-based argument passing).


    Performance and Security Considerations in Production

    Deploying eBPF at scale requires careful consideration of its operational impact.

    * The Verifier is Your Enemy (and Friend): The eBPF verifier ensures kernel stability by rejecting unsafe programs. You will spend significant time appeasing it. Key constraints include:

    * No unbounded loops: All loops must have a compile-time constant upper bound. This makes parsing dynamic data structures challenging.

    * Stack Size Limit: The stack is limited to 512 bytes.

    * Pointer Safety: All memory access is statically checked to prevent out-of-bounds reads/writes.

    * Kernel Version Dependency & CO-RE: Historically, eBPF programs were brittle and broke with kernel updates. The modern solution is CO-RE (Compile Once - Run Everywhere), which uses BTF (BPF Type Format). BTF embeds debugging information about kernel types into the kernel itself, allowing the eBPF loader to perform runtime relocations. Your eBPF program can then adapt to small changes in struct layouts between kernel versions. Always use a toolchain that supports CO-RE for production systems.

    CPU Overhead: While far less than a sidecar, eBPF is not free. A program attached to tcp_sendmsg runs for every single TCP send operation on the system*. Inefficient eBPF code can introduce measurable CPU overhead. Use bpftool prog profile to measure the execution time of your programs and identify hotspots.

    * Security Context: Loading eBPF programs requires elevated privileges (CAP_BPF or CAP_SYS_ADMIN). This makes the eBPF agent a highly privileged component. Its security is paramount. Use read-only eBPF maps where possible, minimize the agent's attack surface, and consider using Linux Security Modules (LSMs) to restrict what eBPF programs can do.

    Conclusion: A Paradigm Shift with a Higher Skill Floor

    eBPF offers a path to highly efficient, transparent, and low-latency service mesh observability. By moving logic from sidecar proxies into the kernel, we can eliminate entire classes of performance bottlenecks and significantly reduce the resource footprint of our observability plane.

    However, this power comes at the cost of complexity. Building robust eBPF-based agents requires a deep, systems-level understanding of the Linux kernel, networking, and the intricacies of the eBPF virtual machine and its verifier. It is not a replacement for sidecars in all scenarios—especially where complex L7 traffic management and policy are required—but for observability, it represents the future. For senior engineers tasked with optimizing performance at the limits, mastering eBPF is no longer optional; it's a critical tool for building the next generation of cloud-native infrastructure.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles