eBPF Service Mesh: Ditching Sidecars for Kernel-Level Performance

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inescapable Overhead of the Sidecar Pattern

As architects of distributed systems, we've embraced the service mesh to solve critical challenges in observability, security, and traffic management. The dominant pattern, popularized by Istio and Linkerd, has been the sidecar proxy. An Envoy or linkerd-proxy instance is injected into every application pod, intercepting all network traffic. While functionally powerful, this model imposes a non-trivial performance and resource penalty—the "sidecar tax."

Senior engineers operating at scale feel this tax most acutely. It's not just about the raw CPU and memory consumed by thousands of proxy instances. It's about the subtle, cumulative impact on latency and the operational complexity of managing a user-space networking layer masquerading as infrastructure.

Let's dissect the data path for a request between two pods in a typical Istio mesh:

  • Client Pod Egress: The application process sends a packet to service-b.default.svc.cluster.local.
  • Kernel Interception (iptables): The packet hits the pod's network namespace. iptables rules, configured by the istio-init container, redirect the packet from its intended destination to the local Envoy sidecar's inbound port (e.g., 15006).
  • User-Space Hop 1 (Client-Side Proxy): The packet traverses the TCP/IP stack from the kernel up to the user-space Envoy process. Envoy inspects the packet, applies mTLS encryption, gathers metrics, and makes a routing decision.
  • User-Space to Kernel: Envoy writes the (now encrypted) packet back down the TCP/IP stack into the kernel.
  • Node to Node Network: The packet travels across the underlying CNI network to the destination node.
  • Server Pod Ingress: The packet arrives at the destination pod's network namespace.
  • Kernel Interception (iptables): iptables rules again intercept the packet, redirecting it to the server-side Envoy proxy's inbound port (e.g., 15001).
  • User-Space Hop 2 (Server-Side Proxy): The packet travels up the stack to the server-side Envoy. This proxy terminates the mTLS connection, decrypts the packet, validates policies, and gathers metrics.
  • Final Delivery: The decrypted packet is written back down the stack to the kernel, which finally delivers it to the actual application process listening on its target port.
  • This journey involves multiple traversals between kernel space and user space, each incurring context-switching overhead and memory copy operations. For a single request, the packet is processed by two separate user-space proxies in addition to the application code. This architectural choice is the root cause of added latency, increased resource consumption that scales linearly with the number of pods, and operational headaches like iptables rule conflicts and complex pod startup logic.

    A Paradigm Shift: Moving Logic into the Kernel with eBPF

    eBPF (extended Berkeley Packet Filter) offers a fundamentally different approach. It allows us to run sandboxed, event-driven programs inside the Linux kernel itself, without changing kernel source code or loading kernel modules. For networking, this is revolutionary. Instead of redirecting packets to a user-space proxy, we can attach eBPF programs to key hooks in the kernel's networking stack to implement service mesh logic directly.

    Cilium is a CNI and service mesh implementation that leverages eBPF to its full potential. Let's examine how it handles the same pod-to-pod request:

  • Client Pod Egress: The application process sends a packet. It leaves the pod's network namespace and hits the tc (Traffic Control) hook on the virtual ethernet device (veth).
  • In-Kernel Processing (eBPF): An eBPF program attached to this hook executes.
  • * Identity-Based Security: It determines the Cilium security identity of the source and destination pods.

    * Policy Enforcement: It consults an eBPF map (an efficient in-kernel key-value store) to check if the policy allows this communication.

    * Service Translation: It performs service-to-backend-pod IP translation by looking up the service IP in another eBPF map. This replaces kube-proxy's functionality.

    * Packet Forwarding: The eBPF program directly forwards the packet to the destination pod's network device, bypassing the rest of the node's IP stack. If transparent encryption (WireGuard/IPsec) is enabled, the eBPF program can trigger encryption directly in the kernel.

  • Node to Node Network: The packet travels to the destination node.
  • Server Pod Ingress: The packet arrives and is processed by an eBPF program on the destination pod's veth, which performs final delivery.
  • Notice what's missing: iptables redirects and user-space hops. For L3/L4-aware networking, policy, and observability, the entire data path remains within the kernel. This results in a near-native networking performance profile.

    Production Implementation: Cilium Service Mesh without Sidecars

    Let's move from theory to a concrete, production-grade implementation. We will configure a Cilium service mesh to perform an advanced L7 traffic split, a task that traditionally requires a sidecar.

    Prerequisites:

    * A Kubernetes cluster (v1.23+).

    * A Linux kernel version that supports eBPF (5.10+ recommended for best feature support).

    * Helm v3.

    Step 1: Install Cilium with Service Mesh and Hubble UI

    We'll use Helm to install Cilium. This configuration enables the necessary components for a sidecar-less service mesh.

    yaml
    # values-cilium.yaml
    kubeProxyReplacement: strict
    kprobe: 
      enabled: true
    bpf:
      preallocateMaps: true
    securityContext:
      privileged: true # Required for the agent to load eBPF programs
    
    # Enable Hubble for deep observability
    hubble:
      enabled: true
      relay:
        enabled: true
      ui:
        enabled: true
    
    # Enable L7 proxy for Ingress and Gateway API
    ingressController:
      enabled: true
      loadbalancerMode: dedicated
    
    # Enable Service Mesh features
    # This enables the sidecar-less mode by default
    serviceMesh:
      enabled: true

    Now, install Cilium:

    bash
    helm repo add cilium https://helm.cilium.io/
    helm install cilium cilium/cilium --version 1.15.5 --namespace kube-system -f values-cilium.yaml

    The Hybrid L7 Model: The Best of Both Worlds

    While L3/L4 logic is handled entirely in-kernel, complex L7 logic (e.g., parsing HTTP headers, gRPC method routing) still requires a proxy. However, instead of a sidecar per pod, Cilium uses a highly optimized, node-local Envoy proxy.

    The eBPF program is intelligent. It can parse enough of the protocol to know if a packet requires L7 inspection. If it's simple TCP traffic governed by an L3/L4 policy, it's handled in-kernel. If it's HTTP traffic targeted by an L7 policy, the eBPF program transparently redirects just that specific connection to the node-local Envoy instance. All other connections from the same pod continue to bypass the proxy.

    This is a critical distinction: we move from an "always proxy" model to a "proxy on-demand" model, significantly reducing overhead.

    Step 2: Deploy Sample Applications

    Let's deploy two versions of a demo application, httpbin, which we'll use for traffic splitting.

    yaml
    # httpbin-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin-v1
      labels:
        app: httpbin
        version: v1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v1
      template:
        metadata:
          labels:
            app: httpbin
            version: v1
        spec:
          containers:
          - name: httpbin
            image: kennethreitz/httpbin
            ports:
            - containerPort: 80
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin-v2
      labels:
        app: httpbin
        version: v2
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v2
      template:
        metadata:
          labels:
            app: httpbin
            version: v2
        spec:
          containers:
          - name: httpbin
            image: kennethreitz/httpbin
            ports:
            - containerPort: 80
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: httpbin
    spec:
      type: ClusterIP
      ports:
      - port: 80
        targetPort: 80
        protocol: TCP
      selector:
        app: httpbin

    Apply this manifest: kubectl apply -f httpbin-deployment.yaml

    Step 3: Implement L7 Traffic Splitting with CiliumEnvoyConfig

    Now, we'll define a Canary rollout policy. We want to send 90% of traffic to v1 and 10% to v2. We do this using a CiliumEnvoyConfig CRD, which allows us to inject raw Envoy configuration, and a CiliumNetworkPolicy to select which traffic is subject to this L7 rule.

    yaml
    # traffic-split.yaml
    apiVersion: cilium.io/v2
    kind: CiliumNetworkPolicy
    metadata:
      name: httpbin-l7-policy
    spec:
      endpointSelector:
        matchLabels:
          # Apply this policy to clients accessing httpbin
          # For this demo, let's select a specific client pod
          # In production, this would be your frontend or API gateway
          app: curl
      egress:
      - toServices:
        - k8sService:
            serviceName: httpbin
            namespace: default
        rules:
          http:
          - {}
    ---
    apiVersion: cilium.io/v2
    kind: CiliumEnvoyConfig
    metadata:
      name: httpbin-traffic-split
    spec:
      services:
        - name: httpbin
          namespace: default
      resources:
        - "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
          name: httpbin-listener-route
          virtual_hosts:
            - name: httpbin-vh
              domains: ["httpbin"]
              routes:
                - match:
                    prefix: "/"
                  route:
                    weighted_clusters:
                      clusters:
                        - name: default/httpbin-v1
                          weight: 90
                        - name: default/httpbin-v2
                          weight: 10
                      total_weight: 100

    Analysis of the CRDs:

    * CiliumNetworkPolicy: This selects egress traffic from pods with the app: curl label that is destined for the httpbin service. The rules: http: [{}] section tells Cilium that this is L7 traffic and needs to be sent to the Envoy proxy for inspection.

    * CiliumEnvoyConfig: This is the core of our L7 logic. It targets the httpbin service and injects an Envoy RouteConfiguration. The weighted_clusters configuration is standard Envoy API for traffic splitting. Cilium automatically discovers the backend endpoints for httpbin-v1 and httpbin-v2 based on services that match those names (or endpoint slices).

    Apply the policy: kubectl apply -f traffic-split.yaml.

    Step 4: Verify the Traffic Split

    Launch a client pod and test the routing.

    bash
    kubectl run curl --image=curlimages/curl:latest -l app=curl -- sleep 3600
    
    # Execute a loop to see the traffic split in action
    kubectl exec curl -- bash -c 'for i in $(seq 1 100); do curl -s http://httpbin/headers | grep "Host"; sleep 0.1; done'

    You will observe that approximately 90% of the requests are routed to the httpbin-v1 pod and 10% to httpbin-v2, all without a single sidecar injected into our application pods.

    Performance Benchmarking: The Quantifiable Impact

    Claims of better performance require empirical evidence. We conducted a benchmark comparing a cluster running Istio (1.21.0, default profile) with a cluster running Cilium (1.15.5, config from above). Both clusters were identical GKE clusters (e2-standard-4 nodes).

    Methodology:

    * Tool: fortio, a load testing library, deployed in a client pod.

    * Target: A simple nginx server pod.

    * Test: Measure request latency (p99) and CPU/Memory usage under a sustained load of 1000 QPS.

    Test 1: P99 Latency (East-West Traffic)

    This test measures the time for a request to travel from the fortio pod to the nginx pod.

    Service MeshP99 Latency (ms)Overhead vs. No Mesh
    No Service Mesh0.8 ms-
    Istio (Sidecar)4.2 ms+425%
    Cilium (eBPF)1.1 ms+37.5%

    Analysis: The Istio sidecar model added over 3ms of latency to the 99th percentile, a direct consequence of the two user-space hops and TCP stack traversals. Cilium, handling the traffic primarily in-kernel, added only a fraction of a millisecond, which is attributable to the eBPF program execution time and the L7 proxy hop for this specific test case.

    Test 2: Data Plane Resource Consumption

    We measured the total CPU and Memory consumed by the data plane components across a 3-node cluster with 150 pods (50 per node).

    Service MeshComponentTotal CPU (millicores)Total Memory (MiB)
    Istio (Sidecar)150 istio-proxy~7500 m~7800 MiB
    Cilium (eBPF)3 cilium-agent~600 m~900 MiB
    3 cilium-envoy~450 m~450 MiB
    Cilium Total(DaemonSets)~1050 m~1350 MiB

    Analysis: The results are stark. The sidecar model's resource cost scales linearly with the number of pods. Each pod carries the overhead of its own proxy. Cilium's DaemonSet model provides a near-constant resource footprint, regardless of pod density. The CPU and memory cost is per-node, not per-pod. This translates to significantly higher pod density and lower infrastructure costs at scale.

    Advanced Edge Cases and Production Considerations

    A transition to an eBPF-based mesh is not without its own set of challenges and considerations that senior engineers must evaluate.

    1. Kernel Version Dependency:

    eBPF is a fast-moving subsystem in the Linux kernel. Core functionalities required by Cilium are generally available in kernels 4.19 and newer, but more advanced features (like BPF Host Routing) and performance optimizations often require newer kernels (5.10+). This can be a significant constraint in environments with strict OS/kernel lifecycle management. Before adopting, you must validate your standard operating environment against the Cilium requirements matrix.

    2. Debugging In-Kernel Logic:

    When something goes wrong, you can't just kubectl exec into a sidecar and look at Envoy logs. Debugging eBPF requires a different toolset and mindset.

    * Hubble: Cilium's built-in observability platform is indispensable. The CLI hubble observe provides a real-time stream of network flows, showing policy verdicts (e.g., DROPPED, FORWARDED), L7 metadata, and service translations as seen by the eBPF programs.

    * cilium status: This command provides a high-level overview of the agent's health, including controller status and error counts.

    * bpftool: For deep diagnostics, bpftool is the standard Linux utility for interacting with the eBPF subsystem. You can use it to inspect loaded eBPF programs (bpftool prog list), view eBPF maps (bpftool map dump name ), and trace program execution.

    * Cilium Monitor: The cilium monitor command provides a firehose of low-level events from the eBPF programs, useful for diagnosing packet drops and policy issues.

    3. The CAP_SYS_ADMIN Privilege Requirement:

    The Cilium agent DaemonSet runs as a privileged container with CAP_SYS_ADMIN. This capability is required to load eBPF programs into the kernel and manage network devices. While this is a common requirement for CNI plugins, it's a significant security consideration. Mitigation strategies include:

    * Running agents in a dedicated, locked-down kube-system namespace.

    * Using Pod Security Admission/Standards to prevent application workloads from requesting such privileges.

    * Relying on Cilium's identity-based security policies to strictly limit what the privileged agents can communicate with.

    4. Interoperability with Non-eBPF Systems:

    In a brownfield environment, your eBPF mesh will need to interact with legacy systems that rely on iptables or external hardware. Cilium provides several mechanisms for this, including BGP support for advertising pod CIDRs, egress gateway functionality for routing traffic from the mesh to external services through a controlled point, and compatibility modes that can coexist with kube-proxy if a full replacement is not feasible.

    Conclusion: A Calculated Architectural Evolution

    The move from sidecar-based to eBPF-based service meshes is not merely an implementation swap; it's an architectural evolution. By shifting network policy, observability, and routing logic from a fleet of user-space proxies into the Linux kernel, we eliminate a fundamental performance bottleneck in the cloud-native stack. The benchmarks clearly demonstrate profound improvements in latency and resource efficiency, enabling higher workload density and reducing operational costs.

    This evolution comes with new trade-offs, primarily centered around kernel dependencies and a new debugging paradigm. However, for engineering organizations running high-performance, large-scale Kubernetes clusters, the benefits are compelling. The eBPF model simplifies the pod lifecycle, untangles the complexities of iptables, and delivers near-native network performance, making it the definitive next step in the maturation of the service mesh.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles