Advanced eBPF Routing for Sidecar-less Kubernetes Service Mesh

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inescapable Overhead of the Sidecar Pattern

As senior engineers building distributed systems on Kubernetes, we've largely accepted the sidecar proxy as a necessary cost for the benefits of a service mesh. Observability, mTLS, and advanced traffic routing provided by tools like Istio or Linkerd are indispensable. However, in high-performance or resource-constrained environments, the cost of injecting an Envoy or Linkerd2-proxy into every application pod becomes a significant bottleneck. This isn't a theoretical concern; it's a production reality.

The fundamental problem is the traversal of the network stack. A request from Service A to Service B doesn't go from pod A to pod B. It follows this path:

  • Application A sends a request to service-b.namespace.svc.cluster.local.
  • The request is intercepted by Pod A's network namespace and redirected via iptables rules to its local Envoy sidecar (e.g., localhost:15001).
  • Envoy A processes the request (applies mTLS, routing rules, gathers metrics) and sends it out to the node's network.
  • The request reaches the node hosting Pod B.
  • The request is intercepted by the Envoy sidecar in Pod B.
  • Envoy B processes the request (terminates mTLS, checks policies) and forwards it to the application container on localhost.
  • Application B finally receives the request.
  • For a single hop, we've added two full user-space proxy traversals and multiple trips through the TCP/IP stack within the pod's network namespace. In a complex call chain involving five services, this amounts to ten additional proxy hops. The cumulative impact on P99 latency and the aggregate CPU/memory consumption across a large cluster can be staggering. We're talking about reserving hundreds of millicores and megabytes of RAM per pod just for the mesh infrastructure.

    This is where eBPF (extended Berkeley Packet Filter) presents a paradigm shift. By moving networking logic from a user-space sidecar directly into the Linux kernel, we can achieve the same service mesh goals with a fraction of the performance overhead.


    eBPF: Kernel-Level Programmability for Networking

    We will dispense with the "What is eBPF?" primer. We assume you understand it allows sandboxed programs to run in the kernel. The crucial part for our discussion is which kernel hooks enable a sidecar-less mesh and how they work.

    The two primary hooks leveraged by platforms like Cilium are:

  • Traffic Control (TC) Hooks (cls_bpf): These hooks attach eBPF programs to network interfaces (both physical and virtual, like veth pairs). When a packet enters or leaves an interface, the eBPF program can inspect, modify, redirect, or drop it before it proceeds further up the network stack. This is ideal for enforcing network policies and performing load balancing at L3/L4.
  • Socket-level Hooks (cgroup/connect_sock, sock_ops): These hooks are even more powerful. They attach to cgroups and can intercept socket operations like connect(), sendmsg(), and recvmsg(). This allows an eBPF program to redirect a connection from one destination IP/port to another before a single packet is sent. It also enables transparently intercepting application data for tasks like mTLS encryption/decryption.
  • The eBPF Redirection Mechanism

    Let's visualize the new request flow from Service A to Service B in an eBPF-powered mesh:

  • Application A calls connect() on a socket for Service B's ClusterIP.
  • The cgroup/connect_sock eBPF hook triggers.
    • The eBPF program consults an eBPF map (a highly efficient kernel-space key/value store) that contains the mapping from ClusterIPs to real backend pod IPs.
  • The eBPF program directly rewrites the destination IP and port in the socket's metadata to point to the chosen Pod B IP.
  • The kernel proceeds with the TCP handshake directly to Pod B, completely bypassing any user-space proxies and iptables rules.
  • This is not packet forwarding; it's connection-time destination rewriting. The application is entirely unaware this has happened. The performance gain is immense because we've eliminated the two user-space hops and the associated context switching and memory copies.

    Here's a conceptual C-like representation of what such an eBPF program might do:

    c
    // Simplified pseudo-code for an eBPF sock_addr hook
    SEC("cgroup/connect4")
    int bpf_socket_redirect(struct bpf_sock_addr *ctx) {
        // Only act on TCP connections
        if (ctx->protocol != IPPROTO_TCP) {
            return BPF_OK;
        }
    
        // Check if the destination IP is a ClusterIP we manage
        // bpf_map_lookup_elem is a helper to read from an eBPF map
        struct service_info *svc = bpf_map_lookup_elem(&service_map, &ctx->user_ip4);
    
        if (svc) {
            // This is a service we need to load balance
            // The 'select_backend_pod' function would contain the LB logic
            // (e.g., round-robin, consistent hashing) reading from another map
            struct pod_info *backend = select_backend_pod(svc);
    
            if (backend) {
                // Rewrite the destination IP and port in the connection context
                ctx->user_ip4 = backend->ip;
                ctx->user_port = backend->port;
            }
        }
    
        return BPF_OK;
    }

    This kernel-level agility is the foundation of the sidecar-less service mesh.


    Production Implementation: L7 Traffic Splitting with Cilium

    Cilium is a production-grade CNI that leverages eBPF for networking, observability, and security. Its sidecar-less service mesh capability is built on the principles described above. Let's implement an advanced canary deployment scenario.

    Scenario: We have a product-api service. We want to route 90% of traffic to the stable v1 and 10% of traffic with a specific HTTP header (X-Canary-User: true) to the new v2.

    First, deploy the two versions of our application:

    yaml
    # product-api-v1-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: product-api-v1
      labels:
        app: product-api
        version: v1
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: product-api
          version: v1
      template:
        metadata:
          labels:
            app: product-api
            version: v1
        spec:
          containers:
          - name: api
            image: my-repo/product-api:v1
            ports:
            - containerPort: 8080
    ---
    # product-api-v2-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: product-api-v2
      labels:
        app: product-api
        version: v2
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: product-api
          version: v2
      template:
        metadata:
          labels:
            app: product-api
            version: v2
        spec:
          containers:
          - name: api
            image: my-repo/product-api:v2
            ports:
            - containerPort: 8080

    Next, define the Kubernetes Service that acts as the stable endpoint:

    yaml
    # product-api-svc.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: product-api
    spec:
      type: ClusterIP
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: product-api # Selects both v1 and v2 initially

    Now for the core logic. We use the Gateway API's HTTPRoute resource, which Cilium can understand and translate into eBPF rules. This is far more expressive than legacy Ingress.

    yaml
    # product-api-httproute.yaml
    apiVersion: gateway.networking.k8s.io/v1beta1
    kind: HTTPRoute
    metadata:
      name: product-api-routing
    spec:
      parentRefs:
      - name: product-api # Attaches this route to the Service
        kind: Service
        group: ""
        port: 80
      rules:
      - matches:
        - headers:
          - type: Exact
            name: X-Canary-User
            value: "true"
        backendRefs:
        - name: product-api-v2 # Cilium needs Services for backends
          kind: Service
          port: 80
          weight: 100
      - backendRefs:
        - name: product-api-v1
          kind: Service
          port: 80
          weight: 100

    Note: To make this work, you also need to create Service objects for product-api-v1 and product-api-v2 that select only their respective pods. This HTTPRoute attaches to the main product-api service and defines the routing logic.

    When this HTTPRoute is applied, Cilium's operator does the following:

    • It recognizes the L7 routing rules.
    • It realizes that simple eBPF socket redirection is insufficient, as it needs to inspect HTTP headers.
  • It dynamically injects a minimal, highly optimized Envoy proxy only where needed—either in the client pod or on the node as a per-node proxy—and uses eBPF to steer only the relevant traffic (product-api on port 80) to it.
  • The eBPF program on the client pod's network interface will see an outgoing packet to the product-api ClusterIP. It will redirect this packet to the local Envoy listener.
  • Envoy inspects the header, makes the routing decision (v1 or v2), and then sends the request to the final destination pod IP. The return path is similarly handled by eBPF.
  • This hybrid approach is key. eBPF is used for the heavy lifting of L3/L4 redirection and policy, while a shared, optimized proxy is used only for complex L7 logic, avoiding the per-pod resource cost of a traditional sidecar.

    You can verify this behavior using Cilium's observability tool, Hubble:

    bash
    # Install Hubble CLI
    
    # Enable Hubble UI
    hubble ui
    
    # From another terminal, send traffic
    # Normal user
    kubectl exec -it client-pod -- curl http://product-api
    
    # Canary user
    kubectl exec -it client-pod -- curl -H "X-Canary-User: true" http://product-api

    The Hubble UI will visually trace the requests, showing that normal traffic flows to v1 pods and canary traffic flows to v2 pods, with the policy decision being applied at the source.


    Advanced Use Case: Truly Transparent mTLS

    The most impressive feature of an eBPF-based mesh is transparent mTLS without a user-space proxy. Sidecars terminate TLS in user space, meaning traffic between the application and its sidecar is unencrypted. eBPF handles this at a lower level.

    Mechanism:

  • Identity: Cilium uses SPIFFE identities, embedding a cryptographic identity into each pod via a CiliumIdentity CRD.
  • Syscall Interception: eBPF programs are attached to the sendmsg() and recvmsg() socket calls.
  • Transparent Encryption: When an application in Pod A writes plaintext data to a socket destined for Pod B, the sendmsg eBPF program intercepts this data. It uses the Linux Kernel TLS (KTLS) module to encrypt the data using session keys derived from the SPIFFE identities. The now-encrypted data is then handed to the TCP/IP stack.
  • Transparent Decryption: On the receiving node, the process is reversed. The recvmsg eBPF program receives the encrypted data from the socket buffer, uses KTLS to decrypt it, and presents the plaintext data to the application in Pod B.
  • The application has no idea TLS is involved. It simply reads and writes to a standard TCP socket. This eliminates the unencrypted traffic leg within the pod and avoids the overhead of user-space TLS termination.

    Enabling this is remarkably simple:

    In the Cilium ConfigMap, you enable mTLS:

    yaml
    # In cilium-config ConfigMap
    ... 
      enable-mutual-authentication: "true"
      tls-secrets-namespace: "cilium-secrets"

    And define a CiliumNetworkPolicy to enforce it:

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "api-mtls-enforcement"
    spec:
      endpointSelector:
        matchLabels:
          app: product-api
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: client-app
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
          authentication:
            mode: "required" # Enforce mTLS

    To verify, use cilium monitor:

    bash
    # On a node, run:
    cilium monitor --type l7
    
    # Look for output showing TLS handshake and encrypted traffic
    -> Pod client-app/client-pod-xyz -> product-api/product-api-v1-abc identity 12345 -> 54321 
       TCP 8080 TLS SNI: product-api.default.svc.cluster.local

    Performance Analysis: Sidecar vs. Sidecar-less

    This is where the eBPF approach truly shines. Let's analyze the performance delta across key metrics.

    MetricSidecar-based (Istio)eBPF-based (Cilium)Impact
    P99 Request LatencyAdds ~5-15ms per hop (2 proxies)Adds <1ms per hopIn a 5-service chain, this can be the difference between a 75ms overhead and a 5ms overhead.
    CPU Usage (per pod)~50-200m per Envoy sidecar0 (logic is in the node-level Cilium agent)Across 1000 pods, this saves 50-200 full CPU cores dedicated solely to the service mesh.
    Memory Usage (per pod)~50-150MB per Envoy sidecar0Saves 50-150GB of RAM in a 1000-pod cluster, allowing for significantly higher pod density per node.
    Network PathApp -> localhost -> Sidecar -> Node -> Sidecar -> AppApp -> Node -> AppDrastically simpler, faster, and easier to debug. Fewer points of failure.

    These are not just theoretical numbers. Benchmarks consistently show that by avoiding the user-space data path, eBPF-based meshes reduce latency by an order of magnitude and cut resource consumption dramatically.


    Edge Cases and Operational Gotchas

    A senior engineer knows there's no such thing as a free lunch. The power of eBPF comes with its own set of complexities.

  • Kernel Version Dependency: This is the most significant constraint. The features used by Cilium (like KTLS integration and advanced socket hooks) require modern Linux kernels (typically 5.2+ for full functionality). Running on older enterprise Linux distributions can be a non-starter or will force Cilium to fall back to less efficient mechanisms. Always check the Cilium documentation for required kernel versions before planning a migration.
  • Debugging eBPF: When something goes wrong, you can't just kubectl logs a sidecar. Debugging becomes a kernel-level activity. You need to become proficient with tools like:
  • * bpftool: The Swiss Army knife for inspecting loaded eBPF programs and maps. You can dump the JIT-compiled assembly, view map contents, and see which programs are attached to which hooks.

    * cilium monitor: An essential tool for viewing real-time packet-level events, policy verdicts, and L7 traffic as processed by eBPF.

    * Hubble: A higher-level observability platform that uses the data from cilium monitor to build service maps and visualize traffic flows.

  • Complex L7 Protocol Handling: eBPF is exceptionally good at parsing common protocols like HTTP/1.x, gRPC, and Kafka because dedicated parsers can be implemented. For obscure or custom TCP protocols, eBPF cannot perform L7 inspection. In these cases, Cilium's architecture gracefully falls back to instantiating an Envoy proxy to handle that specific traffic, giving you the best of both worlds without forcing a sidecar on every pod.
  • eBPF Map Limits: eBPF maps store state like service-to-endpoint mappings, connection tracking tables, and policy identities. These maps consume non-swappable kernel memory. In clusters with tens of thousands of services or millions of connections, you must monitor the size of these maps and configure their limits appropriately to avoid kernel memory exhaustion.
  • Conclusion: A New Architectural Default

    The sidecar-less service mesh powered by eBPF is not a niche optimization; it is the logical evolution of service mesh architecture for modern, performance-sensitive infrastructure. By moving policy enforcement, load balancing, and observability into the kernel, we eliminate the primary sources of overhead that have plagued sidecar-based implementations.

    While the operational model requires a deeper understanding of Linux kernel primitives and a new set of debugging tools, the benefits are undeniable: radically lower latency, significantly reduced resource consumption, and a simpler data path. For engineering teams pushing the boundaries of scale and performance on Kubernetes, the transition from sidecar to eBPF is no longer a question of if, but when.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles