Kernel-Level Service Mesh Observability with eBPF and Cilium

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Sidecar Tax: Acknowledging the Performance Bottleneck

For any senior engineer who has operated a service mesh like Istio at scale, the term "sidecar tax" is painfully familiar. The architectural pattern of injecting a user-space proxy (typically Envoy) into every application pod provides powerful features but at a significant cost:

  • Latency Overhead: Every network packet to or from an application pod must traverse the kernel's TCP/IP stack twice—once for the application and once for the proxy. This user-space to kernel-space context switching adds measurable latency to the p99 of every request.
  • Resource Consumption: Each sidecar proxy consumes non-trivial amounts of CPU and memory, which, when multiplied across thousands of pods in a large cluster, represents a substantial resource and financial cost.
  • Operational Complexity: Managing sidecar injection, upgrades, and configuration drift adds another layer of complexity to CI/CD pipelines and cluster operations. Traffic interception via iptables can be brittle and difficult to debug.
  • While this model has been a necessary trade-off for gaining observability, security, and traffic management, a more efficient paradigm has emerged directly from the Linux kernel: eBPF.

    This article is not an introduction to eBPF. It assumes you understand its basic principles. Instead, we will perform a deep dive into a production-grade implementation of a sidecarless service mesh observability layer using Cilium, focusing on the advanced techniques and edge cases you will encounter in a real-world, multi-cluster environment.

    Architectural Shift: From User-Space Proxy to In-Kernel Data Path

    The fundamental difference lies in where policy enforcement and data collection occur.

    Traditional Sidecar Model (e.g., Istio):

    mermaid
    graph TD
        subgraph Pod A
            AppA[Application Container]
            ProxyA[Envoy Sidecar]
        end
    
        subgraph Pod B
            AppB[Application Container]
            ProxyB[Envoy Sidecar]
        end
    
        AppA -- localhost TCP --> ProxyA
        ProxyA -- Kernel TCP/IP Stack --> NodeNetwork
        NodeNetwork -- Kernel TCP/IP Stack --> ProxyB
        ProxyB -- localhost TCP --> AppB
    
        style AppA fill:#f9f,stroke:#333,stroke-width:2px
        style AppB fill:#f9f,stroke:#333,stroke-width:2px
        style ProxyA fill:#bbf,stroke:#333,stroke-width:2px
        style ProxyB fill:#bbf,stroke:#333,stroke-width:2px

    Traffic from AppA is redirected by iptables to ProxyA on the localhost interface. ProxyA then opens a new connection to ProxyB, traversing the full network stack. This round trip introduces at least two additional passes through the TCP/IP stack.

    eBPF-based Model (e.g., Cilium):

    mermaid
    graph TD
        subgraph Node Kernel
            eBPF[eBPF Programs at TC/Socket Hooks]
        end
    
        subgraph Pod A
            AppA[Application Container]
        end
    
        subgraph Pod B
            AppB[Application Container]
        end
    
        AppA -- Socket Call --> eBPF
        eBPF -- Direct Path --> AppB
    
        style AppA fill:#f9f,stroke:#333,stroke-width:2px
        style AppB fill:#f9f,stroke:#333,stroke-width:2px
        style eBPF fill:#9f9,stroke:#333,stroke-width:2px

    eBPF programs attached to kernel hooks (like the Traffic Control (TC) ingress/egress hooks or socket-level hooks) can inspect, modify, and even redirect packets before they traverse much of the kernel's networking stack. The eBPF program, understanding pod identity via Cilium's control plane, can enforce policy and record observability data with minimal overhead, then forward the packet directly to its destination.

    Production Implementation: Cilium and Hubble

    We'll deploy a sample microservices application and configure Cilium to provide deep L3/L4 and L7 observability without any sidecars.

    Prerequisites: A multi-cluster Kubernetes environment with kubectl context configured for both clusters (cluster-1 and cluster-2).

    Step 1: Install Cilium with Cluster Mesh and Hubble

    First, we install the Cilium CLI and then install Cilium on both clusters, enabling the necessary components.

    bash
    # On cluster-1
    cilium install --version 1.15.1 \
      --set cluster.name=cluster-1 \
      --set cluster.id=1 \
      --set ipam.operator.clusterPoolIPv4PodCIDRList=10.0.0.0/16 \
      --set clusterMesh.enabled=true \
      --set hubble.enabled=true \
      --set hubble.relay.enabled=true \
      --set hubble.ui.enabled=true
    
    # On cluster-2
    cilium install --version 1.15.1 \
      --set cluster.name=cluster-2 \
      --set cluster.id=2 \
      --set ipam.operator.clusterPoolIPv4PodCIDRList=10.1.0.0/16 \
      --set clusterMesh.enabled=true \
      --set hubble.enabled=true \
      --set hubble.relay.enabled=true \
      --set hubble.ui.enabled=true

    After installation, enable the cluster mesh connection:

    bash
    # Switch context to cluster-1
    cilium clustermesh connect --destination-context cluster-2
    
    # Verify connection
    cilium clustermesh status --wait

    Step 2: Deploy a Sample Application

    We'll use a simple frontend service that calls a backend service. We'll deploy the frontend to cluster-1 and the backend to cluster-2 to demonstrate cross-cluster observability.

    demo-app.yaml:

    yaml
    # --- Apply to cluster-1 ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: demo
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: frontend
      namespace: demo
      labels:
        app: frontend
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: frontend
      template:
        metadata:
          labels:
            app: frontend
        spec:
          containers:
          - name: frontend
            image: curlimages/curl
            command: ["sleep", "3600"]
    ---
    # --- Apply to cluster-2 ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: demo
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: backend
      namespace: demo
      annotations:
        # This annotation makes the service discoverable in other clusters
        service.cilium.io/global: "true"
    spec:
      selector:
        app: backend
      ports:
      - protocol: TCP
        port: 80
        targetPort: 8080
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: backend
      namespace: demo
      labels:
        app: backend
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: backend
      template:
        metadata:
          labels:
            app: backend
        spec:
          containers:
          - name: backend
            image: gcr.io/google-samples/hello-app:1.0
            ports:
            - containerPort: 8080

    Step 3: L3/L4 Observability with Hubble

    Let's generate some traffic from the frontend in cluster-1 to the backend in cluster-2.

    bash
    # Get the frontend pod name in cluster-1
    FRONTEND_POD=$(kubectl get pods -n demo -l app=frontend -o jsonpath='{.items[0].metadata.name}' --context cluster-1)
    
    # Exec into the pod and curl the backend service
    # The .demo.svc.cluster.local DNS name resolves across clusters thanks to Cilium Cluster Mesh
    kubectl exec -it $FRONTEND_POD -n demo --context cluster-1 -- bash -c "curl -v http://backend.demo.svc.cluster.local/"

    Now, let's observe this flow using Hubble. We can do this from either cluster.

    bash
    # From cluster-1, forward the hubble-relay port
    kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
    
    # Observe the flow
    hubble observe --server localhost:4245 --namespace demo -f
    
    # Expected Output (simplified):
    # TIMESTAMP     SOURCE                  DESTINATION                  TYPE          VERDICT     SUMMARY
    # May 24 10:30  demo/frontend-xxxx      demo/backend-yyyy (cluster-2)  L3/L4         FORWARDED   TCP 10.0.1.23:54321 -> 10.1.2.45:80
    # May 24 10:30  demo/frontend-xxxx      kube-system/kube-dns         L3/L4         FORWARDED   UDP 10.0.1.23:12345 -> 10.0.0.10:53

    This basic L3/L4 visibility is captured by eBPF programs at the TC hooks on the virtual ethernet (veth) devices of each pod. It's incredibly efficient as it's just reading packet headers in the kernel.

    The Advanced Challenge: L7 Observability on Encrypted Traffic

    This is where the true power and complexity of eBPF-based observability lie. How can Cilium provide L7 visibility (e.g., HTTP paths, gRPC methods) into a TLS-encrypted stream without deploying a proxy to terminate TLS?

    The answer is to move the observation point. Instead of observing on the wire (where data is encrypted), Cilium uses eBPF to observe data inside the application's memory right before it's passed to the TLS library for encryption, and right after it's returned from the TLS library after decryption.

    This is achieved using Kernel Probes (kprobes) and User-space Probes (uprobes).

    * kprobes: Attach eBPF programs to functions within the kernel itself.

    * uprobes: Attach eBPF programs to functions within a user-space library or executable loaded into memory.

    Cilium attaches uprobes to well-known TLS libraries like OpenSSL, GnuTLS, and NSS. Specifically, it targets functions like SSL_read and SSL_write. When the application calls SSL_write to send data, the eBPF uprobe triggers, reads the plaintext buffer from the function's arguments directly from memory, parses it for L7 information, and sends this metadata to the Cilium agent via a perf ring buffer. The data is then encrypted and sent on its way. The reverse happens for SSL_read.

    Step 4: Implementing L7 HTTP Observability

    To enable L7 parsing, we need a CiliumNetworkPolicy that specifies the protocol.

    backend-l7-policy.yaml (Apply to cluster-2):

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "backend-l7-visibility"
      namespace: demo
    spec:
      endpointSelector:
        matchLabels:
          app: backend
      ingress:
      - fromEndpoints:
        - matchLabels:
            # Allow traffic from any 'frontend' pod in any cluster
            "k8s:app": frontend
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
          rules:
            http: [{}]

    This policy does two things: it restricts ingress to the backend pod to only allow traffic from frontend pods, and the rules: http: [{}] block tells Cilium's eBPF programs to activate the L7 HTTP parser for traffic on this port.

    Now, regenerate the traffic:

    bash
    kubectl exec -it $FRONTEND_POD -n demo --context cluster-1 -- bash -c "curl -v http://backend.demo.svc.cluster.local/api/v1/data"

    Observe again with Hubble, but this time, specifically ask for L7 events:

    bash
    hubble observe --server localhost:4245 --namespace demo -f --type l7
    
    # Expected Output (simplified):
    # TIMESTAMP     SOURCE                  DESTINATION                  TYPE          VERDICT     SUMMARY
    # May 24 10:45  demo/frontend-xxxx      demo/backend-yyyy (cluster-2)  L7-request    FORWARDED   HTTP/1.1 GET /api/v1/data
    # May 24 10:45  demo/backend-yyyy (cluster-2) demo/frontend-xxxx      L7-response   FORWARDED   HTTP/1.1 200

    We now have full L7 visibility into the HTTP request path and response code, across clusters, without a single sidecar proxy. This data can be exported from Hubble to Prometheus, Grafana, or other observability platforms to build rich dashboards and alerts.

    Edge Cases and Performance Deep Dive

    This approach is powerful but not without its complexities and edge cases that senior engineers must consider.

    Edge Case 1: Statically Linked and Non-Standard TLS Libraries

    The uprobe approach relies on attaching to dynamically linked, well-known shared libraries (.so files). What happens when your application is a Go binary that statically links its own TLS implementation (a fork of BoringSSL)?

    * Problem: The standard uprobes for libssl.so won't find their target functions. Cilium's L7 TLS visibility for that specific pod will be blind. You will still get L3/L4 flow data, but no L7 details.

    * Solution/Mitigation:

    1. Awareness: The first step is to be aware of which services in your stack use non-standard TLS. This requires deep application knowledge.

    2. Service-Specific Probes: For critical applications, it's theoretically possible to write custom eBPF uprobes targeting the specific function names and memory layouts of the statically linked library, but this is a highly advanced and brittle endeavor.

    3. Kernel TLS (kTLS): A more robust, forward-looking solution is to leverage kTLS. Applications can offload TLS handling to the kernel, which then allows eBPF programs to transparently access the decrypted data. Support for this is still evolving in both applications and in Cilium.

    4. Fallback to Sidecars: For services where L7 visibility is non-negotiable and eBPF introspection isn't possible, you can selectively enable a traditional sidecar proxy just for those workloads, while the rest of the mesh remains sidecarless.

    Edge Case 2: Kernel Version Dependency

    eBPF is not a single feature; it's a rapidly evolving subsystem within the Linux kernel. Advanced features, like specific eBPF program types or helper functions required for efficient parsing, are only available in newer kernels.

    * Problem: Your staging environment runs on kernel 5.15 and L7 gRPC parsing works perfectly. Production is on an older LTS kernel, 4.19, and the parser fails to load or provides incomplete data.

    * Solution/Mitigation:

    1. Strict Homogeneity: Enforce a strict, homogeneous kernel version across all nodes in your production environment. This is a critical operational discipline for running eBPF at scale.

    2. Consult Documentation: Always consult the Cilium documentation for the minimum kernel versions required for the specific features you intend to use.

    3. Graceful Degradation: Understand that Cilium will attempt to gracefully degrade. If a feature is unavailable, it typically falls back to a less efficient implementation or simply disables that feature, logging a warning. Monitoring Cilium agent logs is crucial.

    Performance Analysis: Quantifying the Gains

    Let's analyze a realistic performance comparison between a sidecar-based and eBPF-based mesh for a simple request/response workload.

    MetricIstio 1.21 (Envoy Sidecar)Cilium 1.15 (eBPF)% ImprovementNotes
    p99 Latency Added/Hop~2.5 ms~0.3 ms~88%Measured on top of baseline application latency.
    CPU Usage / 1000 RPS~0.5 vCPU~0.08 vCPU~84%CPU consumed by the data plane components (Envoy vs. kernel work).
    Memory per Pod (Proxy)~50 MiBN/A100%eBPF programs and maps consume kernel memory, not per-pod overhead.
    TCP Connection Traversal4 (App->Proxy, Proxy->App)1 (App->App)75%Fewer traversals of the TCP/IP stack.

    The reasons for this dramatic improvement are fundamental:

    * No Context Switching: eBPF operates entirely in the kernel, avoiding the expensive overhead of copying packet data between kernel space and user space.

    * Path Optimization: The networking path is significantly shorter and more direct.

    * Shared Resources: eBPF maps and programs are loaded once per node, not once per pod, leading to a much more efficient resource utilization model.

    Edge Case 3: The eBPF Verifier and Program Complexity

    To ensure kernel stability, every eBPF program must pass a rigorous in-kernel verifier before it can be attached. The verifier statically analyzes the program to prevent infinite loops, out-of-bounds memory access, and other unsafe operations.

    * Problem: A complex L7 protocol parser (e.g., for a custom RPC protocol) might be too large or have too many execution paths to pass the verifier's complexity checks (e.g., the default 1 million instruction limit).

    * Solution/Mitigation:

    1. eBPF Tail Calls: Break down a large, monolithic eBPF program into smaller, chained programs. One program can perform initial parsing and then use a bpf_tail_call to jump to another program to handle a specific state. This is a standard pattern for implementing state machines in eBPF.

    2. eBPF-to-eBPF Function Calls: For newer kernels (5.10+), you can define and call functions within a single eBPF program, which helps with code organization and can reduce overall complexity from the verifier's perspective.

    3. User-Space Assistance: For extremely complex parsing, the eBPF program can extract the raw data and send it to the user-space Cilium agent for final parsing. This is a trade-off, re-introducing some kernel-to-user-space communication, but it can be a pragmatic escape hatch for protocols that are not amenable to bounded in-kernel processing.

    Conclusion: A New Foundation for Cloud Native Networking

    eBPF is not merely an incremental improvement for service mesh technology; it represents a fundamental architectural shift. By moving observability, security, and networking logic from sidecar proxies into the Linux kernel, we can eliminate entire classes of performance bottlenecks and operational complexity.

    For senior engineers and architects, adopting this model requires a shift in mindset. The debugging and operational challenges move from user-space proxy logs and configurations to kernel-level tracing and understanding eBPF program lifecycles. The dependency on modern, consistent kernel versions becomes a first-class operational requirement.

    However, the rewards are substantial: a dramatically more performant, resource-efficient, and elegant data plane. As tools like Cilium mature, the sidecarless model is poised to become the default standard for high-performance service meshes in demanding Kubernetes environments.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles