eBPF-Powered Observability: Low-Overhead Tracing in K8s with Cilium

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Performance Bottleneck of Sidecar Observability

In modern microservices architectures running on Kubernetes, observability is non-negotiable. However, the de facto standard—the sidecar proxy model popularized by service meshes like Istio—comes with a well-documented performance cost. Every packet originating from or destined for your application pod must traverse a user-space proxy. This round trip involves multiple context switches between user space and kernel space, memory copy operations, and the inherent processing latency of the proxy itself. For high-throughput, low-latency services, this can add milliseconds to your p99 latency and significantly increase CPU and memory footprints across the cluster.

The fundamental issue is the data path. A typical sidecar flow looks like this:

Application -> Pod Network Namespace (veth) -> Host Network Namespace -> Sidecar Proxy (User Space) -> Host Network Namespace -> Destination

This model, while powerful for traffic management and policy enforcement, is suboptimal for pure observability. We are paying a continuous performance tax for data that is often just being observed, not mutated.

eBPF (extended Berkeley Packet Filter) offers a paradigm shift. By attaching small, sandboxed programs to various hook points within the Linux kernel, we can achieve similar visibility directly at the source, eliminating the user-space detour. Cilium leverages eBPF to create a networking, security, and observability data plane that operates almost entirely within the kernel.

This article will demonstrate how to harness this power, focusing on practical, production-level techniques for deep system tracing with minimal overhead.


Section 1: Architecting for Kernel-Level Visibility

Before diving into commands, it's crucial to understand the architectural differences. Cilium's observability tool, Hubble, uses eBPF programs attached to kernel hooks like Traffic Control (TC) and socket operations to capture network flow data.

  • L3/L4 Visibility: An eBPF program on the TC ingress/egress hooks of a network device (like a veth pair) can inspect every packet. It can see source/destination IP, port, and TCP flags. This is done before the packet is even handed to the pod's network stack, making it incredibly efficient.
  • L7 Visibility: This is where the magic lies. Instead of terminating TLS and parsing traffic in a user-space proxy, Cilium attaches eBPF programs (specifically, kprobes) to the read/write syscalls within common TLS libraries (e.g., OpenSSL, GnuTLS). This allows it to inspect the unencrypted data just before it's encrypted by the application's TLS library or just after it's decrypted. This provides L7 visibility (HTTP paths, gRPC methods, Kafka topics) without the overhead and complexity of certificate management and TLS termination in a sidecar.
  • Production-Grade Cilium & Hubble Configuration

    We assume a running Kubernetes cluster. A default Cilium installation is insufficient; we need to enable Hubble with its UI and metrics endpoints. Below is a production-oriented Helm configuration snippet.

    yaml
    # values-production.yaml
    
    # Enable Kubernetes without kube-proxy for maximum performance
    # This lets Cilium manage all service routing via eBPF maps.
    kubeProxyReplacement: strict
    
    # BPF-based host routing for pods
    hostServices:
      enabled: true
    
    # Enable BPF masquerading for traffic leaving the cluster
    bpf:
      masquerade: true
    
    # Hubble Configuration
    hubble:
      enabled: true
      # Deploy Hubble Relay for cluster-wide flow aggregation
      relay:
        enabled: true
        # Tune buffer sizes for high-traffic clusters
        # Default is 4095; increase if you see dropped flows
        bufferSize: 8191
      # Deploy the UI for visual inspection
      ui:
        enabled: true
      # Enable metrics for Prometheus integration
      metrics:
        enabled:
          - "dns"
          - "drop"
          - "tcp"
          - "flow"
          - "port-distribution"
          - "icmp"
          - "http"
    
    # Prometheus Integration
    prometheus:
      enabled: true
      serviceMonitor:
        enabled: true # For Prometheus Operator
    
    # Operator Configuration
    operator:
      prometheus:
        enabled: true
        serviceMonitor:
          enabled: true

    To apply this configuration:

    bash
    helm repo add cilium https://helm.cilium.io/
    
    helm install cilium cilium/cilium --version 1.12.5 \
      --namespace kube-system \
      -f values-production.yaml

    Key Production Considerations from this configuration:

  • kubeProxyReplacement: strict: This is a critical performance optimization. It removes iptables from the service routing path entirely. Cilium uses eBPF hash maps to perform NAT for Kubernetes Services, which is significantly faster and more scalable than sequential iptables rule processing.
  • hubble.relay.enabled: true: In a multi-node cluster, the Hubble daemon on each node only sees flows on that node. Hubble Relay aggregates these flows, providing a single API endpoint for cluster-wide observability. Without it, you'd have to query each node's agent individually.

  • Section 2: Advanced Flow Tracing with the Hubble CLI

    The Hubble CLI is your primary tool for real-time debugging. Let's move beyond hubble observe and into complex scenarios.

    First, let's set up a sample application. We'll use a paymentservice and a currencyservice.

    yaml
    # sample-app.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
      name: hipstershop
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: paymentservice
      namespace: hipstershop
      labels:
        app: paymentservice
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: paymentservice
      template:
        metadata:
          labels:
            app: paymentservice
        spec:
          containers:
          - name: server
            image: gcr.io/google-samples/microservices-demo/paymentservice:v0.3.8
            ports:
            - containerPort: 50051
    --- 
    apiVersion: v1
    kind: Service
    metadata:
      name: paymentservice
      namespace: hipstershop
    spec:
      type: ClusterIP
      selector:
        app: paymentservice
      ports:
      - port: 50051
        targetPort: 50051
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: currencyservice
      namespace: hipstershop
      labels:
        app: currencyservice
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: currencyservice
      template:
        metadata:
          labels:
            app: currencyservice
        spec:
          containers:
          - name: server
            image: gcr.io/google-samples/microservices-demo/currencyservice:v0.3.8
            ports:
            - containerPort: 7000
    --- 
    apiVersion: v1
    kind: Service
    metadata:
      name: currencyservice
      namespace: hipstershop
    spec:
      type: ClusterIP
      selector:
        app: currencyservice
      ports:
      - port: 7000
        targetPort: 7000

    Apply it: kubectl apply -f sample-app.yaml

    Scenario 1: Debugging Dropped Packets due to a Network Policy

    Let's create a restrictive CiliumNetworkPolicy that denies traffic.

    yaml
    # deny-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "deny-all-ingress"
      namespace: hipstershop
    spec:
      endpointSelector:
        matchLabels:
          app: paymentservice
      ingress: [] # Empty ingress means deny all

    Apply it: kubectl apply -f deny-policy.yaml

    Now, if we try to connect from currencyservice to paymentservice, it will fail. A simple ping won't work, but how do we debug this at the network layer?

    bash
    # Exec into the currencyservice pod
    CURRENCY_POD=$(kubectl get pods -n hipstershop -l app=currencyservice -o jsonpath='{.items[0].metadata.name}')
    kubectl exec -it -n hipstershop $CURRENCY_POD -- /bin/sh
    
    # Inside the pod, try to connect (this will hang)
    apk add --no-cache curl
    curl -v paymentservice:50051

    Now, from another terminal, use Hubble to see exactly why it's failing.

    bash
    # Port-forward the hubble-relay service
    kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
    
    # Use hubble observe to find the drop
    hubble observe --namespace hipstershop --verdict DROPPED --to-pod paymentservice -f

    Expected Output & Analysis:

    text
    TIME                 SOURCE -> DESTINATION                                   VERDICT     REASON
    Oct 26 12:35:10.123  hipstershop/currencyservice-5f... (10.0.1.45) -> hipstershop/paymentservice-6c... (10.0.1.99:50051)   DROPPED     Policy denied

    The output is unambiguous. We see:

  • VERDICT: DROPPED
  • REASON: Policy denied
  • This confirms a network policy is the culprit. The key performance insight here is that this filtering and logging happened entirely in the kernel. The packet was dropped by the eBPF program on the destination pod's network interface; it never even reached the pod's network stack, let alone an iptables chain. This is extremely efficient for enforcing firewall rules.

    Scenario 2: Identity-Based vs. IP-Based Filtering

    Cilium assigns a security identity (a numeric ID) to each endpoint based on its labels. Policies are then enforced based on these identities, not ephemeral pod IPs. This is a more robust and scalable model.

    Let's find the identity of our pods:

    bash
    # Get the Cilium Endpoint for the currency service
    CILIUM_EP=$(kubectl get cep -n hipstershop -l app=currencyservice -o jsonpath='{.items[0].metadata.name}')
    
    # Describe the endpoint to get its identity
    kubectl describe cep -n hipstershop $CILIUM_EP
    
    # Look for a line like: Identity: ID=43128, Labels: [k8s:app=currencyservice, ...]

    Let's say the identity is 43128. We can now use this for highly specific tracing.

    bash
    # Trace all traffic originating from any pod with this identity
    hubble observe --from-identity 43128
    
    # This is far more powerful than IP-based filtering, as pods can be rescheduled and get new IPs,
    # but their label-based identity remains the same.

    Section 3: L7 Observability without Sidecars (HTTP/gRPC)

    This is where Cilium + eBPF truly outshines traditional methods for pure observability.

    First, let's fix our network policy to allow traffic.

    yaml
    # allow-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-currency-to-payment"
      namespace: hipstershop
    spec:
      endpointSelector:
        matchLabels:
          app: paymentservice
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: currencyservice

    Delete the old policy and apply the new one:

    kubectl delete cnp -n hipstershop deny-all-ingress

    kubectl apply -f allow-policy.yaml

    Our sample app uses gRPC. Let's generate some traffic.

    bash
    # We'll deploy a load generator
    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/main/release/kubernetes-manifests.yaml

    Now, let's inspect the gRPC calls between services.

    bash
    # Observe gRPC traffic to the paymentservice
    hubble observe --namespace hipstershop --protocol grpc --to-pod paymentservice -f

    Expected Output & Analysis:

    text
    TIME                 SOURCE -> DESTINATION                                   TYPE        VERDICT
    Oct 26 12:45:20.555  hipstershop/checkoutservice-7d... -> hipstershop/paymentservice-6c... (50051)   gRPC        FORWARDED (gRPC) {call:"hipstershop.PaymentService/Charge", authority:":authority: paymentservice:50051"}

    Notice the rich L7 information: gRPC {call:"hipstershop.PaymentService/Charge"}. Hubble's eBPF programs, attached to the socket functions, have parsed the gRPC frame to extract the service and method name. This was achieved without a sidecar, without terminating TLS, and without any application code changes.

    Edge Case: Statically Compiled Go Binaries

    A common edge case for L7 parsing is with statically compiled Go applications that don't use the system's shared C libraries for TLS (like OpenSSL). By default, Cilium's kprobes look for symbols in these shared libraries. If your Go app is built like this, Cilium's automatic L7 parsing might fail.

    Solution: You need to provide user-space probing information via Cilium's configuration or annotations on the pod. This tells the Cilium agent where to find the TLS function symbols within the statically linked binary. This is an advanced procedure and requires analyzing the binary with tools like nm to find the correct function offsets.

    Example annotation (conceptual):

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-static-go-app
      annotations:
        # This is a conceptual example; the exact annotation may vary
        "cilium.io/tls-probe.go-crypto/tls": "/path/to/binary:main.FunctionName"
    spec:
      # ...

    Section 4: Integrating with Prometheus for Long-Term Analysis

    Real-time CLI observation is for debugging. For monitoring and alerting, we need to integrate with a TSDB like Prometheus.

    Our values-production.yaml already enabled the metrics endpoints and created a ServiceMonitor. If you have the Prometheus Operator installed, it will automatically scrape Cilium and Hubble.

    Let's explore some powerful PromQL queries you can build.

    1. HTTP Latency Golden Signals (without a sidecar)

    Hubble can export HTTP request/response metrics. You can calculate p95 latency between two services.

    promql
    # p99 latency for HTTP GET requests from frontend to productcatalogservice
    
    histogram_quantile(0.99,
      sum(rate(hubble_http_response_latency_seconds_bucket{
        namespace="hipstershop",
        source_app="frontend",
        destination_app="productcatalogservice",
        method="GET"
      }[5m])) by (le, source_app, destination_app)
    )

    2. Network Policy Drop Rate by Reason

    This is invaluable for security monitoring. You can alert if a specific application suddenly starts seeing a high rate of policy denials.

    promql
    # Rate of dropped packets to the paymentservice, broken down by drop reason
    
    sum(rate(hubble_drop_total{
      namespace="hipstershop",
      destination_app="paymentservice"
    }[5m])) by (reason)

    3. DNS Resolution Failures per Source App

    Cilium can also parse DNS requests/responses, giving you insight into service discovery issues.

    promql
    # Rate of DNS queries with RCODE != NoError, indicating an error
    
    sum(rate(hubble_dns_responses_total{
      namespace="hipstershop",
      rcode!="NoError"
    }[5m])) by (source_app)

    These metrics provide a comprehensive view of your network's health and security posture, all sourced directly from the kernel with minimal overhead.


    Section 5: Advanced Troubleshooting with `cilium policy trace`

    Sometimes, hubble observe tells you a packet was dropped, but your network policies are complex, and you don't know which rule is the cause. The cilium policy trace command is a simulator that lets you determine the policy verdict for a hypothetical packet.

    Let's go back to our deny-all-ingress scenario. We know traffic from currencyservice to paymentservice is being dropped. Let's prove it with the tracer.

    First, we need the security identities and pod IPs.

    bash
    # Get source pod info
    SOURCE_POD_NAME=$(kubectl get pod -n hipstershop -l app=currencyservice -o jsonpath='{.items[0].metadata.name}')
    SOURCE_POD_IP=$(kubectl get pod -n hipstershop $SOURCE_POD_NAME -o jsonpath='{.status.podIP}')
    SOURCE_IDENTITY=$(kubectl get cep -n hipstershop -l app=currencyservice -o jsonpath='{.items[0].status.identity.id}')
    
    # Get destination pod info
    DEST_POD_NAME=$(kubectl get pod -n hipstershop -l app=paymentservice -o jsonpath='{.items[0].metadata.name}')
    DEST_POD_IP=$(kubectl get pod -n hipstershop $DEST_POD_NAME -o jsonpath='{.status.podIP}')
    DEST_IDENTITY=$(kubectl get cep -n hipstershop -l app=paymentservice -o jsonpath='{.items[0].status.identity.id}')

    Now, run the trace from one of the Cilium agent pods. Find a cilium pod on the same node as the destination pod for the most accurate trace.

    bash
    # Find a cilium agent pod
    CILIUM_POD=$(kubectl get pods -n kube-system -l k8s-app=cilium -o jsonpath='{.items[0].metadata.name}')
    
    # Execute the policy trace command inside the cilium agent
    kubectl exec -it -n kube-system $CILIUM_POD -- \
      cilium policy trace \
        --src-identity $SOURCE_IDENTITY \
        --src-ip $SOURCE_POD_IP \
        --dst-identity $DEST_IDENTITY \
        --dst-ip $DEST_POD_IP \
        -d 50051/TCP

    Expected Output & Analysis:

    text
    -> Verdict: Denied
      Source Identity: 43128 -> hipstershop/currencyservice
      Destination Identity: 12945 -> hipstershop/paymentservice
      Traffic: TCP port 50051
      Policy: 
        hipstershop/deny-all-ingress (Ingress)
          Enforced: Yes
          Rule:       (no rules matched)

    This output is the ultimate debugging tool. It tells you:

  • The final verdict is Denied.
  • It explicitly names the policy responsible: hipstershop/deny-all-ingress.
  • It even tells you that the denial was because no rules within that policy matched the allowed traffic profile.
  • This level of introspection allows you to resolve complex, multi-policy interaction issues with confidence, without having to guess which policy is at fault.

    Conclusion: The Future is Kernel-Native

    For senior engineers optimizing for performance, reliability, and security in Kubernetes, moving observability out of user-space sidecars and into the kernel via eBPF is a logical and powerful evolution. Cilium provides a mature, production-ready implementation of this vision.

    By leveraging kernel-native data collection, we achieve:

  • Reduced Latency: Eliminating the user-space proxy detour for every network packet significantly lowers communication overhead.
  • Lower Resource Consumption: Fewer moving parts (no sidecar per pod) means less CPU and memory usage across the cluster.
  • Simplified Architecture: The operational burden of injecting, managing, and updating sidecars is removed.
  • Deep, Context-Aware Visibility: Combining L3/L4 flow data with L7 protocol parsing and identity-based context provides richer insights than IP-based tools.
  • While the sidecar model still holds value for complex traffic-shifting and routing use cases, for pure observability, the performance and efficiency of eBPF are undeniable. As eBPF's capabilities continue to expand with projects like Tetragon for runtime security, the kernel is solidifying its place as the next frontier for cloud-native observability and security.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles