eBPF-based Observability in Kubernetes with Cilium and Hubble

13 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inherent Overhead of Sidecar-Based Observability

For years, the de facto standard for achieving deep observability in Kubernetes has been the service mesh, typically implemented via a sidecar proxy like Envoy or Linkerd. While this pattern democratized features like mTLS, traffic shifting, and L7 metric collection, it comes with a non-trivial performance and operational cost that senior engineers managing large-scale clusters are acutely aware of. We're not here to rehash the basics of service meshes; we're here to dissect their fundamental limitations in high-density, latency-sensitive environments.

  • Resource Taxation: Every pod injected with a sidecar pays a resource tax. A proxy like Envoy can easily consume 50-100m CPU and 50-100MB of RAM under moderate load. In a 1000-pod cluster, this equates to dedicating 50-100 full CPU cores and 50-100GB of RAM just to the mesh's data plane. This overhead scales linearly with the number of pods, creating significant resource pressure and increasing infrastructure costs.
  • Latency Injection: The sidecar model forces all network traffic in and out of a pod through an additional user-space proxy. This introduces two extra network hops for every single request. While each hop may only add a millisecond or two (P99 latencies are often higher), this overhead is cumulative across a deep call stack of microservices. A request traversing 5 services could see an additional 10-20ms of latency solely from proxy traversal, which can be unacceptable for real-time systems.
  • Kernel Blindness: A fundamental architectural limitation of sidecars is their position in the networking stack. They operate at L7 but live in user space. This means they are completely blind to kernel-level events. They cannot distinguish between network-level packet loss and application-level errors, nor can they efficiently measure time spent in the TCP/IP stack versus time spent in the application's processing logic. They only see what enters and exits their own proxy process, not what happens at the socket or syscall level.
  • Operational Complexity: Managing the lifecycle of sidecars, handling proxy configuration drift, and debugging traffic flow through multiple Envoy instances is a significant operational burden. Issues like ordering of iptables rules, handling init containers, and ensuring compatibility with various application network stacks add to the complexity.
  • This is the context that necessitates a more efficient, kernel-native approach. eBPF (extended Berkeley Packet Filter) offers a revolutionary alternative by moving observability logic from user-space sidecars directly into the Linux kernel.

    Cilium and eBPF: Observability at the Source

    eBPF allows us to run sandboxed, event-driven programs within the kernel without changing kernel source code or loading kernel modules. For observability, this is a paradigm shift. Instead of proxying traffic, we can attach eBPF programs to specific kernel hooks to observe data and events as they happen.

    Cilium, a CNI (Container Network Interface) for Kubernetes, leverages eBPF extensively for networking, security, and observability. Here’s how it works under the hood for observability:

    * Socket-level Hooks: Cilium attaches eBPF programs to kernel functions related to socket operations, such as tcp_sendmsg, tcp_recvmsg, and sk_deliver_skb.

    * Packet Processing: When an application in a pod makes a send() or recv() syscall, the attached eBPF program is triggered. This program has access to the raw buffer data.

    L7 Protocol Parsing: Cilium's eBPF programs include parsers for common L7 protocols like HTTP/1.x, HTTP/2 (including gRPC), DNS, and Kafka. These parsers run inside the kernel* and extract metadata (e.g., HTTP method, URL, status code, gRPC service/method) directly from the TCP stream.

    * Data Aggregation: The extracted metadata and flow information (source IP/pod, destination IP/pod, verdict) are efficiently written into shared eBPF maps (kernel-space key/value stores).

    * User-space Collection: The Cilium agent and Hubble (Cilium's observability layer) read this data from the eBPF maps and expose it to user-space tools like the Hubble CLI, UI, and Prometheus.

    This model eliminates sidecar proxies entirely for observability purposes. The result is frictionless, kernel-native visibility with significantly lower overhead.

    Production-Grade Deployment of Cilium with Hubble

    A default helm install cilium is insufficient for a production environment. You need to enable specific features for observability and tune parameters for performance. Below is a sample values.yaml for a production-grade deployment.

    yaml
    # cilium-values.yaml
    
    # Use eBPF-based host routing instead of the kernel's IP stack
    # for better performance.
    kubeProxyReplacement: strict
    
    # Enable eBPF-based masquerading for traffic leaving the cluster.
    # More efficient than traditional iptables-based masquerading.
    bpf:
      masquerade: true
    
    # Pre-allocate eBPF maps to avoid runtime allocation overhead.
    # Critical for performance in busy clusters.
    bpf:
      preallocateMaps: true
    
    # Enable Hubble for observability
    hubble:
      enabled: true
      # Deploy the relay for cluster-wide aggregation
      relay:
        enabled: true
      # Deploy the UI for visualization
      ui:
        enabled: true
      # Enable metrics for Prometheus scraping
      metrics:
        enabled:
          - dns:query;ignoreAAAA
          - drop
          - tcp
          - flow
          - icmp
          - http
        # Enable L7 HTTP metrics
        http:
          # Split metrics by path, method, and status for detailed analysis
          # WARNING: This can increase cardinality. Use with caution.
          # For high-traffic APIs, consider removing 'path'.
          labels:
            - path
            - method
            - status
    
    # Enable Hubble's OpenTelemetry/Jaeger export for distributed tracing
    # This is an alpha feature but powerful for correlating traces with flows.
    hubble:
      tracing:
        enabled: true
        # Example: point to a local Jaeger agent
        jaeger:
          address: "jaeger-agent.tracing.svc.cluster.local"
          port: 6831
    
    # Enable L7 visibility policies. This is required for the eBPF programs
    # to parse L7 traffic. You can apply this cluster-wide or per-namespace.
    policyEnforcementMode: "always"
    
    # Fine-tuning for high-throughput environments
    # Increase buffer sizes for the Cilium agent and monitor
    ciliumAgent:
      prometheus:
        serviceMonitor:
          enabled: true
    ciliumMonitor:
      # Increase the Npages buffer for the perf ring buffer to avoid dropped events
      # Default is 64. Increase to 128 or 256 for very busy nodes.
      eventQueueSize: 128

    To deploy this configuration:

    bash
    helm repo add cilium https://helm.cilium.io/
    helm install cilium cilium/cilium --version 1.15.1 \
      --namespace kube-system \
      -f cilium-values.yaml

    This configuration not only enables Hubble but also tunes Cilium for performance by replacing kube-proxy with a more efficient eBPF implementation and using eBPF for NAT masquerading.

    Deep Dive: Tracing L7 Traffic without Proxies

    Let's demonstrate this with a practical scenario. We'll deploy two microservices, frontend and backend, and observe their gRPC communication using Hubble.

    Sample Microservices

    Here's a simplified backend gRPC service in Go:

    go
    // backend/main.go
    package main
    
    import (
    	"context"
    	"fmt"
    	"log"
    	"net"
    
    	"google.golang.org/grpc"
    	pb "path/to/your/proto"
    )
    
    const port = ":50051"
    
    type server struct{
    	pb.UnimplementedBackendServiceServer
    }
    
    func (s *server) GetData(ctx context.Context, in *pb.Request) (*pb.Response, error) {
    	log.Printf("Received request for ID: %s", in.GetId())
    	return &pb.Response{Data: fmt.Sprintf("Data for ID %s", in.GetId())}, nil
    }
    
    func main() {
    	lis, err := net.Listen("tcp", port)
    	if err != nil {
    		log.Fatalf("failed to listen: %v", err)
    	}
    	s := grpc.NewServer()
    	pb.RegisterBackendServiceServer(s, &server{})
    	log.Printf("server listening at %v", lis.Addr())
    	if err := s.Serve(lis); err != nil {
    		log.Fatalf("failed to serve: %v", err)
    	}
    }

    And a frontend service that calls it:

    go
    // frontend/main.go
    package main
    
    import (
    	"context"
    	"log"
    	"net/http"
    	"time"
    
    	"google.golang.org/grpc"
    	pb "path/to/your/proto"
    )
    
    const backendAddr = "backend-service:50051"
    
    func handler(w http.ResponseWriter, r *http.Request) {
    	conn, err := grpc.Dial(backendAddr, grpc.WithInsecure(), grpc.WithBlock())
    	if err != nil {
    		log.Fatalf("did not connect: %v", err)
    	}
    	defer conn.Close()
    	c := pb.NewBackendServiceClient(conn)
    
    	ctx, cancel := context.WithTimeout(context.Background(), time.Second)
    	defer cancel()
    
    	resp, err := c.GetData(ctx, &pb.Request{Id: "123"})
    	if err != nil {
    		log.Printf("could not get data: %v", err)
    		w.WriteHeader(http.StatusInternalServerError)
    		return
    	}
    	log.Printf("Backend Response: %s", resp.GetData())
    	w.Write([]byte(resp.GetData()))
    }
    
    func main() {
    	http.HandleFunc("/call", handler)
    	log.Fatal(http.ListenAndServe(":8080", nil))
    }

    Observing with Hubble CLI

    After deploying these services to a namespace (e.g., demo), we can use the Hubble CLI to observe the traffic in real-time. First, port-forward to the Hubble relay:

    bash
    kubectl port-forward -n kube-system svc/hubble-relay 4245:80

    Now, watch for traffic to the backend service:

    bash
    hubble observe --to-service demo/backend-service --follow

    The output will be incredibly detailed, showing not just L3/L4 information but also the parsed L7 gRPC data:

    text
    TIMESTAMP           SOURCE                      DESTINATION                 TYPE     VERDICT   SUMMARY
    Oct 26 12:34:56.789   demo/frontend-pod-xyz:34567   demo/backend-pod-abc:50051    grpc     FORWARDED   gRPC call to demo.BackendService/GetData

    To get the full details of a specific flow, use the -o json flag. You'll see the parsed gRPC fields:

    json
    {
      "flow": {
        "source": {
          "pod_name": "frontend-pod-xyz",
          "namespace": "demo"
        },
        "destination": {
          "pod_name": "backend-pod-abc",
          "namespace": "demo"
        },
        "l7": {
          "type": "gRPC",
          "grpc": {
            "service": "demo.BackendService",
            "method": "GetData",
            "status_code": 0
          }
        }
      }
    }

    This rich data was extracted directly from the kernel's TCP stream by an eBPF program, with zero modification to the application code and no sidecar proxy.

    Advanced Use Case: Dynamic Service Maps

    One of the most powerful outcomes of this frictionless data collection is the ability to generate a live, accurate service dependency map. The Hubble UI consumes the aggregated flow data from the Hubble Relay to render this visualization automatically.

    Access the Hubble UI via port-forwarding:

    bash
    kubectl port-forward -n kube-system svc/hubble-ui 8081:80

    Navigating to http://localhost:8081, you'll see a graph of all communicating services in your cluster. You can filter by namespace and see:

    * Dependencies: Arrows connecting services, indicating traffic flow.

    * Protocol: Icons indicating HTTP, gRPC, DNS, Kafka, etc.

    * Health: The color and thickness of the connections can represent success rates (e.g., green for 2xx, red for 5xx) and traffic volume.

    This isn't a static, manually configured diagram. It is a live representation of what is actually happening in your cluster's network, derived from the ground truth of kernel-level observations. For incident response or understanding complex system interactions, this is invaluable.

    Performance Considerations and Critical Edge Cases

    eBPF is not a silver bullet. Senior engineers must understand its performance characteristics and limitations.

    1. CPU and Memory Overhead

    While significantly lower than sidecars, eBPF is not free.

    * CPU: The Cilium agent itself is lightweight, but the execution of eBPF programs on every network packet consumes kernel CPU cycles. This is highly efficient but can become noticeable on nodes processing hundreds of thousands of packets per second. The JIT (Just-In-Time) compilation of eBPF bytecode to native machine code on program load also consumes a small amount of CPU.

    * Memory: eBPF maps, which store state, flows, and metrics, consume kernel memory. The size of these maps is configurable. In our values.yaml, bpf.preallocateMaps=true allocates this memory upfront. A node in a large cluster might see the Cilium agent's kernel-space memory footprint grow to several hundred megabytes.

    Benchmark: In a typical scenario, a Cilium agent might consume ~50m CPU and ~200MB of memory, a fraction of a full service mesh. However, monitoring the cilium_bpf_map_pressure and cilium_kfuncs_total metrics in Prometheus is crucial.

    2. Kernel Version Dependencies

    This is the most critical operational constraint. eBPF's capabilities are directly tied to the Linux kernel version. Cilium requires a minimum of Linux 4.9.7, but many advanced features (like some L7 parsing and performance optimizations) require kernel 5.3+ or even 5.10+. Before a production rollout, you must verify that your node OS distribution provides a compatible and sufficiently recent kernel. Running cilium status on a node will perform a check and report any missing required features.

    3. The TLS Encryption Blind Spot

    By default, eBPF programs attached at the socket level see encrypted TLS payloads. This is a major challenge. How can you get L7 visibility into mTLS traffic?

    * Solution 1: Kernel TLS (kTLS): If you can offload TLS termination to the kernel, eBPF programs attached to socket hooks can see the decrypted data. This requires application support and kernel configuration (CONFIG_TLS). It's highly efficient but not universally applicable.

    Solution 2: User-space Probes (uprobes): The more flexible solution is to use eBPF uprobes. Instead of attaching to kernel syscalls, you attach eBPF programs to functions within a user-space library, such as OpenSSL's SSL_read and SSL_write. This allows the eBPF program to intercept the data before it's encrypted by the application or after* it's been decrypted. Cilium is developing this capability, and projects like Pixie use this technique extensively. This is a more complex but powerful pattern that bridges the kernel/user-space gap.

    * Solution 3: Service Mesh Integration: For strict mTLS enforcement and observability, you can still run Cilium alongside a service mesh like Istio. In this model, Cilium provides the highly efficient CNI and L3/L4 policy, while the mesh handles mTLS and L7 traffic management. This hybrid approach lets you use the best tool for the job.

    4. Handling Large Payloads and Packet Reordering

    eBPF programs have a limited 512-byte stack and are designed for short, fast execution. Parsing a large HTTP request that spans multiple TCP packets is non-trivial. Cilium's engineers have solved this by using techniques like eBPF tail calls (chaining eBPF programs together) and per-CPU arrays to buffer and reassemble TCP streams within the kernel before passing them to the L7 parser. This complexity is abstracted from the user but is a testament to the advanced engineering required to make eBPF viable for L7 observability.

    Integrating with the Observability Stack

    Hubble's metrics are designed to be scraped by Prometheus. With the serviceMonitor enabled in our values.yaml, the Prometheus Operator will automatically discover and scrape the hubble-relay endpoint.

    Here are some powerful PromQL queries you can build using Hubble's metrics:

    HTTP Success Rate (excluding 4xx) for a specific service:

    promql
    (sum(rate(hubble_http_responses_total{destination_service="demo/backend-service", status_code!~"4.."}[5m])) by (destination_service)) 
    /
    (sum(rate(hubble_http_responses_total{destination_service="demo/backend-service"}[5m])) by (destination_service))

    P95 Latency for gRPC calls between two apps:

    promql
    _Note: Hubble provides flow events, not explicit latency histograms. For true latency metrics, you'd typically need application-level instrumentation or a service mesh. However, you can track the duration of flows as a proxy._
    
    # This metric is not available out-of-the-box but shows the potential.
    # Future versions or custom eBPF could provide this.
    histogram_quantile(0.95, sum(rate(hubble_flow_latency_seconds_bucket{source_app="frontend", destination_app="backend"}[1m])) by (le))

    Dropped Packets by Drop Reason:

    promql
    sum(rate(hubble_drop_total[5m])) by (reason)

    This is invaluable for debugging network policies. It tells you why a packet was dropped (e.g., "Policy denied", "Stale or unroutable IP").

    Conclusion: The Kernel is the New Observability Plane

    Moving observability from user-space sidecars into the kernel via eBPF is not just an incremental improvement; it's a fundamental shift in how we build and operate cloud-native systems. By tapping into the kernel as the source of truth, we gain:

    * Performance: Drastically reduced resource overhead and near-zero latency injection compared to proxies.

    * Deep Visibility: The ability to see all L3-L7 traffic, policy drops, and kernel-level events without instrumenting applications.

    * Simplicity: A single agent (Cilium) provides CNI, network policy, and observability, reducing the number of moving parts in a cluster.

    While challenges like kernel dependencies and TLS encryption remain, the trajectory is clear. The engineering community is rapidly building the tools to overcome these hurdles. For senior engineers responsible for the performance, reliability, and security of Kubernetes clusters, mastering eBPF-based observability is no longer an option for the future—it's a critical skill for building the next generation of efficient, resilient systems.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles