Advanced K8s Network Observability with eBPF and Cilium Hubble
Beyond `iptables`: The Imperative for eBPF-based Observability
For senior engineers operating Kubernetes at scale, the limitations of the default kube-proxy implementation, typically backed by iptables, are a familiar source of friction. While functional, iptables introduces significant performance bottlenecks due to its sequential rule processing in the kernel's Netfilter framework. More critically for observability, it abstracts away the true source of network traffic, often masquerading it behind SNAT (Source Network Address Translation), making it incredibly difficult to answer a simple question: "Which specific pod is talking to which specific service?"
This is where eBPF (extended Berkeley Packet Filter) represents a paradigm shift. By allowing sandboxed programs to run directly within the Linux kernel, projects like Cilium can implement a highly efficient and identity-aware networking datapath. Cilium attaches eBPF programs to network interfaces (via tc hooks) and socket calls (via sock_ops hooks), effectively bypassing iptables and kube-proxy for in-cluster traffic. This provides not only a massive performance improvement but also a revolutionary new plane for observability.
This article assumes you understand these fundamentals. We will not cover the basics of Cilium installation or the theory of eBPF. Instead, we will focus on advanced, practical applications of Hubble, Cilium's observability component, to diagnose and resolve complex, real-world networking scenarios that are nearly impossible to debug with traditional tooling.
The Cilium eBPF Datapath: A Quick Architectural Refresher
To effectively use Hubble, it's crucial to understand how Cilium's eBPF programs capture the data it visualizes. Unlike a sidecar proxy that intercepts traffic in userspace, Cilium operates at the kernel level.
veth) looks up the source and destination identities. Policy decisions are made based on these identities, not ephemeral IP addresses.ClusterIP services, Cilium's eBPF programs attached at the tc (Traffic Control) hook on the network device perform direct DNAT (Destination NAT) to a selected backend pod's IP. This is done via a lookup in an eBPF map that stores the service-to-backend mappings. This avoids the entire iptables chain traversal, significantly reducing latency.Hubble taps directly into these eBPF maps and a perf ring buffer where Cilium's eBPF programs push event notifications (e.g., new flows, policy verdicts, packet drops). This is the source of its power: it's not sampling traffic or relying on userspace agents; it's reporting the ground truth directly from the kernel's decision-making process.
Production-Ready Hubble Configuration
To unlock the full potential of Hubble, your initial Cilium deployment configuration is critical. A default install often has Hubble disabled or minimally configured. Here is a production-oriented Helm values.yaml snippet for a Cilium installation.
Code Example 1: values.yaml for a Production Cilium/Hubble Deployment
# values.yaml for Cilium Helm chart
kubeProxyReplacement: strict # Completely bypass kube-proxy for maximum performance
bpf:
# Pre-allocation of BPF maps prevents runtime performance hits
preallocateMaps: true
# Enable Hubble for deep observability
hubble:
enabled: true
listenAddress: ":4244"
# Deploy Hubble Relay for a cluster-wide view
relay:
enabled: true
# Tune buffer and timeout for high-volume clusters
# bufferSize is the number of flows per-node peer buffer.
bufferSize: 2048
# Timeout after which a peer is considered disconnected
peerServiceConnectTimeout: 20s
# Deploy the Hubble UI for visualization
ui:
enabled: true
# Expose metrics for Prometheus scraping
metrics:
enabled:
- "dns"
- "drop"
- "tcp"
- "flow"
- "port-distribution"
- "icmp"
- "http"
serviceMonitor:
enabled: true # If using Prometheus Operator
Key considerations in this configuration:
* kubeProxyReplacement: strict: This is the most performant mode, where Cilium's eBPF datapath handles all ClusterIP, NodePort, and LoadBalancer traffic. This ensures all relevant traffic is visible to Hubble.
* hubble.relay.enabled: true: In a multi-node cluster, each node's Cilium agent has its own Hubble instance with a view of local traffic. Hubble Relay aggregates these disparate streams, providing a single API endpoint for a complete, cluster-wide view. This is essential for observing inter-node communication.
* hubble.metrics.enabled: This exposes a rich set of Prometheus metrics from the flow data, allowing you to build dashboards and alerts on network behavior (e.g., unexpected packet drop rates, DNS resolution failures).
Scenario 1: Debugging Silently Dropped Packets
One of the most frustrating production issues is when a service can't connect to another, but there are no application-level errors, no logs, and ping or netcat simply time out. This often points to a networking issue, potentially a misconfigured NetworkPolicy.
Let's simulate this. We have a three-tier application: frontend, api-gateway, and user-service.
Code Example 2: Application and a Flawed CiliumNetworkPolicy
# app-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
name: prod-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: prod-app
spec:
replicas: 1
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
tier: presentation
spec:
containers:
- name: frontend-container
image: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: prod-app
spec:
replicas: 1
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
tier: logic
spec:
containers:
- name: api-gateway-container
image: kennethreitz/httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: prod-app
spec:
replicas: 1
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
tier: data
spec:
containers:
- name: user-service-container
image: kennethreitz/httpbin
---
# faulty-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "api-access-policy"
namespace: prod-app
spec:
endpointSelector:
matchLabels:
app: api-gateway # Policy applies to the api-gateway
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "80"
protocol: TCP
egress:
- toEndpoints:
- matchLabels:
# TYPO: Should be 'user-service'
app: user-servce
toPorts:
- ports:
- port: "80"
protocol: TCP
In the CiliumNetworkPolicy, there is a subtle typo in the egress rule: app: user-servce instead of app: user-service. This policy will be successfully applied by Kubernetes, but it will cause the api-gateway to be unable to reach the user-service. Application logs in api-gateway will only show connection timeouts.
Diagnosis with Hubble CLI
This is where Hubble shines. We can use the CLI to observe flows and immediately see the policy enforcement decision.
First, port-forward to the Hubble Relay service:
kubectl port-forward -n kube-system svc/hubble-relay 4245:80
Now, let's observe the flows from the api-gateway pod, looking specifically for dropped packets.
# Find the api-gateway pod name
API_POD=$(kubectl get pods -n prod-app -l app=api-gateway -o jsonpath='{.items[0].metadata.name}')
# Observe flows from this pod, filtering for DROPPED verdicts
hubble observe -n prod-app --from pod=${API_POD} --verdict DROPPED -o json
You will get detailed output like this:
{
"flow": {
"time": "2023-10-27T18:35:12.123456789Z",
"verdict": "DROPPED",
"drop_reason_desc": "POLICY_DENIED",
"IP": {
"source": "10.0.1.150",
"destination": "10.0.1.200",
"ipVersion": "IPv4"
},
"l4": {
"TCP": {
"source_port": 54321,
"destination_port": 80,
"flags": {
"SYN": true
}
}
},
"source": {
"ID": 123,
"identity": 45678,
"namespace": "prod-app",
"labels": ["k8s:app=api-gateway", "k8s:tier=logic"],
"pod_name": "api-gateway-xyz-123"
},
"destination": {
"ID": 456,
"identity": 56789,
"namespace": "prod-app",
"labels": ["k8s:app=user-service", "k8s:tier=data"],
"pod_name": "user-service-abc-456"
},
"Type": "L3_L4",
"traffic_direction": "EGRESS",
"policy_match_type": 2
},
"node_name": "k8s-worker-1"
}
The crucial fields are "verdict": "DROPPED" and "drop_reason_desc": "POLICY_DENIED". This tells us instantly that a network policy is the culprit. Hubble provides the full source and destination pod labels, leaving no ambiguity. We can see the traffic was intended for a pod with k8s:app=user-service, and by inspecting our policy, we can quickly spot the typo.
Correction
After fixing the typo in faulty-policy.yaml and reapplying it, running hubble observe again would show "verdict": "FORWARDED".
Code Example 3: The Corrected CiliumNetworkPolicy
# corrected-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "api-access-policy"
namespace: prod-app
spec:
endpointSelector:
matchLabels:
app: api-gateway
ingress:
# ... (ingress unchanged)
egress:
- toEndpoints:
- matchLabels:
# FIX: Corrected the label selector
app: user-service
toPorts:
- ports:
- port: "80"
protocol: TCP
This level of immediate, actionable feedback directly from the kernel's packet processing path is something tcpdump or application logs could never provide so cleanly.
Scenario 2: Achieving L7 Visibility for API Call Tracing
A common challenge in microservices is understanding the application-level interactions. Which API endpoint is service A calling on service B? Is it using the correct HTTP method? This is traditionally the domain of service meshes like Istio or Linkerd, which inject a sidecar proxy to inspect L7 traffic. However, this comes with significant operational overhead and resource consumption.
Cilium's eBPF datapath can parse several L7 protocols (HTTP, gRPC, Kafka, DNS) without a sidecar. The same eBPF programs that handle L3/L4 routing can be extended to understand and enforce policies on L7 data for unencrypted traffic.
Let's extend our policy to be more granular. We want the api-gateway to be able to call GET /users on the user-service, but not POST /users.
Code Example 4: An Advanced CiliumNetworkPolicy with L7 HTTP rules
# l7-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "api-access-policy"
namespace: prod-app
spec:
endpointSelector:
matchLabels:
app: user-service # Policy now applies to the user-service
ingress:
- fromEndpoints:
- matchLabels:
app: api-gateway
toPorts:
- ports:
- port: "80"
protocol: TCP
# L7 Rules for HTTP traffic on port 80
rules:
http:
- method: "GET"
path: "/get" # httpbin uses /get for GET requests
After applying this policy, let's try to make both a GET and a POST request from the api-gateway pod to the user-service.
# Exec into the api-gateway pod
kubectl exec -it -n prod-app ${API_POD} -- /bin/bash
# This request will succeed (matches the policy)
curl -X GET http://user-service-svc.prod-app.svc.cluster.local/get
# This request will fail silently (connection reset by peer)
curl -X POST http://user-service-svc.prod-app.svc.cluster.local/post -d '{"key":"value"}'
Again, the application sees a generic connection error. But Hubble can tell us exactly what happened at the L7 level.
# Observe HTTP traffic, filtering for the user-service destination
hubble observe -n prod-app --to app=user-service --protocol http -o json
The output will contain distinct flow records for each attempt:
Successful GET Request:
{
"flow": {
"verdict": "FORWARDED",
"l7": {
"type": "REQUEST",
"latency_ns": "5000000",
"http": {
"code": 200,
"method": "GET",
"url": "http://user-service-svc.prod-app.svc.cluster.local/get",
"protocol": "HTTP/1.1",
"headers": [ ... ]
}
},
// ... other fields ...
}
}
Denied POST Request:
{
"flow": {
"verdict": "DROPPED",
"drop_reason_desc": "POLICY_DENIED",
"l7": {
"type": "REQUEST",
"http": {
"code": 0, // No response
"method": "POST",
"url": "http://user-service-svc.prod-app.svc.cluster.local/post",
"protocol": "HTTP/1.1"
}
},
// ... other fields ...
"traffic_direction": "INGRESS",
"Summary": "TCP Flags: SYN, ACK -> 10.0.1.200:80 HTTP/1.1 POST http://user-service-svc.prod-app.svc.cluster.local/post"
}
}
This is incredibly powerful. We can see the exact HTTP method and url that was denied by the policy, directly from the kernel. This allows us to debug application integration issues, enforce API contracts at the network layer, and gain deep visibility without the complexity of a service mesh.
Advanced Patterns and Performance Considerations
As you scale your usage of Hubble, several advanced patterns and performance considerations come into play.
Performance Tuning Hubble
Hubble's observability is not free. The eBPF programs push flow events into a perf ring buffer, which the userspace Hubble agent consumes. In clusters with extremely high connection rates, this buffer can fill up, leading to lost observability events.
* Monitor for Drops: The hubble_observer_status_messages_total{type="flows-lost"} Prometheus metric is critical. If this counter is increasing, you are losing visibility.
* Increase Buffer Sizes: In the Cilium ConfigMap (kubectl edit cm -n kube-system cilium-config), you can tune hubble-event-buffer-capacity. The default is 4095. Increasing this can help absorb bursts but uses more memory.
* Flow Sampling: For massive clusters where capturing every single flow is not feasible, consider enabling sampling. This is not yet a mature feature but is under active development in the Cilium community.
Edge Case: Inter-node Encrypted Traffic (WireGuard)
Cilium can provide transparent encryption for all inter-node traffic using WireGuard. When enabled, Cilium encapsulates all traffic between nodes in an encrypted tunnel. This has a significant implication for Hubble's L7 visibility.
Since the L7 data (e.g., HTTP headers) is encrypted once it leaves the source node, the Cilium agent on the destination node cannot parse it. Therefore, L7 observability in Hubble only works for traffic between pods on the same node when transparent encryption is active. L3/L4 flow data (IPs, ports, verdicts) remains fully visible for all traffic, as this is determined before encryption.
This is a critical trade-off: you must choose between full cluster-wide L7 visibility and inter-node traffic encryption. For many security postures, encryption is non-negotiable, and you must accept the limitation on L7 observability or rely on a service mesh for that specific capability.
Integrating with the Broader Observability Stack
Hubble's CLI and UI are excellent for interactive debugging, but its true power is realized when integrated with your existing monitoring and alerting systems.
Code Example 5: Prometheus Integration
Assuming you've enabled metrics and the ServiceMonitor in the Helm chart, Prometheus will automatically scrape Hubble. You can then build powerful alerts.
For example, to alert when a specific application (api-gateway) experiences a high rate of policy-denied packet drops:
# PromQL alert for high packet drop rate
sum(rate(hubble_drop_total{reason="Policy denied", namespace="prod-app", source_pod=~"api-gateway.*"}[5m])) by (source_pod, destination_pod) > 10
This PromQL query calculates the per-second rate of drops due to policy denials from the api-gateway over a 5-minute window and will fire if it exceeds 10 drops/sec. This transforms Hubble from a reactive debugging tool into a proactive security and reliability monitor.
Conclusion: The Kernel as the Source of Truth
eBPF-based observability with Cilium and Hubble is not just an incremental improvement over existing tools; it is a fundamental shift in how we approach cloud-native network diagnostics. By tapping directly into the kernel, we get an unfiltered, performant, and context-rich view of how our services are interacting. We can move beyond guessing based on application logs and tcpdump traces to making definitive diagnoses based on the kernel's own policy enforcement verdicts.
For senior engineers responsible for the stability and security of complex Kubernetes clusters, mastering these tools is no longer a luxury. It is a core competency for building resilient, observable, and secure systems in the cloud-native era.