eBPF for Granular Kubernetes Network Policy & L7 Observability
Beyond iptables: The Kernel-Level Revolution with eBPF in Kubernetes
For years, Kubernetes networking has been synonymous with iptables. While functional, this netfilter-based approach reveals significant performance and scalability cracks in large, dynamic clusters. Every new Service or NetworkPolicy adds rules to chains that must be traversed linearly for each packet. In a 10,000-service cluster, this can introduce non-trivial latency and CPU overhead on every node, a problem that only compounds with high packet rates and complex policy rule sets. This is not a theoretical concern; it is a production bottleneck that has driven the search for a more efficient paradigm.
Enter eBPF (extended Berkeley Packet Filter). eBPF is not merely an alternative; it's a fundamental shift in how we interact with the Linux kernel. It allows us to run sandboxed programs directly within the kernel in response to specific events, such as a packet arriving at a network interface. For Kubernetes networking, this means we can bypass the cumbersome iptables and ipvs machinery entirely.
Instead of traversing linear chains, an eBPF program attached to a network hook (like the Traffic Control ingress/egress hook) can perform a highly efficient O(1) hash map lookup to determine a packet's destination or policy verdict. This is the core of the performance gain. Cilium, a CNI built on eBPF, leverages this to manage service routing, load balancing, and network policy enforcement directly in the kernel, achieving near-native network performance.
This article is not an introduction to eBPF. It assumes you understand its basic principles. We will dive directly into advanced, production-oriented patterns for leveraging eBPF's capabilities—specifically through Cilium and Tetragon—to solve complex network policy and security observability challenges that are intractable with traditional tools.
We will tackle two specific, advanced scenarios:
NetworkPolicy objects.Section 1: Advanced L7 Policy Enforcement with CiliumNetworkPolicy
Standard Kubernetes NetworkPolicy is limited to L3/L4 constructs—IP addresses, ports, and pod labels. In a modern microservices architecture, this is often insufficient. We need to control access based on application-layer (L7) attributes, such as HTTP methods (GET, POST), gRPC service calls, or, in our scenario, Kafka API keys.
Scenario: Consider a multi-tenant analytics platform running on Kubernetes. We have a shared Kafka cluster. A service named ingestion-service in the tenant-alpha namespace needs to produce messages to the alpha-telemetry topic. A separate processing-service in the same namespace needs to consume from that topic. Critically, we must enforce the following:
* ingestion-service must only be able to use the Kafka Produce API key on the alpha-telemetry topic.
* ingestion-service must be blocked from consuming or accessing any other topic.
* processing-service must only be able to use the Kafka Fetch API key on the alpha-telemetry topic.
* No other pod in the namespace should be able to communicate with the Kafka brokers.
This requires deep packet inspection of the Kafka protocol, something iptables cannot do.
The Implementation: L7 Kafka Policy
First, let's assume our Kafka brokers are identified by the label app: kafka in the kafka namespace, and our application pods are labeled app: ingestion-service and app: processing-service respectively within the tenant-alpha namespace.
Here is the CiliumNetworkPolicy that enforces these granular rules. Save this as kafka-l7-policy.yaml.
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "kafka-l7-access-control"
namespace: "tenant-alpha"
spec:
endpointSelector:
matchLabels:
app: kafka
# This policy is applied to the Kafka brokers themselves
# It controls what can ingress TO them from selected pods
ingress:
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": "tenant-alpha"
"k8s:app": "ingestion-service"
toPorts:
- ports:
- port: "9092"
protocol: TCP
rules:
kafka:
- role: "produce"
topic: "alpha-telemetry"
apiKey: "produce"
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": "tenant-alpha"
"k8s:app": "processing-service"
toPorts:
- ports:
- port: "9092"
protocol: TCP
rules:
kafka:
- role: "consume"
topic: "alpha-telemetry"
apiKey: "fetch"
Deconstructing the eBPF Magic
Let's break down what happens when you kubectl apply -f kafka-l7-policy.yaml:
CiliumNetworkPolicy CRD.iptables rules. Instead, it dynamically compiles a specialized eBPF program tailored to this exact policy.eth0 inside the pod's netns).ingestion-service pod attempts to connect to a Kafka broker on port 9092, the TCP handshake is allowed. However, as Kafka protocol data packets start to flow, the eBPF program intercepts them.apiKey (e.g., Produce which is key 0, or Fetch which is key 1) and the topic name within the packet payload.apiKey and topic against the rules defined in the policy. * If the ingestion-service sends a Produce request for alpha-telemetry, the eBPF program returns a verdict of TC_ACT_OK, and the packet is forwarded.
* If the ingestion-service attempts to send a Fetch request or produce to a different topic like beta-logs, the eBPF program's logic will not find a match. It will return TC_ACT_SHOT, dropping the packet silently in the kernel. The application will experience a timeout or connection reset.
This entire process happens in the kernel context, with minimal overhead and without ever context-switching to userspace for policy decisions. The identity of the source pod (ingestion-service) is encoded into a security identity that Cilium manages and stores in an eBPF map, allowing for fast, IP-agnostic lookups.
Verifying Enforcement with Hubble
How do we confirm this is working? This is where Cilium's observability tool, Hubble, becomes indispensable.
Let's try to produce to a forbidden topic from the ingestion-service pod:
# Exec into the ingestion-service pod
kubectl exec -it -n tenant-alpha <ingestion-service-pod> -- bash
# Assuming kafkacat is installed
# This should FAIL (timeout)
kafkacat -b kafka.kafka:9092 -t forbidden-topic -P -l <<< "test-message"
Now, let's observe the dropped traffic using the Hubble CLI:
# Forward the hubble-relay port to your local machine
kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
# Observe dropped traffic from our specific pod
hubble observe --from-pod tenant-alpha/ingestion-service --verdict DROPPED --to-port 9092 -o json | jq
You will see output similar to this:
{
"flow": {
"source": {
"identity": 258,
"namespace": "tenant-alpha",
"labels": ["k8s:app=ingestion-service", ...]
},
"destination": {
"identity": 192,
"namespace": "kafka",
"labels": ["k8s:app=kafka", ...]
},
"verdict": "DROPPED",
"drop_reason_desc": "POLICY_DENIED",
"l4": {
"TCP": {
"destination_port": 9092
}
},
"l7": {
"type": "KAFKA",
"kafka": {
"api_key": "produce",
"api_version": 9,
"correlation_id": 1,
"topic": "forbidden-topic"
}
},
"policy_match_type": "L7",
"traffic_direction": "INGRESS"
}
}
This output is incredibly valuable for debugging. It explicitly shows a DROPPED verdict, a POLICY_DENIED reason, and most importantly, the parsed L7 Kafka data: apiKey: "produce" and topic: "forbidden-topic". This confirms that our L7 policy is the precise reason for the drop.
Section 2: Kernel-Level Runtime Security with Tetragon
Network policies are powerful, but they are blind to what happens inside a container. A compromised application could execute malicious binaries (curl, wget, ncat) to exfiltrate data over an already-allowed network connection. Traditional security tools might use auditd or ptrace, which can have significant performance overhead. eBPF offers a more performant solution by placing probes directly on kernel functions (kprobes) and system calls.
This is the domain of Tetragon, a Cilium sub-project focused on eBPF-based security observability and runtime enforcement.
Scenario: Let's extend our previous example. The ingestion-service container has a vulnerability and an attacker gains shell access. The container image is minimal and does not include tools like curl. The attacker uses a package manager (apk add curl) or uploads a static binary to install it. They then attempt to use curl to send data to an external metadata service, which is allowed by a broad egress network policy.
We want to detect the execution of /usr/bin/curl within the ingestion-service pod specifically, as this is anomalous behavior.
The Implementation: A Tetragon TracingPolicy
Tetragon uses a CRD called TracingPolicy to define which kernel events to monitor. We'll create a policy that attaches a kprobe to the sys_execve system call, which is invoked whenever a new program is executed.
Save this as detect-curl-exec.yaml.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "detect-anomalous-curl-execution"
spec:
kprobes:
- call: "sys_execve"
syscall: true
args:
- index: 0
type: "string"
selectors:
- matchPIDs:
- operator: "In"
followForks: true
isNamespacePID: true
values:
- 0
matchArgs:
- index: 0
operator: "Equal"
values:
- "/usr/bin/curl"
matchBinaries:
- operator: "In"
values:
- "/bin/bash"
- "/bin/sh"
matchPodLabels:
- key: "app"
value: "ingestion-service"
- key: "k8s:io.kubernetes.pod.namespace"
value: "tenant-alpha"
Deconstructing the eBPF Probe
TracingPolicy, the Tetragon agent on the node where the ingestion-service pod is running takes action.sys_execve kernel function. This program will now execute every time any process on the host calls execve.selectors. This logic is compiled into the eBPF program itself. When the program runs, it checks: * Is the process being executed named /usr/bin/curl? (matchArgs on index 0).
* Is the parent process /bin/bash or /bin/sh? (matchBinaries)
* Does the process belong to a container associated with a pod that has the labels app: ingestion-service and namespace: tenant-alpha? (matchPodLabels). Tetragon maintains eBPF maps that correlate process IDs (PIDs) to Kubernetes metadata.
This in-kernel pre-filtering is incredibly efficient. The userspace agent is not sifting through a firehose of all execve calls on the system; it only receives the highly-contextual, pre-filtered events that we care about. This minimizes CPU overhead significantly compared to userspace monitoring agents.
Verifying Detection
First, apply the policy:
kubectl apply -f detect-curl-exec.yaml
Next, tail the logs from the Tetragon agent on the relevant node. You can use the tetra CLI for this:
# Find the tetragon pod on the node where ingestion-service is running
kubectl get pods -n kube-system -o wide | grep tetragon
# Stream the logs in a structured format
kubectl logs -n kube-system -f <tetragon-pod-name> -c export-stdout | tetra getevents -o compact
Now, simulate the attack:
# Exec into the pod
kubectl exec -it -n tenant-alpha <ingestion-service-pod> -- /bin/bash
# Install and run curl
apk add curl
curl http://example.com
Almost immediately, you will see a JSON event appear in the Tetragon logs:
🚀 process tenant-alpha/ingestion-service-5f... /bin/bash -> /usr/bin/curl http://example.com
This compact output gives us the smoking gun: the exact pod, the parent process (/bin/bash), and the full command that was executed (/usr/bin/curl http://example.com). This is kernel-level ground truth, providing high-fidelity security signals that are extremely difficult to evade.
Section 3: Advanced Edge Cases and Production Considerations
Implementing these patterns in production requires navigating several advanced topics.
1. Performance Tuning and Map Sizing
eBPF relies heavily on maps to store state (e.g., security identities, policy rules, connection tracking entries). The size of these maps is critical. If a map fills up, the eBPF datapath may fail to create new entries, leading to dropped connections or incorrect policy enforcement.
* Dynamic Sizing: Modern versions of Cilium enable bpf-map-dynamic-sizing by default. However, for clusters with predictable, high-density workloads, you may want to manually tune these values in the Cilium ConfigMap. For example, bpf-ct-global-tcp-max and bpf-nat-global-max are common tuning points.
* Monitoring: Monitor map pressure via the Cilium metrics. Look for cilium_bpf_map_pressure. If this metric is consistently high, it's a clear signal that your maps are undersized for your workload.
* CPU Overhead: While eBPF is efficient, it's not free. JIT (Just-In-Time) compilation of eBPF programs adds a small CPU cost. On very old kernels without JIT support, the eBPF interpreter is used, which is significantly slower. Monitor the ksoftirqd kernel threads' CPU usage on your nodes; high usage can sometimes be linked to intense eBPF program activity.
2. The Kernel Version Gauntlet
eBPF is a rapidly evolving kernel feature. Advanced functionality is often tied to specific kernel versions.
* BTF (BPF Type Format): Features like Tetragon's more advanced kprobes and compile-once-run-everywhere (CO-RE) capabilities rely on BTF, which became standard around kernel 5.4. Running a heterogeneous cluster with some nodes on older kernels (e.g., 4.19) and some on newer ones can lead to inconsistent feature availability. Your deployment automation must account for this.
* eBPF Verifier: The kernel's verifier is a static analysis engine that scrutinizes every eBPF program before it's loaded. It ensures the program is safe—that it will always terminate (no unbounded loops) and won't access memory out of bounds. Writing complex eBPF logic (or relying on a CNI that does) means you are at the mercy of the verifier. A kernel upgrade can sometimes introduce a stricter verifier that rejects a previously valid eBPF program. This is a critical risk to test for before rolling out kernel updates in a production eBPF environment.
3. Handling `hostNetwork: true` Pods
Pods running with hostNetwork: true (common for node-level agents like monitoring exporters or CNI pods themselves) do not have their own network namespace. They share the host's. This complicates eBPF policy enforcement.
* Attachment Point: For a normal pod, Cilium attaches eBPF programs to the pod's veth pair. For a hostNetwork pod, Cilium must attach its eBPF programs to the physical network device on the host (e.g., eth0).
* Identity Crisis: Cilium's identity-based security model relies on associating traffic with a specific pod identity. For host network pods, traffic appears to originate from the node's IP, not a pod IP. Cilium uses clever eBPF socket-level hooks (cgroup/connect4, cgroup/sendmsg4) to intercept traffic at the connect() or send() syscall and associate it with the correct process and container, thereby deriving its Kubernetes identity. This is more complex and has different performance characteristics than the standard veth approach. Be aware that policies applied to hostNetwork pods are exercising a different, more intricate kernel path.
Conclusion: The Programmable Kernel is the Future
eBPF is more than just a faster iptables. It represents a paradigm shift towards a programmable kernel, where networking, security, and observability logic can be dynamically and safely inserted at runtime. By leveraging tools like Cilium and Tetragon, senior engineers can solve complex, real-world problems that were previously intractable.
We have demonstrated how to enforce L7-aware Kafka policies and detect anomalous process executions—tasks that go far beyond the capabilities of standard Kubernetes primitives. The key takeaway is that by moving enforcement and observation from userspace agents into highly efficient, sandboxed kernel programs, we gain unprecedented performance, granularity, and security context. As you scale your Kubernetes deployments, mastering these eBPF-native patterns will become not just an advantage, but a necessity for building robust, secure, and high-performance cloud-native systems.