eBPF for Granular L7 Policy Enforcement in Cilium K8s Clusters
Beyond IP and Port: The Imperative for L7 Policy Enforcement
In any non-trivial Kubernetes environment, standard NetworkPolicy resources quickly reveal their limitations. While effective for coarse-grained L3/L4 segmentation (e.g., 'pods with label app=backend can talk to pods with app=database on port 5432'), they are blind to the application-layer context. Modern microservice architectures demand more. We need to enforce rules like: 'Service A can invoke the POST /v1/users endpoint on Service B, but not DELETE /v1/users,' or 'The metrics-scraper can consume from the telemetry Kafka topic, but cannot produce to it.'
Traditionally, achieving this level of control required injecting a sidecar proxy (like Envoy in an Istio service mesh) into every pod. While powerful, this approach introduces significant operational overhead: resource consumption, increased latency due to user-space hops, and complex configuration management.
This is where Cilium's eBPF-based datapath offers a paradigm shift. By attaching eBPF (extended Berkeley Packet Filter) programs to network socket hooks directly within the Linux kernel, Cilium can parse, filter, and make policy decisions on L7 traffic before it even reaches the application's network stack. This kernel-level enforcement provides the granularity of a service mesh without the performance tax of a sidecar proxy. This article dissects the advanced implementation of these policies, focusing on production patterns, edge cases, and performance tuning.
The eBPF Advantage: Why Kernel-Level Enforcement Matters
For senior engineers, the 'why' is as important as the 'how'. The decision to use eBPF over a proxy-based model hinges on three core principles:
cilium monitor and hubble that observe the kernel's actions directly.Pattern 1: Advanced HTTP-Aware Ingress Control
Let's move from theory to a concrete, production-level scenario. Imagine a multi-service application with a payment-api that exposes several endpoints. We need to enforce the following access rules:
* The frontend-app can initiate new payments via POST /v1/payments.
* The auditing-service can retrieve payment statuses via GET /v1/payments/{id}.
* The auditing-service can also list transactions via GET /v1/payments.
* Crucially, the frontend-app must be blocked from listing all payments, and no service except a future admin-tool should be able to delete payments.
Standard NetworkPolicy is useless here. An L4 policy would have to open port 8080 to both services, granting them full access. With Cilium, we can define these rules with surgical precision.
Implementation with `CiliumNetworkPolicy`
First, let's define the CiliumNetworkPolicy resource. Assume our pods are labeled appropriately (app: payment-api, app: frontend-app, app: auditing-service).
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "payment-api-l7-policy"
namespace: "production"
spec:
endpointSelector:
matchLabels:
app: payment-api
ingress:
- fromEndpoints:
- matchLabels:
app: frontend-app
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "POST"
path: "/v1/payments"
- fromEndpoints:
- matchLabels:
app: auditing-service
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/v1/payments/[0-9a-f-]+" # Regex for UUIDs
- method: "GET"
path: "/v1/payments"
Dissecting the Policy
* endpointSelector: This targets the policy to apply to all pods with the app: payment-api label. This is our destination.
* ingress: We define a list of ingress rules.
* fromEndpoints: This specifies the source pods. We have two distinct blocks for frontend-app and auditing-service.
* toPorts and rules.http: This is the core of L7 enforcement. Inside the toPorts block for port 8080, we define http rules.
* For the frontend-app, we explicitly allow only POST requests to the exact path /v1/payments.
* For the auditing-service, we allow GET requests. Note the use of a regular expression in the path: /v1/payments/[0-9a-f-]+. This allows fetching specific payments by their UUID while remaining secure. We also have a separate rule for the list endpoint.
Verification and Debugging
How do we confirm this works? We can use cilium monitor to observe the packet verdicts in real-time.
First, execute a permitted request from the frontend-app pod:
# From inside the frontend-app pod
curl -X POST http://payment-api:8080/v1/payments -d '{"amount": 100}'
Now, tail the cilium monitor logs on the node where the payment-api pod is running:
# On the K8s node
sudo cilium monitor -t drop --related-to <payment-api-pod-id>
You will see Policy verdict: ALLOWED logs. Now, try a forbidden request from the same frontend-app pod:
# From inside the frontend-app pod
curl -X GET http://payment-api:8080/v1/payments
This request will hang and eventually time out. The cilium monitor output will show the reason:
-> Pod production/payment-api-7bdfc8c8c4-abcde an L7 policy violation
Policy verdict: DENIED reason: "L7 policy drop for HTTP GET /v1/payments"
This immediate, kernel-level feedback is invaluable for debugging complex access control rules.
Pattern 2: Securing Non-HTTP Traffic - Kafka Protocol Parsing
HTTP is common, but modern systems rely heavily on other L7 protocols like gRPC and Kafka. Cilium's programmable nature allows it to support parsers for these protocols as well. Let's tackle a Kafka scenario.
Scenario: A transactions-service produces events to a raw_transactions Kafka topic. A fraud-detection-service needs to consume from this topic. A separate archival-service also needs to consume. Crucially, the fraud-detection-service must never be allowed to produce to any topic, and no service should be able to access the __consumer_offsets topic directly.
Kafka-Aware `CiliumNetworkPolicy`
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "kafka-broker-policy"
namespace: "streaming"
spec:
endpointSelector:
matchLabels:
app: kafka-broker
ingress:
- fromEndpoints:
- matchLabels:
app: transactions-service
toPorts:
- ports:
- port: "9092"
protocol: TCP
rules:
kafka:
- role: "produce"
topic: "raw_transactions"
- fromEndpoints:
- matchLabels:
# Select multiple consumers with one rule
- k8s:io.kubernetes.pod.namespace: streaming
- 'k8s:app in (fraud-detection-service, archival-service)'
toPorts:
- ports:
- port: "9092"
protocol: TCP
rules:
kafka:
- role: "consume"
topic: "raw_transactions"
# Consumers also need to perform API lookups and fetch requests
- apiKey: "Metadata"
- apiKey: "ApiVersions"
- apiKey: "Fetch"
Dissecting the Kafka Policy
* rules.kafka: This block activates Cilium's Kafka protocol parser.
* role: A high-level abstraction for produce or consume actions, which maps to multiple underlying Kafka API keys (e.g., produce implies ProduceRequest, MetadataRequest, etc.).
* topic: The specific Kafka topic the rule applies to. Cilium's eBPF program parses the Kafka message header to extract the topic name and compares it against the policy.
* apiKey: For more granular control, you can specify the exact Kafka API key (e.g., Fetch, Produce, ListOffsets). In our consumer rule, we explicitly allow Metadata, ApiVersions, and Fetch to enable the consumer client to function correctly, while only permitting the Fetch action on the specified raw_transactions topic.
Under the Hood: Zero-Copy Parsing
The performance of this feature is critical. When a TCP packet destined for port 9092 arrives at the network interface, the eBPF program attached to the socket buffer (sk_buff) is triggered. It reads just enough of the TCP payload to parse the Kafka request header, which contains the API key and topic name. It does not need to copy the entire message payload into a separate buffer or send it to user space. The decision is made on the fly, and if denied, the packet is dropped right there. This zero-copy, kernel-space approach is what enables high-throughput Kafka security without performance degradation.
Pattern 3: Dynamic Egress Control with DNS-Aware Policies
Controlling ingress is half the battle. Egress control, especially to external services, presents a different challenge. IP-based egress rules are brittle; the IP addresses of cloud provider APIs or SaaS platforms change constantly. The correct approach is to define policies based on Fully Qualified Domain Names (FQDNs).
Scenario: A notification-service needs to send emails via the SendGrid API (api.sendgrid.com) and push notifications via OneSignal (onesignal.com). It must be blocked from making any other external network calls to prevent data exfiltration.
FQDN-Based `CiliumNetworkPolicy`
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "notification-service-egress"
namespace: "production"
spec:
endpointSelector:
matchLabels:
app: notification-service
egress:
# Rule 1: Allow DNS lookups to kube-dns
- toEndpoints:
- matchLabels:
'k8s:io.kubernetes.pod.namespace': kube-system
'k8s:k8s-app': kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP
# Rule 2: Allow traffic to the resolved IPs of the FQDNs
- toFQDNs:
- matchName: "api.sendgrid.com"
- matchName: "onesignal.com"
toPorts:
- ports:
- port: "443"
protocol: TCP
The Magic of DNS Snooping
This policy seems simple, but its implementation is sophisticated. How does Cilium know which IPs correspond to api.sendgrid.com?
egress rule explicitly allows the notification-service to send DNS queries to kube-dns. The Cilium agent runs an eBPF program that snoops on all DNS traffic.notification-service pod performs a DNS lookup for api.sendgrid.com, the eBPF program on its node intercepts the DNS response from kube-dns. It parses this response to extract the FQDN and its corresponding IP addresses.ipcache).notification-service then tries to open a TCP connection to one of these resolved IPs on port 443, another eBPF program attached to the socket connect hook checks the destination IP. It performs a lookup in the ipcache. If the IP is associated with an allowed FQDN from the policy, the connection is permitted. If not, it's dropped.Edge Cases and Performance Considerations
* DNS TTL: What happens when the DNS record's TTL expires and the IP changes? Cilium honors the TTL from the DNS response. When a mapping in its ipcache expires, it will be removed. The next connection attempt will be stalled until the application performs a fresh DNS query, which Cilium will then snoop to repopulate the cache. You can also configure a min-ttl in Cilium to prevent cache thrashing from low-TTL records.
* Cache Staleness: In rare cases, the cache could become stale. Debugging involves inspecting the cache directly with cilium bpf ipcache list to see if the IP your application is trying to reach is present and associated with the correct FQDN.
Wildcards: Cilium supports wildcards like .google.com, which is powerful but requires careful consideration. A rule for *.google.com will match maps.google.com but also potentially less desirable endpoints. Use the most specific names possible.
Section 4: Performance Profiling and Troubleshooting
Assuming everything works is a junior mindset. Senior engineers plan for failure and performance degradation. When you're operating at the kernel level, you need kernel-level tools.
Measuring Overhead with `bpftool`
How much CPU time are your eBPF programs actually consuming? The bpftool utility is your ground truth.
To identify all eBPF programs loaded by Cilium and profile them:
# List all loaded BPF programs
sudo bpftool prog list
# Find the ID of a program attached to a network interface, e.g., from-container
PROG_ID=$(sudo bpftool prog show name from-container | head -n1 | cut -d: -f1)
# Enable profiling on that program for 10 seconds
sudo bpftool prog profile id $PROG_ID sleep 10
This will output a detailed profile showing the number of times each instruction was executed and the number of CPU cycles consumed. If you notice performance degradation after applying a complex L7 policy, this is the first place to look. You might find that a poorly written regex in an HTTP rule is causing excessive CPU cycles on every packet.
Debugging Policy Drops with Hubble
While cilium monitor is great for real-time text logs, Hubble provides a powerful UI and CLI for observing and filtering network flows. When a request is failing and you don't know why, Hubble can pinpoint the exact policy rule responsible.
# Observe all dropped traffic destined for the payment-api pod
hubble observe --namespace production --to-pod payment-api --verdict DROPPED -f
The output will provide rich context:
TIME SOURCE -> DESTINATION TYPE VERDICT
May 20 10:30:15.123 UTC frontend-app-xyz -> payment-api-abc http-request DROPPED (L7 policy denied)
# Detailed view
Summary: HTTP/1.1 GET /v1/payments
Policy Decision: Ingress DENIED from identity 12345 to 67890 by rule payment-api-l7-policy
This tells you not just that it was dropped, but specifically that it was an L7 policy denial due to the payment-api-l7-policy rule. This level of observability is non-negotiable in a production environment.
Final Considerations for Production Environments
* Kernel Version: eBPF's capabilities are tightly coupled to the Linux kernel version. Always run a recent, stable kernel (5.4+ is a good baseline) to ensure access to the necessary eBPF features and verifier improvements.
* Incremental Rollout: Never apply complex L7 policies cluster-wide at once. Roll them out namespace by namespace or even service by service. Use monitoring and alerting to watch for unexpected drops or latency spikes.
* Policy as Code: Store your CiliumNetworkPolicy manifests in Git and manage them via a GitOps workflow (e.g., with ArgoCD or Flux). This provides an audit trail and a single source of truth for your network security posture.
By moving beyond L3/L4 and embracing eBPF-powered L7 enforcement, you can build a far more secure, efficient, and observable microservices architecture in Kubernetes. This approach provides the fine-grained control needed for a true zero-trust network model without the performance and complexity trade-offs of traditional service mesh sidecars.