eBPF for Granular L7 Policy Enforcement in Cilium K8s Clusters

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond IP and Port: The Imperative for L7 Policy Enforcement

In any non-trivial Kubernetes environment, standard NetworkPolicy resources quickly reveal their limitations. While effective for coarse-grained L3/L4 segmentation (e.g., 'pods with label app=backend can talk to pods with app=database on port 5432'), they are blind to the application-layer context. Modern microservice architectures demand more. We need to enforce rules like: 'Service A can invoke the POST /v1/users endpoint on Service B, but not DELETE /v1/users,' or 'The metrics-scraper can consume from the telemetry Kafka topic, but cannot produce to it.'

Traditionally, achieving this level of control required injecting a sidecar proxy (like Envoy in an Istio service mesh) into every pod. While powerful, this approach introduces significant operational overhead: resource consumption, increased latency due to user-space hops, and complex configuration management.

This is where Cilium's eBPF-based datapath offers a paradigm shift. By attaching eBPF (extended Berkeley Packet Filter) programs to network socket hooks directly within the Linux kernel, Cilium can parse, filter, and make policy decisions on L7 traffic before it even reaches the application's network stack. This kernel-level enforcement provides the granularity of a service mesh without the performance tax of a sidecar proxy. This article dissects the advanced implementation of these policies, focusing on production patterns, edge cases, and performance tuning.

The eBPF Advantage: Why Kernel-Level Enforcement Matters

For senior engineers, the 'why' is as important as the 'how'. The decision to use eBPF over a proxy-based model hinges on three core principles:

  • Performance: eBPF programs are JIT-compiled into native machine code and executed in kernel space. This eliminates the expensive context switches and memory copies required to shuttle packets between the kernel and a user-space proxy. For latency-sensitive applications, this can mean the difference between meeting and missing SLOs. The overhead of an L7 eBPF policy is measured in microseconds, whereas a sidecar proxy often adds milliseconds.
  • Security: eBPF programs undergo a rigorous verification process by the kernel's BPF verifier before being loaded. The verifier performs a static analysis to ensure the program is safe to run—it checks for unbounded loops, out-of-bounds memory access, and other potential instabilities. This provides a strong security guarantee that a user-space proxy, which can crash or be exploited like any other application, cannot offer at the same level.
  • Simplicity & Transparency: The application remains completely unaware of eBPF enforcement. There are no sidecars to inject, no process changes, and no modifications to Kubernetes deployments. The networking logic is transparently applied by the CNI, simplifying the operational model. Debugging is also more direct, using tools like cilium monitor and hubble that observe the kernel's actions directly.

  • Pattern 1: Advanced HTTP-Aware Ingress Control

    Let's move from theory to a concrete, production-level scenario. Imagine a multi-service application with a payment-api that exposes several endpoints. We need to enforce the following access rules:

    * The frontend-app can initiate new payments via POST /v1/payments.

    * The auditing-service can retrieve payment statuses via GET /v1/payments/{id}.

    * The auditing-service can also list transactions via GET /v1/payments.

    * Crucially, the frontend-app must be blocked from listing all payments, and no service except a future admin-tool should be able to delete payments.

    Standard NetworkPolicy is useless here. An L4 policy would have to open port 8080 to both services, granting them full access. With Cilium, we can define these rules with surgical precision.

    Implementation with `CiliumNetworkPolicy`

    First, let's define the CiliumNetworkPolicy resource. Assume our pods are labeled appropriately (app: payment-api, app: frontend-app, app: auditing-service).

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "payment-api-l7-policy"
      namespace: "production"
    spec:
      endpointSelector:
        matchLabels:
          app: payment-api
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: frontend-app
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
          rules:
            http:
            - method: "POST"
              path: "/v1/payments"
      - fromEndpoints:
        - matchLabels:
            app: auditing-service
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/v1/payments/[0-9a-f-]+" # Regex for UUIDs
            - method: "GET"
              path: "/v1/payments"

    Dissecting the Policy

    * endpointSelector: This targets the policy to apply to all pods with the app: payment-api label. This is our destination.

    * ingress: We define a list of ingress rules.

    * fromEndpoints: This specifies the source pods. We have two distinct blocks for frontend-app and auditing-service.

    * toPorts and rules.http: This is the core of L7 enforcement. Inside the toPorts block for port 8080, we define http rules.

    * For the frontend-app, we explicitly allow only POST requests to the exact path /v1/payments.

    * For the auditing-service, we allow GET requests. Note the use of a regular expression in the path: /v1/payments/[0-9a-f-]+. This allows fetching specific payments by their UUID while remaining secure. We also have a separate rule for the list endpoint.

    Verification and Debugging

    How do we confirm this works? We can use cilium monitor to observe the packet verdicts in real-time.

    First, execute a permitted request from the frontend-app pod:

    bash
    # From inside the frontend-app pod
    curl -X POST http://payment-api:8080/v1/payments -d '{"amount": 100}'

    Now, tail the cilium monitor logs on the node where the payment-api pod is running:

    bash
    # On the K8s node
    sudo cilium monitor -t drop --related-to <payment-api-pod-id>

    You will see Policy verdict: ALLOWED logs. Now, try a forbidden request from the same frontend-app pod:

    bash
    # From inside the frontend-app pod
    curl -X GET http://payment-api:8080/v1/payments

    This request will hang and eventually time out. The cilium monitor output will show the reason:

    text
    -> Pod production/payment-api-7bdfc8c8c4-abcde an L7 policy violation
    Policy verdict: DENIED reason: "L7 policy drop for HTTP GET /v1/payments"

    This immediate, kernel-level feedback is invaluable for debugging complex access control rules.


    Pattern 2: Securing Non-HTTP Traffic - Kafka Protocol Parsing

    HTTP is common, but modern systems rely heavily on other L7 protocols like gRPC and Kafka. Cilium's programmable nature allows it to support parsers for these protocols as well. Let's tackle a Kafka scenario.

    Scenario: A transactions-service produces events to a raw_transactions Kafka topic. A fraud-detection-service needs to consume from this topic. A separate archival-service also needs to consume. Crucially, the fraud-detection-service must never be allowed to produce to any topic, and no service should be able to access the __consumer_offsets topic directly.

    Kafka-Aware `CiliumNetworkPolicy`

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "kafka-broker-policy"
      namespace: "streaming"
    spec:
      endpointSelector:
        matchLabels:
          app: kafka-broker
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: transactions-service
        toPorts:
        - ports:
          - port: "9092"
            protocol: TCP
          rules:
            kafka:
            - role: "produce"
              topic: "raw_transactions"
      - fromEndpoints:
        - matchLabels:
            # Select multiple consumers with one rule
            - k8s:io.kubernetes.pod.namespace: streaming
            - 'k8s:app in (fraud-detection-service, archival-service)'
        toPorts:
        - ports:
          - port: "9092"
            protocol: TCP
          rules:
            kafka:
            - role: "consume"
              topic: "raw_transactions"
            # Consumers also need to perform API lookups and fetch requests
            - apiKey: "Metadata"
            - apiKey: "ApiVersions"
            - apiKey: "Fetch"

    Dissecting the Kafka Policy

    * rules.kafka: This block activates Cilium's Kafka protocol parser.

    * role: A high-level abstraction for produce or consume actions, which maps to multiple underlying Kafka API keys (e.g., produce implies ProduceRequest, MetadataRequest, etc.).

    * topic: The specific Kafka topic the rule applies to. Cilium's eBPF program parses the Kafka message header to extract the topic name and compares it against the policy.

    * apiKey: For more granular control, you can specify the exact Kafka API key (e.g., Fetch, Produce, ListOffsets). In our consumer rule, we explicitly allow Metadata, ApiVersions, and Fetch to enable the consumer client to function correctly, while only permitting the Fetch action on the specified raw_transactions topic.

    Under the Hood: Zero-Copy Parsing

    The performance of this feature is critical. When a TCP packet destined for port 9092 arrives at the network interface, the eBPF program attached to the socket buffer (sk_buff) is triggered. It reads just enough of the TCP payload to parse the Kafka request header, which contains the API key and topic name. It does not need to copy the entire message payload into a separate buffer or send it to user space. The decision is made on the fly, and if denied, the packet is dropped right there. This zero-copy, kernel-space approach is what enables high-throughput Kafka security without performance degradation.


    Pattern 3: Dynamic Egress Control with DNS-Aware Policies

    Controlling ingress is half the battle. Egress control, especially to external services, presents a different challenge. IP-based egress rules are brittle; the IP addresses of cloud provider APIs or SaaS platforms change constantly. The correct approach is to define policies based on Fully Qualified Domain Names (FQDNs).

    Scenario: A notification-service needs to send emails via the SendGrid API (api.sendgrid.com) and push notifications via OneSignal (onesignal.com). It must be blocked from making any other external network calls to prevent data exfiltration.

    FQDN-Based `CiliumNetworkPolicy`

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "notification-service-egress"
      namespace: "production"
    spec:
      endpointSelector:
        matchLabels:
          app: notification-service
      egress:
      # Rule 1: Allow DNS lookups to kube-dns
      - toEndpoints:
        - matchLabels:
            'k8s:io.kubernetes.pod.namespace': kube-system
            'k8s:k8s-app': kube-dns
        toPorts:
        - ports:
          - port: "53"
            protocol: UDP
      # Rule 2: Allow traffic to the resolved IPs of the FQDNs
      - toFQDNs:
        - matchName: "api.sendgrid.com"
        - matchName: "onesignal.com"
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP

    The Magic of DNS Snooping

    This policy seems simple, but its implementation is sophisticated. How does Cilium know which IPs correspond to api.sendgrid.com?

  • DNS Interception: The first egress rule explicitly allows the notification-service to send DNS queries to kube-dns. The Cilium agent runs an eBPF program that snoops on all DNS traffic.
  • Dynamic IP Mapping: When the notification-service pod performs a DNS lookup for api.sendgrid.com, the eBPF program on its node intercepts the DNS response from kube-dns. It parses this response to extract the FQDN and its corresponding IP addresses.
  • IP Cache Population: This FQDN-to-IP mapping is stored in a local, highly-efficient eBPF map on the node (often called the ipcache).
  • Policy Enforcement: When the notification-service then tries to open a TCP connection to one of these resolved IPs on port 443, another eBPF program attached to the socket connect hook checks the destination IP. It performs a lookup in the ipcache. If the IP is associated with an allowed FQDN from the policy, the connection is permitted. If not, it's dropped.
  • Edge Cases and Performance Considerations

    * DNS TTL: What happens when the DNS record's TTL expires and the IP changes? Cilium honors the TTL from the DNS response. When a mapping in its ipcache expires, it will be removed. The next connection attempt will be stalled until the application performs a fresh DNS query, which Cilium will then snoop to repopulate the cache. You can also configure a min-ttl in Cilium to prevent cache thrashing from low-TTL records.

    * Cache Staleness: In rare cases, the cache could become stale. Debugging involves inspecting the cache directly with cilium bpf ipcache list to see if the IP your application is trying to reach is present and associated with the correct FQDN.

    Wildcards: Cilium supports wildcards like .google.com, which is powerful but requires careful consideration. A rule for *.google.com will match maps.google.com but also potentially less desirable endpoints. Use the most specific names possible.


    Section 4: Performance Profiling and Troubleshooting

    Assuming everything works is a junior mindset. Senior engineers plan for failure and performance degradation. When you're operating at the kernel level, you need kernel-level tools.

    Measuring Overhead with `bpftool`

    How much CPU time are your eBPF programs actually consuming? The bpftool utility is your ground truth.

    To identify all eBPF programs loaded by Cilium and profile them:

    bash
    # List all loaded BPF programs
    sudo bpftool prog list
    
    # Find the ID of a program attached to a network interface, e.g., from-container
    PROG_ID=$(sudo bpftool prog show name from-container | head -n1 | cut -d: -f1)
    
    # Enable profiling on that program for 10 seconds
    sudo bpftool prog profile id $PROG_ID sleep 10

    This will output a detailed profile showing the number of times each instruction was executed and the number of CPU cycles consumed. If you notice performance degradation after applying a complex L7 policy, this is the first place to look. You might find that a poorly written regex in an HTTP rule is causing excessive CPU cycles on every packet.

    Debugging Policy Drops with Hubble

    While cilium monitor is great for real-time text logs, Hubble provides a powerful UI and CLI for observing and filtering network flows. When a request is failing and you don't know why, Hubble can pinpoint the exact policy rule responsible.

    bash
    # Observe all dropped traffic destined for the payment-api pod
    hubble observe --namespace production --to-pod payment-api --verdict DROPPED -f

    The output will provide rich context:

    text
    TIME                           SOURCE -> DESTINATION                     TYPE            VERDICT
    May 20 10:30:15.123 UTC        frontend-app-xyz -> payment-api-abc       http-request    DROPPED (L7 policy denied)
    
    # Detailed view
    Summary: HTTP/1.1 GET /v1/payments
    Policy Decision: Ingress DENIED from identity 12345 to 67890 by rule payment-api-l7-policy

    This tells you not just that it was dropped, but specifically that it was an L7 policy denial due to the payment-api-l7-policy rule. This level of observability is non-negotiable in a production environment.

    Final Considerations for Production Environments

    * Kernel Version: eBPF's capabilities are tightly coupled to the Linux kernel version. Always run a recent, stable kernel (5.4+ is a good baseline) to ensure access to the necessary eBPF features and verifier improvements.

    * Incremental Rollout: Never apply complex L7 policies cluster-wide at once. Roll them out namespace by namespace or even service by service. Use monitoring and alerting to watch for unexpected drops or latency spikes.

    * Policy as Code: Store your CiliumNetworkPolicy manifests in Git and manage them via a GitOps workflow (e.g., with ArgoCD or Flux). This provides an audit trail and a single source of truth for your network security posture.

    By moving beyond L3/L4 and embracing eBPF-powered L7 enforcement, you can build a far more secure, efficient, and observable microservices architecture in Kubernetes. This approach provides the fine-grained control needed for a true zero-trust network model without the performance and complexity trade-offs of traditional service mesh sidecars.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles