Advanced eBPF: Granular L7 Network Policies in Cilium

17 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The L3/L4 Limitation: Why Senior Engineers Need More

Standard Kubernetes NetworkPolicy objects are a foundational element of cluster security, but they operate primarily at L3 (IP address) and L4 (port). For any seasoned engineer architecting a non-trivial microservices application, this is insufficient. A policy that allows traffic from pod-A to pod-B on port 8080 is a blunt instrument. It cannot distinguish between a legitimate GET /api/v1/users request and a malicious POST /api/v1/admin/delete_all request sent over the same connection. This gap forces security logic up into the application layer or necessitates a full-blown service mesh, often with significant operational and performance overhead.

This is where the paradigm shifts. By leveraging eBPF, Cilium pushes sophisticated L7 awareness directly into the Linux kernel. This isn't just about writing a different kind of YAML; it's a fundamental change in how and where network policy is enforced. This article is not an introduction to Cilium. It assumes you understand CNI basics and the concept of identity-based security. Our goal is to dissect the implementation details, performance characteristics, and advanced production patterns for L7 policies, enabling you to build truly zero-trust networks.


Under the Hood: The eBPF and Proxy Symbiosis

To effectively use and debug L7 policies, you must understand how Cilium processes a packet. It's a common misconception that all traffic is forced through a userspace proxy. The reality is an elegant, performance-oriented dance between eBPF programs in the kernel and a proxy (Envoy) in userspace.

  • Initial Packet Interception: When a packet leaves a pod, it traverses the virtual ethernet (veth) pair connecting the pod's network namespace to the host. Cilium attaches an eBPF program (often bpf_lxc.c) to the Traffic Control (TC) hook on the host side of this veth pair. This program executes for every single packet.
  • Identity-Based Verdict (L3/L4): The eBPF program's first job is to perform a highly efficient L3/L4 policy check. It extracts the source security identity (a numeric ID Cilium assigns based on pod labels) and the destination IP/port. It then performs a lookup in an eBPF map containing the allowed identity pairs for that destination. If the policy denies the connection at this level, the packet is dropped immediately in the kernel. This is incredibly fast and accounts for the bulk of policy enforcement.
  • The L7 Redirect Decision: If the L3/L4 policy allows the connection, the eBPF program checks if a more specific L7 policy applies to this destination port. If and only if an L7 rule (e.g., an HTTP rule) exists, the eBPF program redirects the packet's socket to a userspace Envoy proxy managed by the Cilium agent. If no L7 rule exists, the packet is forwarded directly to its destination, incurring no proxy overhead.
  • This "proxy-on-demand" architecture is Cilium's key performance advantage over a traditional service mesh sidecar model, where all traffic is unconditionally passed through a proxy, regardless of whether L7 inspection is needed.

    Here is a conceptual C-like snippet illustrating the logic within the eBPF TC program:

    c
    // Conceptual eBPF program logic at the TC hook
    int handle_packet(struct __sk_buff *skb) {
        // 1. Extract metadata: source identity, destination IP/port
        __u32 src_identity = get_source_identity(skb);
        struct endpoint_key dst_key = get_destination_key(skb);
    
        // 2. Perform L3/L4 policy lookup in BPF map
        struct policy_verdict *verdict = bpf_map_lookup_elem(&policy_map, &dst_key);
    
        // 3. Check for L7 redirect requirement
        if (verdict && verdict->requires_l7_parse) {
            // Redirect to userspace proxy (e.g., Envoy) for deep inspection
            // This is a complex operation involving socket manipulation
            return redirect_to_proxy(skb, verdict->proxy_port);
        } else if (verdict && verdict->is_allowed) {
            // L3/L4 policy allows, no L7 needed. Forward directly.
            return TC_ACT_OK; 
        } else {
            // Policy denies at L3/L4. Drop packet in kernel.
            return TC_ACT_SHOT;
        }
    }

    This efficiency is paramount. For a high-throughput database connection that only needs L4 policy, it never pays the latency or CPU cost of proxy serialization/deserialization.


    Advanced L7 Policy Implementation Patterns

    Let's move from theory to production-grade implementation. Basic examples are insufficient for real-world scenarios involving API gateways, diverse L7 protocols, and complex access control.

    Scenario 1: Securing a Microservices API Gateway

    Consider a common pattern: an api-gateway pod routes requests to various backend services. We need to enforce that the gateway can only call specific, sanctioned API endpoints on each backend.

    Problem:

    * An api-gateway pod needs to communicate with billing-service and user-service.

    * billing-service should only accept POST /v1/charge and GET /v1/invoices/{uuid}.

    * user-service should only accept GET /v1/users/{id} and PUT /v1/users/{id}.

    * All other paths on these services should be blocked, even from the gateway.

    Solution: A single CiliumNetworkPolicy applied to the api-gateway can enforce this with precise egress rules.

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "api-gateway-backend-l7-policy"
      namespace: production
    spec:
      # Apply this policy to the api-gateway pods
      endpointSelector:
        matchLabels:
          app: api-gateway
    
      # Define egress (outbound) rules
      egress:
        # Rule 1: Allow traffic to billing-service
        - toEndpoints:
          - matchLabels:
              app: billing-service
          toPorts:
          - port: "8080"
            protocol: TCP
            rules:
              http:
              - method: "POST"
                path: "/v1/charge"
              - method: "GET"
                # Path matching supports regex for dynamic segments
                path: "/v1/invoices/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"
    
        # Rule 2: Allow traffic to user-service
        - toEndpoints:
          - matchLabels:
              app: user-service
          toPorts:
          - port: "9090"
            protocol: TCP
            rules:
              http:
              - method: "GET"
                path: "/v1/users/[0-9]+"
              - method: "PUT"
                path: "/v1/users/[0-9]+"
    
        # Rule 3: Allow DNS lookups (CRITICAL for production)
        - toEndpoints:
          - matchLabels:
              "k8s:io.kubernetes.pod.namespace": kube-system
              "k8s:k8s-app": kube-dns
          toPorts:
          - port: "53"
            protocol: UDP
            rules:
              dns:
              - matchPattern: "*"

    Analysis of Advanced Features:

    * Endpoint Selectors: The policy is anchored to the source (api-gateway) via endpointSelector and targets destinations (billing-service, user-service) via toEndpoints. This is identity-based, not IP-based, so it's resilient to pod churn.

    Regex in Paths: We use a precise regex for the UUID in the invoices path and a simpler one for the user ID. This demonstrates the flexibility needed for modern REST APIs. A simple wildcard () would be too permissive.

    * Explicit DNS Rule: Forgetting to allow DNS is a classic mistake that causes catastrophic, hard-to-debug failures. We explicitly allow UDP port 53 to kube-dns pods. This is a non-negotiable production requirement.

    Scenario 2: Beyond HTTP - Securing gRPC and Kafka

    Many systems rely on protocols other than HTTP/REST. Cilium's L7 capabilities extend to other common protocols through specific parsers.

    Problem A: Securing gRPC Communication

    A frontend service communicates with a user-auth-grpc service. We must ensure the frontend can only invoke the Login and CheckAuth RPCs from the AuthService service, and nothing else.

    Solution: Use the l7protocol field to specify the gRPC parser.

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "frontend-to-grpc-auth-policy"
      namespace: production
    spec:
      endpointSelector:
        matchLabels:
          app: frontend
      egress:
      - toEndpoints:
        - matchLabels:
            app: user-auth-grpc
        toPorts:
        - port: "50051"
          protocol: TCP
          rules:
            l7protocol: "grpc"
            l7:
            # The gRPC service name is defined in the .proto file
            - service: "auth.v1.AuthService"
              method: "Login"
            - service: "auth.v1.AuthService"
              method: "CheckAuth"

    Problem B: Granular Kafka Topic Permissions

    A payment-processor service needs to produce messages to the processed-payments Kafka topic, while a fraud-detector service needs to consume from that same topic. Neither should be able to perform the other's role or access other topics.

    Solution: Use the kafka rule type to define role- and topic-specific permissions.

    yaml
    # Policy for the Payment Processor (Producer)
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "payment-processor-kafka-producer-policy"
      namespace: production
    spec:
      endpointSelector:
        matchLabels:
          app: payment-processor
      egress:
      - toEndpoints:
        - matchLabels:
            app: kafka-broker
        toPorts:
        - port: "9092"
          protocol: TCP
          rules:
            kafka:
            - role: "produce"
              topic: "processed-payments"
    ---
    # Policy for the Fraud Detector (Consumer)
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "fraud-detector-kafka-consumer-policy"
      namespace: production
    spec:
      endpointSelector:
        matchLabels:
          app: fraud-detector
      egress:
      - toEndpoints:
        - matchLabels:
            app: kafka-broker
        toPorts:
        - port: "9092"
          protocol: TCP
          rules:
            kafka:
            - role: "consume"
              topic: "processed-payments"

    These examples showcase how Cilium's protocol-aware parsers allow you to enforce business logic-level security at the platform layer, significantly reducing the attack surface.


    Edge Cases and Production Hardening

    Writing the happy-path policy is only half the battle. Production environments are rife with complexity that can undermine your security posture.

    The Elephant in the Room: Encrypted TLS Traffic

    Problem: L7 policies are fundamentally incompatible with end-to-end TLS encryption. If the Cilium proxy can't see the HTTP headers or gRPC method names, it cannot enforce a policy on them. How do we reconcile zero-trust networking with encryption?

    * Solution 1: The Pragmatic Approach - Cilium + Service Mesh. For many, the most robust solution is to combine Cilium's strengths with a service mesh like Istio or Linkerd. In this model:

    * The service mesh handles mTLS, terminating TLS inside the pod's proxy sidecar.

    * Cilium enforces L3/L4/identity policies on the encrypted traffic between pods. This provides a critical layer of defense-in-depth. An attacker who compromises a pod cannot even attempt to connect to a database pod if no identity-based policy allows it, regardless of their ability to handle TLS.

    * Once traffic is decrypted by the service mesh sidecar, the mesh's own L7 policies can be applied.

    * Solution 2: The Integrated Approach - Cilium Service Mesh. Cilium now includes its own service mesh functionality, which uses eBPF to transparently manage TLS termination without requiring manual sidecar injection. This provides the best of both worlds: efficient eBPF-based networking and policy, combined with integrated visibility into encrypted traffic for L7 policy enforcement. This is rapidly becoming the preferred architecture for greenfield deployments.

    * Solution 3: The Future - Kernel TLS (kTLS) Inspection. An emerging, highly advanced technique involves using eBPF kprobes to hook into the kernel's own TLS functions. This would allow inspecting decrypted data directly in the kernel without a userspace proxy. This is still experimental and has significant limitations but represents a potential future direction for hyper-efficient L7 policy on encrypted traffic.

    Policies for `hostNetwork: true` Pods

    Problem: Pods running with hostNetwork: true (common for node agents like node-exporter or certain CNI plugins) don't have a pod-specific veth pair. They share the host's network interface. How can Cilium apply policies to them?

    Solution: Cilium handles this by attaching eBPF programs directly to the physical host network devices (e.g., eth0) at the TC or XDP hook. When traffic from a host-networked pod egresses the node, these programs can still identify it based on its process context and apply the correct identity-based policies. The key takeaway is that Cilium's identity model is not solely dependent on pod networking constructs; it can extend to any process on the host, ensuring consistent policy enforcement.

    Safe Policy Rollouts with Audit Mode

    Problem: Applying a new, restrictive network policy in a live production environment is terrifying. A mistake could cause a widespread outage. How can we validate a policy's impact before enforcing it?

    Solution: CiliumNetworkPolicy supports an audit mode. Instead of dropping non-compliant packets, it allows them to pass but logs a policy violation. This provides a non-disruptive way to observe what would have been dropped.

    To enable it globally, you can set policy-enforcement-mode: "audit" in the Cilium ConfigMap. For a more targeted approach, you can annotate a specific pod:

    bash
    # Enable audit mode for a specific pod
    kubectl annotate pod my-app-pod -n my-ns io.cilium.policy-verdict=audit

    After enabling audit mode, you can use Hubble to observe the would-be-dropped traffic:

    bash
    # Show traffic that is 'allowed' but with an AUDIT verdict
    hubble observe --verdict AUDIT -n my-ns

    This command is your safety net. You can apply a new policy, watch the audit verdicts for a period, and if no legitimate traffic is being flagged, you can confidently switch to full enforcement mode.


    Advanced Debugging and Observability

    When things go wrong, kubectl describe is not enough. You need to peer into the kernel.

    Beyond Basic Hubble: Advanced Queries

    Hubble is the primary tool for observing policy decisions. Go beyond simple observation with powerful filters.

    Scenario: You've applied the API gateway policy from earlier, but the gateway is receiving 503 Service Unavailable errors when trying to contact the billing-service. You suspect a policy issue.

    bash
    # Trace traffic from the gateway to the billing service, looking only for drops.
    hubble observe --from-pod production/api-gateway-xyz --to-pod production/billing-service-abc --verdict DROP -o json

    The output will be a JSON object for each dropped packet, containing a traffic.l7.http field if the drop was due to an L7 rule violation. The drop_reason_desc field will explicitly state which rule was violated, e.g., Policy denied by HTTP rule: method mismatch. This tells you instantly if the application is using a DELETE method when your policy only allows POST and GET.

    `cilium monitor`: The Unfiltered Kernel Stream

    For the deepest level of insight, cilium monitor provides a real-time stream of debug events directly from the eBPF programs in the kernel. It's verbose but invaluable.

    bash
    # Run on the node where the source or destination pod is running
    cilium monitor --type drop -v

    An example drop message might look like this:

    -> drop (Policy denied) flow 0x... to endpoint 1234, ifindex eth0, identity 54321->87654, CtxId 0, 10.0.1.25:43210 -> 10.0.2.33:8080 tcp FIN

    This tells you the exact source and destination security identities (e.g., 54321->87654), the IPs and ports, and the reason (Policy denied). This is your ground truth when debugging connectivity.

    `bpftool` and `cilium bpf`: Peering into eBPF Maps

    For the ultimate expert-level debugging, you can inspect the state of the eBPF maps themselves. This confirms what policies and identities have actually been programmed into the kernel.

  • List attached eBPF programs: Find the network interface of your pod (ip a) and then list the BPF programs attached to it.
  • bash
        # Get the ifindex of the pod's veth pair on the host
        IFINDEX=$(kubectl get pod -n my-ns my-app-pod -o jsonpath='{.status.podIP}' | xargs -I{} cilium endpoint list -o json | jq '.[] | select(.status.networking.addressing[0].ipv4 == "{}") | .status.networking.interface-index')
        
        # List BPF programs on that interface
        bpftool net list dev ifindex $IFINDEX
  • Inspect the Policy Map: The policy map contains the rules for allowed connections. You can dump its contents to see the raw rules the kernel is using.
  • bash
        # This requires being on the node
        cilium bpf policy get <endpoint-id> <destination-identity>

    This level of inspection allows you to definitively answer the question: "Does the kernel's view of the policy match the YAML I applied?" In cases of controller bugs or synchronization issues, this is the only way to know for sure.


    Performance Considerations

    While eBPF is exceptionally fast, it is not zero-cost. Senior engineers must understand the performance trade-offs.

    * eBPF vs. iptables: At scale, eBPF consistently outperforms iptables for Kubernetes networking. iptables rules are evaluated sequentially (O(n) complexity), whereas eBPF uses hash map lookups (O(1) complexity). For clusters with thousands of services and policies, the difference is dramatic, leading to lower latency and CPU usage.

    L7 Proxy Overhead: The latency cost of redirecting a flow to the userspace Envoy proxy is typically measured in tens of microseconds. While small, this can be significant for ultra-low-latency applications. The key is to apply L7 policies only* where necessary. For high-volume, performance-sensitive traffic that doesn't need deep inspection (e.g., database connections, streaming data), stick to L3/L4 policies to avoid the proxy path entirely.

    * Tuning BPF Maps: In extremely large clusters (>1000 nodes), you may need to tune the size of the eBPF maps used to store connection tracking entries, policy rules, and identity information. This is done via the Cilium ConfigMap (e.g., bpf-ct-global-tcp-max, bpf-policy-map-max). Incorrect sizing can lead to dropped packets or policy failures when maps become full.

    Conclusion: The Kernel as the New Security Boundary

    Moving beyond L3/L4 network policies is no longer optional for securing modern, distributed applications. By leveraging eBPF, Cilium provides a mechanism to enforce granular, L7-aware policies with surgical precision and remarkable performance. This approach pushes the security boundary down into the most efficient and secure layer possible: the Linux kernel.

    For the senior engineer, mastering this technology means more than just writing YAML. It requires a deep understanding of the interplay between kernel-space eBPF and user-space proxies, a strategy for handling encryption, and the skills to debug policy enforcement at the lowest levels. By embracing this complexity, you can build systems that are not only more secure and resilient but also more performant and operationally simpler than those relying on traditional network security models.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles