Advanced eBPF Network Policy Enforcement in Cilium for Zero-Trust

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Limitations of `iptables` and the Promise of eBPF

For any seasoned engineer operating Kubernetes at scale, the limitations of the default NetworkPolicy object, typically implemented via iptables, are painfully apparent. While functional for basic L3/L4 filtering, iptables-based solutions suffer from significant performance degradation as the number of rules and services grows. The linear traversal of rule chains and the overhead of connection tracking (conntrack) in the kernel's networking stack become major bottlenecks, introducing latency and consuming substantial CPU resources. Furthermore, in a dynamic microservices environment where pod IPs are ephemeral, relying on IP-based rules is both brittle and fundamentally insecure.

This is where eBPF (extended Berkeley Packet Filter) fundamentally changes the game. By allowing us to run sandboxed programs directly within the Linux kernel, eBPF enables a new paradigm for networking and security. Cilium leverages this capability to implement a highly efficient, identity-based networking and security model. Instead of routing packets through complex iptables chains, Cilium attaches eBPF programs to network hooks (like Traffic Control tc) on pod veth pairs. These programs make policy decisions directly in the kernel at the earliest possible point, bypassing the entire iptables stack.

At the core of Cilium's model is identity-based security. Cilium assigns a numeric security identity to each pod based on its labels. A policy allowing app=frontend to talk to app=backend is translated into a simple rule: allow identity X to talk to identity Y. This identity mapping is stored in an eBPF map, a highly efficient key-value store in the kernel. When a packet arrives, the eBPF program performs a near-instantaneous O(1) hash table lookup in this map to enforce the policy. This is orders of magnitude faster and more scalable than traversing a list of IP-based iptables rules.

This article will not re-explain these fundamentals. We assume you understand them. Instead, we will dive deep into two advanced, production-critical patterns that are impossible with standard NetworkPolicy but are made elegant and performant by Cilium and eBPF: Layer 7 HTTP-aware policies and FQDN-aware egress controls.


Production Pattern 1: Granular L7 HTTP-Aware Policies

The Problem: Consider a typical microservices scenario. A billing-service exposes multiple API endpoints:

* POST /api/v1/charge: To create a new payment.

* GET /api/v1/invoices/{id}: To retrieve invoice details.

* GET /healthz: For liveness probes.

A checkout-service should only be able to create new charges, while an auditing-service should only be able to retrieve invoices. The cluster's prometheus service needs to access the health endpoint. A standard NetworkPolicy can only open port access (e.g., allow traffic to billing-service on TCP port 8080), but it cannot differentiate between POST and GET requests or inspect the URL path. This violates the principle of least privilege, a cornerstone of zero-trust security.

The Solution: We leverage the CiliumNetworkPolicy Custom Resource Definition (CRD) to define L7-aware rules. Cilium's eBPF programs can identify HTTP traffic and redirect it to a tightly integrated, lightweight Envoy proxy running in userspace for deep packet inspection and policy enforcement. This is done transparently without requiring a full service mesh sidecar for every pod, significantly reducing resource overhead.

Implementation Example

First, let's define our services. We'll use simple nginx deployments for demonstration, with labels that Cilium will use for identity.

1. Deploy the Services:

yaml
# services.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: billing-service
  labels:
    app: billing-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: billing-service
  template:
    metadata:
      labels:
        app: billing-service
        class: sensitive
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: billing-service
spec:
  selector:
    app: billing-service
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-service
  labels:
    app: checkout-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: checkout-service
  template:
    metadata:
      labels:
        app: checkout-service
    spec:
      containers:
      - name: client
        image: curlimages/curl
        # Keep the pod running
        command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: auditing-service
  labels:
    app: auditing-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: auditing-service
  template:
    metadata:
      labels:
        app: auditing-service
    spec:
      containers:
      - name: client
        image: curlimages/curl
        command: ["sleep", "3600"]

Apply this manifest: kubectl apply -f services.yaml

2. Define the Advanced CiliumNetworkPolicy:

Now, we create the policy that enforces our specific L7 rules. Note the toPorts section with the rules and http stanzas.

yaml
# billing-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "billing-l7-policy"
spec:
  endpointSelector:
    matchLabels:
      app: billing-service
  ingress:
  # Rule 1: Allow checkout-service to POST to /api/v1/charge
  - fromEndpoints:
    - matchLabels:
        app: checkout-service
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/api/v1/charge"

  # Rule 2: Allow auditing-service to GET invoices
  - fromEndpoints:
    - matchLabels:
        app: auditing-service
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/v1/invoices/.*" # Use regex for paths

  # Rule 3: Allow any pod in the cluster with the 'system:monitoring' label to access healthz
  - fromEndpoints:
    - matchLabels:
        'k8s:system': 'monitoring'
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/healthz"

Apply the policy: kubectl apply -f billing-policy.yaml

Verification and Deep Dive

Let's verify the policy enforcement from our client pods.

* Get pod names:

bash
    CHECKOUT_POD=$(kubectl get pods -l app=checkout-service -o jsonpath='{.items[0].metadata.name}')
    AUDITING_POD=$(kubectl get pods -l app=auditing-service -o jsonpath='{.items[0].metadata.name}')

* Test from checkout-service:

bash
    # This should succeed (even though nginx will 404, the request is allowed)
    kubectl exec $CHECKOUT_POD -- curl -s -X POST http://billing-service:8080/api/v1/charge -o /dev/null -w "%{http_code}"
    # Expected output: 404 (or similar, but not a connection timeout)

    # This should be blocked by the policy and time out
    kubectl exec $CHECKOUT_POD -- curl -s -X GET http://billing-service:8080/api/v1/invoices/123 --connect-timeout 5
    # Expected output: curl: (28) Connection timed out after 5001 milliseconds

* Test from auditing-service:

bash
    # This should succeed
    kubectl exec $AUDITING_POD -- curl -s -X GET http://billing-service:8080/api/v1/invoices/123 -o /dev/null -w "%{http_code}"
    # Expected output: 404

    # This should be blocked
    kubectl exec $AUDITING_POD -- curl -s -X POST http://billing-service:8080/api/v1/charge --connect-timeout 5
    # Expected output: curl: (28) Connection timed out after 5001 milliseconds

How it Works Under the Hood:

  • When the policy is applied, the Cilium agent on the node hosting the billing-service pod recognizes it's an L7 policy.
    • It updates the eBPF program attached to the pod's network interface.
    • This eBPF program is now configured to inspect incoming packets on port 8080. It identifies them as part of a TCP stream destined for the HTTP port.
    • Instead of just allowing the packet, the eBPF program transparently redirects it to the Envoy proxy managed by Cilium.
  • Envoy performs the full L7 inspection: it checks the source security identity (which Cilium passes to it), the HTTP method (POST), and the path (/api/v1/charge).
  • If the request matches a rule in the CiliumNetworkPolicy, Envoy forwards the request to the actual Nginx container. If not, Envoy drops the connection.
  • This architecture provides the best of both worlds: the immense performance of eBPF for all L3/L4 filtering and identity lookups, with an efficient, targeted hand-off to a userspace proxy only when deep L7 inspection is required.


    Production Pattern 2: Dynamic DNS-Aware Egress Policies

    The Problem: Microservices frequently need to communicate with external, third-party APIs (e.g., Stripe, Twilio, S3). A common security requirement is to restrict egress traffic to only these specific services. The challenge is that the IP addresses for these services are often dynamic and can change without notice, served from a large pool behind a CDN. Creating egress rules based on static IP addresses or CIDR blocks is fragile and a maintenance nightmare. A rule allowing egress to 104.18.10.121 today might be incorrect tomorrow, either breaking the application or, worse, inadvertently allowing traffic to a completely different service that later acquires that IP.

    The Solution: Cilium provides a powerful solution with its toFQDNs policy rule. This allows you to define egress policies based on fully qualified domain names (FQDNs). Cilium dynamically resolves these domain names to IPs and constantly updates the allowed IP list in an eBPF map, ensuring the policy remains accurate without manual intervention.

    Implementation Example

    Let's create a policy that allows a data-exporter pod to communicate only with api.github.com and nothing else on the public internet.

    1. Deploy the data-exporter:

    yaml
    # exporter.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: data-exporter
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: data-exporter
      template:
        metadata:
          labels:
            app: data-exporter
        spec:
          containers:
          - name: exporter
            image: curlimages/curl
            command: ["sleep", "3600"]

    Apply it: kubectl apply -f exporter.yaml

    2. Define the DNS-Aware Egress Policy:

    This policy selects the data-exporter pod and applies an egress rule. It allows DNS traffic (UDP/53) to kube-dns and then allows TCP/443 traffic specifically to destinations matching the FQDN api.github.com.

    yaml
    # egress-fqdn-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "egress-to-github-api"
    spec:
      endpointSelector:
        matchLabels:
          app: data-exporter
      egress:
      # Step 1: Allow DNS lookups. This is crucial.
      # The policy is enforced at the IP level, so the pod must be able to resolve the FQDN first.
      - toEndpoints:
        - matchLabels:
            'k8s:io.kubernetes.pod.namespace': kube-system
            'k8s:k8s-app': kube-dns
        toPorts:
        - ports:
          - port: "53"
            protocol: UDP
          rules:
            dns:
            - matchPattern: "*"
    
      # Step 2: Allow HTTPS traffic to the resolved IPs of api.github.com
      - toFQDNs:
        - matchName: "api.github.com"
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP

    Apply the policy: kubectl apply -f egress-fqdn-policy.yaml

    Verification and Advanced Edge Cases

    * Get the pod name:

    bash
        EXPORTER_POD=$(kubectl get pods -l app=data-exporter -o jsonpath='{.items[0].metadata.name}')

    * Test connectivity:

    bash
        # This should succeed (we expect a 404 from GitHub's API, but the connection is made)
        kubectl exec $EXPORTER_POD -- curl -s -I https://api.github.com --connect-timeout 5
        # Expected output: HTTP/2 404 or similar success code
    
        # This should be blocked and time out
        kubectl exec $EXPORTER_POD -- curl -s -I https://www.google.com --connect-timeout 5
        # Expected output: curl: (28) Connection timed out after 5001 milliseconds

    Deep Dive into the eBPF Mechanism:

    This feature is a masterclass in eBPF's power:

  • DNS Interception: The Cilium agent attaches an eBPF program (using kprobe or tracepoint) to the kernel function responsible for receiving network packets, such as udp_recvmsg. This program inspects all DNS response packets leaving kube-dns (or your configured DNS server).
  • eBPF Map Population: When a DNS response for api.github.com is observed, the eBPF program extracts the IP addresses from the response. The Cilium agent in userspace is notified and populates a special eBPF map (cilium_dns_policy) with these IPs, associating them with the security identity of the pod that made the request.
  • Egress Enforcement: When the data-exporter pod attempts to make an outbound TCP connection on port 443, the eBPF program attached to its tc egress hook triggers. This program looks up the destination IP in the cilium_dns_policy map. If a match is found for its security identity, the connection is allowed. If not, the packet is dropped in the kernel.
  • Edge Case: DNS TTL and IP Churn

    What happens when the DNS record's Time-To-Live (TTL) expires and the IP for api.github.com changes? This is a critical production consideration.

    * Cilium's Approach: Cilium honors the TTL from the DNS record. When the TTL expires, the Cilium agent automatically removes the corresponding IP from the eBPF map. The next time the application tries to connect, it will perform a new DNS lookup. The eBPF DNS snooper will see the new response, and the agent will populate the map with the new IP address, healing the connection path automatically.

    Race Condition Risk: A subtle race condition exists. If a DNS record's TTL expires, and the cloud provider immediately reassigns that IP to another customer's service before* the Cilium agent's garbage collection removes it from the eBPF map, there is a small window where egress traffic could be misdirected. To mitigate this, Cilium employs polling and other heuristics to ensure timely cleanup. For highly sensitive workloads, you can configure lower DNS cache TTLs within your application or use Cilium's configuration to enforce a maximum cache duration, trading a slight increase in DNS lookups for tighter security.

    Performance and Scalability Considerations

    Adopting these advanced patterns has profound performance implications compared to traditional methods:

    * L7 Policy Overhead: While there is overhead in redirecting traffic to the Envoy proxy, it's significantly less than a full sidecar mesh. The eBPF pre-filtering ensures that only the necessary traffic (e.g., on port 8080) is ever sent to the proxy. All other traffic is handled purely in-kernel by eBPF. This targeted approach is ideal for security enforcement without the complexity of managing a full service mesh.

    * DNS Policy Performance: The FQDN enforcement mechanism is exceptionally fast. The DNS lookup happens once, and subsequent enforcement is a simple IP lookup in a kernel-level eBPF hash map. This is vastly superior to userspace solutions that might have to intercept every connect() syscall, which introduces significant overhead and context-switching.

    * Scalability: Both patterns scale horizontally. Since policies are enforced at the source/destination via eBPF on the node, there is no central bottleneck. Adding more nodes, pods, or policies has a minimal impact on overall cluster network performance, unlike iptables where every node must potentially manage massive, slow-to-update rule chains.

    Conclusion

    To build a true zero-trust network in Kubernetes, you must move beyond the coarse-grained controls of L3/L4 NetworkPolicy. By leveraging the power of eBPF and the advanced CRDs provided by Cilium, platform engineers can implement the granular, identity-aware, and dynamic policies that modern microservice architectures demand.

    The L7-aware and FQDN-aware patterns detailed here are not just theoretical possibilities; they are production-ready solutions to common, complex security challenges. Understanding the underlying eBPF mechanisms—from identity-based lookups in kernel maps to DNS response interception—is key to deploying, troubleshooting, and tuning these policies effectively. By embracing this next-generation networking stack, you can build Kubernetes platforms that are not only more secure but also significantly more performant and scalable.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles