eBPF Network Policies in Cilium for Zero-Trust K8s Security

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Performance Bottleneck of iptables in Large-Scale Kubernetes

For any senior engineer who has managed a Kubernetes cluster beyond a few dozen nodes, the limitations of the default kube-proxy iptables mode become painfully apparent. While functional, iptables was not designed for the dynamic, high-churn environment of container orchestration. Every Service creates a set of rules, and every Pod adds more. In a large cluster with thousands of services and pods, the iptables chains can grow to tens of thousands of rules.

Traffic routing and policy enforcement require traversing these chains. The complexity is not O(1); it's often closer to O(n), where 'n' is the number of services or rules. This linear scaling introduces tangible latency to every single packet and consumes significant CPU on worker nodes, not for business logic, but for networking overhead. Furthermore, standard Kubernetes NetworkPolicy objects are limited to L3/L4 (IP/Port) filtering, which is insufficient for modern zero-trust security postures that demand application-layer awareness.

This is the core problem Cilium addresses by replacing iptables with eBPF (extended Berkeley Packet Filter).

Why eBPF Changes the Game

eBPF allows us to run sandboxed programs directly within the Linux kernel, triggered by specific events like network packet arrival. Cilium leverages this by attaching eBPF programs to network interfaces at hooks like Traffic Control (TC) and, for even higher performance, Express Data Path (XDP).

When a packet arrives at a pod's network interface, a Cilium-managed eBPF program executes. Instead of traversing a long chain of rules, this program performs a highly efficient lookup in an eBPF map. Cilium assigns a unique security identity (a numeric ID) to each endpoint (pod) based on its labels. The network policy is compiled into a compact representation and stored in eBPF maps, keyed by these identities.

The result is a near O(1) complexity for policy enforcement, regardless of the number of services or policies in the cluster. The kernel makes a direct, identity-based decision on whether to allow or drop the packet, bypassing the entire iptables and conntrack stack. This fundamental architectural shift is what enables both superior performance and more sophisticated security capabilities.


Advanced `CiliumNetworkPolicy` Implementation Patterns

Let's move beyond theory and implement production-grade policies for a realistic scenario. Imagine a multi-tenant environment with three namespaces:

* payments: Houses the critical payment processing service.

* invoicing: Contains the service responsible for generating invoices after a successful payment.

* shared-monitoring: Runs Prometheus for scraping metrics across the cluster.

Our security requirements for a zero-trust posture are:

  • Default Deny: No pod can communicate with any other pod unless explicitly allowed.
  • Principle of Least Privilege: The payments-api pod should only be able to initiate contact with the invoicing-generator pod on its specific gRPC port.
  • Directional Control: The invoicing-generator must not be able to initiate contact back to payments-api.
  • Controlled Observability: Prometheus in the shared-monitoring namespace can scrape metrics from pods in both payments and invoicing, but no other pod in shared-monitoring can.
  • Step 1: Establishing a Default Deny Stance

    First, we apply a cluster-wide policy that denies all inter-pod communication unless another policy explicitly allows it. This is the foundation of zero-trust.

    yaml
    # policy-00-default-deny.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "default-deny-all"
      namespace: "kube-system" # Applied from a central namespace
    spec:
      endpointSelector: {}
      ingress: []
      egress: []

    By selecting all endpoints (endpointSelector: {}) and providing empty ingress and egress lists, we effectively block all traffic. Applying this will immediately break communication in your cluster, which is the desired starting point.

    Step 2: Implementing L4 Service-to-Service Communication

    Now, let's specifically allow the payments-api to communicate with the invoicing-generator.

    Assume our pods have the following labels:

    * payments-api: app: payments-api, team: payments

    * invoicing-generator: app: invoicing-generator, team: invoicing

    * prometheus: app: prometheus, team: monitoring

    Here is the policy to allow traffic from payments to invoicing.

    yaml
    # policy-01-payments-to-invoicing.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-payments-to-invoicing"
      namespace: "invoicing"
    spec:
      description: "Allow payments-api to call invoicing-generator on gRPC port 50051"
      # This policy applies to the invoicing-generator pod
      endpointSelector:
        matchLabels:
          app: invoicing-generator
    
      # Ingress rules for the selected endpoint
      ingress:
        - fromEndpoints:
            # Allow traffic FROM pods with these labels
            - matchLabels:
                app: payments-api
                team: payments
          # To a specific port
          toPorts:
            - ports:
                - port: "50051"
                  protocol: TCP

    Analysis:

    * namespace: "invoicing": The policy is scoped to the invoicing namespace.

    endpointSelector: It applies only* to pods with the label app: invoicing-generator.

    * fromEndpoints: This is the core of Cilium's identity-based security. It allows ingress from any pod in the cluster that matches the app: payments-api and team: payments labels, regardless of its IP address or namespace.

    * toPorts: We restrict this communication to TCP port 50051.

    This policy is powerful, but we can do better. It allows any traffic on port 50051, but what if that port also exposed a debug endpoint? We need L7 awareness.

    Step 3: Layer 7 Policy Enforcement for gRPC and HTTP

    To enforce policy at the application layer, Cilium integrates with Envoy as a proxy, which is transparently injected when an L7 rule is present. Let's refine our policy to allow only a specific gRPC method.

    yaml
    # policy-02-l7-grpc-payments-to-invoicing.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-payments-to-invoicing-l7"
      namespace: "invoicing"
    spec:
      description: "Allow payments-api to call ONLY the GenerateInvoice gRPC method"
      endpointSelector:
        matchLabels:
          app: invoicing-generator
    
      ingress:
        - fromEndpoints:
            - matchLabels:
                app: payments-api
                team: payments
          toPorts:
            - ports:
                - port: "50051"
                  protocol: TCP
              # L7 protocol parsing rules
              rules:
                http:
                  - method: "POST"
                    # For gRPC, the path is /<package.Service>/<Method>
                    path: "/invoicing.v1.InvoicingService/GenerateInvoice"

    Deep Dive into L7 Rules:

    * rules.http: Cilium treats gRPC as HTTP/2 traffic, so we use an http rule. The gRPC method call is mapped to an HTTP POST request.

    * path: This is the critical part. The path must match the fully qualified gRPC service and method name: /./. Any other gRPC method call to this service, such as a debug or admin method, will be blocked by Envoy at the direction of Cilium, even though it's on the same port.

    This provides an exceptionally granular level of security. You have now guaranteed that the payments-api can only perform its designated function on the invoicing service.

    Step 4: Enabling Prometheus Scrapes

    Finally, let's create a policy to allow Prometheus to scrape our application pods. Both services expose a /metrics endpoint on port 9090.

    yaml
    # policy-03-allow-prometheus-scrape.yaml
    apiVersion: "cilium.ioio/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-prometheus-scrapes"
      # This policy is applied in both namespaces
      namespace: "payments"
    spec:
      description: "Allow Prometheus to scrape metrics from all pods in this namespace"
      endpointSelector: {}
      ingress:
        - fromEndpoints:
            - matchLabels:
                # Select the Prometheus pod in the monitoring namespace
                k8s-app: prometheus
                app.kubernetes.io/instance: prometheus
          toPorts:
            - ports:
                - port: "9090"
                  protocol: TCP
              rules:
                http:
                  - method: "GET"
                    path: "/metrics"
    ---
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-prometheus-scrapes"
      namespace: "invoicing"
    spec:
      description: "Allow Prometheus to scrape metrics from all pods in this namespace"
      # Select all endpoints in the invoicing namespace
      endpointSelector: {}
      ingress:
        - fromEndpoints:
            # Define the source using namespace and pod selectors for clarity
            - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: shared-monitoring
              podSelector:
                matchLabels:
                  app: prometheus
          toPorts:
            - ports:
                - port: "9090"
                  protocol: TCP
              rules:
                http:
                  - method: "GET"
                    path: "/metrics"

    This demonstrates applying a similar rule across namespaces, locking down access to the specific Prometheus pod and only allowing it to GET /metrics on port 9090.


    Handling Advanced Edge Cases in Production

    Real-world systems are never as clean as the examples above. Senior engineers must anticipate and handle complex edge cases.

    Edge Case 1: DNS-based Egress Policies and TTL Headaches

    A common requirement is for a pod to access an external service, like a third-party payment gateway API (e.g., api.paymentprovider.com). You cannot use an IP-based policy because the IP addresses can change frequently, especially if the service is behind a CDN.

    Cilium solves this with DNS-aware policies.

    yaml
    # policy-04-egress-to-fqdn.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-egress-to-payment-provider"
      namespace: "payments"
    spec:
      endpointSelector:
        matchLabels:
          app: payments-api
      egress:
        - toEndpoints:
            # Allow egress to other pods in the cluster (e.g., invoicing)
            - matchLabels:
                app: invoicing-generator
        # Allow egress to external FQDNs
        - toFQDNs:
            - matchNames:
                - "api.paymentprovider.com"
          ports:
            - ports:
                - port: "443"
                  protocol: TCP

    How it Works Under the Hood:

  • The payments-api pod makes a DNS request for api.paymentprovider.com.
    • Cilium's eBPF program, hooked into the kernel's socket operations, intercepts this DNS request/response.
    • Cilium populates an eBPF map with the resolved IP addresses associated with that FQDN and the security identity of the source pod.
    • When the pod then tries to make a TCP connection to one of those IPs, another eBPF program checks the map and allows the connection.

    The Production Problem: Short TTLs and DNS Caching

    What happens if api.paymentprovider.com has a very short DNS TTL (e.g., 60 seconds)? Cilium's internal DNS cache must be managed carefully. If the pod uses a cached IP after Cilium's entry has expired, the connection will be dropped. This can cause intermittent, hard-to-debug failures.

    Solution and Tuning Parameters:

    * --tofqdns-min-ttl: A Cilium agent flag to set a minimum TTL. If the upstream DNS record has a TTL of 30s, Cilium can be configured to hold onto it for, say, 300s to reduce churn.

    * --tofqdns-idle-connection-grace-period: Controls how long connections to an FQDN's IP can exist after the DNS entry expires. This is crucial for long-lived connections.

    * Monitoring DNS Policy State: Use the Cilium CLI to inspect the state of FQDN policies:

    bash
        # Exec into a cilium agent pod
        cilium fqdn cache list

    This command will show you the FQDNs, their corresponding IPs, and their expiration timestamps, which is invaluable for debugging.

    Edge Case 2: Securing `hostNetwork: true` Pods

    Some pods, like node exporters or certain CNI components, need to run with hostNetwork: true. These pods bypass the pod network namespace and bind directly to the node's network interface. Standard CiliumNetworkPolicy which uses endpointSelector will not apply to them because they are not a standard Cilium-managed "endpoint".

    This is a major security gap. How do you prevent a compromised node-exporter from accessing sensitive services on the node or exfiltrating data?

    Solution: CiliumClusterwideNetworkPolicy and nodeSelector

    Cilium provides cluster-scoped policies and the ability to select nodes themselves as endpoints.

    yaml
    # policy-05-secure-host-network.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "restrict-node-exporter-access"
    spec:
      description: "Control traffic for host-networked pods on the node itself"
      # Select the nodes to apply this policy to
      nodeSelector:
        matchLabels:
          kubernetes.io/os: linux
    
      # Allow ingress to the node-exporter port (9100) from Prometheus only
      ingress:
        - fromEndpoints:
            - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: shared-monitoring
              podSelector:
                matchLabels:
                  app: prometheus
          toPorts:
            - ports:
                - port: "9100"
                  protocol: TCP
    
      # Deny all other ingress to the node
      # (Be careful with this in production! Ensure you allow SSH, kubelet, etc.)

    This policy is applied directly to the host's network stack. It ensures that even though the node-exporter is running on the host network, only Prometheus can connect to it on port 9100. All other access attempts will be dropped by eBPF at the node's network interface level.


    Policy Auditing and Observability with Hubble

    Enforcing policy is only half the battle. In a complex system, you need to understand traffic flows and debug why connections are being dropped. This is where Hubble, Cilium's observability component, is indispensable.

    Let's say a developer deploys a new version of the payments-api which now needs to call a new audit-log service, but they forgot to update the network policy. The connections will fail silently.

    To debug this, we use the hubble CLI:

    bash
    # See all dropped packets originating from the payments namespace
    hubble observe --namespace payments --verdict DROPPED -f
    
    # Example Output:
    #TIMESTAMP     SOURCE                              DESTINATION                         TYPE      VERDICT   SUMMARY
    #Dec 15 14:20:10 payments/payments-api-7c... -> invoicing/audit-log-5b... (8080)   L4-NEW    DROPPED   Policy denied (TCP_SYN)

    This immediately tells us:

    • The exact source and destination pods.
  • The destination port (8080).
  • The verdict: DROPPED.
  • The reason: Policy denied.
  • Audit Mode for Safe Policy Rollout:

    Making a mistake with a network policy in production can cause an outage. To prevent this, Cilium policies can be applied in "audit" mode. This logs the policy violation but does not actually drop the packet.

    yaml
    # policy-06-audit-mode.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-payments-to-invoicing"
      namespace: "invoicing"
    spec:
      endpointSelector:
        matchLabels:
          app: invoicing-generator
      ingress:
        - fromEndpoints: ...
          toPorts:
            - ports: ...
              # Add this rule to log violations instead of enforcing
              rules:
                l7proto: ""
                l7: 
                  - "{}"
              # This is the key field for audit mode
              # The action is to audit all traffic on this port
              # and apply L7 rules if they exist.
              # In Cilium 1.13+ this is simplified with `action: AUDIT`

    With this policy, a non-compliant API call would be logged by Hubble with a verdict of AUDIT, but the traffic would be allowed to proceed. This allows you to safely observe the effects of a new policy before switching it to enforcement mode.

    Conclusion: eBPF is the Foundation for Modern Cloud-Native Security

    By moving network policy enforcement from the cumbersome iptables framework into the efficient, programmable eBPF layer in the kernel, Cilium provides a step-function improvement in performance, scalability, and security granularity. Standard Kubernetes NetworkPolicies are a starting point, but they lack the application-layer awareness and identity-based controls required for a true zero-trust model.

    As a senior engineer, mastering CiliumNetworkPolicy and its underlying eBPF mechanisms is no longer a niche skill; it's becoming a core competency for building secure, scalable, and observable distributed systems. By understanding how to implement L7-aware rules, handle complex edge cases like DNS and host networking, and leverage observability tools like Hubble, you can build a security posture that is not a bolted-on afterthought but an intrinsic, high-performance feature of your Kubernetes platform.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles