Advanced eBPF Network Policies in Cilium for Zero-Trust K8s

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inadequacy of `iptables` for Modern Cloud-Native Security

For any engineer operating Kubernetes at scale, the limitations of the default NetworkPolicy resource, and more fundamentally, its common implementation via iptables, become painfully apparent. Standard policies operate at L3/L4, relying on IP addresses and ports. In a dynamic Kubernetes environment where pods are ephemeral and IPs change constantly, this model is brittle. Furthermore, iptables performance degrades linearly as the number of rules increases, leading to significant latency and CPU overhead in clusters with thousands of services and policies.

This isn't a critique of iptables itself—it's a testament to a tool being stretched beyond its intended design. The core problem is the context switch and packet traversal through chains of rules in kernel space. For every packet, the kernel must walk these chains to find a match. In a microservices mesh with heavy east-west traffic, this overhead is multiplied, creating a bottleneck that directly impacts application performance.

This is the context in which eBPF (extended Berkeley Packet Filter) emerges as a transformative technology for cloud-native networking. By allowing sandboxed programs to run directly within the Linux kernel, eBPF enables tools like Cilium to create a networking and security datapath that is more performant, programmable, and context-aware. Cilium bypasses iptables entirely for pod-to-pod traffic, attaching eBPF programs to network interfaces to make policy decisions at the earliest possible point in the packet processing pipeline. Policy lookups become O(1) operations using eBPF maps (highly efficient key-value stores in the kernel), irrespective of the number of policies.

This article assumes you understand the basics of Kubernetes networking and NetworkPolicy. We will dive directly into advanced security patterns that are only possible with Cilium's eBPF datapath and its custom resource definitions (CRDs), CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy.

Core Mechanism: Identity-Based Security with eBPF

Before we dive into policy examples, it's crucial to understand Cilium's core concept: identity-based security. Instead of using a pod's transient IP address as its primary identifier, Cilium uses a stable identity derived from its Kubernetes labels.

Here's the workflow:

  • Identity Allocation: When a pod is created, the Cilium agent on the node inspects its labels (e.g., app=frontend, role=api).
  • KVStore Registration: The agent communicates with a central key-value store (like etcd) to either retrieve an existing numeric Security Identity for that unique set of labels or allocate a new one. This identity is a 16-bit unsigned integer, allowing for up to 65,536 unique identities cluster-wide.
  • eBPF Map Programming: The agent programs an eBPF map on the node, mapping the pod's local IP address to its newly assigned Security Identity.
  • Policy Enforcement: When a packet leaves a pod, an eBPF program attached to the pod's network interface (veth pair) looks up the destination IP in another eBPF map. This map contains the Security Identities of all other pods in the cluster. The eBPF program then checks a policy map to see if SourceIdentity is allowed to communicate with DestinationIdentity on the specified port and protocol. This entire check happens in the kernel with a few hash table lookups, resulting in near-native packet processing speed.
  • This abstraction from IP addresses is what enables powerful, declarative policies that remain stable even as pods are rescheduled and reassigned IPs across the cluster.

    Production Example 1: Strict Ingress Control for a Backend API

    Let's model a common microservice scenario. We have a payment-api that should only accept ingress traffic from the checkout-api and a specific batch-processor job. Any other pod, even within the same namespace, should be denied access.

    First, let's define our pods with appropriate labels:

    yaml
    # checkout-api.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: checkout-api
      namespace: production
    spec:
      selector:
        matchLabels:
          app: checkout-api
      template:
        metadata:
          labels:
            app: checkout-api
            role: frontend-api
        spec:
          containers:
          - name: main
            image: my-repo/checkout-api:1.2.0
    ---
    # payment-api.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: payment-api
      namespace: production
    spec:
      selector:
        matchLabels:
          app: payment-api
      template:
        metadata:
          labels:
            app: payment-api
            role: backend-api
        spec:
          containers:
          - name: main
            image: my-repo/payment-api:2.5.1
            ports:
            - containerPort: 8080
    ---
    # batch-processor.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: nightly-settlement
      namespace: ops
    spec:
      template:
        metadata:
          labels:
            app: batch-processor
            task: settlement
        spec:
          containers:
          - name: processor
            image: my-repo/batch-processor:3.0.0
          restartPolicy: Never

    Now, we'll use a CiliumNetworkPolicy to enforce our desired ingress rule on the payment-api.

    yaml
    # payment-api-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: payment-api-ingress-policy
      namespace: production
    spec:
      endpointSelector:
        matchLabels:
          app: payment-api
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: checkout-api
            role: frontend-api
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": ops
            app: batch-processor
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP

    Dissection of the Policy:

    * endpointSelector: This targets the pods to which the policy applies. Here, any pod with the label app: payment-api in the production namespace.

    * ingress: This block defines the allowed incoming traffic rules. By default, if an ingress block is present, all other ingress traffic is denied (zero-trust default).

    * fromEndpoints: This is the core of identity-based security. Instead of specifying CIDRs, we specify label selectors for allowed source pods.

    The first selector allows any pod with both* app: checkout-api and role: frontend-api.

    * The second selector demonstrates a cross-namespace rule. It allows pods from the ops namespace ("k8s:io.kubernetes.pod.namespace": ops) that also have the app: batch-processor label. The k8s: prefix denotes a reserved label that Cilium automatically applies to endpoints.

    * toPorts: This specifies that the allowed traffic must be on TCP port 8080. Traffic to other ports on the payment-api pod will be dropped.

    This policy is far more robust and readable than an IP-based equivalent. It describes intent rather than network topology.

    Advanced Scenario: L7-Aware Policies for API Security

    L3/L4 policies are often insufficient. Consider a shared internal API gateway that routes requests to different backend services based on the HTTP path. We might want to allow a user-service to read data (GET /api/v1/users/{id}) but prevent it from deleting data (DELETE /api/v1/users/{id}). An admin-service, however, should be allowed to perform both actions.

    This requires L7 visibility, which Cilium provides by transparently integrating with an Envoy proxy. When an L7 policy is applied, Cilium's eBPF datapath intercepts the relevant traffic and redirects it to an Envoy proxy running on the same node without any application configuration changes.

    Production Example 2: Granular HTTP Method and Path Control

    Let's secure an internal-gateway pod. We have two clients: user-profile-service and admin-dashboard.

    yaml
    # internal-gateway-l7-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: internal-gateway-l7-policy
      namespace: core-infra
    spec:
      endpointSelector:
        matchLabels:
          app: internal-gateway
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: user-profile-service
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/api/v1/users/.*"
            - method: "PUT"
              path: "/api/v1/users/[0-9]+/profile"
      - fromEndpoints:
        - matchLabels:
            app: admin-dashboard
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/api/v1/users/.*"
            - method: "DELETE"
              path: "/api/v1/users/[0-9]+"
            - method: "GET"
              path: "/api/v1/metrics"

    Dissection of the L7 Policy:

    * The policy is split into two ingress stanzas, one for each source identity.

    For user-profile-service: We open TCP port 80, but add an L7 rules block. This block specifies that only GET requests to paths matching the regex /api/v1/users/. and PUT requests to /api/v1/users/[0-9]+/profile are allowed. Any other request from this service (e.g., POST /api/v1/users or DELETE /api/v1/users/123) will receive an HTTP 403 Forbidden response from the Envoy proxy.

    * For admin-dashboard: This service has more privileges. It can GET and DELETE users, and also access the /api/v1/metrics endpoint. The path matching uses POSIX ERE (Extended Regular Expression) syntax, providing powerful matching capabilities.

    Performance Considerations for L7 Policies:

    Enabling L7 inspection is not free. It involves redirecting traffic from the pure eBPF datapath to the Envoy proxy in user space. This introduces latency compared to L3/L4-only policies. However, the overhead is localized to the specific pods and ports targeted by the L7 policy. Best practice is to apply L7 policies surgically only where deep packet inspection is required, while using performant L3/L4 identity-based policies for the majority of traffic.

    Advanced Scenario: DNS-Aware Egress Policies for External Services

    Controlling egress traffic is a critical component of a zero-trust posture. A common requirement is to allow a pod to connect to a specific external service (e.g., a third-party payment provider like api.stripe.com) but nothing else. The challenge is that the IP addresses for these FQDNs can change frequently and unpredictably.

    Basing egress policies on static IP addresses is a maintenance nightmare and prone to failure. Cilium solves this with DNS-aware policies.

    Production Example 3: Locking Down Egress to a Specific FQDN

    Imagine a reporting-service that needs to upload data to an S3 bucket (my-company-reports.s3.us-east-1.amazonaws.com) and send metrics to Datadog (api.datadoghq.com).

    yaml
    # reporting-service-egress-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: reporting-service-egress
      namespace: analytics
    spec:
      endpointSelector:
        matchLabels:
          app: reporting-service
      egress:
      - toFQDNs:
        - matchName: "my-company-reports.s3.us-east-1.amazonaws.com"
        - matchName: "api.datadoghq.com"
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP
      # Allow DNS traffic itself, otherwise FQDN lookups will fail!
      - toEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": kube-system
            "k8s:k8s-app": kube-dns
        toPorts:
        - ports:
          - port: "53"
            protocol: UDP
          rules:
            dns:
            - matchPattern: "*"

    Dissection of the FQDN Policy:

    * egress block: This defines allowed outbound traffic. If present, all other egress traffic is denied by default.

    * toFQDNs: This is the key element. We specify the fully qualified domain names the pod is allowed to connect to.

    * How it works: When the reporting-service pod attempts to resolve api.datadoghq.com, the Cilium agent intercepts the DNS request. It allows the request to proceed to kube-dns, but it also inspects the response. It then caches the mapping of api.datadoghq.com to its resolved IP addresses (e.g., 52.20.126.253). Cilium dynamically programs the eBPF maps on the node to allow egress traffic from the reporting-service pod to these specific destination IPs on TCP port 443.

    * DNS TTL: Cilium respects the TTL of the DNS records. When a record expires from its cache, it will re-allow a DNS lookup for that name and update its eBPF maps with the new IPs. This ensures the policy remains effective even as DNS records change.

    * Allowing DNS: A critical and often overlooked part of FQDN policies is that you must explicitly allow the pod to make DNS requests. The second egress stanza in the example does exactly this. It allows UDP traffic on port 53 to the kube-dns service endpoints.

    Edge Cases and Production Hardening

    Implementing these policies in production requires consideration for several edge cases.

    Policy for `hostNetwork` Pods

    Pods running with hostNetwork: true (e.g., monitoring agents like Datadog or Prometheus node-exporter) bypass the pod network namespace and bind directly to the node's network interface. Standard NetworkPolicy cannot target them effectively. Cilium can enforce policies on these pods using CiliumClusterwideNetworkPolicy (CCNP) combined with a nodeSelector.

    yaml
    # Secure node-exporter access
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: allow-prometheus-to-nodes
    spec:
      nodeSelector:
        matchLabels: {}
      ingress:
      - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": monitoring
            app: prometheus
        toPorts:
        - ports:
          - port: "9100" # node-exporter port
            protocol: TCP

    This CCNP applies to the host networking stack on all nodes (nodeSelector: {}) and allows ingress on port 9100 only from pods labeled app: prometheus in the monitoring namespace.

    Policy Auditing and Dry-Run

    Applying a restrictive network policy in a production environment can be risky. Cilium provides tools to mitigate this.

    Policy Audit Mode: You can apply a policy in a non-enforcing, audit-only mode by adding the annotation policy.cilium.io/mode: audit to the policy manifest. In this mode, traffic that would have been dropped* is instead allowed, but a trace notification is logged. This allows you to observe the impact of a policy before enabling enforcement.

    * Hubble CLI for Observability: Hubble provides deep observability into the network flows within your cluster. You can use it to verify policy behavior in real-time.

    bash
        # Watch for dropped packets to the payment-api pod
        $ hubble observe --namespace production --to-pod payment-api --verdict DROPPED -f
        
        # See if a specific flow is allowed or denied
        $ cilium policy verdict --from-pod default:my-app --to-pod production:payment-api --dport 8080
        -> INGRESS: ALLOWED
        -> EGRESS: ALLOWED
        Final verdict: ALLOWED

    Explicit Deny Policies

    While the default-deny model is powerful, sometimes you need to carve out exceptions in a generally permissive environment. CiliumNetworkPolicy supports explicit deny rules. These rules take precedence over any allow rules.

    yaml
    # Deny access to an internal-only endpoint from the outside
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: deny-admin-endpoint-from-dmz
      namespace: backend
    spec:
      endpointSelector:
        matchLabels:
          app: user-database
      ingress:
      - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": dmz
        toPorts:
        - ports:
          - port: "5432"
            protocol: TCP
          rules:
            # This is a hypothetical L7 rule for Postgres
            # Demonstrating the concept of a deny rule
            l7proto: "postgresql"
            l7: 
            - "query_action": "DROP"
              "query_table": "users"

    Note: This is a conceptual example; actual L7 deny rules depend on the supported parsers. The structure demonstrates the deny capability.

    Conclusion: A Paradigm Shift in Kubernetes Security

    Transitioning from iptables-based CNIs to an eBPF-powered solution like Cilium represents a fundamental shift in how we secure Kubernetes clusters. It's a move from a brittle, IP-centric model to a robust, identity-aware model that aligns with cloud-native principles.

    By leveraging CiliumNetworkPolicy, senior engineers can:

  • Implement True Zero-Trust: Build on a default-deny foundation where only explicitly allowed communication paths are permitted.
  • Decouple Security from Topology: Define policies based on service identity (labels), making them resilient to pod churn and IP changes.
  • Achieve Granular Control: Enforce L7 rules for APIs and DNS-aware rules for egress, providing a level of security unattainable with standard NetworkPolicy.
  • Enhance Performance and Scalability: Eliminate the iptables bottleneck, leading to lower latency and higher throughput, especially in large and high-traffic clusters.
  • The patterns discussed here—identity-based ingress, L7 API filtering, and FQDN-based egress—are not just features; they are the building blocks for a modern, secure, and performant microservices architecture. Mastering them is essential for any engineer responsible for the security and stability of production Kubernetes environments.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles