Zero-Trust K8s Networking with Cilium's eBPF-Powered Policies

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inadequacy of Native `NetworkPolicy` for Zero-Trust

As architects of distributed systems on Kubernetes, we're tasked with building environments that are secure by default. The principle of least privilege, a cornerstone of zero-trust security, dictates that a workload should only be able to communicate with the specific services it absolutely requires. While the native Kubernetes NetworkPolicy resource provides a starting point, its limitations become immediately apparent in any non-trivial production environment.

Native policies operate primarily at L3/L4 (IP address and port). This forces a reliance on unstable pod IPs or broad CIDR ranges, which are antithetical to the dynamic, ephemeral nature of cloud-native workloads. They lack awareness of application-level protocols (L7), meaning you can allow a connection to a database on port 5432 but cannot distinguish between a read-only query and a destructive DROP TABLE command. Furthermore, controlling egress to external services by their fully qualified domain name (FQDN) is impossible, forcing engineers to maintain brittle, manually updated IP whitelists for third-party APIs.

These constraints make implementing a robust zero-trust model with native tools a losing battle. To achieve the required granularity and performance, we must descend the stack and leverage modern kernel capabilities. This is where Cilium and eBPF fundamentally change the game.

The Cilium eBPF Datapath: Bypassing `iptables` for Performance and Identity

Cilium's core innovation is its eBPF-based datapath. Instead of relying on chains of iptables rules, which suffer from performance degradation at scale and are difficult to debug, Cilium attaches lightweight, sandboxed eBPF programs directly to network hooks within the Linux kernel (e.g., the Traffic Control tc ingress/egress hooks on a network device).

When a packet arrives at a pod's network interface, the eBPF program executes instantly in kernel space. This program has access to packet data and, crucially, to shared eBPF maps. Cilium uses these maps to store a mapping between a workload's identity and its corresponding policy.

Identity-Based Security: The Core Primitive

Instead of using a pod's ephemeral IP address as its primary identifier, Cilium assigns a unique, cluster-wide Security Identity to each endpoint based on its Kubernetes labels. For example, a pod with labels app=api, role=payments might be assigned the identity 47812.

  • The Cilium agent on each node monitors the Kubernetes API for pods and their labels.
  • It assigns a numeric identity to unique sets of labels, synchronizing this mapping across the cluster via a key-value store (like etcd or the K8s CRD-backed store).
  • When a CiliumNetworkPolicy is applied, it's compiled into eBPF bytecode. This bytecode doesn't contain IP addresses; it contains rules based on these numeric identities.
    • The compiled rules are loaded into eBPF maps on each relevant node.

    When our eBPF program at the tc hook inspects a packet, it extracts the source endpoint's security identity (which Cilium has associated with the source IP in another map) and checks against the policy map for the destination endpoint. This check is a highly efficient key-lookup in a kernel-space hash map, orders of magnitude faster than traversing a linear iptables chain.

    This identity-based model provides several critical advantages:

    * Scalability: Policy enforcement complexity is independent of the number of pods. It scales with the number of unique label sets (identities), which is typically far smaller.

    * Decoupling: Network policy is decoupled from network location (IP address). Pods can be rescheduled, scaled up, or moved across nodes without requiring any policy changes.

    * Performance: Bypassing iptables and conntrack significantly reduces per-packet overhead, leading to lower latency and higher throughput, especially in services with high connection churn.

    Advanced `CiliumNetworkPolicy` in Production Scenarios

    Let's move beyond theory and implement production-grade policies using the CiliumNetworkPolicy CRD. We'll assume a microservices application for an e-commerce platform.

    Scenario 1: Strict Identity-Based Ingress and Egress

    Problem: The order-processing service should only accept ingress traffic from the api-gateway and initiate egress traffic only to the postgres-db service on port 5432.

    Solution: We define two policies. One for ingress, one for egress.

    yaml
    # ingress-policy-order-processing.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "order-processing-ingress"
      namespace: "production"
    spec:
      endpointSelector:
        matchLabels:
          app: order-processing
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: api-gateway
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
    yaml
    # egress-policy-order-processing.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "order-processing-egress"
      namespace: "production"
    spec:
      endpointSelector:
        matchLabels:
          app: order-processing
      egress:
      - toEndpoints:
        - matchLabels:
            app: postgres-db
        toPorts:
        - ports:
          - port: "5432"
            protocol: TCP

    Analysis:

    * endpointSelector: This targets the pods to which the policy applies. Here, any pod with the label app: order-processing.

    * fromEndpoints: The ingress rule specifies that only endpoints with the label app: api-gateway are allowed.

    * toEndpoints: The egress rule limits outbound connections to endpoints labeled app: postgres-db.

    * Implicit Deny: If any policy selects a pod, that pod enters a default-deny mode. Any traffic not explicitly allowed by a policy is dropped. This is the foundation of a zero-trust posture.

    Scenario 2: L7-Aware HTTP Policy for API Authorization

    Problem: The api-gateway is allowed to communicate with the user-profile service. However, we want to enforce that only internal service-accounts can modify user data (POST, PUT, DELETE), while general frontend traffic can only read it (GET).

    Solution: Cilium can parse HTTP traffic (and other L7 protocols like gRPC, Kafka) to enforce rules based on paths, methods, or headers.

    yaml
    # l7-policy-user-profile.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "user-profile-l7-access"
      namespace: "production"
    spec:
      endpointSelector:
        matchLabels:
          app: user-profile
      ingress:
      - fromEndpoints:
        - matchLabels:
            # Allow traffic from both frontend and internal services
            any:app:
            - api-gateway
            - internal-batch-job
        toPorts:
        - ports:
          - port: "9000"
            protocol: TCP
          rules:
            http:
            # Rule 1: Read-only access for the gateway
            - method: "GET"
              path: "/api/v1/users/.*"
              fromEndpoints:
                - matchLabels:
                    app: api-gateway
            # Rule 2: Write access for internal jobs
            - method: "POST"
              path: "/api/v1/users"
              fromEndpoints:
                - matchLabels:
                    app: internal-batch-job
            - method: "PUT"
              path: "/api/v1/users/.*"
              fromEndpoints:
                - matchLabels:
                    app: internal-batch-job

    Implementation Details:

    * When Cilium sees a toPorts rule with an L7 protocol (http), it dynamically enables a parser for that traffic. This is often handled by an embedded Envoy proxy managed by the Cilium agent, but modern versions are increasingly shifting this logic directly into eBPF for even higher performance (a feature known as "Host-level visibility").

    * The eBPF program at the tc hook redirects traffic on port 9000 to the parser. The parser inspects the HTTP headers.

    * If the request matches an allowed rule (e.g., a GET from api-gateway), the packet is forwarded to the user-profile pod. If not (e.g., a POST from api-gateway), the connection is terminated.

    Notice the fromEndpoints block inside* the HTTP rule. This allows us to apply L7 restrictions based on the source's identity, providing extremely granular control.

    Scenario 3: DNS-Aware Egress Control for External APIs

    Problem: A payment-processor service needs to communicate with Stripe's API (api.stripe.com) but should be blocked from making any other external network calls to prevent data exfiltration.

    Solution: We leverage Cilium's toFQDNs feature.

    yaml
    # fqdn-policy-payment-processor.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "payment-processor-external-egress"
      namespace: "production"
    spec:
      endpointSelector:
        matchLabels:
          app: payment-processor
      egress:
      # Rule 1: Allow DNS lookups to kube-dns
      - toEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": kube-system
            "k8s:k8s-app": kube-dns
        toPorts:
        - ports:
          - port: "53"
            protocol: UDP
          rules:
            dns:
            - matchPattern: "*"
    
      # Rule 2: Allow HTTPS traffic to api.stripe.com
      - toFQDNs:
        - matchName: "api.stripe.com"
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP

    Under the Hood:

  • DNS Interception: The eBPF program attached to the payment-processor pod's network interface is configured to intercept all DNS requests on port 53.
  • Policy Check: When a DNS request for api.stripe.com is made, the eBPF program checks if the FQDN is allowed by any policy. In this case, it is.
  • Dynamic IP Mapping: The request is allowed to proceed to kube-dns. When the DNS response comes back with the IP addresses for api.stripe.com, the Cilium agent's DNS proxy intercepts it.
  • eBPF Map Update: The agent programs an eBPF map with the returned IP addresses, associating them with the security identity of the payment-processor pod and the allowed FQDN. A TTL is also set, based on the DNS record's TTL.
  • Traffic Enforcement: When the pod then tries to open a TCP connection to one of Stripe's IPs on port 443, the eBPF egress program on its network interface checks the packet's destination IP. It performs a lookup in the FQDN policy map and finds a match, allowing the packet to be sent. Any attempt to connect to an IP not in this map will be dropped.
  • This mechanism is vastly superior to maintaining static IP whitelists for cloud services whose IPs change frequently.

    Edge Cases and Production-Grade Patterns

    Pattern: Cluster-Wide Default Deny

    To enforce a true zero-trust posture, you should start with a cluster-wide policy that denies all communication by default, forcing teams to explicitly define CiliumNetworkPolicy for their applications.

    yaml
    # clusterwide-default-deny.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "default-deny-all"
    spec:
      endpointSelector: {}
      ingress: []
      egress: []

    * CiliumClusterwideNetworkPolicy: This is a non-namespaced resource that applies to all pods in the cluster.

    * endpointSelector: {}: An empty selector matches all endpoints.

    * ingress: [] and egress: []: Empty ingress and egress arrays mean no traffic is allowed. With this policy in place, a pod can only communicate if another, more specific policy grants it permission.

    Edge Case: Policies for `hostNetwork: true` Pods

    Pods running with hostNetwork: true (e.g., node-exporter or certain CNI components) do not have their own network namespace. They are directly exposed on the node's network interface. Cilium can still apply policies to them by targeting the host itself.

    Problem: Allow Prometheus to scrape metrics from node-exporter pods running on the host network, but prevent node-exporter from making any other connections.

    yaml
    # host-policy-node-exporter.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "node-exporter-host-policy"
      # Note: Host policies are typically placed in the kube-system namespace
      namespace: kube-system
    spec:
      # This special 'reserved:host' entity selects the host itself as an endpoint
      nodeSelector:
        matchLabels:
          {}
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: prometheus
        toPorts:
        - ports:
          - port: "9100"
            protocol: TCP

    * nodeSelector: Instead of endpointSelector, we use nodeSelector to apply this policy to the host networking stack on nodes that match the labels (empty selector means all nodes).

    * This effectively treats the node as a Cilium-managed endpoint, allowing fine-grained control even for workloads that bypass the standard pod networking model.

    Performance and Observability with Hubble

    The most significant operational benefit of an eBPF-based datapath is not just policy enforcement but the rich, free observability it provides. The same eBPF programs that make policy decisions can also export metadata about every single flow to a user-space agent.

    Quantifying Performance

    Directly benchmarking the network performance difference between an iptables-based CNI (like kube-router or Calico in iptables mode) and Cilium's eBPF mode requires a controlled environment. However, typical results from tools like iperf3 for throughput and qperf for latency show:

    * Throughput: Cilium often approaches bare-metal line-rate performance, as the eBPF path is highly optimized.

    * Latency: For services with high connection rates, the elimination of conntrack table locks and iptables rule traversal can result in a 10-30% reduction in P99 latency for network requests within the cluster.

    This performance gain is particularly impactful for latency-sensitive applications like financial trading platforms, real-time bidding systems, or distributed databases.

    Debugging with Hubble

    When a policy isn't behaving as expected, Hubble provides indispensable introspection.

    Problem: A new v2 version of the user-profile service was deployed, but the api-gateway is receiving connection timeouts when trying to reach it. We suspect a network policy issue.

    Debugging Steps:

  • Target the source pod and look for dropped packets:
  • bash
        # Using the Hubble CLI, observe traffic from the api-gateway
        # and filter for dropped verdicts.
        kubectl exec -it -n kube-system cilium-xxxx -- hubble observe --from-pod production/api-gateway-7f8c9d... --verdict DROPPED -o json
  • Analyze the Hubble output:
  • json
        {
          "flow": {
            "verdict": "DROPPED",
            "drop_reason_desc": "POLICY_DENIED",
            "source": {
              "identity": 16452,
              "namespace": "production",
              "labels": ["k8s:app=api-gateway", ...]
            },
            "destination": {
              "identity": 31098,
              "namespace": "production",
              "labels": ["k8s:app=user-profile", "k8s:version=v2", ...]
            },
            "L4": {"TCP": {"destination_port": 9000}},
            ...
          }
        }
  • Identify the Root Cause: The output clearly shows a POLICY_DENIED drop. We inspect the labels of the destination pod (k8s:version=v2) and realize our L7 policy user-profile-l7-access has an endpointSelector of matchLabels: { app: user-profile }. It's missing the version label!
  • The Fix: The selector doesn't match the new v2 pods. We need to update our policy to be less specific or create a new policy for v2.
  • A better endpointSelector would be:

    yaml
        spec:
          endpointSelector:
            matchLabels:
              app: user-profile

    This level of immediate, actionable feedback, directly correlated with Kubernetes metadata (identities, labels, namespaces), is simply not possible to achieve with iptables logs. Hubble allows you to see not just that a packet was dropped, but which policy rule caused the drop.

    Conclusion: Kernel-Level Programmability as the New Standard

    Moving to a Cilium and eBPF-based networking model is more than a CNI swap; it's a paradigm shift in how we implement security and observability in Kubernetes. By leveraging kernel-level programmability, we transcend the limitations of traditional IP-based firewalls and build systems that are:

    * More Secure: Identity-based, L7-aware policies enable a true zero-trust posture that is both granular and easy to manage.

    * More Performant: The eBPF datapath offers lower latency and higher throughput by bypassing legacy kernel networking components.

    * More Observable: The ability to introspect every flow without performance penalty provides unparalleled debugging and monitoring capabilities.

    For senior engineers and platform architects, mastering these advanced policy constructs is no longer a niche skill but a fundamental requirement for building scalable, secure, and high-performance cloud-native platforms.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles