Advanced Cilium Network Policies with eBPF for Zero-Trust K8s

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inadequacy of Native Kubernetes NetworkPolicy for Zero-Trust

As architects of distributed systems, we understand that a foundational principle of modern security is zero-trust, which dictates that no actor, system, or service operating within the security perimeter should be trusted by default. In the context of Kubernetes, the primary tool for network segmentation is the NetworkPolicy resource. While essential, its capabilities are fundamentally limited to Layer 3/4 (IP address and port) constructs.

This presents a significant challenge in microservices architectures. A policy allowing pod A to communicate with pod B on port 8080 is a blunt instrument. It permits A to access any endpoint on B's API, including sensitive administrative endpoints like /debug/pprof or /metrics. A compromised pod A has a wide attack surface on pod B. This is where the standard API fails to deliver on the promise of zero-trust. True zero-trust requires identity-aware, L7-aware controls.

This is the problem space where Cilium, leveraging the revolutionary capabilities of eBPF, provides a paradigm shift. By operating directly within the Linux kernel, eBPF allows Cilium to inspect and make policy decisions on network packets with extreme performance and context awareness. This article bypasses the basics of Cilium and dives directly into the advanced implementation patterns of CiliumNetworkPolicy (CNP) and CiliumClusterwideNetworkPolicy (CCNP) that are critical for enforcing a robust zero-trust model in a production environment.

We will explore:

  • L7 HTTP/gRPC Policy Enforcement: Moving beyond ports to restrict access to specific API paths and gRPC methods.
  • DNS-Aware Egress Policies: Creating dynamic policies based on FQDNs, essential for controlling access to external SaaS APIs.
  • Entity-Based Policies: Targeting non-pod entities like the host, remote CIDRs, or even the Kubernetes API server itself.
  • Performance and Debugging: Understanding the performance characteristics of eBPF-based policies and using tools like Hubble to diagnose policy violations in real-time.

  • Pattern 1: Granular L7 API Control with `CiliumNetworkPolicy`

    The most immediate and powerful upgrade from the standard NetworkPolicy is the ability to enforce rules at the application layer. Let's consider a common scenario: a billing-api service that exposes multiple endpoints, and a frontend-app that should only be able to invoke the public-facing endpoints.

    Scenario:

  • frontend-app (in namespace frontend): Needs to call POST /v1/payments on the billing-api.
  • billing-api (in namespace billing): Exposes POST /v1/payments and a sensitive GET /admin/metrics endpoint.
  • Goal: Explicitly allow the former while denying the latter, even though both are on the same port.
  • First, let's define our workloads. We'll use simple Nginx pods to simulate the services.

    yaml
    # workloads.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
      name: frontend
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: billing
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: frontend-app
      namespace: frontend
      labels:
        app: frontend
        role: client
    spec:
      containers:
      - name: frontend-container
        image: curlimages/curl
        command: ["sleep", "3600"]
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: billing-api
      namespace: billing
      labels:
        app: billing
        role: server
    spec:
      containers:
      - name: billing-container
        image: nginx
        ports:
        - containerPort: 80
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: billing-api-svc
      namespace: billing
    spec:
      selector:
        app: billing
      ports:
      - protocol: TCP
        port: 80
        targetPort: 80

    Now, we'll apply the CiliumNetworkPolicy (CNP). Note the ingress section, which specifies toPorts with an L7 rules block.

    yaml
    # billing-l7-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: billing-api-l7-policy
      namespace: billing
    spec:
      endpointSelector:
        matchLabels:
          app: billing
          role: server
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: frontend
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP
          rules:
            http:
            - method: "POST"
              path: "/v1/payments"

    Implementation and Verification

    • Apply the workloads and the policy:
    bash
        kubectl apply -f workloads.yaml
        kubectl apply -f billing-l7-policy.yaml
  • Exec into the frontend-app pod to test connectivity:
  • bash
        kubectl exec -it -n frontend frontend-app -- /bin/sh
  • Inside the pod, attempt the allowed and denied requests. We simulate the API calls with curl. The billing-api-svc.billing.svc.cluster.local is the FQDN for the service.
  • sh
        # This request matches the policy (POST to /v1/payments) and should succeed (or receive a 404 from Nginx, which is still a successful network transaction).
        # The key is that the connection is not dropped.
        curl -X POST http://billing-api-svc.billing.svc.cluster.local/v1/payments -v
        # Expected output: Connection established, HTTP response (e.g., 404 Not Found from default Nginx)
    
        # This request does NOT match the policy (GET to /admin/metrics) and should be dropped.
        curl -X GET http://billing-api-svc.billing.svc.cluster.local/admin/metrics -v --connect-timeout 5
        # Expected output: curl: (28) Connection timed out after 5001 milliseconds

    How eBPF Makes This Possible

    Cilium doesn't just use eBPF for L3/L4 packet filtering. For L7 protocols like HTTP, it transparently injects a proxy (built on Envoy) when an L7 rule is defined. However, the initial packet handling and redirection to this proxy are managed by eBPF programs attached to the socket system calls (connect, sendmsg, recvmsg) of the pod's network namespace.

  • When frontend-app makes a TCP connection, the eBPF program on its socket intercepts it.
  • It identifies that a CiliumNetworkPolicy with an L7 rule applies to the destination (billing-api).
  • The eBPF program redirects the traffic to a listener on the managed Envoy proxy on the same node.
  • The proxy parses the HTTP request, evaluates it against the path and method rules, and either forwards it to the billing-api pod or sends back a connection refused error.
  • This is significantly more efficient than a full-mesh service mesh, where every pod has a sidecar proxy. Cilium's per-node proxy model reduces resource overhead while providing the same L7 visibility for policy enforcement.


    Pattern 2: Securing Egress with DNS-Aware Policies

    Microservices often need to communicate with external APIs (e.g., Stripe, GitHub, AWS services). Hardcoding IP addresses in network policies is brittle and unmaintainable, as these IPs can change frequently. Cilium's DNS-aware policies solve this problem elegantly.

    Scenario:

  • A ci-runner pod in the ci namespace needs to clone code from github.com and push artifacts to an S3 bucket at s3.us-west-2.amazonaws.com.
  • Goal: Allow egress traffic only to these specific FQDNs on port 443, denying all other external communication.
  • Let's define the runner pod and the corresponding CNP.

    yaml
    # ci-runner.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
      name: ci
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: ci-runner
      namespace: ci
      labels:
        app: ci-runner
    spec:
      containers:
      - name: runner
        image: curlimages/curl
        command: ["sleep", "3600"]

    Now for the sophisticated egress policy. The key is the toFQDNs block.

    yaml
    # ci-egress-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: ci-runner-egress-policy
      namespace: ci
    spec:
      endpointSelector:
        matchLabels:
          app: ci-runner
      egress:
      - toFQDNs:
        - matchName: "github.com"
        - matchName: "s3.us-west-2.amazonaws.com"
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP
      # This rule is crucial! It allows DNS queries to kube-dns.
      # Without it, the pod can't resolve the FQDNs in the first place.
      - toEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": kube-system
            "k8s:k8s-app": kube-dns
        toPorts:
        - ports:
          - port: "53"
            protocol: UDP
          rules:
            dns:
            - matchPattern: "*"

    Implementation Details and Kernel-Level Magic

    • Apply the resources:
    bash
        kubectl apply -f ci-runner.yaml
        kubectl apply -f ci-egress-policy.yaml
  • Test from within the ci-runner pod:
  • bash
        kubectl exec -it -n ci ci-runner -- /bin/sh
    
        # This should succeed because github.com is in the allowlist.
        curl -v --head https://github.com --connect-timeout 5
    
        # This should also succeed.
        curl -v --head https://s3.us-west-2.amazonaws.com --connect-timeout 5
    
        # This should be blocked and time out.
        curl -v --head https://api.stripe.com --connect-timeout 5

    Cilium's eBPF implementation for this is ingenious:

  • DNS Snooping: An eBPF program is attached to the pod's network interface. It inspects outgoing traffic on port 53 (DNS).
  • Dynamic IP Mapping: When the ci-runner makes a DNS query for github.com, the eBPF program captures the request and the subsequent response from kube-dns. It extracts the returned IP addresses (e.g., 140.82.121.4).
  • eBPF Map Update: Cilium stores this FQDN-to-IP mapping in a kernel-level eBPF map, along with the DNS response's TTL (Time To Live).
  • Egress Policy Enforcement: When the pod then initiates a TCP connection to 140.82.121.4 on port 443, another eBPF program attached to the socket's connect syscall fires. It performs a lookup in the eBPF map. It finds that 140.82.121.4 corresponds to github.com, which is allowed by the policy, so the connection is permitted.
  • TTL Expiration: When the TTL expires, the entry is flushed from the eBPF map, forcing a new DNS lookup for the next connection. This ensures the policy adapts to changes in DNS records.
  • This entire process happens in the kernel, making it incredibly fast and efficient. It avoids the complexities of user-space DNS proxies and provides a robust way to manage egress to dynamic cloud services.


    Pattern 3: Cluster-wide Policies and Non-Pod Entities

    Some security rules are not namespace-specific; they are global concerns. For example, you might want to prevent all pods (except those in a monitoring namespace) from scraping the kubelet's /metrics endpoint on the host node, or you might want to establish a baseline deny-all policy for the entire cluster.

    This is where CiliumClusterwideNetworkPolicy (CCNP) and the toEntities selector become indispensable.

    Scenario:

    • Implement a default-deny ingress policy for the entire cluster.
  • Only allow pods with the label role: ingress-controller in the networking namespace to receive traffic from outside the cluster.
    • Allow all intra-cluster communication.

    This CCNP creates a secure baseline for the cluster.

    yaml
    # cluster-baseline-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "cluster-baseline-deny-ingress"
    spec:
      # Apply this policy to all pods in the cluster
      endpointSelector: {}
      ingress:
      - fromEntities:
        # The 'cluster' entity represents all endpoints within the cluster.
        # This rule allows any pod to talk to any other pod.
        - cluster
      - fromEntities:
        # The 'world' entity represents any IP address outside the cluster.
        - world
        # This ingress rule is restricted to a specific port and selector.
        # This is a deviation from the scenario to show a more practical example.
        # Here we only allow world access to ingress controllers.
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP
          # This policy only applies if the destination pod is an ingress controller.
          terminatingEndpointSelector:
            matchLabels:
              role: ingress-controller
              "k8s:io.kubernetes.pod.namespace": networking

    Dissecting the `fromEntities` Selector

    fromEntities is a powerful construct that extends policy beyond pod labels:

  • cluster: Represents all pods managed by Cilium in the cluster. The rule fromEntities: [cluster] is a simple way to allow all pod-to-pod communication.
  • world: Represents all IP addresses external to the cluster. This is used for controlling traffic from the internet or other external networks.
  • host: Represents the node itself. This is critical for securing communication between pods and host-level services (like the kubelet or node-exporter).
  • remote-node: Represents other nodes in the cluster.
  • Edge Case: Securing Kubelet Access

    A common production requirement is to lock down access to the kubelet's read-only port (10255) and its secure port (10250). A malicious pod could otherwise access sensitive metrics and logs.

    Here's a CCNP to restrict access to the kubelet to only Prometheus pods in the monitoring namespace.

    yaml
    # secure-kubelet-access.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "secure-kubelet-access"
    spec:
      # This policy applies to the host entity itself, not pods.
      nodeSelector: {}
      ingress:
      - fromEndpoints:
        - matchLabels:
            app.kubernetes.io/name: prometheus
            "k8s:io.kubernetes.pod.namespace": monitoring
        toPorts:
        - ports:
          # Kubelet secure port
          - port: "10250"
            protocol: TCP
          # Kubelet read-only port (if enabled)
          - port: "10255"
            protocol: TCP

    This policy is unique because it uses nodeSelector: {} to apply to the host network namespace on every node. It then specifies that only pods matching the fromEndpoints selector can initiate connections to the specified kubelet ports. All other pods attempting to connect will be dropped by the eBPF program attached to the host's network interface.


    Performance Considerations & Troubleshooting with Hubble

    A primary reason for choosing an eBPF-based CNI is performance. Traditional iptables-based network policies in kube-proxy suffer from performance degradation as the number of rules increases. iptables processes rules as a sequential chain, leading to O(n) complexity. A large number of services and policies can result in significant packet latency.

    In contrast, eBPF uses hash table lookups in its maps, providing O(1) performance regardless of the number of policies. This results in:

  • Lower Latency: Packet processing time is constant and minimal.
  • Higher Throughput: The kernel handles traffic more efficiently without traversing long chains.
  • Reduced CPU Usage: Less CPU is spent on packet filtering, especially on nodes with many active connections.
  • However, when policies don't behave as expected, debugging can be challenging. A curl timing out doesn't tell you why. This is where Hubble, Cilium's observability platform, becomes essential.

    Debugging a Policy Violation with Hubble

    Let's revisit our first L7 policy example. When we tried to access the forbidden /admin/metrics endpoint, the connection timed out. Let's see what Hubble shows us.

    First, enable the Hubble UI or use the CLI. For the CLI:

    bash
    # Port-forward to the hubble-relay service
    kubectl port-forward -n kube-system svc/hubble-relay 4245:80
    
    # In another terminal, run the curl command that is expected to fail
    kubectl exec -it -n frontend frontend-app -- curl -X GET http://billing-api-svc.billing.svc.cluster.local/admin/metrics --connect-timeout 5
    
    # Now, use the Hubble CLI to observe the traffic flow for the frontend namespace
    hubble observe --namespace frontend -f

    The output will provide a real-time stream of network flows. When the forbidden curl is executed, you will see an entry like this:

    text
    TIMESTAMP            SOURCE                       DESTINATION                          TYPE          VERDICT     SUMMARY
    May 20 14:30:15.123  frontend/frontend-app-xyz -> billing/billing-api-abc:80     http-request  DROPPED     HTTP/1.1 GET /admin/metrics
        Policy verdict: L7_POLICY_DENIED
        Reason: Policy denied

    This output is incredibly valuable for a senior engineer debugging a production issue:

  • VERDICT: DROPPED: Confirms the packet was dropped, not lost elsewhere.
  • TYPE: http-request: Shows that Cilium understood the traffic at L7.
  • SUMMARY: HTTP/1.1 GET /admin/metrics: Pinpoints the exact request that was blocked.
  • Policy verdict: L7_POLICY_DENIED: Explicitly states that the drop was due to an L7 policy, immediately ruling out L3/L4 connectivity issues.
  • This level of detail dramatically reduces Mean Time to Resolution (MTTR) for network-related issues in a zero-trust environment. You can instantly see which policy is being enforced and why a specific flow is being denied, without having to parse complex iptables rules or sift through ambiguous logs.

    Conclusion

    Implementing a true zero-trust security posture in Kubernetes requires moving beyond the limitations of the standard NetworkPolicy API. Cilium, through its advanced CRDs and the underlying power of eBPF, provides the necessary tools for senior engineers to build a highly secure, performant, and observable network fabric.

    By mastering patterns like L7-aware rules, DNS-based egress controls, and cluster-wide policies targeting non-pod entities, you can enforce the principle of least privilege at a granular level that was previously only achievable with a heavyweight service mesh. The ability to do this directly in the CNI layer, with the performance benefits of in-kernel processing and the deep observability of tools like Hubble, makes eBPF a cornerstone technology for modern cloud-native security architecture.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles