Advanced eBPF Networking Policies in Cilium for Microservices

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond iptables: The Necessity for a Programmable Kernel Datapath

As architects of distributed systems, we've moved past the question of if we need network segmentation and are now concerned with the how—specifically, how to implement it efficiently, expressively, and securely in a dynamic, high-churn Kubernetes environment. The native NetworkPolicy resource, while a crucial first step, operates primarily at L3/L4 and relies on pod selectors (labels). This model quickly reveals its limitations in sophisticated microservice architectures where identity is more nuanced than a key-value pair and security decisions must be made based on application-layer context.

The traditional implementation of NetworkPolicy via iptables introduces significant performance bottlenecks. Each rule adds to a linear chain, and the conntrack table can become a point of contention in high-connection-rate scenarios. This isn't just a performance issue; it's a scalability ceiling. When a single node hosts hundreds of pods, each with its own policy, the iptables rule set can become unmanageably large and slow, impacting packet latency directly.

Enter eBPF (extended Berkeley Packet Filter). By attaching sandboxed, event-driven programs directly to kernel hooks—such as the Traffic Control (TC) ingress/egress hook—we can create a highly efficient, programmable datapath. Cilium leverages this to bypass iptables and kube-proxy entirely for in-cluster traffic, performing policy enforcement, service load balancing, and observability directly in the kernel. This post assumes you understand this fundamental premise. We will not re-explain eBPF basics. Instead, we will dissect advanced, production-ready policy patterns that are only possible with this kernel-level programmability.


Pattern 1: From Weak Labels to Strong Identity with ServiceAccounts

Label-based selectors are mutable and lack a strong attestation mechanism. A developer could inadvertently (or maliciously) apply a label to a pod, granting it network access it shouldn't have. A more robust identity primitive within Kubernetes is the ServiceAccount, which is tied to cryptographic tokens and RBAC roles. Cilium can use this stronger identity for policy enforcement.

Let's consider a payments-api service that must only accept ingress traffic from pods running as the checkout-processor service account, regardless of their labels or namespace.

The Implementation

First, ensure you have the ServiceAccount and the target payments-api deployment:

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: checkout-processor
  namespace: production
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-deployment
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: checkout
  template:
    metadata:
      labels:
        app: checkout
    spec:
      serviceAccountName: checkout-processor
      containers:
      - name: checkout-container
        image: your-checkout-image
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
    spec:
      containers:
      - name: payments-api-container
        image: your-payments-image
        ports:
        - containerPort: 8080

Now, we define a CiliumNetworkPolicy to enforce the identity-based rule. Note the use of fromEndpoints combined with a selector on the Kubernetes identity itself.

yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: payments-api-sa-ingress-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  ingress:
  - fromEndpoints:
    - matchLabels:
        "k8s:io.kubernetes.pod.namespace": production
        "k8s:io.kubernetes.serviceaccount.name": checkout-processor
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP

How It Works Under the Hood

Cilium doesn't just read labels. The Cilium agent on each node watches the Kubernetes API for pods, service accounts, and other resources. It assigns a unique, internal Cilium Security Identity (a 16-bit integer) to each unique set of labels/identity markers. When a pod is created with the checkout-processor service account, Cilium assigns it a security identity that includes this attribute.

The policy above is compiled into eBPF bytecode and loaded into the kernel. The eBPF program attached to the network interface of the payments-api pods will inspect the metadata of incoming packets. Cilium cleverly passes the sender's security identity as part of the packet metadata (often using custom packet headers or marks). The eBPF program simply checks if the sender's identity integer is in the allowed set derived from the policy. This is an O(1) hash table lookup in an eBPF map—dramatically faster than traversing a linear iptables chain.

You can inspect these identities using the Cilium CLI:

bash
# Find the security identity for a checkout pod
$ cilium endpoint list | grep checkout-processor

# Inspect the identity and its associated labels/service account
$ cilium identity get <identity-id>

This pattern provides a cryptographically-rooted identity primitive for network policy, effectively elevating your security posture from simple metadata matching to a verifiable workload identity model.


Pattern 2: API-Aware Filtering with L7 Policies

Microservice security requires context beyond IP addresses and ports. We often need to enforce rules like "Service A can read product data, but only Service B can write it." This requires inspecting application-layer data, such as HTTP methods and paths.

Cilium achieves this through a combination of eBPF for initial packet filtering and an embedded proxy (like Envoy) for deep L7 inspection when required. The key is that the proxy is only engaged for traffic explicitly targeted by an L7 rule, minimizing performance overhead.

The Scenario

Imagine a product-catalog service with the following endpoints:

  • GET /products/{id}: Publicly readable by a frontend service.
  • POST /products: Writable only by an inventory-manager service.
  • DELETE /products/{id}: Writable only by an inventory-manager service.
  • The Implementation

    Here is the CiliumNetworkPolicy that enforces this logic:

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: product-catalog-api-policy
      namespace: services
    spec:
      endpointSelector:
        matchLabels:
          app: product-catalog
      ingress:
      # Rule for frontend service (read-only access)
      - fromEndpoints:
        - matchLabels:
            app: frontend
            "k8s:io.kubernetes.pod.namespace": ui
        toPorts:
        - ports:
          - port: "9000"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/products/.*"
    
      # Rule for inventory-manager service (write access)
      - fromEndpoints:
        - matchLabels:
            app: inventory-manager
            "k8s:io.kubernetes.pod.namespace": backend
        toPorts:
        - ports:
          - port: "9000"
            protocol: TCP
          rules:
            http:
            - method: "POST"
              path: "/products"
            - method: "DELETE"
              path: "/products/.*"

    Performance and Implementation Details

    When this policy is applied, the Cilium agent does the following:

  • eBPF Pre-filtering: The eBPF program at the TC hook still performs the initial L3/L4 and identity-based checks. If a packet arrives on port 9000 from a pod with the frontend identity, the eBPF program knows it's a candidate for L7 inspection.
  • Redirection to Proxy: Instead of delivering the packet to the product-catalog pod's socket, the eBPF program redirects it to the Envoy proxy running in the same network namespace. This redirection is transparent to both the sending and receiving applications.
  • L7 Inspection: Envoy parses the HTTP request, checks the method (GET) and path (/products/123) against its configured rules (derived from the CNP), and then makes a decision.
  • Forward or Deny: If the request is compliant, Envoy forwards it to the application's socket. If not, it drops the request and can return an HTTP 403 Forbidden response.
  • Critical Performance Consideration: L7 policy enforcement is inherently more CPU-intensive than L3/L4 checks. The cost is paid at the proxy level. However, Cilium's architecture mitigates this by only invoking the proxy for traffic that matches a toPorts rule with an L7 specification. All other traffic is handled purely in the eBPF fast-path. This allows you to surgically apply expensive L7 rules only where absolutely necessary, without penalizing all traffic on the node.

    You can monitor proxy-related metrics to understand the performance impact:

    bash
    cilium metrics list | grep proxy

    This will show metrics for proxy latency, number of redirected connections, and policy enforcement decisions at L7.


    Pattern 3: Taming External Egress with DNS-Aware Policies

    One of the most challenging production scenarios is controlling egress traffic to external, non-Kubernetes services. Allowing wide-open egress is a security risk, but hardcoding IP addresses for external APIs (e.g., Stripe, Twilio, S3) is brittle, as these IPs can change frequently behind a DNS name.

    Cilium solves this with DNS-aware policies. The Cilium agent acts as a DNS proxy or sniffer, observing DNS responses destined for pods. It then uses this information to dynamically update the allowed egress IPs in the eBPF maps.

    The Scenario

    An auditing-service needs to upload logs to an S3 bucket, audit-logs.s3.us-east-1.amazonaws.com. It also needs to access the EC2 metadata service to retrieve IAM credentials. We want to lock down its egress to only these two destinations.

    The Implementation

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: auditing-service-s3-egress
      namespace: security
    spec:
      endpointSelector:
        matchLabels:
          app: auditing-service
      egress:
      # Allow DNS lookups to kube-dns/coredns
      - toEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": kube-system
            "k8s:k8s-app": kube-dns
        toPorts:
        - ports:
          - port: "53"
            protocol: UDP
          rules:
            dns:
            - matchPattern: "*"
    
      # Allow egress to the resolved IPs of the S3 bucket FQDN
      - toFQDNs:
        - matchName: "audit-logs.s3.us-east-1.amazonaws.com"
        toPorts:
        - ports:
          - port: "443"
            protocol: TCP
    
      # Allow egress to the EC2 metadata service IP
      - toCIDR:
        - 169.254.169.254/32
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP

    Edge Cases and Kernel Mechanics

    This pattern is powerful but introduces complexities you must manage:

  • DNS Caching and TTL: When the auditing-service pod performs a DNS lookup for the S3 FQDN, the Cilium agent intercepts or observes the response. It extracts the IP addresses from the A/AAAA records and programs them into an eBPF map associated with the pod's endpoint, along with the DNS TTL.
  • Race Conditions: What if DNS resolves to a new IP, but the pod tries to connect to the old, now-stale IP just as the TTL expires? Cilium manages this by retaining old IPs for a grace period. It also proactively refreshes DNS entries before their TTL expires to keep the eBPF map up-to-date.
  • Wildcards and matchPattern: For services that use many subdomains (e.g., *.google.com), matchPattern is more efficient than listing every FQDN. The agent will then allow egress to any IP it has seen in a DNS response for a matching domain.
  • Debugging this can be complex. The cilium fqdn cache list command is indispensable:

    bash
    # From within a cilium agent pod
    $ cilium fqdn cache list
    
    # Example output:
    Endpoint   FQDNs                                                 Source   IPs              TTL
    2345       audit-logs.s3.us-east-1.amazonaws.com                 DNS      52.216.140.78    5s

    This command shows you exactly which IPs the agent has learned for a given FQDN and their remaining TTL. If you're experiencing egress connectivity issues, this cache is the first place to look. It will reveal if DNS resolution is failing or if the IPs your application is trying to reach are not the ones Cilium has cached.


    Pattern 4: Global and Multi-Cluster Policies

    In large-scale environments, especially those spanning multiple Kubernetes clusters, enforcing consistent security policies is a significant challenge. CiliumNetworkPolicy is namespace-scoped, but what if you need to enforce a rule across all namespaces or even all connected clusters?

    Cilium provides two CRDs for this: CiliumClusterwideNetworkPolicy (CCNP) and the capabilities of Cilium Cluster Mesh.

    The Scenario

    We want to enforce a global policy that no workload, except those in the monitoring namespace, can directly access the Kubernetes API server's IP address. Furthermore, in a multi-cluster setup, we want to allow a user-db service in cluster-1 to be accessed only by the api-server in cluster-2.

    Cluster-Wide Implementation (Single Cluster)

    This CCNP denies egress to the API server's CIDR for all pods, then creates an exception for pods in the monitoring namespace.

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: deny-direct-kube-api-access
    spec:
      # Apply to all pods in the cluster
      endpointSelector: {}
      egress:
      # Deny traffic to the API server CIDR
      - toCIDRSet:
        - cidr: 10.0.0.1/32  # Replace with your actual Kube API server IP/CIDR
      # Allow traffic from monitoring namespace (overrides the deny)
      - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": monitoring

    This demonstrates the power of cluster-wide defaults with specific allow-list overrides, a common pattern for establishing a baseline security posture.

    Multi-Cluster Implementation (with Cluster Mesh)

    Cilium Cluster Mesh creates a flat network across multiple clusters, enabling direct pod-to-pod communication and, crucially, shared service discovery and policy enforcement. Services can be exposed as global services, and policies can reference endpoints in remote clusters.

    To allow api-server in cluster-2 to access user-db in cluster-1:

    yaml
    # In cluster-1, where user-db resides
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: allow-remote-api-server
      namespace: database
    spec:
      endpointSelector:
        matchLabels:
          app: user-db
      ingress:
      - fromEndpoints:
        # Select endpoints from a different cluster
        - matchLabels:
            app: api-server
            "k8s:io.kubernetes.pod.namespace": backend
            # This special label selects the remote cluster
            'io.cilium.cluster-name': 'cluster-2'
        toPorts:
        - ports:
          - port: "5432"
            protocol: TCP

    Under the hood, Cilium Cluster Mesh uses a combination of technologies like mutual TLS (mTLS) for secure communication between clusters and a shared etcd or control plane to synchronize identities and service information. When a policy references a remote cluster, the Cilium agent on the source node knows how to route the traffic (often via a gateway or tunnel) to the correct node in the destination cluster, while ensuring the security identity is carried across the cluster boundary for policy enforcement.

    Conclusion: Policy as Code in a Programmable Datapath

    Moving from iptables-based NetworkPolicy to Cilium's eBPF-powered CRDs is not merely an implementation swap. It's a paradigm shift. It elevates network control from simple L3/L4 filtering to a rich, context-aware security layer that understands Kubernetes-native identities, application-layer protocols, and even external service DNS names.

    As senior engineers, our role is to build systems that are not only functional but also scalable, performant, and secure by design. The patterns discussed here—strong identity via ServiceAccounts, surgical L7 filtering, dynamic DNS-aware egress, and cluster-wide enforcement—are the building blocks for achieving a true zero-trust networking model within Kubernetes. Mastering these advanced policy constructs allows us to replace brittle, IP-based rules with declarative, identity-driven policies that automatically adapt to the dynamic nature of a microservices platform. The future of cloud-native networking is being written in the kernel, and eBPF is the pen.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles