Advanced eBPF Policies in Cilium for Zero-Trust Kubernetes

September 30, 2025

14 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The `iptables` Bottleneck: Why Cloud-Native Networking Needed a Revolution

For years, Kubernetes networking has been synonymous with iptables. As the default implementation for kube-proxy and the enforcement mechanism for most Container Network Interfaces (CNIs), iptables has been the bedrock of in-cluster traffic routing and security. However, for senior engineers managing clusters at scale, its limitations are painfully apparent. iptables uses sequential chains of rules; as the number of services and pods grows, these chains can become thousands of rules long. Every packet traversing a node must be checked against this chain, leading to significant CPU overhead and increased tail latency. Furthermore, its IP-and-port-based model is fundamentally misaligned with the ephemeral, dynamic nature of cloud-native workloads, where pod IPs are transient and meaningless.

Enter eBPF (extended Berkeley Packet Filter). eBPF allows us to run sandboxed programs directly within the Linux kernel, triggered by various events, including network packet arrival. Cilium leverages eBPF to create a highly efficient, programmable datapath that bypasses iptables and kube-proxy entirely. By attaching eBPF programs to network interfaces, Cilium can make intelligent routing and security decisions at the earliest possible point in the packet processing pipeline, directly in the kernel. This results in near-native network performance, regardless of cluster size.

But the true power of Cilium and eBPF lies not just in performance, but in its security model. It shifts the paradigm from IP-based filtering to an identity-based one, a prerequisite for any serious zero-trust implementation.

Beyond `NetworkPolicy`: Graduating to `CiliumNetworkPolicy`

The standard kubernetes.io/NetworkPolicy resource is a good first step, but it's insufficient for complex microservices architectures. Its major drawbacks include:

No DNS-Aware Egress: You cannot create a rule to allow traffic to api.thirdparty.com. You are forced to use brittle IP CIDR blocks.

No L7 Awareness: It operates only at L3/L4. You can allow traffic to a pod on port 8080, but you cannot specify that only GET /api/v1/data is allowed while POST /api/v1/admin is denied.

Limited Scope: Policies are namespaced, making it cumbersome to apply cluster-wide baseline rules.

Cilium addresses these gaps with two powerful Custom Resource Definitions (CRDs): CiliumNetworkPolicy (CNP) for namespaced policies and CiliumClusterwideNetworkPolicy (CCNP) for global rules.

Production Example 1: Foundational Identity-Based Policy

Let's model a common scenario: a frontend service in the web namespace needs to communicate with a backend-api service in the api namespace. In a zero-trust model, this communication must be explicitly allowed.

First, let's define our workloads. Note the critical role of labels; they are the foundation of Cilium's security identity.

yaml

# workloads.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
      tier: web
  template:
    metadata:
      labels:
        app: frontend
        tier: web
        # Cilium identity is derived from these labels
    spec:
      containers:
      - name: frontend-container
        image: nginx
---
apiVersion: v1
kind: Namespace
metadata:
  name: api
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-api
  namespace: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backend-api
      tier: api
  template:
    metadata:
      labels:
        app: backend-api
        tier: api
    spec:
      containers:
      - name: backend-container
        image: paurosello/python-flask-rest-api:1.0
        ports:
        - containerPort: 5000

Now, we'll create a CiliumNetworkPolicy to allow ingress to backend-api only from frontend pods.

yaml

# backend-ingress-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "backend-api-ingress-policy"
  namespace: api
spec:
  # This policy applies to all pods with these labels
  endpointSelector:
    matchLabels:
      app: backend-api
      tier: api
  
  # Ingress rules define allowed incoming traffic
  ingress:
  - fromEndpoints:
    # Allow traffic FROM pods with these labels, regardless of their namespace
    - matchLabels:
        app: frontend
        tier: web
    toPorts:
    - ports:
      - port: "5000"
        protocol: TCP

How this works under the hood: When a pod is created, the Cilium agent on the node assigns it a unique numeric security identity based on the hash of its labels (e.g., app=frontend,tier=web might map to identity 12345). This identity is stored in an efficient eBPF map. The CNP is also translated into eBPF rules. When a packet from a frontend pod arrives at the backend-api pod's node, the eBPF program on the ingress path checks the source pod's security identity. If identity 12345 is in the allowed list for the destination pod, the packet is forwarded. If not, it's dropped. This check is incredibly fast and completely independent of pod IPs.

Implementing Granular L7 Policies for API-Aware Security

Identity-based L3/L4 policies are a huge step up, but true zero-trust requires understanding application-layer context. Cilium achieves this by integrating an eBPF-aware proxy (typically Envoy) that can inspect L7 traffic when required.

Production Example 2: HTTP-Aware Ingress for a Multi-Endpoint Service

Consider our backend-api service. It exposes two endpoints: POST /orders for creating orders and GET /metrics for Prometheus scraping. The payments service should only be able to create orders, while the prometheus service in the monitoring namespace should only be able to scrape metrics.

yaml

# advanced-backend-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "advanced-backend-api-policy"
  namespace: api
spec:
  endpointSelector:
    matchLabels:
      app: backend-api
      tier: api
  ingress:
  # Rule 1: Allow payments service to create orders
  - fromEndpoints:
    - matchLabels:
        app: payments
        tier: business-logic
    toPorts:
    - ports:
      - port: "5000"
        protocol: TCP
      # This is where the L7 magic happens
      rules:
        http:
        - method: "POST"
          path: "/orders"

  # Rule 2: Allow Prometheus to scrape metrics
  - fromEndpoints:
    - matchLabels:
        app: prometheus
        k8s-app: prometheus # Often a standard label
      namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
    toPorts:
    - ports:
      - port: "5000"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/metrics"

Deep Dive: When this policy is applied, Cilium's eBPF programs on the node hosting the backend-api pod are updated. For traffic destined to port 5000, instead of just forwarding it, the eBPF program redirects the packet to the Envoy proxy running in the same node. Envoy performs the L7 inspection (checking the HTTP method and path). If the request matches the policy, Envoy forwards it to the application pod. If not, it returns an HTTP 403 Forbidden. This redirection is transparent to both the source and destination pods. The performance cost is localized only to the traffic that requires L7 inspection, while all other traffic continues to be processed purely in eBPF.

Production Example 3: Kafka-Aware Policies

This L7 awareness extends beyond HTTP. For event-driven architectures, securing Kafka topics is critical. You don't want a compromised logging service to be able to produce messages on the payments-processed topic.

Scenario: An order-processor service should only be able to produce to the new-orders topic, while an inventory-service should only be able to consume from it.

yaml

# kafka-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "kafka-broker-policy"
  namespace: kafka
spec:
  endpointSelector:
    matchLabels:
      app: kafka-broker
  ingress:
  # Rule 1: Allow order-processor to produce to new-orders topic
  - fromEndpoints:
    - matchLabels:
        app: order-processor
    toPorts:
    - ports:
      - port: "9092"
        protocol: TCP
      rules:
        kafka:
        - role: produce
          topic: "new-orders"

  # Rule 2: Allow inventory-service to consume from new-orders topic
  - fromEndpoints:
    - matchLabels:
        app: inventory-service
    toPorts:
    - ports:
      - port: "9092"
        protocol: TCP
      rules:
        kafka:
        - role: consume
          topic: "new-orders"

This policy provides incredibly granular control, preventing lateral movement and data exfiltration within your message bus, a common blind spot in many security architectures.

Mastering Egress Control with FQDN Policies

One of the most challenging aspects of zero-trust networking is managing egress to external services. Hardcoding IP ranges for services like Stripe, S3, or Twilio is a maintenance nightmare, as these IPs can change frequently. Cilium solves this with DNS-aware policies.

Production Example 4: Securely Connecting to a Third-Party API

Imagine a billing-service that needs to call the Stripe API at api.stripe.com but should be blocked from accessing anything else on the internet.

yaml

# billing-egress-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "billing-service-egress"
  namespace: billing
spec:
  endpointSelector:
    matchLabels:
      app: billing-service
  egress:
  # Rule 1: Allow DNS lookups. This is CRITICAL for FQDN policies to work.
  - toEndpoints:
    - matchLabels:
        "k8s:io.kubernetes.pod.namespace": kube-system
        "k8s:k8s-app": kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP
      rules:
        dns:
        - matchPattern: "*"

  # Rule 2: Allow traffic to the resolved IPs of api.stripe.com
  - toFQDNs:
    - matchName: "api.stripe.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

How FQDN Policies Work:

The billing-service pod attempts to connect to api.stripe.com. This triggers a DNS query to kube-dns.

The first egress rule explicitly allows this DNS query to succeed.

The Cilium agent intercepts the DNS response from kube-dns. It sees that api.stripe.com resolved to, for example, 54.186.126.123.

Cilium injects this IP into an eBPF map associated with the billing-service's security identity, along with a TTL matching the DNS record's TTL.

When the billing-service then initiates a TCP connection to 54.186.126.123 on port 443, the eBPF program on the egress path finds the IP in the allowed map and forwards the packet.

Any attempt to connect to another external IP will be dropped by the eBPF program because there is no matching rule.

This mechanism is both secure and dynamic, adapting automatically to changes in external service IPs.

Production Patterns and Operational Excellence

Implementing these policies requires a mature operational strategy. Applying a default-deny policy on a brownfield cluster without preparation will cause a catastrophic outage.

The Path to a Default-Deny Stance

The ultimate goal of zero-trust is a cluster-wide default-deny policy, where all traffic is blocked unless explicitly allowed. This should be rolled out carefully.

Step 1: Deploy a Cluster-wide Default Deny in Audit Mode

Cilium policies can be set to "policyAuditMode": "true" on the Cilium ConfigMap. In this mode, policies are evaluated, but violations are only logged, not enforced. This is a critical safety net.

First, we create a policy that denies all traffic by default.

yaml

# default-deny.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "cluster-wide-default-deny"
spec:
  # Select all pods in the cluster
  endpointSelector: {}
  # An empty ingress list means deny all ingress
  ingress: []
  # An empty egress list means deny all egress
  egress: []

Step 2: Observe and Build Allow-list Policies with Hubble

With the default-deny policy in audit mode, use Hubble, Cilium's observability platform, to understand the actual traffic flows in your cluster.

bash

# Install Hubble CLI
export CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
tar xzvf cilium-linux-${CLI_ARCH}.tar.gz
sudo mv cilium /usr/local/bin/

# Enable Hubble in your cluster
cilium hubble enable

# Port-forward to the Hubble UI
cilium hubble ui

The Hubble UI provides a powerful service map that visualizes dependencies. You can see which connections would be dropped by your new default-deny policy. Use this information to iteratively build the specific CiliumNetworkPolicy rules needed for your applications to function, like the ones we created in the examples above.

Use the Hubble CLI to inspect denied flows:

bash

# Show all traffic that is currently being dropped (or would be dropped in enforce mode)
hubble observe --verdict DROPPED -f

Step 3: Switch to Enforcement Mode

Once you have a comprehensive set of allow-list policies and Hubble shows no legitimate traffic being dropped in audit mode, you are ready to switch to full enforcement. Edit the Cilium ConfigMap and set policy-enforcement-mode to "default" or "always".

Edge Case: Host Networking and Privileged Pods

Pods running with hostNetwork: true or certain privileged capabilities can bypass some network policy enforcement points. For these critical workloads (like the cilium-agent itself or a node exporter), you must create explicit policies that allow necessary traffic. It is crucial to restrict the use of host networking as much as possible.

Performance Considerations

eBPF vs. Proxy: L3/L4 policy enforcement in eBPF is extremely lightweight. The primary performance consideration is L7 policy enforcement, which requires traversing the Envoy proxy. While highly optimized, this introduces a small amount of latency. The best practice is to apply L7 policies only where necessary. Don't add a blanket HTTP policy to every port; be specific.

FQDN Proxy Load: In environments with high DNS churn (many short-lived connections to a wide variety of external domains), the FQDN proxy can become a bottleneck. Monitor the cilium-agent logs and metrics for any signs of DNS proxy overload. For high-throughput egress, consider using dedicated egress gateways.

Conclusion: eBPF as the Foundation for Modern Security

By moving beyond the IP-based constraints of iptables and embracing an identity-aware, programmable datapath with eBPF, Cilium provides the tools necessary to build a true zero-trust security posture within Kubernetes. The journey requires a shift in mindset from perimeter security to intrinsic, workload-centric security. It demands a deep understanding of application communication patterns and an operational commitment to observability and iterative policy refinement.

Mastering CiliumNetworkPolicy and its advanced features like L7 and FQDN awareness is no longer a niche skill; it is becoming a core competency for senior engineers responsible for the security, reliability, and performance of large-scale, cloud-native systems. The patterns discussed here—identity-based L4 rules, API-aware L7 filtering, and dynamic FQDN egress control—are not just theoretical concepts; they are the practical building blocks for a more secure, observable, and efficient cloud-native future.

The `iptables` Bottleneck: Why Cloud-Native Networking Needed a Revolution

Beyond `NetworkPolicy`: Graduating to `CiliumNetworkPolicy`

Production Example 1: Foundational Identity-Based Policy

Implementing Granular L7 Policies for API-Aware Security

Production Example 2: HTTP-Aware Ingress for a Multi-Endpoint Service

Production Example 3: Kafka-Aware Policies

Mastering Egress Control with FQDN Policies

Production Example 4: Securely Connecting to a Third-Party API

Production Patterns and Operational Excellence

The Path to a Default-Deny Stance

Edge Case: Host Networking and Privileged Pods

Performance Considerations

Conclusion: eBPF as the Foundation for Modern Security

Found this article helpful?