eBPF Network Policies in Cilium for Zero-Trust Microservices

October 1, 2025

15 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond IP Addresses: The Shift to Identity-Based Security with eBPF

For years, Kubernetes NetworkPolicy has been the standard for network segmentation. While effective for basic isolation, its reliance on IP/port-based filtering, typically implemented with iptables, falls short in dynamic, microservice-heavy environments. iptables rules can become unwieldy in large clusters, leading to performance degradation, and they lack the context to understand application-layer (L7) protocols. A policy allowing traffic to a pod on port 8080 permits access to all API endpoints on that port, a significant gap in a zero-trust security model.

This is where Cilium changes the game. By leveraging eBPF (extended Berkeley Packet Filter), Cilium operates directly within the Linux kernel to provide a highly efficient and programmable datapath. It fundamentally shifts the security paradigm from network-centric (IP addresses) to identity-centric (service labels). Each pod is assigned a cryptographic security identity based on its Kubernetes labels. All network policy decisions are then made based on these identities, completely bypassing the complexities and performance bottlenecks of iptables.

This article is not an introduction to Cilium. It assumes you understand the basics of Kubernetes networking and are looking to implement sophisticated, production-ready security postures. We will dissect advanced CiliumNetworkPolicy use cases, explore performance trade-offs, and detail operational patterns for safely rolling out a zero-trust model in a live environment.

The Core Primitive: `CiliumNetworkPolicy` and Security Identity

A standard Kubernetes NetworkPolicy selects pods and defines ingress/egress rules based on pod selectors or IP blocks. A CiliumNetworkPolicy (a Custom Resource Definition) extends this concept dramatically.

Let's consider two services, frontend and api-server.

yaml

# api-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
    tier: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        tier: backend
        # Cilium identity labels
        app.kubernetes.io/name: api-server
        app.kubernetes.io/part-of: my-app
    spec:
      containers:
      - name: api-server
        image: pseudo/api-server:1.0
        ports:
        - containerPort: 8080

Cilium assigns a numeric identity to this pod based on its labels (app.kubernetes.io/name=api-server, etc.). This identity is embedded in every network packet originating from the pod. When a packet arrives at its destination, an eBPF program attached to the network interface can instantly check the source identity against a list of allowed identities stored in an eBPF map. This lookup is incredibly fast, with constant-time complexity, regardless of the number of policies or pods in the cluster.

Here's a basic policy allowing frontend to talk to api-server:

yaml

# allow-frontend-to-api.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "allow-frontend-to-api-server"
  namespace: "default"
spec:
  endpointSelector:
    matchLabels:
      app.kubernetes.io/name: api-server # Policy applies to the api-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        app.kubernetes.io/name: frontend # Allow traffic FROM the frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP

This looks similar to a standard NetworkPolicy, but the underlying mechanism is profoundly different. It's not an IP rule; it's an identity rule. If the frontend pod is rescheduled and gets a new IP, this policy remains effective without any updates. This is the foundation upon which we build our advanced L7 policies.

Advanced L7 Policy Enforcement: API-Aware Security

True zero-trust requires securing communication at the application layer. It's not enough to know who can talk to whom; we must control what they can say. Cilium integrates with Envoy proxy to provide transparent L7 policy enforcement without requiring application code changes or manual sidecar injection.

Scenario 1: Securing a RESTful Payments API

Imagine a payments-api service with two critical endpoints:

POST /v1/charge: Used by the frontend service to process payments.

POST /v1/refund: A sensitive operation, only accessible by a separate finance-tool service.

A traditional network policy would expose both endpoints to any service that can reach port 8080.

Objective: Allow frontend to only access POST /v1/charge and deny all other requests, while allowing finance-tool to access POST /v1/refund.

Here is the CiliumNetworkPolicy to achieve this granular control:

yaml

# payments-api-l7-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "secure-payments-api"
  namespace: "production"
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/v1/charge"

  - fromEndpoints:
    - matchLabels:
        app: finance-tool
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/v1/refund"

Implementation Details:

endpointSelector: The policy applies to all pods with the label app: payments-api.

ingress rules: The policy defines two separate ingress rules, one for each allowed source.

fromEndpoints: This selects the source pods by their identity labels.

toPorts and rules.http: This is the core of L7 enforcement. When traffic from frontend hits port 8080 on payments-api, Cilium's eBPF program redirects it to an embedded Envoy proxy. Envoy inspects the HTTP headers and allows the request only if it matches method: "POST" and path: "/v1/charge". All other requests from frontend will be rejected with an HTTP 403 Forbidden.

Verification in Production:

To test this, we can exec into a running frontend pod:

bash

# Exec into the frontend pod
kubectl exec -it <frontend-pod-name> -- /bin/bash

# This request will succeed (HTTP 200 OK)
curl -X POST http://payments-api:8080/v1/charge -d '{"amount": 100}'

# This request will be blocked by the policy (HTTP 403 Forbidden)
curl -X POST http://payments-api:8080/v1/refund -d '{"charge_id": "ch_123"}'

# This GET request will also be blocked
curl http://payments-api:8080/healthz

This demonstrates a powerful, least-privilege security model enforced at the network layer, invisible to the application itself.

Performance Deep Dive: eBPF vs. iptables in High-Scale Clusters

One of the most compelling reasons to adopt Cilium is its performance advantage, especially at scale. The iptables-based datapath used by many Kubernetes CNIs suffers from linear scaling problems. As the number of services and policies grows, the number of iptables rules explodes. For each packet, the kernel must traverse these long chains, inducing latency.

eBPF Datapath Advantages:

* Constant Time Lookups: Policy decisions are based on eBPF map lookups, which are O(1). The time to process a packet does not increase with the number of policies.

* Bypassing Network Stack Layers: eBPF allows Cilium to short-circuit parts of the kernel's networking stack (like netfilter hooks where iptables resides), reducing per-packet overhead.

* Efficient Load Balancing: Cilium's built-in load balancer (kube-proxy replacement) also uses eBPF maps and is significantly more efficient than iptables-based kube-proxy in iptables mode.

Benchmark Considerations:

While exact numbers depend on hardware and workload, published benchmarks by the Cilium community and third parties consistently show:

* Throughput: eBPF-based datapath can achieve near line-rate throughput, often outperforming iptables-based CNIs by 10-30% in high-connection scenarios.

* Latency: Per-packet latency is lower and, more importantly, more consistent. The tail latency (p99) is significantly better because there are no long rule chains to traverse.

* CPU Usage: For policy enforcement, eBPF is more CPU-efficient. The initial JIT (Just-In-Time) compilation of eBPF programs has a small CPU cost, but the runtime execution is far cheaper than traversing iptables chains for every packet.

In a cluster with thousands of pods and hundreds of fine-grained network policies, the difference is not academic. It can be the difference between a stable, predictable network and one plagued by intermittent latency spikes and CPU pressure on worker nodes.

Edge Case Handling: DNS-Aware Egress Policies

Microservices often need to communicate with external, third-party APIs. A common security challenge is restricting egress traffic to specific external endpoints. Hardcoding IP addresses in network policies is brittle, as these IPs can change frequently.

Objective: Allow a reporting-service to connect only to api.thirdparty.com on port 443, without allowing access to any other external address.

Cilium's toFQDNs selector solves this elegantly.

yaml

# egress-to-thirdparty-api.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "allow-egress-to-thirdparty"
  namespace: "production"
spec:
  endpointSelector:
    matchLabels:
      app: reporting-service
  egress:
  - toFQDNs:
    - matchName: "api.thirdparty.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

How It Works Under the Hood:

DNS Interception: When the reporting-service pod makes a DNS query for api.thirdparty.com, Cilium's eBPF program, attached at the socket level, intercepts this request.

IP Mapping: Cilium allows the DNS query to proceed. When the response comes back from CoreDNS (or your cluster's DNS server), Cilium captures the resulting IP addresses (e.g., 104.18.40.188, 104.18.41.188).

Dynamic Policy Update: These IPs are then inserted into an eBPF map associated with the security identity of reporting-service and the destination FQDN, along with the DNS response's TTL.

Egress Enforcement: When the application initiates a TCP connection to 104.18.40.188, the egress eBPF program performs a lookup in this map. Since the destination IP is present, the connection is allowed. An attempt to connect to any other external IP (e.g., 8.8.8.8) will fail the map lookup and be dropped.

Edge Case: DNS TTL and Stale Entries

What happens when the DNS record's TTL expires? Cilium's agent actively manages the lifecycle of these DNS-to-IP mappings. It respects the TTL from the DNS response. Once the TTL expires, the corresponding IP is removed from the eBPF map. The next time the application tries to connect, it will need to perform a new DNS lookup, which will repopulate the map with fresh IPs. This ensures the policy adapts to changes in external service IPs without manual intervention.

Observability and Auditing with Hubble

A security policy is only as good as your ability to observe and debug it. This is where Hubble, Cilium's built-in observability platform, becomes indispensable.

When a request is unexpectedly blocked, developers need to know why, instantly. Hubble provides deep visibility into network flows and policy decisions.

Let's revisit our L7 payments API scenario. A developer on the frontend team tries to add a health check and finds it's being blocked.

Using the Hubble CLI, a platform engineer can diagnose this in seconds:

bash

# Install the Hubble CLI
# ...

# Port-forward to the Hubble relay service
kubectl port-forward -n kube-system svc/hubble-relay 4245:80

# Observe dropped traffic originating from the frontend pod
hubble observe --from-pod default/frontend-xxxxxxxx-xxxxx --verdict DROPPED --protocol http -o json

The output would look something like this:

json

{
  "flow": {
    "source": {
      "identity": 258,
      "namespace": "default",
      "labels": ["k8s:app=frontend"],
      "pod_name": "frontend-xxxxxxxx-xxxxx"
    },
    "destination": {
      "identity": 312,
      "namespace": "production",
      "labels": ["k8s:app=payments-api"],
      "pod_name": "payments-api-yyyyyy-yyyyy"
    },
    "Type": "L7",
    "L7": {
      "http": {
        "code": 403,
        "method": "GET",
        "url": "http://payments-api:8080/healthz"
      }
    },
    "verdict": "DROPPED",
    "drop_reason_desc": "POLICY_DENIED"
  }
}

This JSON blob provides immediate, actionable information:

* Source/Destination: Clearly identifies the pods involved.

* L7 Data: Shows the exact HTTP request that was blocked (GET /healthz).

* Verdict & Reason: Explicitly states the packet was DROPPED due to POLICY_DENIED.

With this information, the engineer knows precisely which policy needs to be updated to allow the health check, removing guesswork and accelerating troubleshooting.

Production Pattern: Staged Policy Rollout with Audit Mode

Applying a strict, deny-by-default policy in a complex, running production environment is fraught with risk. A single misconfigured rule could cause a major outage. To mitigate this, Cilium provides an audit mode for its policies.

Objective: Gradually introduce a zero-trust, default-deny posture for an entire namespace without breaking existing applications.

The Strategy:

Establish a Baseline: First, deploy a cluster-wide policy that allows all traffic but logs any traffic that would be denied if a default-deny policy were in place. We use a CiliumClusterwideNetworkPolicy for this.

yaml

    # clusterwide-audit-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "cluster-wide-audit-deny"
    spec:
      endpointSelector: {}
      ingress:
      - {}
      egress:
      - {}
      # This is the key: log, but do not enforce, the deny action
      policy-audit-mode: "true"

This policy effectively does nothing but enables audit logging for connections that don't match any specific allow policy.

Monitor and Observe: Let this policy run for a period (e.g., 24-48 hours). During this time, use Hubble and other logging tools to observe the audit events. This creates a complete inventory of all legitimate communication paths in your cluster.

bash

    # Look for flows that would have been dropped
    hubble observe --verdict AUDIT

Author Specific allow Policies: Based on the audit logs, create specific, least-privilege CiliumNetworkPolicy resources for each application. For example, if you see audit events for frontend talking to payments-api, you create the L7 policy we discussed earlier.

Gradually Reduce Audit Scope: As you become more confident in your allow policies, you can tighten the audit policy's endpointSelector to apply only to namespaces or applications that have not yet been fully policy-protected.

Flip to Enforcement: Once all legitimate traffic is covered by explicit allow policies, you can deploy a final, namespace-wide or cluster-wide default-deny policy.

yaml

    # namespace-default-deny.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "default-deny-ingress"
      namespace: "production"
    spec:
      endpointSelector: {}
      # An empty ingress block denies all ingress traffic
      # to all pods in the namespace unless allowed by another policy.
      ingress: []

This staged, observe-then-enforce pattern is a professional approach that transforms a high-risk flag day into a controlled, evidence-based process.

Beyond HTTP: Securing Kafka with L7 Policies

Cilium's L7 awareness is not limited to HTTP. It includes parsers for other protocols, such as Kafka, which is critical for securing event-driven architectures.

Scenario: A user-service produces messages to a user-signups Kafka topic. A fraud-detection-service consumes from this topic. We want to enforce this flow at the protocol level.

Objective:

Allow user-service to only produce to the user-signups topic.

Allow fraud-detection-service to only consume from the user-signups topic.

Deny all other Kafka operations for these services.

yaml

# kafka-l7-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "secure-kafka-broker"
  namespace: "data-platform"
spec:
  endpointSelector:
    matchLabels:
      app: kafka
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: user-service
    toPorts:
    - ports:
      - port: "9092"
        protocol: TCP
      rules:
        kafka:
        - role: produce
          topic: "user-signups"

  - fromEndpoints:
    - matchLabels:
        app: fraud-detection-service
    toPorts:
    - ports:
      - port: "9092"
        protocol: TCP
      rules:
        kafka:
        - role: consume
          topic: "user-signups"

This policy provides incredibly granular control. The user-service cannot accidentally (or maliciously) produce to a different topic, nor can it consume. This prevents a whole class of potential bugs and security vulnerabilities in an event-driven system, enforced transparently by the network layer.

Conclusion

Adopting Cilium with eBPF is more than a CNI swap; it's a strategic shift towards a more secure, performant, and observable cloud-native network. By moving from IP-based to identity-based security, we gain the ability to create policies that mirror our application architecture, not the underlying network topology.

For senior engineers, the true power lies in the advanced features we've explored: API-aware L7 enforcement, performant and scalable datapath, robust handling of external services via DNS-aware policies, and safe operational patterns like audit mode. These capabilities are not just incremental improvements; they are foundational building blocks for implementing a genuine zero-trust security posture within Kubernetes, enabling you to build and operate complex systems with a higher degree of confidence and control.

Beyond IP Addresses: The Shift to Identity-Based Security with eBPF

The Core Primitive: `CiliumNetworkPolicy` and Security Identity

Advanced L7 Policy Enforcement: API-Aware Security

Scenario 1: Securing a RESTful Payments API

Implementation Details:

Verification in Production:

Performance Deep Dive: eBPF vs. iptables in High-Scale Clusters

Edge Case Handling: DNS-Aware Egress Policies

How It Works Under the Hood:

Edge Case: DNS TTL and Stale Entries

Observability and Auditing with Hubble

Production Pattern: Staged Policy Rollout with Audit Mode

Beyond HTTP: Securing Kafka with L7 Policies

Conclusion

Found this article helpful?