Advanced eBPF Networking Policies with Cilium for Multi-Cluster K8s

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inadequacy of IP-Based Security in Dynamic Environments

As senior engineers managing large-scale Kubernetes deployments, we've all felt the friction of traditional networking models. The iptables-based CNI plugins that dominate the ecosystem, while functional, were built on paradigms that predate the ephemeral, dynamic nature of containers. In a multi-cluster, multi-tenant environment, relying on IP address CIDR blocks for security policies is not just cumbersome; it's fundamentally broken. An IP address is a transient locator, not a durable workload identity. When a pod is rescheduled, its IP changes, and iptables-based rules, which can number in the tens of thousands, must be painstakingly updated across every node, leading to performance degradation and race conditions.

This is where the paradigm shift to eBPF (extended Berkeley Packet Filter) and Cilium becomes a strategic imperative, not just a technical curiosity. eBPF allows us to run sandboxed programs directly within the Linux kernel, enabling us to implement networking, security, and observability logic with the performance of compiled C code and the safety of a verification engine. Cilium harnesses this power to create a networking data plane that is not only faster but also fundamentally more intelligent and secure.

This article assumes you understand what Kubernetes, CNI, and eBPF are. We will not cover the basics. Instead, we will dive directly into a complex, production-relevant scenario: enforcing granular, API-aware security policies between microservices running in separate Kubernetes clusters, connected via Cilium's Cluster Mesh.

Core Architecture: Identity-Based Security with Cilium Cluster Mesh

Before we write a single line of YAML, we must internalize the core concept that makes Cilium powerful: security identity. Instead of filtering traffic based on source/destination IP addresses, Cilium assigns a unique numerical identity to each pod based on its Kubernetes labels. This identity is embedded within the network packets themselves (e.g., using VXLAN or Geneve encapsulation) or used in kernel-level maps for policy decisions.

This decouples security policy from network topology. It doesn't matter what IP a pod has or which node or cluster it's running on. A pod with labels app=frontend,env=prod has the same security identity everywhere. Policies are defined in terms of these stable labels, not ephemeral IPs.

When we extend this to a multi-cluster environment, Cilium's Cluster Mesh synchronizes these identities and services across all connected clusters. A control plane, clustermesh-apiserver, runs on each cluster and uses etcd to maintain a consistent view of identities and services across the mesh. This allows a pod in cluster-a to discover and securely communicate with a service in cluster-b as if it were local, with policies applied seamlessly across the boundary.

Let's model our scenario:

* cluster-a (us-west-2): Hosts the billing-service which exposes a critical API for processing payments.

* cluster-b (eu-central-1): Hosts the frontend-service that needs to call the billing-service.

Our goal is to allow frontend-service to call only the POST /v1/charge endpoint on billing-service and nothing else, while blocking all other cross-cluster communication by default.

Setting Up the Multi-Cluster Mesh

First, we establish the mesh. We assume you have two Kubernetes clusters with Cilium installed. The key is to enable the cluster mesh feature and configure them to connect.

Using the cilium-cli, the process is straightforward:

bash
# On cluster-a
cilium clustermesh enable --context cluster-a

# On cluster-b
cilium clustermesh enable --context cluster-b

# Connect the clusters
cilium clustermesh connect --context cluster-a --destination-context cluster-b

This command handles the complexity of exchanging certificates and configuring the control planes to communicate. Once connected, you can verify the status:

bash
# Run on either cluster
cilium clustermesh status --wait

✅ All clusters connected!

Cluster Connections:
- cluster-b: 2/2 configured, 2/2 connected

Under the hood, this creates the necessary deployments (clustermesh-apiserver) and secrets. The Cilium agents on each node will now watch for services and identities from remote clusters.

Global Services: The Key to Cross-Cluster Discovery

For a service in one cluster to be accessible from another, it must be marked as a global service. This is done with a simple annotation.

Let's define our billing-service in cluster-a:

yaml
# cluster-a/billing-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: billing-service
  namespace: payments
  annotations:
    # This annotation makes the service discoverable across the mesh
    io.cilium/global-service: "true"
spec:
  type: ClusterIP
  ports:
    - port: 8080
      protocol: TCP
  selector:
    app: billing-service
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: billing-service
  namespace: payments
  labels:
    app: billing-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: billing-service
  template:
    metadata:
      labels:
        app: billing-service
        # Critical labels for identity
        team: payments
        env: prod
    spec:
      containers:
        - name: billing-service
          image: my-org/billing-service:1.2.0
          ports:
            - containerPort: 8080

Once applied in cluster-a, the Cluster Mesh control plane will synchronize this service definition to cluster-b. You can verify this by checking for CiliumEndpointSlice objects in cluster-b corresponding to the billing-service.

Now, any pod in cluster-b can resolve billing-service.payments.svc.cluster.local via DNS, and Cilium will automatically load-balance requests to the healthy billing-service pods in cluster-a.

Implementing a Cluster-Wide Default Deny Policy

Before creating specific allow rules, a robust security posture starts with a default deny policy. We'll create a CiliumClusterwideNetworkPolicy that denies all traffic between pods unless explicitly allowed. This is a crucial step often missed in simpler setups.

yaml
# common/default-deny-all.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "default-deny-all-pods"
spec:
  description: "Deny all pod-to-pod traffic cluster-wide by default"
  # Select all pods in all namespaces
  endpointSelector: {}
  # Deny all ingress and egress traffic
  ingress: []
  egress: []

Applying this policy will immediately sever all communication. This is our secure baseline.

Crafting the Advanced L7 Cross-Cluster Policy

Now for the core of our task: allowing the specific API call. We will use a CiliumClusterwideNetworkPolicy (CCNP) because it can select endpoints and apply rules across the entire mesh, regardless of namespace or cluster boundaries.

Here is the policy that grants our required access:

yaml
# common/allow-frontend-to-billing.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "allow-frontend-to-billing-charge-api"
spec:
  description: "Allow frontend in cluster-b to call the POST /v1/charge endpoint on billing-service in cluster-a"
  
  # Step 1: Select the destination pod (the billing-service)
  endpointSelector:
    matchLabels:
      app: billing-service
      team: payments
      env: prod

  # Step 2: Define the ingress rule
  ingress:
    - fromEndpoints:
        # Step 3: Select the source pod (the frontend-service)
        - matchLabels:
            app: frontend-service
            team: web
            env: prod
      # Step 4: Define the L7 API-aware rule
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: "POST"
                path: "/v1/charge"

Let's break down this policy's precision:

  • endpointSelector: This targets the destination pods. We select all pods with the labels app: billing-service, team: payments, and env: prod. This rule will only apply to traffic coming into these pods.
  • fromEndpoints: This specifies the allowed source. We are only allowing traffic from pods with labels app: frontend-service, team: web, and env: prod.
  • Cross-Cluster Identity: Notice there's no mention of cluster-a or cluster-b. Cilium's identity mechanism makes the pod's location irrelevant. A frontend-service pod in cluster-b is granted the same access as one in cluster-a (if it existed).
  • toPorts and rules: This is the L7 magic. We are not just opening port 8080. We are telling Cilium to inspect the HTTP traffic on that port. The rule is highly specific: only allow requests where the method is POST and the path is exactly /v1/charge. Any other request, like a GET /v1/transactions or even POST /v1/refund, will be dropped at the kernel level.
  • After applying this policy, the frontend-service can successfully make the intended API call, but all other access attempts will fail.

    Debugging and Observability with Hubble

    How do we verify this? Traditional tools like tcpdump are insufficient because they can't easily interpret eBPF's actions or show policy verdicts. This is where Hubble, Cilium's observability component, is indispensable.

    Let's simulate two API calls from a frontend-service pod in cluster-b and observe the traffic using the Hubble CLI.

    bash
    # In frontend-service pod (cluster-b)
    # This call should succeed
    curl -X POST http://billing-service.payments/v1/charge -d '{"amount": 100}'
    
    # This call should fail (timeout)
    curl -X GET http://billing-service.payments/v1/transactions

    Now, let's use Hubble to see what happened. We'll filter for traffic destined for the billing-service.

    bash
    # Run on a node in cluster-a
    hubble observe --to-pod payments/billing-service-xyz -f
    
    # --- Successful Request --- #
    TIMESTAMP          SOURCE                                     DESTINATION                                TYPE          VERDICT   SUMMARY
    Apr 26 14:20:10.112  cluster-b/web/frontend-service-abc (10.0.2.55) -> payments/billing-service-xyz (10.0.1.10)   http-request  FORWARDED POST http://billing-service.payments/v1/charge
    Apr 26 14:20:10.115  payments/billing-service-xyz (10.0.1.10) -> cluster-b/web/frontend-service-abc (10.0.2.55)   http-response FORWARDED 200
    
    # --- Denied Request --- #
    TIMESTAMP          SOURCE                                     DESTINATION                                TYPE          VERDICT   SUMMARY
    Apr 26 14:20:15.234  cluster-b/web/frontend-service-abc (10.0.2.55) -> payments/billing-service-xyz (10.0.1.10)   l7            DROPPED   Policy denied (L7)

    Hubble's output is explicit. The first request was FORWARDED because it matched the L7 policy. The second request was DROPPED with the reason Policy denied (L7). This level of visibility is critical for debugging complex microservice interactions in a zero-trust environment.

    Edge Cases and Performance Considerations

    1. Data Path: Tunneling vs. Native Routing

    Cilium Cluster Mesh can operate in two primary modes for cross-cluster traffic:

    * Tunneling (Default): Uses VXLAN or Geneve to encapsulate traffic between gateway nodes in each cluster. This is easy to set up as it works on any underlying network.

    * Native Routing: Requires direct network reachability between pods across clusters. This is often achieved with BGP or by peering VPCs in a cloud environment. It offers lower overhead and potentially higher throughput as it avoids the encapsulation tax.

    Production Pattern: For performance-critical applications, native routing is superior. However, it introduces significant operational complexity in managing the underlying network. Start with tunneling for simplicity and migrate to native routing only if you can measure a significant performance bottleneck. For most API-driven services, the latency overhead of tunneling (~1-2ms) is acceptable.

    2. Control Plane Failure

    What happens if the connection between the clustermesh-apiserver instances is lost? This is a critical edge case. Cilium is designed for resilience.

    * Existing Connections: Established connections will continue to function based on the policies and identities that were known before the split.

    * New Pods: If a new frontend-service pod starts in cluster-b during the outage, it will receive a local identity. However, cluster-a will not know about this new identity. Traffic from this new pod to billing-service will be dropped because its identity is unknown to the destination Cilium agent.

    This is a fail-closed behavior, which is desirable for security. Once connectivity is restored, identities are re-synchronized, and traffic will flow correctly.

    3. Transparent mTLS Encryption

    Our policy controls access, but the data in transit is still plaintext. Cilium can provide transparent mutual TLS (mTLS) without requiring a service mesh like Istio or Linkerd. It leverages eBPF to hijack traffic and perform encryption/decryption in the kernel, using SPIFFE/SPIRE or a built-in certificate authority for identity documents.

    Enabling mTLS is a configuration change in the Cilium ConfigMap:

    yaml
    # In cilium-config ConfigMap
    mesh-auth-enabled: true
    mesh-auth-spiffe-enabled: true # If using SPIRE

    A policy can then be set to require encryption:

    yaml
    # ... inside the CCNP spec.ingress rule ...
    - fromEndpoints:
        # ...
      authentication:
        mode: "required"

    With this, Cilium will ensure that any traffic between frontend-service and billing-service is encrypted with mTLS, providing end-to-end security across cluster boundaries with minimal performance overhead and zero application code changes.

    Conclusion: The Future of Cloud-Native Networking

    We have moved far beyond simple IP-based ACLs. By leveraging eBPF, Cilium provides a powerful, identity-aware framework for securing modern, distributed applications. We have demonstrated how to build a multi-cluster mesh, enforce granular L7 policies that understand our application's API, and debug the system with purpose-built observability tools. We've also touched on advanced production considerations like data path selection and transparent mTLS.

    This approach solves the fundamental impedance mismatch between static network constructs and dynamic, ephemeral workloads. It allows security policy to be expressed in terms of application logic, not network topology. For senior engineers responsible for the stability, security, and performance of large-scale Kubernetes platforms, mastering these eBPF-based patterns is no longer optional—it is the definitive path forward in cloud-native networking.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles