eBPF for Zero-Trust Networking in Multi-Cluster Kubernetes

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Multi-Cluster Security Bottleneck: Beyond iptables and Sidecars

In modern platform engineering, a single Kubernetes cluster is often a single point of failure and a scalability boundary. Multi-cluster, multi-region architectures are the norm for achieving high availability and geo-locality. However, securing traffic between these clusters introduces significant complexity. Traditional NetworkPolicy is a cluster-scoped resource, and legacy enforcement using iptables suffers from performance degradation as rule sets grow. While service meshes like Istio offer multi-cluster capabilities, they do so at the cost of injecting sidecar proxies into every pod, adding resource overhead, and increasing latency.

This is where eBPF (extended Berkeley Packet Filter) represents a paradigm shift. By running sandboxed programs directly in the Linux kernel, eBPF-powered CNIs like Cilium can implement networking, observability, and security logic with near-native performance. This article provides a deep, implementation-focused guide on leveraging Cilium's Cluster Mesh to build a high-performance, identity-aware zero-trust network fabric across multiple Kubernetes clusters.

We will not cover the basics of eBPF or Cilium. Instead, we will focus on the advanced architectural patterns and specific configurations required to solve production-grade, multi-cluster security challenges. Our goal is to move from a default-allow, IP-based security posture to a default-deny, cryptographically-verified identity-based model that spans cluster boundaries seamlessly.

Architectural Premise: The eBPF Datapath Advantage

Before diving into implementation, it's critical to understand why eBPF is superior for this use case. A traditional CNI using iptables processes packets against a linear chain of rules in kernel space. As policies become more complex, this chain grows, and every packet must traverse it, leading to increased CPU usage and latency. A service mesh sidecar moves policy enforcement to user space, intercepting all traffic with a proxy. This adds multiple network hops within the same pod and significant memory/CPU overhead.

Cilium's eBPF datapath bypasses this entirely. eBPF programs are attached to network interfaces (at the Traffic Control TC or Express Data Path XDP hooks). When a packet arrives, the eBPF program executes directly in the kernel, making an immediate policy decision based on metadata stored in eBPF maps. For service routing, it replaces kube-proxy's iptables or IPVS rules with a highly efficient eBPF map lookup, performing direct node-to-node routing.

Key Advantages for Multi-Cluster Zero-Trust:

  • Performance: Kernel-level enforcement without iptables rule traversal or user-space proxying minimizes latency and CPU overhead.
  • IP-Independent Identity: Policies are based on cryptographic workload identities (via SPIFFE/SPIRE) embedded in CiliumIdentity objects, not ephemeral pod IPs. This is crucial in a dynamic, multi-cluster environment where IP addresses are meaningless across network boundaries.
  • Deep Visibility: Since eBPF operates at the kernel level, it can observe all system calls and network packets, providing unparalleled visibility through tools like Hubble without any application instrumentation.
  • Lab Environment Setup: A Realistic Multi-Cluster Scenario

    We will simulate a two-cluster setup, representing, for example, a us-west-1 and a eu-central-1 region. For reproducibility, we'll use kind (Kubernetes in Docker), but the principles and Cilium configurations are directly applicable to managed services like GKE, EKS, or AKS.

    Prerequisites:

    * kubectl

    * helm v3

    * kind

    * cilium-cli

    Step 1: Create Two kind Clusters

    We need to disable the default CNI and provide an explicit API server address that is accessible from the other cluster's nodes (in a real cloud environment, this would be a public endpoint or a VPC-peered private address).

    bash
    # cluster1-config.yaml
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    networking:
      disableDefaultCNI: true
      # IMPORTANT: Use the actual IP of your host machine for the API server
      # so clusters can communicate. Find with `ip addr` or `ifconfig`.
      apiServerAddress: "192.168.1.100" # <-- CHANGE THIS
      apiServerPort: 6443
    nodes:
    - role: control-plane
    - role: worker
    - role: worker
    
    ---
    # cluster2-config.yaml
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    networking:
      disableDefaultCNI: true
      apiServerAddress: "192.168.1.100" # <-- CHANGE THIS
      apiServerPort: 6444
    nodes:
    - role: control-plane
    - role: worker
    - role: worker

    Now, create the clusters:

    bash
    kind create cluster --name cluster1 --config cluster1-config.yaml
    kind create cluster --name cluster2 --config cluster2-config.yaml

    Step 2: Install Cilium with Cluster Mesh Enabled

    We will install Cilium via Helm, specifying unique cluster IDs and names, and enabling the Cluster Mesh feature.

    bash
    # Switch context to cluster1
    kubectl config use-context kind-cluster1
    
    helm repo add cilium https://helm.cilium.io/
    
    # Install Cilium on Cluster 1
    helm install cilium cilium/cilium --version 1.15.5 \
       --namespace kube-system \
       --set cluster.name=cluster1 \
       --set cluster.id=1 \
       --set ipam.mode=kubernetes \
       --set kubeProxyReplacement=true \
       --set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
       --set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
       --set cgroup.autoMount.enabled=false \
       --set cgroup.hostRoot=/sys/fs/cgroup \
       --set k8sServiceHost=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#https://##') \
       --set k8sServicePort=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#.*:##')
    
    # Switch context to cluster2
    kubectl config use-context kind-cluster2
    
    # Install Cilium on Cluster 2
    helm install cilium cilium/cilium --version 1.15.5 \
       --namespace kube-system \
       --set cluster.name=cluster2 \
       --set cluster.id=2 \
       --set ipam.mode=kubernetes \
       --set kubeProxyReplacement=true \
       --set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
       --set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
       --set cgroup.autoMount.enabled=false \
       --set cgroup.hostRoot=/sys/fs/cgroup \
       --set k8sServiceHost=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#https://##') \
       --set k8sServicePort=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#.*:##')

    Step 3: Enable and Connect the Cluster Mesh

    Now we enable the mesh on both clusters and then connect them.

    bash
    # Switch context to each cluster and enable cluster mesh
    cilium-cli --context kind-cluster1 clustermesh enable
    cilium-cli --context kind-cluster2 clustermesh enable
    
    # Connect the two clusters
    cilium-cli --context kind-cluster1 clustermesh connect --destination-context kind-cluster2
    
    # Verify the connection status
    cilium-cli --context kind-cluster1 clustermesh status --wait
    # Expected Output:
    # ✅ All clusters connected! Quorum: 2/2
    # ✅ Global services synchronised
    # ...

    Our two-cluster fabric is now online. The Cilium agents in each cluster are aware of each other, but by default, no traffic is allowed between them. We have a foundation for our zero-trust posture.

    Implementing a Global Zero-Trust Policy

    Our strategy is to establish a default-deny posture at the cluster boundary and then explicitly allow required communication paths using identity-based policies.

    Step 1: Deploy Workloads

    Let's deploy a backend service in cluster1 and a frontend service in cluster2. We'll also deploy a netshoot pod in both for diagnostics.

    bash
    # Apply to cluster1
    kubectl config use-context kind-cluster1
    kubectl create ns backend-ns
    kubectl apply -n backend-ns -f - <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: backend-api
      labels:
        app: backend-api
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: backend-api
      template:
        metadata:
          labels:
            app: backend-api
        spec:
          containers:
          - name: backend
            image: gcr.io/google-samples/hello-app:1.0
            ports:
            - containerPort: 8080
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: backend-svc
      labels:
        app: backend-api
    spec:
      selector:
        app: backend-api
      ports:
      - port: 80
        targetPort: 8080
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: netshoot-c1
      labels:
        app: netshoot
    spec:
      containers:
      - name: netshoot
        image: nicolaka/netshoot
        command: ["sleep", "3600"]
    EOF
    
    # Apply to cluster2
    kubectl config use-context kind-cluster2
    kubectl create ns frontend-ns
    kubectl apply -n frontend-ns -f - <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: frontend-app
      labels:
        app: frontend-app
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: frontend-app
      template:
        metadata:
          labels:
            app: frontend-app
        spec:
          containers:
          - name: frontend
            image: nicolaka/netshoot
            command: ["sleep", "3600"]
    EOF

    Step 2: Establish a Default-Deny Cross-Cluster Posture

    We use a CiliumClusterwideNetworkPolicy (CCNP) to define rules that apply across the entire cluster. We will create a policy that denies all ingress from remote clusters by default.

    yaml
    # apply-to-both-clusters.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "default-deny-cross-cluster"
    spec:
      description: "Deny all ingress traffic from remote clusters by default"
      endpointSelector: {}
      ingress:
      - fromEndpoints:
        - matchLabels:
            'cilium.io/cluster-name': cluster1
        - matchLabels:
            'cilium.io/cluster-name': cluster2
      # This allows intra-cluster traffic. The absence of a rule for remote clusters means deny.

    Apply this to cluster1 (the destination cluster for our test traffic):

    bash
    kubectl config use-context kind-cluster1
    kubectl apply -f apply-to-both-clusters.yaml

    Step 3: Verify the Deny Policy

    At this point, the frontend-app in cluster2 should not be able to reach the backend-svc in cluster1, even though Cluster Mesh is connected. First, we need to make the service in cluster1 discoverable globally.

    bash
    # In cluster1
    kubectl config use-context kind-cluster1
    kubectl annotate service -n backend-ns backend-svc io.cilium/global-service=true

    This annotation tells Cilium to advertise this service to all other clusters in the mesh. In cluster2, you can now resolve its DNS name: backend-svc.backend-ns.svc.cluster.local.

    Let's try to connect from the frontend-app pod in cluster2:

    bash
    kubectl config use-context kind-cluster2
    FRONTEND_POD=$(kubectl get pods -n frontend-ns -l app=frontend-app -o jsonpath='{.items[0].metadata.name}')
    kubectl exec -n frontend-ns $FRONTEND_POD -- curl -m 3 backend-svc.backend-ns.svc.cluster.local
    
    # Expected Output:
    # curl: (28) Connection timed out after 3000 milliseconds

    The connection times out as expected. Our zero-trust baseline is working.

    Step 4: Create a Specific Allow Policy Based on Identity

    Now, we'll create a precise policy that allows ingress to the backend-api pods only from pods with the label app: frontend-app located in the frontend-ns namespace of cluster2.

    This is where the power of identity-based security becomes clear. We are not dealing with IP ranges or CIDRs. The policy is tied to Kubernetes metadata, which Cilium translates into a secure, non-IP identity.

    yaml
    # allow-frontend-to-backend.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: "allow-frontend-to-backend"
      namespace: backend-ns # Policies can be namespaced or cluster-wide
    spec:
      description: "Allow frontend in cluster2 to access backend-api in cluster1"
      endpointSelector:
        matchLabels:
          app: backend-api
      ingress:
      - fromEndpoints:
        - matchLabels:
            # This is the identity selector for the source pod
            'k8s:app': frontend-app
            'k8s:io.kubernetes.pod.namespace': frontend-ns
            # This is the crucial cluster selector
            'cilium.io/cluster-name': cluster2
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP

    Let's break down the fromEndpoints selector:

    * 'k8s:app': frontend-app: Selects pods with this label.

    * 'k8s:io.kubernetes.pod.namespace': frontend-ns: Narrows the selection to a specific namespace.

    * 'cilium.io/cluster-name': cluster2: This is the key for multi-cluster policy. It explicitly states that the source identity must originate from cluster2.

    Apply this policy to cluster1 where the backend-api workload resides:

    bash
    kubectl config use-context kind-cluster1
    kubectl apply -f allow-frontend-to-backend.yaml

    Step 5: Re-verify Connectivity

    Let's run the same curl command from the frontend-app pod in cluster2:

    bash
    kubectl config use-context kind-cluster2
    FRONTEND_POD=$(kubectl get pods -n frontend-ns -l app=frontend-app -o jsonpath='{.items[0].metadata.name}')
    kubectl exec -n frontend-ns $FRONTEND_POD -- curl -m 3 backend-svc.backend-ns.svc.cluster.local
    
    # Expected Output:
    # Hello, world!
    # Version: 1.0.0
    # Hostname: backend-api-xxxxxxxx-xxxxx

    Success. We have implemented a fine-grained, identity-aware, cross-cluster security policy.

    Advanced Edge Case: Securing Host and NodePort Traffic

    Pod-to-pod policies are the primary use case, but production environments have other traffic patterns. A common challenge is securing access to NodePort services or traffic originating from the host network itself (e.g., a monitoring agent running directly on the node).

    Cilium can extend its policy model to nodes by treating them as endpoints. Let's create a scenario where a Prometheus instance in cluster1 needs to scrape a NodePort metrics endpoint in cluster2.

    Scenario Setup:

  • Expose the backend-svc in cluster1 via a NodePort.
  • Create a policy in cluster1 that allows ingress to that NodePort only from nodes in cluster2.
  • First, change the service type in cluster1:

    bash
    kubectl config use-context kind-cluster1
    kubectl patch svc -n backend-ns backend-svc -p '{"spec": {"type": "NodePort"}}'
    
    # Get the assigned NodePort
    NODE_PORT=$(kubectl get -n backend-ns -o jsonpath="{.spec.ports[0].nodePort}" services backend-svc)
    echo "Backend NodePort is: $NODE_PORT"

    Now, let's craft a CiliumNetworkPolicy that targets host endpoints. We use a nodeSelector to apply the policy to the nodes themselves, not the pods.

    yaml
    # allow-cluster2-nodes-to-nodeport.yaml
    apiVersion: cilium.io/v2
    kind: CiliumNetworkPolicy
    metadata:
      name: allow-cluster2-nodes-to-nodeport
    spec:
      # Apply this policy to all nodes in cluster1
      nodeSelector:
        matchLabels: {}
      ingress:
        # Allow ingress from any node in cluster2
      - fromCIDRSet:
        - cidr: <CIDR_OF_CLUSTER2_NODES> # Replace with actual node CIDR
        toPorts:
        - ports:
          - port: "<NODE_PORT>" # Replace with actual NodePort value
            protocol: TCP

    This CIDR-based approach is a necessary evil for host-level traffic, as nodes themselves don't have the same rich identity labels as pods. In a real cloud environment, you would populate fromCIDRSet with the VPC/subnet CIDRs of your cluster2 nodes. This is less ideal than identity-based policy but provides a critical layer of defense for non-pod traffic that is still superior to leaving NodePorts open to the world.

    To test this, you would shell into a node in cluster2 and curl a node IP from cluster1 on the assigned NodePort.

    bash
    # From a node in cluster2
    curl <CLUSTER1_NODE_IP>:$NODE_PORT

    This demonstrates a pragmatic approach to handling legacy or non-pod traffic patterns within an eBPF-powered zero-trust framework.

    Performance and Observability Implications

    We chose eBPF for a reason: performance. Let's quantify it.

    * Latency: In typical benchmarks comparing Cilium's eBPF datapath to an Istio service mesh, the P99 latency for pod-to-pod requests can be 20-30% lower with Cilium. This is because the eBPF path decision happens in a single kernel context switch, whereas Istio requires traffic to traverse the network stack up to the user-space Envoy proxy and back down.

    * CPU/Memory: Bypassing kube-proxy with Cilium's eBPF-based service load balancing drastically reduces CPU usage on nodes, especially in services with a high number of backends. It avoids the linear scaling problem of iptables and the constant resource consumption of sidecar proxies.

    * Scalability: eBPF maps have O(1) lookup complexity, meaning policy enforcement and service routing performance do not degrade as the number of services or policies increases. This is a fundamental advantage over iptables chains which have O(n) complexity.

    Visualizing Policy with Hubble

    One of the most powerful features enabled by eBPF is deep, transparent observability. Hubble, Cilium's observability component, can tap into the eBPF datastream to provide real-time service maps and traffic flow diagnostics.

    Let's enable the Hubble UI and inspect our cross-cluster flow.

    bash
    # Enable Hubble on both clusters
    cilium-cli --context kind-cluster1 hubble enable --ui
    cilium-cli --context kind-cluster2 hubble enable --ui
    
    # Port-forward the UI for cluster1
    cilium-cli --context kind-cluster1 hubble ui

    Now, generate some traffic again from cluster2 to cluster1:

    bash
    kubectl exec -n frontend-ns $FRONTEND_POD -- curl -m 3 backend-svc.backend-ns.svc.cluster.local

    In the Hubble UI (usually at localhost:12000), you will see a service map that visually connects the frontend-app in cluster2 to the backend-api in cluster1. You can click on the traffic flow and see the exact Layer 4 and Layer 7 details, including the policy verdict (Forwarded or Dropped) and the specific Cilium policy rule that was applied. This level of introspection, without any code changes or sidecars, is invaluable for debugging complex network policies in a multi-cluster environment.

    Conclusion: eBPF as the Foundation for Future-Proof Cloud Native Security

    We have successfully built and verified a zero-trust network spanning two Kubernetes clusters, enforced at the kernel level with eBPF. By moving beyond cluster-scoped, IP-based policies, we have created a security posture that is more performant, scalable, and resilient to the dynamic nature of containerized workloads.

    Key Takeaways for Senior Engineers:

  • Embrace Identity over IP: In multi-cluster and cloud native environments, IP addresses are ephemeral and often meaningless across network boundaries. Building security policies on stable, verifiable workload identities is non-negotiable.
  • Kernel-Level Enforcement is a Performance Game-Changer: For services on the critical path, the overhead of user-space proxies (sidecars) can be prohibitive. eBPF offers a way to implement sophisticated logic with minimal performance impact, bypassing iptables and sidecars entirely.
  • Default-Deny is Achievable: With tools like Cilium Cluster Mesh and CiliumClusterwideNetworkPolicy, implementing a global default-deny posture and incrementally building an allow-list is a practical and robust strategy.
  • Observability is Not an Afterthought: The ability to see exactly why a packet was dropped or allowed, and which policy was responsible, is critical for operating a zero-trust network. eBPF provides this visibility for free.
  • While this setup requires a deeper understanding of the networking stack than traditional approaches, the operational and security benefits for large-scale, distributed systems are profound. As Kubernetes deployments continue to grow in complexity and span multiple clouds and regions, eBPF is poised to become the de facto standard for high-performance, secure cloud native networking.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles