eBPF for Zero-Trust Networking in Multi-Cluster Kubernetes
The Multi-Cluster Security Bottleneck: Beyond iptables and Sidecars
In modern platform engineering, a single Kubernetes cluster is often a single point of failure and a scalability boundary. Multi-cluster, multi-region architectures are the norm for achieving high availability and geo-locality. However, securing traffic between these clusters introduces significant complexity. Traditional NetworkPolicy is a cluster-scoped resource, and legacy enforcement using iptables suffers from performance degradation as rule sets grow. While service meshes like Istio offer multi-cluster capabilities, they do so at the cost of injecting sidecar proxies into every pod, adding resource overhead, and increasing latency.
This is where eBPF (extended Berkeley Packet Filter) represents a paradigm shift. By running sandboxed programs directly in the Linux kernel, eBPF-powered CNIs like Cilium can implement networking, observability, and security logic with near-native performance. This article provides a deep, implementation-focused guide on leveraging Cilium's Cluster Mesh to build a high-performance, identity-aware zero-trust network fabric across multiple Kubernetes clusters.
We will not cover the basics of eBPF or Cilium. Instead, we will focus on the advanced architectural patterns and specific configurations required to solve production-grade, multi-cluster security challenges. Our goal is to move from a default-allow, IP-based security posture to a default-deny, cryptographically-verified identity-based model that spans cluster boundaries seamlessly.
Architectural Premise: The eBPF Datapath Advantage
Before diving into implementation, it's critical to understand why eBPF is superior for this use case. A traditional CNI using iptables processes packets against a linear chain of rules in kernel space. As policies become more complex, this chain grows, and every packet must traverse it, leading to increased CPU usage and latency. A service mesh sidecar moves policy enforcement to user space, intercepting all traffic with a proxy. This adds multiple network hops within the same pod and significant memory/CPU overhead.
Cilium's eBPF datapath bypasses this entirely. eBPF programs are attached to network interfaces (at the Traffic Control TC or Express Data Path XDP hooks). When a packet arrives, the eBPF program executes directly in the kernel, making an immediate policy decision based on metadata stored in eBPF maps. For service routing, it replaces kube-proxy's iptables or IPVS rules with a highly efficient eBPF map lookup, performing direct node-to-node routing.
Key Advantages for Multi-Cluster Zero-Trust:
iptables rule traversal or user-space proxying minimizes latency and CPU overhead.CiliumIdentity objects, not ephemeral pod IPs. This is crucial in a dynamic, multi-cluster environment where IP addresses are meaningless across network boundaries.Lab Environment Setup: A Realistic Multi-Cluster Scenario
We will simulate a two-cluster setup, representing, for example, a us-west-1 and a eu-central-1 region. For reproducibility, we'll use kind (Kubernetes in Docker), but the principles and Cilium configurations are directly applicable to managed services like GKE, EKS, or AKS.
Prerequisites:
* kubectl
* helm v3
* kind
* cilium-cli
Step 1: Create Two kind Clusters
We need to disable the default CNI and provide an explicit API server address that is accessible from the other cluster's nodes (in a real cloud environment, this would be a public endpoint or a VPC-peered private address).
# cluster1-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
# IMPORTANT: Use the actual IP of your host machine for the API server
# so clusters can communicate. Find with `ip addr` or `ifconfig`.
apiServerAddress: "192.168.1.100" # <-- CHANGE THIS
apiServerPort: 6443
nodes:
- role: control-plane
- role: worker
- role: worker
---
# cluster2-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
apiServerAddress: "192.168.1.100" # <-- CHANGE THIS
apiServerPort: 6444
nodes:
- role: control-plane
- role: worker
- role: worker
Now, create the clusters:
kind create cluster --name cluster1 --config cluster1-config.yaml
kind create cluster --name cluster2 --config cluster2-config.yaml
Step 2: Install Cilium with Cluster Mesh Enabled
We will install Cilium via Helm, specifying unique cluster IDs and names, and enabling the Cluster Mesh feature.
# Switch context to cluster1
kubectl config use-context kind-cluster1
helm repo add cilium https://helm.cilium.io/
# Install Cilium on Cluster 1
helm install cilium cilium/cilium --version 1.15.5 \
--namespace kube-system \
--set cluster.name=cluster1 \
--set cluster.id=1 \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup \
--set k8sServiceHost=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#https://##') \
--set k8sServicePort=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#.*:##')
# Switch context to cluster2
kubectl config use-context kind-cluster2
# Install Cilium on Cluster 2
helm install cilium cilium/cilium --version 1.15.5 \
--namespace kube-system \
--set cluster.name=cluster2 \
--set cluster.id=2 \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup \
--set k8sServiceHost=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#https://##') \
--set k8sServicePort=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' | sed 's#.*:##')
Step 3: Enable and Connect the Cluster Mesh
Now we enable the mesh on both clusters and then connect them.
# Switch context to each cluster and enable cluster mesh
cilium-cli --context kind-cluster1 clustermesh enable
cilium-cli --context kind-cluster2 clustermesh enable
# Connect the two clusters
cilium-cli --context kind-cluster1 clustermesh connect --destination-context kind-cluster2
# Verify the connection status
cilium-cli --context kind-cluster1 clustermesh status --wait
# Expected Output:
# ✅ All clusters connected! Quorum: 2/2
# ✅ Global services synchronised
# ...
Our two-cluster fabric is now online. The Cilium agents in each cluster are aware of each other, but by default, no traffic is allowed between them. We have a foundation for our zero-trust posture.
Implementing a Global Zero-Trust Policy
Our strategy is to establish a default-deny posture at the cluster boundary and then explicitly allow required communication paths using identity-based policies.
Step 1: Deploy Workloads
Let's deploy a backend service in cluster1 and a frontend service in cluster2. We'll also deploy a netshoot pod in both for diagnostics.
# Apply to cluster1
kubectl config use-context kind-cluster1
kubectl create ns backend-ns
kubectl apply -n backend-ns -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
labels:
app: backend-api
spec:
replicas: 2
selector:
matchLabels:
app: backend-api
template:
metadata:
labels:
app: backend-api
spec:
containers:
- name: backend
image: gcr.io/google-samples/hello-app:1.0
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: backend-svc
labels:
app: backend-api
spec:
selector:
app: backend-api
ports:
- port: 80
targetPort: 8080
---
apiVersion: v1
kind: Pod
metadata:
name: netshoot-c1
labels:
app: netshoot
spec:
containers:
- name: netshoot
image: nicolaka/netshoot
command: ["sleep", "3600"]
EOF
# Apply to cluster2
kubectl config use-context kind-cluster2
kubectl create ns frontend-ns
kubectl apply -n frontend-ns -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend-app
labels:
app: frontend-app
spec:
replicas: 1
selector:
matchLabels:
app: frontend-app
template:
metadata:
labels:
app: frontend-app
spec:
containers:
- name: frontend
image: nicolaka/netshoot
command: ["sleep", "3600"]
EOF
Step 2: Establish a Default-Deny Cross-Cluster Posture
We use a CiliumClusterwideNetworkPolicy (CCNP) to define rules that apply across the entire cluster. We will create a policy that denies all ingress from remote clusters by default.
# apply-to-both-clusters.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
name: "default-deny-cross-cluster"
spec:
description: "Deny all ingress traffic from remote clusters by default"
endpointSelector: {}
ingress:
- fromEndpoints:
- matchLabels:
'cilium.io/cluster-name': cluster1
- matchLabels:
'cilium.io/cluster-name': cluster2
# This allows intra-cluster traffic. The absence of a rule for remote clusters means deny.
Apply this to cluster1 (the destination cluster for our test traffic):
kubectl config use-context kind-cluster1
kubectl apply -f apply-to-both-clusters.yaml
Step 3: Verify the Deny Policy
At this point, the frontend-app in cluster2 should not be able to reach the backend-svc in cluster1, even though Cluster Mesh is connected. First, we need to make the service in cluster1 discoverable globally.
# In cluster1
kubectl config use-context kind-cluster1
kubectl annotate service -n backend-ns backend-svc io.cilium/global-service=true
This annotation tells Cilium to advertise this service to all other clusters in the mesh. In cluster2, you can now resolve its DNS name: backend-svc.backend-ns.svc.cluster.local.
Let's try to connect from the frontend-app pod in cluster2:
kubectl config use-context kind-cluster2
FRONTEND_POD=$(kubectl get pods -n frontend-ns -l app=frontend-app -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n frontend-ns $FRONTEND_POD -- curl -m 3 backend-svc.backend-ns.svc.cluster.local
# Expected Output:
# curl: (28) Connection timed out after 3000 milliseconds
The connection times out as expected. Our zero-trust baseline is working.
Step 4: Create a Specific Allow Policy Based on Identity
Now, we'll create a precise policy that allows ingress to the backend-api pods only from pods with the label app: frontend-app located in the frontend-ns namespace of cluster2.
This is where the power of identity-based security becomes clear. We are not dealing with IP ranges or CIDRs. The policy is tied to Kubernetes metadata, which Cilium translates into a secure, non-IP identity.
# allow-frontend-to-backend.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
name: "allow-frontend-to-backend"
namespace: backend-ns # Policies can be namespaced or cluster-wide
spec:
description: "Allow frontend in cluster2 to access backend-api in cluster1"
endpointSelector:
matchLabels:
app: backend-api
ingress:
- fromEndpoints:
- matchLabels:
# This is the identity selector for the source pod
'k8s:app': frontend-app
'k8s:io.kubernetes.pod.namespace': frontend-ns
# This is the crucial cluster selector
'cilium.io/cluster-name': cluster2
toPorts:
- ports:
- port: "8080"
protocol: TCP
Let's break down the fromEndpoints selector:
* 'k8s:app': frontend-app: Selects pods with this label.
* 'k8s:io.kubernetes.pod.namespace': frontend-ns: Narrows the selection to a specific namespace.
* 'cilium.io/cluster-name': cluster2: This is the key for multi-cluster policy. It explicitly states that the source identity must originate from cluster2.
Apply this policy to cluster1 where the backend-api workload resides:
kubectl config use-context kind-cluster1
kubectl apply -f allow-frontend-to-backend.yaml
Step 5: Re-verify Connectivity
Let's run the same curl command from the frontend-app pod in cluster2:
kubectl config use-context kind-cluster2
FRONTEND_POD=$(kubectl get pods -n frontend-ns -l app=frontend-app -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n frontend-ns $FRONTEND_POD -- curl -m 3 backend-svc.backend-ns.svc.cluster.local
# Expected Output:
# Hello, world!
# Version: 1.0.0
# Hostname: backend-api-xxxxxxxx-xxxxx
Success. We have implemented a fine-grained, identity-aware, cross-cluster security policy.
Advanced Edge Case: Securing Host and NodePort Traffic
Pod-to-pod policies are the primary use case, but production environments have other traffic patterns. A common challenge is securing access to NodePort services or traffic originating from the host network itself (e.g., a monitoring agent running directly on the node).
Cilium can extend its policy model to nodes by treating them as endpoints. Let's create a scenario where a Prometheus instance in cluster1 needs to scrape a NodePort metrics endpoint in cluster2.
Scenario Setup:
backend-svc in cluster1 via a NodePort.cluster1 that allows ingress to that NodePort only from nodes in cluster2.First, change the service type in cluster1:
kubectl config use-context kind-cluster1
kubectl patch svc -n backend-ns backend-svc -p '{"spec": {"type": "NodePort"}}'
# Get the assigned NodePort
NODE_PORT=$(kubectl get -n backend-ns -o jsonpath="{.spec.ports[0].nodePort}" services backend-svc)
echo "Backend NodePort is: $NODE_PORT"
Now, let's craft a CiliumNetworkPolicy that targets host endpoints. We use a nodeSelector to apply the policy to the nodes themselves, not the pods.
# allow-cluster2-nodes-to-nodeport.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-cluster2-nodes-to-nodeport
spec:
# Apply this policy to all nodes in cluster1
nodeSelector:
matchLabels: {}
ingress:
# Allow ingress from any node in cluster2
- fromCIDRSet:
- cidr: <CIDR_OF_CLUSTER2_NODES> # Replace with actual node CIDR
toPorts:
- ports:
- port: "<NODE_PORT>" # Replace with actual NodePort value
protocol: TCP
This CIDR-based approach is a necessary evil for host-level traffic, as nodes themselves don't have the same rich identity labels as pods. In a real cloud environment, you would populate fromCIDRSet with the VPC/subnet CIDRs of your cluster2 nodes. This is less ideal than identity-based policy but provides a critical layer of defense for non-pod traffic that is still superior to leaving NodePorts open to the world.
To test this, you would shell into a node in cluster2 and curl a node IP from cluster1 on the assigned NodePort.
# From a node in cluster2
curl <CLUSTER1_NODE_IP>:$NODE_PORT
This demonstrates a pragmatic approach to handling legacy or non-pod traffic patterns within an eBPF-powered zero-trust framework.
Performance and Observability Implications
We chose eBPF for a reason: performance. Let's quantify it.
* Latency: In typical benchmarks comparing Cilium's eBPF datapath to an Istio service mesh, the P99 latency for pod-to-pod requests can be 20-30% lower with Cilium. This is because the eBPF path decision happens in a single kernel context switch, whereas Istio requires traffic to traverse the network stack up to the user-space Envoy proxy and back down.
* CPU/Memory: Bypassing kube-proxy with Cilium's eBPF-based service load balancing drastically reduces CPU usage on nodes, especially in services with a high number of backends. It avoids the linear scaling problem of iptables and the constant resource consumption of sidecar proxies.
* Scalability: eBPF maps have O(1) lookup complexity, meaning policy enforcement and service routing performance do not degrade as the number of services or policies increases. This is a fundamental advantage over iptables chains which have O(n) complexity.
Visualizing Policy with Hubble
One of the most powerful features enabled by eBPF is deep, transparent observability. Hubble, Cilium's observability component, can tap into the eBPF datastream to provide real-time service maps and traffic flow diagnostics.
Let's enable the Hubble UI and inspect our cross-cluster flow.
# Enable Hubble on both clusters
cilium-cli --context kind-cluster1 hubble enable --ui
cilium-cli --context kind-cluster2 hubble enable --ui
# Port-forward the UI for cluster1
cilium-cli --context kind-cluster1 hubble ui
Now, generate some traffic again from cluster2 to cluster1:
kubectl exec -n frontend-ns $FRONTEND_POD -- curl -m 3 backend-svc.backend-ns.svc.cluster.local
In the Hubble UI (usually at localhost:12000), you will see a service map that visually connects the frontend-app in cluster2 to the backend-api in cluster1. You can click on the traffic flow and see the exact Layer 4 and Layer 7 details, including the policy verdict (Forwarded or Dropped) and the specific Cilium policy rule that was applied. This level of introspection, without any code changes or sidecars, is invaluable for debugging complex network policies in a multi-cluster environment.
Conclusion: eBPF as the Foundation for Future-Proof Cloud Native Security
We have successfully built and verified a zero-trust network spanning two Kubernetes clusters, enforced at the kernel level with eBPF. By moving beyond cluster-scoped, IP-based policies, we have created a security posture that is more performant, scalable, and resilient to the dynamic nature of containerized workloads.
Key Takeaways for Senior Engineers:
iptables and sidecars entirely.CiliumClusterwideNetworkPolicy, implementing a global default-deny posture and incrementally building an allow-list is a practical and robust strategy.While this setup requires a deeper understanding of the networking stack than traditional approaches, the operational and security benefits for large-scale, distributed systems are profound. As Kubernetes deployments continue to grow in complexity and span multiple clouds and regions, eBPF is poised to become the de facto standard for high-performance, secure cloud native networking.