Advanced Cilium Network Policies with eBPF for Zero-Trust K8s
The Inadequacy of Native Kubernetes NetworkPolicy for Zero-Trust
As architects of distributed systems, we understand that a foundational principle of modern security is zero-trust, which dictates that no actor, system, or service operating within the security perimeter should be trusted by default. In the context of Kubernetes, the primary tool for network segmentation is the NetworkPolicy resource. While essential, its capabilities are fundamentally limited to Layer 3/4 (IP address and port) constructs.
This presents a significant challenge in microservices architectures. A policy allowing pod A to communicate with pod B on port 8080 is a blunt instrument. It permits A to access any endpoint on B's API, including sensitive administrative endpoints like /debug/pprof or /metrics. A compromised pod A has a wide attack surface on pod B. This is where the standard API fails to deliver on the promise of zero-trust. True zero-trust requires identity-aware, L7-aware controls.
This is the problem space where Cilium, leveraging the revolutionary capabilities of eBPF, provides a paradigm shift. By operating directly within the Linux kernel, eBPF allows Cilium to inspect and make policy decisions on network packets with extreme performance and context awareness. This article bypasses the basics of Cilium and dives directly into the advanced implementation patterns of CiliumNetworkPolicy (CNP) and CiliumClusterwideNetworkPolicy (CCNP) that are critical for enforcing a robust zero-trust model in a production environment.
We will explore:
Pattern 1: Granular L7 API Control with `CiliumNetworkPolicy`
The most immediate and powerful upgrade from the standard NetworkPolicy is the ability to enforce rules at the application layer. Let's consider a common scenario: a billing-api service that exposes multiple endpoints, and a frontend-app that should only be able to invoke the public-facing endpoints.
Scenario:
frontend-app (in namespace frontend): Needs to call POST /v1/payments on the billing-api.billing-api (in namespace billing): Exposes POST /v1/payments and a sensitive GET /admin/metrics endpoint.First, let's define our workloads. We'll use simple Nginx pods to simulate the services.
# workloads.yaml
apiVersion: v1
kind: Namespace
metadata:
name: frontend
---
apiVersion: v1
kind: Namespace
metadata:
name: billing
---
apiVersion: v1
kind: Pod
metadata:
name: frontend-app
namespace: frontend
labels:
app: frontend
role: client
spec:
containers:
- name: frontend-container
image: curlimages/curl
command: ["sleep", "3600"]
---
apiVersion: v1
kind: Pod
metadata:
name: billing-api
namespace: billing
labels:
app: billing
role: server
spec:
containers:
- name: billing-container
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: billing-api-svc
namespace: billing
spec:
selector:
app: billing
ports:
- protocol: TCP
port: 80
targetPort: 80
Now, we'll apply the CiliumNetworkPolicy (CNP). Note the ingress section, which specifies toPorts with an L7 rules block.
# billing-l7-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: billing-api-l7-policy
namespace: billing
spec:
endpointSelector:
matchLabels:
app: billing
role: server
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "80"
protocol: TCP
rules:
http:
- method: "POST"
path: "/v1/payments"
Implementation and Verification
- Apply the workloads and the policy:
kubectl apply -f workloads.yaml
kubectl apply -f billing-l7-policy.yaml
frontend-app pod to test connectivity: kubectl exec -it -n frontend frontend-app -- /bin/sh
curl. The billing-api-svc.billing.svc.cluster.local is the FQDN for the service. # This request matches the policy (POST to /v1/payments) and should succeed (or receive a 404 from Nginx, which is still a successful network transaction).
# The key is that the connection is not dropped.
curl -X POST http://billing-api-svc.billing.svc.cluster.local/v1/payments -v
# Expected output: Connection established, HTTP response (e.g., 404 Not Found from default Nginx)
# This request does NOT match the policy (GET to /admin/metrics) and should be dropped.
curl -X GET http://billing-api-svc.billing.svc.cluster.local/admin/metrics -v --connect-timeout 5
# Expected output: curl: (28) Connection timed out after 5001 milliseconds
How eBPF Makes This Possible
Cilium doesn't just use eBPF for L3/L4 packet filtering. For L7 protocols like HTTP, it transparently injects a proxy (built on Envoy) when an L7 rule is defined. However, the initial packet handling and redirection to this proxy are managed by eBPF programs attached to the socket system calls (connect, sendmsg, recvmsg) of the pod's network namespace.
frontend-app makes a TCP connection, the eBPF program on its socket intercepts it.CiliumNetworkPolicy with an L7 rule applies to the destination (billing-api).path and method rules, and either forwards it to the billing-api pod or sends back a connection refused error.This is significantly more efficient than a full-mesh service mesh, where every pod has a sidecar proxy. Cilium's per-node proxy model reduces resource overhead while providing the same L7 visibility for policy enforcement.
Pattern 2: Securing Egress with DNS-Aware Policies
Microservices often need to communicate with external APIs (e.g., Stripe, GitHub, AWS services). Hardcoding IP addresses in network policies is brittle and unmaintainable, as these IPs can change frequently. Cilium's DNS-aware policies solve this problem elegantly.
Scenario:
ci-runner pod in the ci namespace needs to clone code from github.com and push artifacts to an S3 bucket at s3.us-west-2.amazonaws.com.Let's define the runner pod and the corresponding CNP.
# ci-runner.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ci
---
apiVersion: v1
kind: Pod
metadata:
name: ci-runner
namespace: ci
labels:
app: ci-runner
spec:
containers:
- name: runner
image: curlimages/curl
command: ["sleep", "3600"]
Now for the sophisticated egress policy. The key is the toFQDNs block.
# ci-egress-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: ci-runner-egress-policy
namespace: ci
spec:
endpointSelector:
matchLabels:
app: ci-runner
egress:
- toFQDNs:
- matchName: "github.com"
- matchName: "s3.us-west-2.amazonaws.com"
toPorts:
- ports:
- port: "443"
protocol: TCP
# This rule is crucial! It allows DNS queries to kube-dns.
# Without it, the pod can't resolve the FQDNs in the first place.
- toEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": kube-system
"k8s:k8s-app": kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP
rules:
dns:
- matchPattern: "*"
Implementation Details and Kernel-Level Magic
- Apply the resources:
kubectl apply -f ci-runner.yaml
kubectl apply -f ci-egress-policy.yaml
ci-runner pod: kubectl exec -it -n ci ci-runner -- /bin/sh
# This should succeed because github.com is in the allowlist.
curl -v --head https://github.com --connect-timeout 5
# This should also succeed.
curl -v --head https://s3.us-west-2.amazonaws.com --connect-timeout 5
# This should be blocked and time out.
curl -v --head https://api.stripe.com --connect-timeout 5
Cilium's eBPF implementation for this is ingenious:
ci-runner makes a DNS query for github.com, the eBPF program captures the request and the subsequent response from kube-dns. It extracts the returned IP addresses (e.g., 140.82.121.4).140.82.121.4 on port 443, another eBPF program attached to the socket's connect syscall fires. It performs a lookup in the eBPF map. It finds that 140.82.121.4 corresponds to github.com, which is allowed by the policy, so the connection is permitted.This entire process happens in the kernel, making it incredibly fast and efficient. It avoids the complexities of user-space DNS proxies and provides a robust way to manage egress to dynamic cloud services.
Pattern 3: Cluster-wide Policies and Non-Pod Entities
Some security rules are not namespace-specific; they are global concerns. For example, you might want to prevent all pods (except those in a monitoring namespace) from scraping the kubelet's /metrics endpoint on the host node, or you might want to establish a baseline deny-all policy for the entire cluster.
This is where CiliumClusterwideNetworkPolicy (CCNP) and the toEntities selector become indispensable.
Scenario:
- Implement a default-deny ingress policy for the entire cluster.
role: ingress-controller in the networking namespace to receive traffic from outside the cluster.- Allow all intra-cluster communication.
This CCNP creates a secure baseline for the cluster.
# cluster-baseline-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
name: "cluster-baseline-deny-ingress"
spec:
# Apply this policy to all pods in the cluster
endpointSelector: {}
ingress:
- fromEntities:
# The 'cluster' entity represents all endpoints within the cluster.
# This rule allows any pod to talk to any other pod.
- cluster
- fromEntities:
# The 'world' entity represents any IP address outside the cluster.
- world
# This ingress rule is restricted to a specific port and selector.
# This is a deviation from the scenario to show a more practical example.
# Here we only allow world access to ingress controllers.
toPorts:
- ports:
- port: "443"
protocol: TCP
# This policy only applies if the destination pod is an ingress controller.
terminatingEndpointSelector:
matchLabels:
role: ingress-controller
"k8s:io.kubernetes.pod.namespace": networking
Dissecting the `fromEntities` Selector
fromEntities is a powerful construct that extends policy beyond pod labels:
cluster: Represents all pods managed by Cilium in the cluster. The rule fromEntities: [cluster] is a simple way to allow all pod-to-pod communication.world: Represents all IP addresses external to the cluster. This is used for controlling traffic from the internet or other external networks.host: Represents the node itself. This is critical for securing communication between pods and host-level services (like the kubelet or node-exporter).remote-node: Represents other nodes in the cluster.Edge Case: Securing Kubelet Access
A common production requirement is to lock down access to the kubelet's read-only port (10255) and its secure port (10250). A malicious pod could otherwise access sensitive metrics and logs.
Here's a CCNP to restrict access to the kubelet to only Prometheus pods in the monitoring namespace.
# secure-kubelet-access.yaml
apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
name: "secure-kubelet-access"
spec:
# This policy applies to the host entity itself, not pods.
nodeSelector: {}
ingress:
- fromEndpoints:
- matchLabels:
app.kubernetes.io/name: prometheus
"k8s:io.kubernetes.pod.namespace": monitoring
toPorts:
- ports:
# Kubelet secure port
- port: "10250"
protocol: TCP
# Kubelet read-only port (if enabled)
- port: "10255"
protocol: TCP
This policy is unique because it uses nodeSelector: {} to apply to the host network namespace on every node. It then specifies that only pods matching the fromEndpoints selector can initiate connections to the specified kubelet ports. All other pods attempting to connect will be dropped by the eBPF program attached to the host's network interface.
Performance Considerations & Troubleshooting with Hubble
A primary reason for choosing an eBPF-based CNI is performance. Traditional iptables-based network policies in kube-proxy suffer from performance degradation as the number of rules increases. iptables processes rules as a sequential chain, leading to O(n) complexity. A large number of services and policies can result in significant packet latency.
In contrast, eBPF uses hash table lookups in its maps, providing O(1) performance regardless of the number of policies. This results in:
However, when policies don't behave as expected, debugging can be challenging. A curl timing out doesn't tell you why. This is where Hubble, Cilium's observability platform, becomes essential.
Debugging a Policy Violation with Hubble
Let's revisit our first L7 policy example. When we tried to access the forbidden /admin/metrics endpoint, the connection timed out. Let's see what Hubble shows us.
First, enable the Hubble UI or use the CLI. For the CLI:
# Port-forward to the hubble-relay service
kubectl port-forward -n kube-system svc/hubble-relay 4245:80
# In another terminal, run the curl command that is expected to fail
kubectl exec -it -n frontend frontend-app -- curl -X GET http://billing-api-svc.billing.svc.cluster.local/admin/metrics --connect-timeout 5
# Now, use the Hubble CLI to observe the traffic flow for the frontend namespace
hubble observe --namespace frontend -f
The output will provide a real-time stream of network flows. When the forbidden curl is executed, you will see an entry like this:
TIMESTAMP SOURCE DESTINATION TYPE VERDICT SUMMARY
May 20 14:30:15.123 frontend/frontend-app-xyz -> billing/billing-api-abc:80 http-request DROPPED HTTP/1.1 GET /admin/metrics
Policy verdict: L7_POLICY_DENIED
Reason: Policy denied
This output is incredibly valuable for a senior engineer debugging a production issue:
VERDICT: DROPPED: Confirms the packet was dropped, not lost elsewhere.TYPE: http-request: Shows that Cilium understood the traffic at L7.SUMMARY: HTTP/1.1 GET /admin/metrics: Pinpoints the exact request that was blocked.Policy verdict: L7_POLICY_DENIED: Explicitly states that the drop was due to an L7 policy, immediately ruling out L3/L4 connectivity issues.This level of detail dramatically reduces Mean Time to Resolution (MTTR) for network-related issues in a zero-trust environment. You can instantly see which policy is being enforced and why a specific flow is being denied, without having to parse complex iptables rules or sift through ambiguous logs.
Conclusion
Implementing a true zero-trust security posture in Kubernetes requires moving beyond the limitations of the standard NetworkPolicy API. Cilium, through its advanced CRDs and the underlying power of eBPF, provides the necessary tools for senior engineers to build a highly secure, performant, and observable network fabric.
By mastering patterns like L7-aware rules, DNS-based egress controls, and cluster-wide policies targeting non-pod entities, you can enforce the principle of least privilege at a granular level that was previously only achievable with a heavyweight service mesh. The ability to do this directly in the CNI layer, with the performance benefits of in-kernel processing and the deep observability of tools like Hubble, makes eBPF a cornerstone technology for modern cloud-native security architecture.