Advanced eBPF Routing for Sidecar-less Kubernetes Service Mesh
The Inescapable Overhead of the Sidecar Pattern
As senior engineers building distributed systems on Kubernetes, we've largely accepted the sidecar proxy as a necessary cost for the benefits of a service mesh. Observability, mTLS, and advanced traffic routing provided by tools like Istio or Linkerd are indispensable. However, in high-performance or resource-constrained environments, the cost of injecting an Envoy or Linkerd2-proxy into every application pod becomes a significant bottleneck. This isn't a theoretical concern; it's a production reality.
The fundamental problem is the traversal of the network stack. A request from Service A to Service B doesn't go from pod A to pod B. It follows this path:
service-b.namespace.svc.cluster.local.iptables rules to its local Envoy sidecar (e.g., localhost:15001).localhost.For a single hop, we've added two full user-space proxy traversals and multiple trips through the TCP/IP stack within the pod's network namespace. In a complex call chain involving five services, this amounts to ten additional proxy hops. The cumulative impact on P99 latency and the aggregate CPU/memory consumption across a large cluster can be staggering. We're talking about reserving hundreds of millicores and megabytes of RAM per pod just for the mesh infrastructure.
This is where eBPF (extended Berkeley Packet Filter) presents a paradigm shift. By moving networking logic from a user-space sidecar directly into the Linux kernel, we can achieve the same service mesh goals with a fraction of the performance overhead.
eBPF: Kernel-Level Programmability for Networking
We will dispense with the "What is eBPF?" primer. We assume you understand it allows sandboxed programs to run in the kernel. The crucial part for our discussion is which kernel hooks enable a sidecar-less mesh and how they work.
The two primary hooks leveraged by platforms like Cilium are:
cls_bpf): These hooks attach eBPF programs to network interfaces (both physical and virtual, like veth pairs). When a packet enters or leaves an interface, the eBPF program can inspect, modify, redirect, or drop it before it proceeds further up the network stack. This is ideal for enforcing network policies and performing load balancing at L3/L4.cgroup/connect_sock, sock_ops): These hooks are even more powerful. They attach to cgroups and can intercept socket operations like connect(), sendmsg(), and recvmsg(). This allows an eBPF program to redirect a connection from one destination IP/port to another before a single packet is sent. It also enables transparently intercepting application data for tasks like mTLS encryption/decryption.The eBPF Redirection Mechanism
Let's visualize the new request flow from Service A to Service B in an eBPF-powered mesh:
connect() on a socket for Service B's ClusterIP.cgroup/connect_sock eBPF hook triggers.- The eBPF program consults an eBPF map (a highly efficient kernel-space key/value store) that contains the mapping from ClusterIPs to real backend pod IPs.
Pod B IP.Pod B, completely bypassing any user-space proxies and iptables rules.This is not packet forwarding; it's connection-time destination rewriting. The application is entirely unaware this has happened. The performance gain is immense because we've eliminated the two user-space hops and the associated context switching and memory copies.
Here's a conceptual C-like representation of what such an eBPF program might do:
// Simplified pseudo-code for an eBPF sock_addr hook
SEC("cgroup/connect4")
int bpf_socket_redirect(struct bpf_sock_addr *ctx) {
// Only act on TCP connections
if (ctx->protocol != IPPROTO_TCP) {
return BPF_OK;
}
// Check if the destination IP is a ClusterIP we manage
// bpf_map_lookup_elem is a helper to read from an eBPF map
struct service_info *svc = bpf_map_lookup_elem(&service_map, &ctx->user_ip4);
if (svc) {
// This is a service we need to load balance
// The 'select_backend_pod' function would contain the LB logic
// (e.g., round-robin, consistent hashing) reading from another map
struct pod_info *backend = select_backend_pod(svc);
if (backend) {
// Rewrite the destination IP and port in the connection context
ctx->user_ip4 = backend->ip;
ctx->user_port = backend->port;
}
}
return BPF_OK;
}
This kernel-level agility is the foundation of the sidecar-less service mesh.
Production Implementation: L7 Traffic Splitting with Cilium
Cilium is a production-grade CNI that leverages eBPF for networking, observability, and security. Its sidecar-less service mesh capability is built on the principles described above. Let's implement an advanced canary deployment scenario.
Scenario: We have a product-api service. We want to route 90% of traffic to the stable v1 and 10% of traffic with a specific HTTP header (X-Canary-User: true) to the new v2.
First, deploy the two versions of our application:
# product-api-v1-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: product-api-v1
labels:
app: product-api
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: product-api
version: v1
template:
metadata:
labels:
app: product-api
version: v1
spec:
containers:
- name: api
image: my-repo/product-api:v1
ports:
- containerPort: 8080
---
# product-api-v2-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: product-api-v2
labels:
app: product-api
version: v2
spec:
replicas: 1
selector:
matchLabels:
app: product-api
version: v2
template:
metadata:
labels:
app: product-api
version: v2
spec:
containers:
- name: api
image: my-repo/product-api:v2
ports:
- containerPort: 8080
Next, define the Kubernetes Service that acts as the stable endpoint:
# product-api-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: product-api
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
selector:
app: product-api # Selects both v1 and v2 initially
Now for the core logic. We use the Gateway API's HTTPRoute resource, which Cilium can understand and translate into eBPF rules. This is far more expressive than legacy Ingress.
# product-api-httproute.yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: product-api-routing
spec:
parentRefs:
- name: product-api # Attaches this route to the Service
kind: Service
group: ""
port: 80
rules:
- matches:
- headers:
- type: Exact
name: X-Canary-User
value: "true"
backendRefs:
- name: product-api-v2 # Cilium needs Services for backends
kind: Service
port: 80
weight: 100
- backendRefs:
- name: product-api-v1
kind: Service
port: 80
weight: 100
Note: To make this work, you also need to create Service objects for product-api-v1 and product-api-v2 that select only their respective pods. This HTTPRoute attaches to the main product-api service and defines the routing logic.
When this HTTPRoute is applied, Cilium's operator does the following:
- It recognizes the L7 routing rules.
- It realizes that simple eBPF socket redirection is insufficient, as it needs to inspect HTTP headers.
product-api on port 80) to it.product-api ClusterIP. It will redirect this packet to the local Envoy listener.v1 or v2), and then sends the request to the final destination pod IP. The return path is similarly handled by eBPF.This hybrid approach is key. eBPF is used for the heavy lifting of L3/L4 redirection and policy, while a shared, optimized proxy is used only for complex L7 logic, avoiding the per-pod resource cost of a traditional sidecar.
You can verify this behavior using Cilium's observability tool, Hubble:
# Install Hubble CLI
# Enable Hubble UI
hubble ui
# From another terminal, send traffic
# Normal user
kubectl exec -it client-pod -- curl http://product-api
# Canary user
kubectl exec -it client-pod -- curl -H "X-Canary-User: true" http://product-api
The Hubble UI will visually trace the requests, showing that normal traffic flows to v1 pods and canary traffic flows to v2 pods, with the policy decision being applied at the source.
Advanced Use Case: Truly Transparent mTLS
The most impressive feature of an eBPF-based mesh is transparent mTLS without a user-space proxy. Sidecars terminate TLS in user space, meaning traffic between the application and its sidecar is unencrypted. eBPF handles this at a lower level.
Mechanism:
CiliumIdentity CRD.sendmsg() and recvmsg() socket calls.Pod A writes plaintext data to a socket destined for Pod B, the sendmsg eBPF program intercepts this data. It uses the Linux Kernel TLS (KTLS) module to encrypt the data using session keys derived from the SPIFFE identities. The now-encrypted data is then handed to the TCP/IP stack.recvmsg eBPF program receives the encrypted data from the socket buffer, uses KTLS to decrypt it, and presents the plaintext data to the application in Pod B.The application has no idea TLS is involved. It simply reads and writes to a standard TCP socket. This eliminates the unencrypted traffic leg within the pod and avoids the overhead of user-space TLS termination.
Enabling this is remarkably simple:
In the Cilium ConfigMap, you enable mTLS:
# In cilium-config ConfigMap
...
enable-mutual-authentication: "true"
tls-secrets-namespace: "cilium-secrets"
And define a CiliumNetworkPolicy to enforce it:
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "api-mtls-enforcement"
spec:
endpointSelector:
matchLabels:
app: product-api
ingress:
- fromEndpoints:
- matchLabels:
app: client-app
toPorts:
- ports:
- port: "8080"
protocol: TCP
authentication:
mode: "required" # Enforce mTLS
To verify, use cilium monitor:
# On a node, run:
cilium monitor --type l7
# Look for output showing TLS handshake and encrypted traffic
-> Pod client-app/client-pod-xyz -> product-api/product-api-v1-abc identity 12345 -> 54321
TCP 8080 TLS SNI: product-api.default.svc.cluster.local
Performance Analysis: Sidecar vs. Sidecar-less
This is where the eBPF approach truly shines. Let's analyze the performance delta across key metrics.
| Metric | Sidecar-based (Istio) | eBPF-based (Cilium) | Impact |
|---|---|---|---|
| P99 Request Latency | Adds ~5-15ms per hop (2 proxies) | Adds <1ms per hop | In a 5-service chain, this can be the difference between a 75ms overhead and a 5ms overhead. |
| CPU Usage (per pod) | ~50-200m per Envoy sidecar | 0 (logic is in the node-level Cilium agent) | Across 1000 pods, this saves 50-200 full CPU cores dedicated solely to the service mesh. |
| Memory Usage (per pod) | ~50-150MB per Envoy sidecar | 0 | Saves 50-150GB of RAM in a 1000-pod cluster, allowing for significantly higher pod density per node. |
| Network Path | App -> localhost -> Sidecar -> Node -> Sidecar -> App | App -> Node -> App | Drastically simpler, faster, and easier to debug. Fewer points of failure. |
These are not just theoretical numbers. Benchmarks consistently show that by avoiding the user-space data path, eBPF-based meshes reduce latency by an order of magnitude and cut resource consumption dramatically.
Edge Cases and Operational Gotchas
A senior engineer knows there's no such thing as a free lunch. The power of eBPF comes with its own set of complexities.
kubectl logs a sidecar. Debugging becomes a kernel-level activity. You need to become proficient with tools like: * bpftool: The Swiss Army knife for inspecting loaded eBPF programs and maps. You can dump the JIT-compiled assembly, view map contents, and see which programs are attached to which hooks.
* cilium monitor: An essential tool for viewing real-time packet-level events, policy verdicts, and L7 traffic as processed by eBPF.
* Hubble: A higher-level observability platform that uses the data from cilium monitor to build service maps and visualize traffic flows.
Conclusion: A New Architectural Default
The sidecar-less service mesh powered by eBPF is not a niche optimization; it is the logical evolution of service mesh architecture for modern, performance-sensitive infrastructure. By moving policy enforcement, load balancing, and observability into the kernel, we eliminate the primary sources of overhead that have plagued sidecar-based implementations.
While the operational model requires a deeper understanding of Linux kernel primitives and a new set of debugging tools, the benefits are undeniable: radically lower latency, significantly reduced resource consumption, and a simpler data path. For engineering teams pushing the boundaries of scale and performance on Kubernetes, the transition from sidecar to eBPF is no longer a question of if, but when.