Sidecarless Service Mesh with eBPF: A Performance Deep Dive

October 3, 2025

13 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inescapable Overhead of the Sidecar Pattern

For years, the sidecar proxy—popularized by service meshes like Istio and Linkerd—has been the de facto standard for introducing observability, security, and reliability features into cloud-native applications. By injecting an L7 proxy like Envoy next to every application container, we gained language-agnostic mTLS, traffic splitting, and detailed metrics without modifying application code. However, this elegance comes at a significant and often underestimated cost in production environments.

Every packet destined for or originating from your application pod must traverse the full TCP/IP stack twice within the same network namespace: once for the application and once for the sidecar proxy. This journey involves multiple context switches between userspace and the kernel, memory copies between buffers, and the serialization/deserialization overhead of the proxy itself.

Let's visualize the data path for a simple request between two pods, pod-a and pod-b, on the same node in a sidecar-based mesh:

pod-a (App): Application writes data to its socket.

Kernel (netns-a): Data travels up the TCP/IP stack.

pod-a (Sidecar): The request is redirected via iptables to the sidecar's listening port. The sidecar receives the data from its socket, moving it from kernel space to its userspace buffer.

Sidecar Logic: Envoy processes the request (applies policies, gathers metrics, etc.).

pod-a (Sidecar): The sidecar writes the (potentially modified) data to its outbound socket.

Kernel (netns-a): Data travels down the TCP/IP stack again to the virtual ethernet device (veth).

Node Network: The packet traverses the node's network stack and is routed to pod-b's veth.

Kernel (netns-b): The packet enters pod-b's network namespace and is again redirected by iptables to its sidecar.

pod-b (Sidecar): The sidecar proxy receives, processes, and forwards the request to the application container via the localhost interface.

This round trip adds non-trivial latency (P99 latency can increase by 10-50ms+ under load) and consumes significant CPU and memory resources across the cluster, as every single pod requires its own dedicated proxy instance. For latency-sensitive services or clusters with high pod density, this overhead becomes a primary operational bottleneck.

This article is not an introduction to eBPF. It assumes you understand its core concepts: kernel-level programmability, hooks (like TC, XDP, sock_ops), and verifiers. We will focus exclusively on how this technology enables a fundamentally more efficient service mesh architecture.

The eBPF Alternative: In-Kernel Service Routing and Policy

eBPF (extended Berkeley Packet Filter) allows us to run sandboxed programs directly within the Linux kernel, triggered by various events, including network packet processing. This enables a sidecarless service mesh to move core data plane logic from a per-pod userspace proxy into a shared, highly efficient kernel-level layer.

Let's re-examine the data path for the same intra-node request, this time with an eBPF-powered mesh like Cilium:

pod-a (App): Application writes data to its socket.

Kernel (sock_ops Hook): An eBPF program attached to the socket operation hooks (sock_ops) identifies that this connection is between two managed pods on the same node.

eBPF Magic (bpf_sockmap): The eBPF program directly connects the sockets of the two pods using a bpf_sockmap, effectively bypassing the entire TCP/IP stack for both pods. Data is written from pod-a's socket buffer directly to pod-b's socket buffer.

pod-b (App): The application reads the data from its socket as if it came over the network, completely unaware of the kernel-level optimization.

The TCP/IP stack, iptables redirection, and multiple context switches are entirely eliminated. This isn't just a minor optimization; it's a fundamental re-architecture of pod-to-pod communication, leading to near-bare-metal network performance.

For inter-node traffic, eBPF programs attached at the Traffic Control (TC) ingress/egress hooks can perform L3/L4 load balancing, apply network policies, and collect metrics directly as packets enter and leave the node, still avoiding the per-pod proxy overhead.

Achieving Service Mesh Features with eBPF

L4 Load Balancing & Routing: eBPF's sock_ops and TC hooks, combined with BPF maps to store service endpoint information, allow for highly efficient L4 routing and load balancing directly in the kernel.

L7 Policy Enforcement: This is where the model gets nuanced. Parsing complex L7 protocols like HTTP/2 or gRPC within the constrained eBPF runtime is challenging and often inefficient. Advanced eBPF-based meshes like Cilium solve this with a hybrid model. The eBPF data path handles all L3/L4 logic. When an L7 policy needs to be enforced, the eBPF program transparently redirects only the relevant traffic to a shared, highly-optimized Envoy proxy running once per node (not per pod). This provides the best of both worlds: extreme performance for most traffic and rich L7 capabilities when required, without the per-pod resource cost.

Mutual TLS (mTLS): Instead of sidecars managing certificates and terminating TLS, eBPF can integrate with kernel-level encryption technologies. Cilium, for example, can use WireGuard (a VPN technology built into the Linux kernel) or IPsec. The eBPF programs at the TC layer automatically encrypt traffic between nodes before it leaves the host and decrypt it upon arrival, providing transparent encryption without any userspace proxies involved in the data path.

Observability: eBPF programs can be attached to kernel tracepoints (kprobes) to monitor system calls related to networking (connect, send, recv). This allows the mesh to gather granular, low-overhead metrics on latency, throughput, and error rates directly from the kernel, providing a source of truth that is independent of any userspace proxy.

Production Implementation Pattern: Cilium Service Mesh

Cilium is a CNI (Container Network Interface) that has evolved to provide a full-featured service mesh using eBPF. Let's walk through some production-grade configurations.

Architecture Overview

cilium-agent: A DaemonSet that runs on every node. It manages the eBPF programs, updates BPF maps with service/policy information from the Kubernetes API, and runs the optional node-local Envoy proxy for L7 policies.

eBPF Programs: Loaded by the agent onto the node's network interfaces (e.g., veth pairs) and kernel hooks (sock_ops, TC).

Hubble: The observability component, which reads data from the eBPF programs to provide service maps, metrics, and request tracing.

Code Example 1: Advanced L7 HTTP Policy Enforcement

Imagine a scenario where a billing-api service needs to expose a /metrics endpoint to a prometheus service but restrict access to a sensitive /api/v1/invoices/export endpoint to only a finance-batch service.

First, we define a CiliumNetworkPolicy. Unlike a standard NetworkPolicy, this CRD is L7-aware.

yaml

# cilium-l7-policy.yaml
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "billing-api-l7-policy"
  namespace: "production"
spec:
  endpointSelector:
    matchLabels:
      app: billing-api
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: prometheus
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/metrics"
  - fromEndpoints:
    - matchLabels:
        app: finance-batch
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/api/v1/invoices/export"

When you apply this manifest, the Cilium agent on the node hosting the billing-api pod does the following:

It identifies that an L7 policy is required for traffic to port 8080.
It configures the eBPF program at the TC hook to redirect incoming TCP traffic on port 8080 for this pod to the node-local Envoy proxy.

It pushes the specific HTTP routing rules (GET /metrics, POST /...) to this Envoy instance.

We can observe this in real-time. If we exec into the Cilium agent pod on that node and run cilium monitor, we can see the policy decisions.

bash

# Attempting a forbidden request from a 'dev-tool' pod
# kubectl exec -n production dev-tool-pod -- curl -X GET http://billing-api:8080/api/v1/invoices/export

# Output from 'cilium monitor -n production --type drop'
xx drop (L7 proxy policy denied) flow 0x0... identity dev-tool-pod-identity -> billing-api-identity:8080 tcp

The key takeaway is that only traffic subject to L7 rules ever touches a proxy. All other traffic (e.g., database connections on other ports) is handled purely at the L4 eBPF layer, preserving maximum performance.

Code Example 2: Transparent Encryption with WireGuard

Enabling transparent, node-to-node encryption is remarkably simple compared to managing certificates for a sidecar-based mTLS system. You typically enable it via the Cilium Helm chart or ConfigMap.

yaml

# Example values for Cilium Helm chart
encryption:
  enabled: true
  type: wireguard

Once deployed, Cilium handles everything:

It generates WireGuard keys for each node and distributes them via Kubernetes secrets.

The cilium-agent on each node configures a cilium_wg0 WireGuard interface.

eBPF programs at the TC egress hook are updated. When a packet is leaving a pod destined for a pod on another node, the eBPF program encapsulates it in a WireGuard UDP packet and sends it to the destination node's WireGuard endpoint.
The receiving node's kernel WireGuard module decrypts the packet, and its eBPF program forwards the original packet to the destination pod.

We can verify this is working using cilium status:

bash

$ kubectl exec -n kube-system cilium-xxxxx -- cilium status | grep Encryption
Encryption: WireGuard

This provides strong encryption for all inter-node pod traffic without any per-pod proxies, certificate rotation complexities, or application-level TLS handshakes, dramatically reducing both operational and performance overhead.

Performance Analysis: Theory vs. Reality

To quantify the difference, let's consider a benchmark setup using fortio in a 3-node Kubernetes cluster. We'll test pod-to-pod requests-per-second (RPS) and P99 latency.

Test Scenarios:

Baseline: No service mesh, direct pod-to-pod communication.

Sidecar Mesh: Istio 1.15 with default Envoy sidecar injection.

Sidecarless eBPF Mesh: Cilium 1.12 with L7 policy enabled.

Hypothetical Benchmark Results:

Scenario	Throughput (RPS)	P99 Latency (ms)	Avg CPU/Pod (Proxy)	Avg Memory/Pod (Proxy)
Baseline (No Mesh)	15,000	1.2	0m	0Mi
Istio (Sidecar)	9,500 (-37%)	8.5 (+608%)	150m	100Mi
Cilium (Sidecarless/eBPF)	14,200 (-5%)	1.8 (+50%)	~15m (amortized)	~10Mi (amortized)

Note: The CPU/Memory for Cilium is amortized. A single node-local proxy serves all pods on that node, so the per-pod cost is the total proxy cost divided by the number of pods.

The results are stark. The sidecar model imposes a significant penalty on both throughput and latency. The eBPF model, while not entirely free of overhead (the L7 inspection still has a cost), performs remarkably close to the baseline. The resource savings are even more dramatic, especially in high-density clusters where hundreds of sidecars are replaced by a handful of node-local agents.

Advanced Edge Cases and Production Caveats

Adopting an eBPF-based mesh is not a silver bullet. It introduces a new set of complexities that senior engineers must consider.

Kernel Version Dependency: This is the most critical operational constraint. eBPF is a rapidly evolving kernel feature. To leverage advanced capabilities like sock_ops acceleration or modern TC hooks, you need a relatively recent Linux kernel (5.4+ is a good baseline, 5.10+ is even better). This can be a major blocker in enterprises with slow kernel upgrade cycles or those using managed Kubernetes services that offer older kernel versions on their nodes.

Debugging is a Different Skillset: When something goes wrong in a sidecar, you have familiar tools: kubectl logs on the Envoy container, Envoy's admin endpoint, and standard userspace debugging tools. When a packet is dropped or misrouted by an eBPF program in the kernel, debugging becomes much harder. You need to become proficient with tools like:

* bpftool: The Swiss Army knife for inspecting loaded eBPF programs and maps.

* cilium monitor: Provides a high-level view of packet flows and policy decisions.

* Kernel tracing tools (trace-cmd, perf): For deep, low-level analysis.

This represents a significant learning curve for teams accustomed to userspace debugging.

The "Proxy Isn't Gone, It's Centralized" Nuance: For complex L7 functionality (e.g., request retries, gRPC-Web transcoding, JWT validation), Cilium still relies on Envoy. The key is that it's a single, shared Envoy instance per node. This creates a different trade-off: you lose the fine-grained isolation of a per-pod proxy (a misconfiguration or crash in the node-local proxy could affect all pods on that node), but you gain massive resource efficiency. You must understand this hybrid architecture and when traffic is being handled purely in-kernel versus when it's being handed off to the userspace proxy.

Security Context and Privileges: The cilium-agent DaemonSet requires powerful capabilities (CAP_SYS_ADMIN, CAP_NET_ADMIN) to load eBPF programs into the kernel. This is a significant security consideration. While the eBPF verifier provides strong safety guarantees against kernel panics, a compromised Cilium agent could potentially gain deep control over the node's networking. This privileged access must be tightly controlled and audited.

Conclusion: A Decision Framework for Senior Engineers

eBPF-based sidecarless service meshes represent a fundamental architectural evolution, shifting the data plane from a distributed set of userspace proxies into the kernel. This move delivers undeniable and substantial gains in performance and resource efficiency.

However, it is not a universal replacement for the sidecar model. The decision to adopt it should be based on a clear-eyed assessment of these trade-offs:

Choose a sidecarless eBPF mesh (like Cilium) when:

Your services are highly sensitive to P99 latency.
You are running large, high-density clusters where the cumulative resource cost of sidecars is a significant expense.
Your primary use cases are L3/L4 policy, transparent encryption, and basic L7 observability, which can be handled almost entirely in-kernel.
Your operations team has the expertise (or is willing to build it) to manage and debug kernel-level infrastructure and can ensure nodes run modern Linux kernels.

Stick with a traditional sidecar mesh (like Istio) when:

Your organization's infrastructure has strict, slow-moving kernel version constraints.
Your team's debugging expertise is firmly rooted in userspace and container logs.
You rely heavily on a vast ecosystem of complex, Envoy-specific L7 features (e.g., custom Wasm plugins) that benefit from the per-pod isolation and configuration model.
Simplicity of operation and a mature ecosystem are more important than raw performance and resource efficiency.

The future of service mesh is likely hybrid. But for engineers pushing the boundaries of performance and scale in cloud-native environments, the sidecarless eBPF architecture is no longer an emerging trend—it's a production-ready reality that demands serious consideration.