Advanced K8s Security: Sidecar-less Policies with Cilium & eBPF

18 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Performance Tax of the Sidecar Pattern at Scale

In modern cloud-native architectures, the service mesh, typically implemented via a sidecar proxy like Envoy (popularized by Istio and Linkerd), has become a standard for achieving observability, security, and reliability. For network security, the model is straightforward: a proxy is injected into every application pod, intercepting all ingress and egress traffic. This allows the proxy to enforce L3-L7 policies, manage mTLS, and collect detailed telemetry. While powerful, this pattern introduces a non-trivial performance and operational tax that becomes increasingly problematic at scale.

Senior engineers responsible for platform stability and cost optimization are often the first to encounter these limitations. The issues are not with the concept, but with the user-space implementation's inherent trade-offs:

  • Resource Overhead: Every sidecar is a running process. It consumes a baseline of CPU and memory, independent of the application it accompanies. In a cluster with thousands of pods, this translates to a significant portion of node resources dedicated solely to proxying, resources that could otherwise be used by applications. A typical Envoy proxy might consume anywhere from 50-100m CPU and 50-100Mi of memory under light load, a cost that is multiplied across every replica of every service.
  • Increased Network Latency: Traffic no longer flows directly from one application's network namespace to another. Instead, it follows a more convoluted path: App A -> App A's Envoy (localhost) -> Node Network Stack -> App B's Envoy (localhost) -> App B. Each trip through a user-space proxy adds latency. While a few milliseconds may seem negligible for a single request, this overhead accumulates across complex call chains and can degrade the P99 latency of user-facing services.
  • Operational Complexity: The sidecar injection mechanism, while often automated, is another point of failure. It can interfere with pod startup, complicate debugging (is it the app or the proxy?), and increase the complexity of the CI/CD pipeline. Managing certificates for mTLS across thousands of proxies requires a robust and highly available control plane, adding another critical component to the infrastructure stack.
  • To quantify this, consider a hypothetical scenario comparing latency and resource costs:

    MetricWithout Sidecar (Baseline)With Sidecar (per pod)Impact on 1000-Pod Cluster
    P99 Request Latency5ms+2-5ms per hopSignificant degradation in deep stacks
    CPU ReservationN/A~100m CPU100 vCPUs dedicated to proxies
    Memory ReservationN/A~100Mi Memory~97 GiB RAM dedicated to proxies

    These are not just numbers; they represent real infrastructure costs and potential SLA breaches. This is the core problem that kernel-native solutions using eBPF are designed to solve.


    eBPF and Cilium: A Kernel-Native Paradigm Shift

    eBPF (extended Berkeley Packet Filter) fundamentally changes the equation by allowing us to run sandboxed programs directly within the Linux kernel. Instead of forcing network packets to detour into a user-space proxy for inspection, we can inspect and manipulate them as they traverse the kernel's own networking stack. This is not a new networking API; it's a generic, event-driven compute engine inside the kernel.

    Cilium is a CNI (Container Network Interface) that leverages eBPF to provide networking, observability, and security. For network policy, its mechanism is profoundly different from a sidecar's:

  • eBPF Program Attachment: Cilium attaches eBPF programs to specific hooks in the kernel, primarily the Traffic Control (TC) ingress/egress hooks on the network devices (e.g., veth pairs connected to pods).
  • Packet-in-Kernel Processing: When a packet leaves a pod, it hits the egress TC hook. The attached eBPF program executes. This program has access to the packet data and metadata.
  • Policy Decision in Kernel: Cilium's control plane translates CiliumNetworkPolicy objects into eBPF maps. These maps are highly efficient key-value stores in the kernel. The eBPF program on the TC hook can perform a lookup in these maps to determine if the packet's source, destination, port, and protocol are allowed. The decision to DROP or ACCEPT the packet happens right there, in kernel space, without the packet ever being copied to a user-space process.
  • Let's visualize the difference:

    Sidecar Model Data Path:

    mermaid
    graph TD
        A[Pod A: Application] --> B(Pod A: Envoy Proxy);
        B --> C{Node Kernel Networking Stack};
        C --> D(Pod B: Envoy Proxy);
        D --> E[Pod B: Application];

    Cilium eBPF Model Data Path:

    mermaid
    graph TD
        A[Pod A: Application] --> B{Node Kernel Networking Stack};
        subgraph Kernel Space
            B -- eBPF Program Executes --> B;
        end
        B --> C[Pod B: Application];

    The eBPF path is shorter, faster, and more efficient. The context switch and memory copy overhead of moving packets between kernel and user space is eliminated entirely for policy enforcement.


    Core Implementation: A Production-Grade Cilium Deployment

    To leverage eBPF's full potential, Cilium must be configured correctly. A default installation may not enable all the performance-critical features. Here is a production-grade helm values configuration for deploying Cilium, focusing on replacing kube-proxy and optimizing for an eBPF-native data path.

    yaml
    # values-production.yaml
    
    # Enable eBPF-based kube-proxy replacement for ultimate performance.
    # This removes iptables/ipvs from the node's service routing path.
    kubeProxyReplacement: strict
    
    # Use the eBPF host routing for pod-to-pod traffic on the same node.
    # Avoids the overhead of the IP stack for local traffic.
    bpf:
      # Pre-allocate eBPF maps to avoid runtime allocation failures under load.
      preallocateMaps: true
    
    # Enable Hubble for deep network observability without performance impact.
    hubble:
      enabled: true
      # Relay is the backend service that collects data from the agents.
      relay:
        enabled: true
      # The UI is optional but highly recommended for visualization.
      ui:
        enabled: true
    
    # Operator settings for managing Cilium-specific resources.
    operator:
      replicas: 2 # Run in HA mode
    
    # IP Address Management (IPAM)
    # Using 'cluster-pool' with a specific CIDR is common for on-prem or
    # environments where you need to control pod IP ranges.
    ipam:
      mode: "cluster-pool"
      operator:
        clusterPoolIPv4PodCIDR: "10.0.0.0/16"
    
    # Enable endpoint-based routing instead of CIDR-based routing for policies.
    # This is more secure and dynamic.
    enable-endpoint-routes: true
    
    # L7 Policy Settings
    # Enable the HTTP and Kafka parsers for L7 policy enforcement.
    policy-enforcement-mode: "default"
    L7Proxy: true # Even for L7, Cilium uses an optimized proxy. This enables it.

    Deployment Steps:

  • Add the Cilium Helm repository:
  • bash
        helm repo add cilium https://helm.cilium.io/
  • Install Cilium using the production values file:
  • bash
        helm install cilium cilium/cilium --version 1.15.1 \
          --namespace kube-system \
          -f values-production.yaml
  • Verify the Installation and eBPF Status:
  • After the pods are running, verify that Cilium is operating in the desired mode. This is a critical step.

    bash
        # Check the overall status of the Cilium deployment
        cilium status --wait
    
        # Expected Output Snippet:
        # ...
        # KubeProxyReplacement:   Strict   [enforced by eBPF]
        # ...
        # Status:                 OK   Health: OK

    The KubeProxyReplacement: Strict line confirms that iptables rules are no longer being used for service routing, a major performance win.

    You can also inspect the eBPF programs loaded on a node:

    bash
        # Exec into a cilium-agent pod
        kubectl -n kube-system exec -it ds/cilium -- cilium bpf list

    Advanced L3/L4 Policy Enforcement without Proxies

    Now that Cilium is running, let's implement a realistic, complex policy for a multi-tenant application. Imagine the following namespaces and services:

    * ns: web-tier: Contains frontend pods (label app=frontend).

    * ns: api-tier: Contains backend-api pods (label app=backend-api).

    * ns: data-tier: Contains multiple database pods, each with a tenant-id label (e.g., app=postgres, tenant-id=t1).

    Goal: Enforce strict, least-privilege network access.

  • frontend can only make egress calls to backend-api on TCP port 8080.
  • backend-api can only make egress calls to pods in the data-tier with the label app=postgres on TCP port 5432.
    • All other traffic (inter-namespace, intra-namespace, and egress to the internet) should be denied by default.

    First, we apply a default-deny policy to the relevant namespaces. This is a critical security posture.

    yaml
    # default-deny-all.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "default-deny-all"
      namespace: web-tier
    spec:
      endpointSelector: {}
      ingress: []
      egress: []
    ---
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "default-deny-all"
      namespace: api-tier
    spec:
      endpointSelector: {}
      ingress: []
      egress: []
    ---
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "default-deny-all"
      namespace: data-tier
    spec:
      endpointSelector: {}
      ingress: []
      egress: []

    Now, we layer in the specific allow rules.

    yaml
    # allow-traffic-policy.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-frontend-to-backend"
      namespace: web-tier
    spec:
      endpointSelector:
        matchLabels:
          app: frontend
      egress:
      - toEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": api-tier
            app: backend-api
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
    --- 
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-backend-to-database"
      namespace: api-tier
    spec:
      endpointSelector:
        matchLabels:
          app: backend-api
      egress:
      - toEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": data-tier
            app: postgres
        toPorts:
        - ports:
          - port: "5432"
            protocol: TCP
    # We also need to allow ingress into the backend and database
    --- 
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-ingress-to-backend"
      namespace: api-tier
    spec:
      endpointSelector:
        matchLabels:
          app: backend-api
      ingress:
      - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": web-tier
            app: frontend
    --- 
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-ingress-to-database"
      namespace: data-tier
    spec:
      endpointSelector:
        matchLabels:
          app: postgres
      ingress:
      - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": api-tier
            app: backend-api

    Verification with Hubble:

    This is where the observability power of eBPF shines. We can use Hubble's CLI to watch the traffic and see the policy decisions happening in real-time.

    bash
    # Port-forward the Hubble Relay service
    kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
    
    # Watch for allowed traffic from frontend to backend
    hubble observe --from-pod web-tier/frontend-pod-name --to-pod api-tier/backend-api-pod-name --verdict ALLOWED -f
    
    # Attempt a blocked connection (e.g., curl from frontend to an external site)
    kubectl exec -n web-tier frontend-pod-name -- curl -m 2 https://google.com
    
    # Watch for the dropped packet in Hubble
    hubble observe --from-pod web-tier/frontend-pod-name --verdict DROPPED -f
    # Expected Output:
    # TIMESTAMP            SOURCE -> DESTINATION              VERDICT   REASON
    # Oct 26 12:30:00.123  web-tier/frontend -> world         DROPPED   Policy denied on egress

    The key is that this drop happened in the kernel via an eBPF program. No sidecar was involved.


    The Holy Grail: Sidecar-less L7 Policy Enforcement

    While L3/L4 policies are powerful, many critical security rules operate at the application layer (L7). A common assumption is that L7 enforcement requires a user-space proxy. With Cilium, this is not always true.

    Cilium's eBPF data path includes parsers for common protocols like HTTP, gRPC, and Kafka. For these protocols, eBPF can perform initial filtering in the kernel, deciding whether a packet even needs to be sent to Cilium's lightweight, embedded Envoy proxy for deeper inspection. This is a hybrid approach that offers the best of both worlds: kernel-native performance for most traffic, with intelligent escalation to a proxy only when necessary.

    Scenario: The backend-api service has public and admin endpoints.

    Any authenticated service can access GET /api/v1/public/.

    Only pods with the label role=admin-tool can access POST /api/v1/admin/.

    Here is the CiliumNetworkPolicy to enforce this. It extends our previous L4 policy for the backend-api.

    yaml
    # l7-policy-backend-api.yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "l7-policy-for-backend-api"
      namespace: api-tier
    spec:
      endpointSelector:
        matchLabels:
          app: backend-api
      ingress:
      - fromEndpoints:
        - matchLabels:
            # Allow from any pod in the web-tier namespace
            "k8s:io.kubernetes.pod.namespace": web-tier
        - matchLabels:
            # Also allow from any pod with the admin-tool role
            role: admin-tool
        toPorts:
        - ports:
          - port: "8080"
            protocol: TCP
          rules:
            http:
            # Rule 1: Allow GET to public paths from anyone allowed by fromEndpoints
            - method: "GET"
              path: "/api/v1/public/.*"
            # Rule 2: Restrict POST to admin paths to only the admin tool
            - method: "POST"
              path: "/api/v1/admin/.*"
              fromEndpoints:
              - matchLabels:
                  role: "admin-tool"

    How it works under the hood:

  • When a packet arrives on port 8080 destined for backend-api, the eBPF program at the TC hook identifies it as TCP traffic.
  • The program sees an L7 HTTP rule is in place. It begins parsing the HTTP headers in the kernel.
  • If a request is GET /api/v1/public/foo, the eBPF program can match this against the policy. It also checks the source identity (which Cilium tracks in an eBPF map) to see if it's from the web-tier or an admin-tool. If it matches, the packet is allowed to proceed directly to the application.
  • If a request is POST /api/v1/admin/bar, the eBPF program again checks the path and the source identity. If the source pod does not have the role=admin-tool label, the eBPF program can drop the packet immediately.
    • Only for more complex cases (e.g., header manipulation, which Cilium doesn't do in this policy) would the packet be redirected to the user-space proxy.

    This eBPF pre-filtering dramatically reduces the amount of traffic that ever touches the user-space proxy, providing the security of L7 policies with performance much closer to pure L4 kernel-path forwarding.


    Edge Cases and Performance Considerations

    A production deployment requires understanding the system's limits and potential failure modes.

    Kernel Version Dependencies

    eBPF is a rapidly evolving kernel feature. Advanced Cilium functionality depends on a sufficiently new kernel. A production cluster should ideally run a kernel version of 5.10 or newer. Older kernels (e.g., 4.19) have support but may lack optimizations or specific features, forcing Cilium to fall back to less performant data paths. Always check the Cilium documentation for the minimum kernel version required for features like kubeProxyReplacement or L7 parsing.

    eBPF Map Limits

    Cilium uses eBPF maps extensively to store state, such as connection tracking entries, policy identities, and service mappings. These maps consume kernel memory and have configurable limits. In a large cluster with high connection churn, you might exhaust these limits.

    Monitoring:

    bash
    # Exec into a cilium-agent pod
    kubectl -n kube-system exec -it ds/cilium -- cilium bpf ct list global
    kubectl -n kube-system exec -it ds/cilium -- cilium bpf nat list

    What happens when a map is full? New connections may be dropped. You will see drops with a Stale or unroutable IP reason in Hubble.

    Mitigation: You can increase the map sizes in the Cilium ConfigMap or via Helm values. For example:

    yaml
    bpf:
      ctGlobalTcpMax: 1000000   # Default is 524288
      ctGlobalAnyMax: 500000     # Default is 262144
      natGlobalMax: 1000000    # Default is 524288

    Monitor your map usage and adjust these values proactively based on your cluster's load.

    Performance Benchmarking: eBPF vs. Sidecar

    To provide concrete evidence, we can run a simple benchmark. We'll use the wrk load testing tool to measure latency between a client and server pod.

    Setup:

    * pod-client and pod-server in the same cluster.

    * pod-server runs a simple Nginx server.

    Test Scenarios:

  • Baseline: No network policy applied.
  • Cilium eBPF: A simple L4 allow policy is applied between the client and server.
  • Istio Sidecar: The namespace is injected with Istio sidecars, and an AuthorizationPolicy is applied to allow the traffic.
  • Methodology:

    From pod-client, run: wrk -t4 -c100 -d30s http://pod-server

    Expected Results (Illustrative):

    ScenarioAvg. LatencyP99 LatencyServer Pod CPU (sidecar)
    1. Baseline0.51ms1.2msN/A
    2. Cilium eBPF0.55ms1.4msN/A
    3. Istio Sidecar2.85ms6.5ms~120m CPU

    Analysis: The results clearly show the performance advantage of the eBPF approach. The latency added by the Cilium policy is negligible because the decision happens in-kernel. The Istio sidecar, requiring two trips through a user-space proxy, adds several milliseconds of latency and consumes significant CPU resources just to forward traffic.

    Interoperability with External Services

    Policies often need to allow traffic to services outside the cluster. Cilium handles this with toCIDRs and toEntities rules.

    Example: Allow backend-api to talk to the AWS RDS metadata endpoint (169.254.169.254) and a specific external monitoring service (e.g., 198.51.100.10/32).

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "allow-egress-to-external"
      namespace: api-tier
    spec:
      endpointSelector:
        matchLabels:
          app: backend-api
      egress:
      - toCIDR:
        - 169.254.169.254/32
        - 198.51.100.10/32

    This rule is also compiled into eBPF maps, making lookups for external IPs just as efficient as for internal pods.


    Conclusion: The Future is Kernel-Native

    For Kubernetes network security, the sidecar model is no longer the only option. By leveraging eBPF, Cilium provides a more performant, resource-efficient, and operationally simpler alternative for enforcing L3-L7 network policies. The ability to make security decisions directly in the Linux kernel eliminates the latency and resource tax imposed by user-space proxies, leading to a faster and more cost-effective platform.

    This approach is not a full replacement for all service mesh functionality—features like advanced traffic splitting, fault injection, and transparent retries often still benefit from a sophisticated user-space proxy. However, with the emergence of Cilium Service Mesh, even these features are being integrated into a more eBPF-native architecture.

    For senior engineers building and maintaining large-scale Kubernetes infrastructure, moving security enforcement from the sidecar to the kernel is a strategic decision. It reduces complexity, lowers operational costs, and strengthens the security posture without compromising the performance of the applications it's designed to protect. The future of cloud-native networking and security is increasingly being written in eBPF, and it's happening directly inside the kernel.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles