Cilium & eBPF: Enforcing L7 Zero-Trust in Production Kubernetes

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond IP Addresses: The Imperative for Application-Aware Security

In any non-trivial Kubernetes environment, the default networking.k8s.io/v1.NetworkPolicy resource quickly reveals its limitations. While effective for basic L3/L4 segmentation, it operates on an outdated security paradigm: IP addresses and ports. In a dynamic microservices architecture where pods are ephemeral and IPs are constantly changing, IP-based rules are not only difficult to manage but fundamentally insecure. A compromised pod that is allowed to communicate with a database on port 5432 has full access, regardless of the legitimacy of its requests.

This is where the principle of zero-trust, specifically applied at the application layer (L7), becomes critical. We must enforce security based on a verifiable workload identity and the specific intent of the communication. A frontend service should not just be allowed to talk to the api-service on port 8080; it should be restricted to making GET requests to /api/v1/products and nothing else.

Traditional solutions to this problem often involve a service mesh with sidecar proxies (e.g., Istio with Envoy). While powerful, this approach introduces significant operational overhead, resource consumption, and added latency for every network call. This article presents a more efficient, kernel-native approach using Cilium and its revolutionary eBPF-powered datapath.

We will assume you are familiar with Kubernetes fundamentals, basic network policies, and the concept of eBPF. Our focus will be on the advanced implementation details, edge cases, and production patterns required to build a robust, L7-aware zero-trust network fabric.

The eBPF Advantage: Kernel-Level Enforcement Without Sidecars

Before diving into policy specifics, it's crucial to understand why Cilium's eBPF implementation is a paradigm shift from traditional iptables-based CNI plugins.

  • Identity over IP: Cilium assigns a cryptographic identity to every pod based on its Kubernetes labels. This identity is a simple integer (e.g., 53181). When a policy is created allowing app=frontend to talk to app=api, Cilium translates this into a rule allowing identity X to communicate with identity Y. This mapping is stored in highly efficient eBPF maps directly within the Linux kernel.
  • Direct Kernel Path: When a pod initiates a network connection, an eBPF program attached to the network socket or traffic control layer intercepts it. It extracts the security identity of the source and destination, performs a lookup in the eBPF policy map, and makes an enforcement decision—all within the kernel context. This avoids the convoluted chains and performance penalties of iptables.
  • Protocol-Aware Parsers: For L7 policies, Cilium attaches eBPF programs that can parse specific protocols like HTTP, Kafka, gRPC, and more. This parsing happens in the kernel, allowing for policy decisions based on data like HTTP paths or Kafka topics without ever sending the packet to a userspace proxy. This is the key to high-performance L7 security without sidecars.
  • This architecture provides the security benefits of a service mesh's L7 awareness with the performance characteristics of a highly optimized kernel datapath.


    Scenario: Securing a Multi-Tier E-Commerce Application

    Let's model a realistic application with the following components:

    * frontend: The public-facing web server.

    * api-gateway: Handles business logic, talks to backend services.

    * products-db: A PostgreSQL database for product catalogs.

    * prometheus: A monitoring service that scrapes metrics.

    Our goal is to implement a strict zero-trust policy:

    • Deny all traffic by default.
  • Allow frontend to make GET requests to /products on the api-gateway.
  • Allow api-gateway to connect to products-db on port 5432.
  • Allow prometheus to make GET requests to /metrics on api-gateway.
  • Block all other communication, including frontend trying to access /metrics or api-gateway trying to execute a DELETE request.
  • Step 1: Baseline Deployments and Default Deny

    First, let's define our application components. For brevity, we'll use simple NGINX and PostgreSQL placeholders.

    Application Manifest (app-deployments.yaml):

    yaml
    apiVersion: v1
    kind: Namespace
    metadata:
      name: e-commerce
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: frontend
      namespace: e-commerce
      labels:
        app: frontend
        tier: presentation
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: frontend
      template:
        metadata:
          labels:
            app: frontend
            tier: presentation
        spec:
          containers:
          - name: nginx
            image: nginx
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: frontend-svc
      namespace: e-commerce
    spec:
      selector:
        app: frontend
      ports:
      - port: 80
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: api-gateway
      namespace: e-commerce
      labels:
        app: api-gateway
        tier: business
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: api-gateway
      template:
        metadata:
          labels:
            app: api-gateway
            tier: business
        spec:
          containers:
          - name: nginx
            image: nginx
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: api-gateway-svc
      namespace: e-commerce
    spec:
      selector:
        app: api-gateway
      ports:
      - name: http
        port: 80
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: products-db
      namespace: e-commerce
      labels:
        app: products-db
        tier: data
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: products-db
      template:
        metadata:
          labels:
            app: products-db
            tier: data
        spec:
          containers:
          - name: postgres
            image: postgres:13
            env:
            - name: POSTGRES_PASSWORD
              value: "supersecret"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: products-db-svc
      namespace: e-commerce
    spec:
      selector:
        app: products-db
      ports:
      - port: 5432

    Now, we apply the cornerstone of zero-trust: a default deny policy for the entire namespace.

    Default Deny Policy (default-deny.yaml):

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "default-deny-all"
      namespace: e-commerce
    spec:
      endpointSelector: {}
      ingress: []
      egress: []

    This policy selects all endpoints (endpointSelector: {}) in the e-commerce namespace and applies empty ingress and egress rules. In Cilium, an empty rule set means deny all. After applying this, no pod can communicate with any other pod, inside or outside the namespace.

    Step 2: Implementing Identity-Based L4 and L7 Policies

    Now we will layer our specific allow rules on top of the default deny. Cilium policies are additive; traffic is permitted if any policy allows it.

    Application-Specific Policies (app-policies.yaml):

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "api-gateway-policy"
      namespace: e-commerce
    spec:
      endpointSelector:
        matchLabels:
          app: api-gateway
      # Ingress rules for the api-gateway
      ingress:
      - fromEndpoints:
        - matchLabels:
            app: frontend
        # L7 HTTP-specific rules
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/products"
      - fromEndpoints:
        - matchLabels:
            # Assuming prometheus runs in the 'monitoring' namespace
            # with the label 'app: prometheus'
            'k8s:io.kubernetes.pod.namespace': monitoring
            app: prometheus
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/metrics"
    
      # Egress rules for the api-gateway
      egress:
      - toEndpoints:
        - matchLabels:
            app: products-db
        # L4 TCP-specific rule
        toPorts:
        - ports:
          - port: "5432"
            protocol: TCP
    ---
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "frontend-policy"
      namespace: e-commerce
    spec:
      endpointSelector:
        matchLabels:
          app: frontend
      # Frontend only needs to talk out to the api-gateway
      egress:
      - toEndpoints:
        - matchLabels:
            app: api-gateway
        toPorts:
        - ports:
          - port: "80"
            protocol: TCP
      # Allow egress to DNS for FQDN policies later
      - toPorts:
        - ports:
          - port: '53'
            protocol: UDP
          rules:
            dns:
            - matchPattern: "*"

    Let's break down the api-gateway-policy:

    * endpointSelector: This policy applies only to pods with the label app: api-gateway.

    * First Ingress Rule:

    fromEndpoints: Allows traffic from* pods with the label app: frontend.

    * toPorts: Specifies the destination port 80.

    * rules.http: This is the L7 magic. It instructs Cilium's eBPF programs to parse HTTP traffic. The rule only allows GET requests to the exact path /products. Any other request, like a POST or a GET to /admin, will be dropped by the kernel, even if it comes from a legitimate frontend pod.

    * Second Ingress Rule:

    * fromEndpoints: This demonstrates a cross-namespace policy. It allows traffic from pods in the monitoring namespace that have the app: prometheus label.

    * rules.http: This rule specifically allows GET requests to /metrics for scraping.

    * Egress Rule:

    toEndpoints: Allows traffic to* pods with the label app: products-db.

    * toPorts: This is a pure L4 rule. It allows TCP traffic on port 5432. Since PostgreSQL has a binary protocol, we don't apply L7 parsing here, but Cilium does support protocol-aware policies for databases like Postgres via extensions.

    Step 3: Verification and Observability with Hubble

    Policies are meaningless without verification. We will use Hubble, Cilium's observability tool, to inspect traffic flows and policy decisions.

    First, enable the Hubble UI:

    bash
    cilium hubble enable --ui

    Now, let's test our rules. Get the frontend pod name:

    bash
    FRONTEND_POD=$(kubectl get pods -n e-commerce -l app=frontend -o jsonpath='{.items[0].metadata.name}')

    Test 1: Allowed Request

    bash
    # This should succeed (returns NGINX welcome page in this demo)
    kubectl exec -it $FRONTEND_POD -n e-commerce -- curl -s --connect-timeout 2 http://api-gateway-svc/products

    Test 2: Blocked L7 Request (Wrong Path)

    bash
    # This should fail (timeout)
    kubectl exec -it $FRONTEND_POD -n e-commerce -- curl -s --connect-timeout 2 http://api-gateway-svc/metrics

    Test 3: Blocked L7 Request (Wrong Method)

    bash
    # This should fail (timeout)
    kubectl exec -it $FRONTEND_POD -n e-commerce -- curl -s -X POST --connect-timeout 2 http://api-gateway-svc/products

    Now, let's observe this in Hubble. Open a new terminal and run:

    bash
    # Follow traffic from the frontend pod
    cilium hubble observe -n e-commerce --from-pod e-commerce/$FRONTEND_POD -f

    When you run the allowed request, you'll see a FORWARDED verdict:

    text
    TIMESTAMP           SOURCE                  DESTINATION              TYPE          VERDICT     SUMMARY
    ...                 e-commerce/frontend... -> e-commerce/api-gateway...  L7-request    FORWARDED   GET http://api-gateway-svc/products HTTP/1.1
    ...                 e-commerce/api-gateway... -> e-commerce/frontend...  L7-response   FORWARDED   200 OK

    When you run the blocked request (e.g., to /metrics), you'll see a DROPPED verdict with a clear reason:

    text
    TIMESTAMP           SOURCE                  DESTINATION              TYPE          VERDICT     SUMMARY
    ...                 e-commerce/frontend... -> e-commerce/api-gateway...  L7-request    DROPPED     Policy denied (L7)

    Hubble provides incontrovertible, real-time proof of policy enforcement directly from the kernel's perspective. The Policy denied (L7) message is explicit: the L3/L4 connection was established, but the eBPF HTTP parser identified a non-compliant request and dropped it.


    Advanced Pattern: FQDN-Based Egress Control

    Microservices often need to communicate with external, third-party APIs (e.g., Stripe, Twilio). Hardcoding IP ranges for these services in egress policies is a fragile anti-pattern. Cilium provides a robust solution with FQDN-aware policies.

    Let's say our api-gateway needs to call api.github.com.

    FQDN Egress Policy (fqdn-policy.yaml):

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "api-gateway-egress-github"
      namespace: e-commerce
    spec:
      endpointSelector:
        matchLabels:
          app: api-gateway
      egress:
      - toFQDNs:
        - matchName: "api.github.com"
      # Must also allow DNS traffic for the FQDN resolution to work.
      - toEndpoints:
        - matchLabels:
            'k8s:io.kubernetes.pod.namespace': kube-system
            'k8s:k8s-app': kube-dns
        toPorts:
        - ports:
          - port: '53'
            protocol: UDP

    How it works under the hood:

  • When this policy is applied, the Cilium agent on the node running the api-gateway pod detects it.
  • It uses the cluster's DNS (e.g., CoreDNS) to resolve api.github.com to a set of IP addresses.
    • Crucially, it programs the pod's eBPF map to allow egress traffic to this specific set of IPs.
    • Cilium continuously monitors DNS responses for this FQDN. If the IP address changes, Cilium will automatically and atomically update the eBPF map with the new IP, ensuring seamless connectivity without policy changes.

    This dynamic, DNS-aware approach is vastly superior to static IP rules for managing external service access.

    Performance and Edge Case Considerations

    While powerful, deploying L7 policies in production requires careful consideration of performance and complex scenarios.

    Performance Impact of L7 Parsing

    While eBPF is incredibly fast, L7 parsing is not free. It consumes more CPU cycles than simple L3/L4 identity lookups. The key is to apply L7 policies judiciously.

    * Guideline: Use L4 identity-based policies for internal, high-trust, high-throughput connections (e.g., between an application and its dedicated database). Reserve L7 policies for critical security boundaries, such as ingress from other teams' services, public ingress, or egress to external services.

    * Benchmarks: The Cilium project publishes extensive benchmarks, but always test in your own environment. For most HTTP workloads, the added latency from eBPF L7 parsing is in the sub-millisecond range, which is significantly lower than the latency introduced by a userspace sidecar proxy.

    Edge Case: Encrypted Traffic (TLS)

    A common question is how L7 policies work with TLS-encrypted traffic. By default, they don't. The eBPF parser sees encrypted gibberish and cannot inspect the HTTP path or method.

    You have a few production-ready options:

  • Service Mesh Integration: If you are already using a service mesh like Istio or Linkerd for mTLS, you can let it handle TLS termination. The traffic arriving at the pod's network namespace will be unencrypted, allowing Cilium's eBPF programs to perform L7 inspection before the data is passed to the application container. This combines the strengths of both systems: the mesh for mTLS and identity, and Cilium for efficient policy enforcement.
  • Cilium Service Mesh: Newer versions of Cilium include built-in service mesh capabilities (often called Cilium Service Mesh) that can handle mTLS termination natively, either with or without a sidecar. This provides a more integrated solution.
  • Application-Level TLS: If TLS is terminated inside your application, Cilium cannot apply L7 policies to that traffic. In this case, you must fall back to L4 policies (allowing traffic on port 443 from a specific identity) and rely on application-level authorization.
  • Edge Case: Non-HTTP L7 Protocols

    Cilium's L7 capabilities extend beyond HTTP. It has built-in parsers for protocols like Kafka, gRPC, and Cassandra. The policy syntax is adapted for the protocol.

    For example, to allow a kafka-producer pod to only write to the orders topic, the policy would look like this:

    yaml
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "kafka-producer-policy"
    spec:
      endpointSelector:
        matchLabels:
          app: kafka-producer
      egress:
      - toEndpoints:
        - matchLabels:
            app: kafka-broker
        toPorts:
        - ports:
          - port: "9092"
            protocol: TCP
          rules:
            kafka:
            - role: "produce"
              topic: "orders"

    This level of protocol-specific L7 enforcement is exceptionally powerful for securing data infrastructure and event-driven architectures within Kubernetes.

    Conclusion: A New Foundation for Cloud-Native Security

    By leveraging the programmability of eBPF at the kernel level, Cilium provides a security enforcement mechanism that is both more powerful and more performant than traditional network policy implementations. Moving from IP-based rules to identity-based, L7-aware policies is not just an incremental improvement; it is a fundamental shift that aligns security posture with the reality of modern, dynamic microservice applications.

    For senior engineers and platform architects, mastering these patterns is key to building a secure, efficient, and observable Kubernetes platform. It enables the implementation of a true zero-trust model that is deeply integrated into the cloud-native stack, reducing reliance on cumbersome sidecars and providing granular control that was previously impossible to achieve at scale.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles