eBPF for Real-time Intrusion Detection in Kubernetes Clusters

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Observability and Performance Gap in Container Security

In a dynamic Kubernetes environment, traditional security monitoring paradigms struggle. Host-based intrusion detection systems (HIDS) often lack container context, treating the entire node as a single entity and failing to attribute events to specific pods or namespaces. The alternative, injecting a sidecar container into every pod for monitoring, introduces significant resource overhead, increases latency, and complicates deployment manifests. A service mesh like Istio provides powerful L7 observability but at a cost, and its visibility is limited to network traffic, leaving blind spots for process execution, file access, and other host-level activities.

This creates a critical challenge: how do we gain deep, contextualized visibility into every pod's behavior—from network connections to system calls—without imposing a crippling performance penalty or architectural complexity? The answer lies in moving the observation point from user space or a sidecar proxy down into the kernel itself.

This is where eBPF (extended Berkeley Packet Filter) becomes a transformative technology for security engineering. By allowing us to run sandboxed programs directly in the kernel, eBPF provides a programmable, performant, and secure mechanism to observe and react to system events in real-time. This article is not an introduction to eBPF; it is a deep dive into specific, production-ready patterns for building a Kubernetes-aware intrusion detection system using eBPF, focusing on syscall tracing and network monitoring.


Core Pattern: Syscall Monitoring with Kprobes and Tracepoints

Our first objective is to detect anomalous process execution within a container. A common indicator of compromise is a pod, designed to run a specific application (e.g., a Go web server), suddenly executing a shell (/bin/sh) or a network utility like curl or wget. We can intercept this activity by attaching an eBPF program to the execve system call.

While we could use a kprobe on the do_execve kernel function, the more stable and recommended approach is to use a tracepoint, specifically syscalls/sys_enter_execve, which provides a stable ABI across kernel versions.

The eBPF Kernel-Space Program

The C code below defines our eBPF program. It attaches to the sys_enter_execve tracepoint, extracts the filename being executed, and sends this information to a user-space agent via a perf buffer.

c
// BPF C code (exec_monitor.c)
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>
#include <linux/fs.h>

#define MAX_FILENAME_LEN 256

// Data structure to send to user space
struct event_t {
    u32 pid;
    u32 ppid;
    char comm[TASK_COMM_LEN];
    char filename[MAX_FILENAME_LEN];
};

// Perf buffer to send data to user space
BPF_PERF_OUTPUT(events);

// Attach to the execve syscall tracepoint
TRACEPOINT_PROBE(syscalls, sys_enter_execve) {
    struct event_t event = {};
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();

    // Get parent PID for context
    event.ppid = BPF_CORE_READ(task, real_parent, tgid);
    event.pid = pid;

    // Get command and filename
    bpf_get_current_comm(&event.comm, sizeof(event.comm));
    bpf_probe_read_user_str(&event.filename, sizeof(event.filename), (void *)args->filename);

    // Submit the event to user space
    events.perf_submit(args, &event, sizeof(event));

    return 0;
}

Key Implementation Details:

* BPF_PERF_OUTPUT(events): This macro from the BCC (BPF Compiler Collection) framework declares a perf buffer named events. This is a high-performance, memory-mapped channel for sending data from kernel to user space.

* TRACEPOINT_PROBE: A BCC macro that simplifies attaching to a tracepoint. It handles the low-level details of creating the eBPF program of type BPF_PROG_TYPE_TRACEPOINT.

* bpf_probe_read_user_str(): A crucial helper function. The args->filename pointer points to user-space memory. The eBPF program cannot directly dereference this pointer for security reasons. This helper safely copies the string from user space into the BPF stack.

* BPF_CORE_READ: This demonstrates a CO-RE (Compile Once - Run Everywhere) helper to safely access kernel structs like task_struct. It prevents breakage when kernel struct layouts change between versions.

The User-Space Agent

The user-space component is responsible for loading the eBPF program, reading from the perf buffer, and applying logic to the received events. Here is a Python example using the BCC framework.

python
# Python user-space agent using BCC (agent.py)
from bcc import BPF
import ctypes as ct
import time

# Define the C data structure in Python
class Event(ct.Structure):
    _fields_ = [
        ("pid", ct.c_uint32),
        ("ppid", ct.c_uint32),
        ("comm", ct.c_char * 16), # TASK_COMM_LEN
        ("filename", ct.c_char * 256) # MAX_FILENAME_LEN
    ]

# List of suspicious executables
SUSPICIOUS_EXECS = {
    b"/bin/sh", b"/bin/bash", b"/usr/bin/curl", b"/usr/bin/wget", b"nc"
}

def print_event(cpu, data, size):
    event = ct.cast(data, ct.POINTER(Event)).contents
    filename = event.filename.decode('utf-8', 'replace')

    # The core detection logic
    if event.filename in SUSPICIOUS_EXECS:
        print(f"[ALERT] Suspicious execution detected!")
        print(f"  PID: {event.pid}, PPID: {event.ppid}")
        print(f"  Command: {event.comm.decode('utf-8', 'replace')}")
        print(f"  Executed: {filename}")
        # In a real system, you would enrich this with K8s metadata (pod name, namespace)
        # and send it to a SIEM.

# Load the eBPF program
with open("exec_monitor.c", "r") as f:
    bpf_text = f.read()

b = BPF(text=bpf_text)

# Attach to the perf buffer
print("Attaching to perf buffer... Monitoring for execve() calls.")
b["events"].open_perf_buffer(print_event)

# Main loop
while True:
    try:
        b.perf_buffer_poll()
    except KeyboardInterrupt:
        exit()

This agent loads our C code, attaches a Python callback (print_event) to the events perf buffer, and enters a loop to poll for data. The crucial next step, not fully implemented here for brevity, is enriching this event with Kubernetes metadata. The user-space agent would need to query the Kubernetes API server (or a local cache) using the PID to find the corresponding container ID, pod name, and namespace, providing full context for the alert.


Production Deployment via Kubernetes DaemonSet

To monitor every node in the cluster, we must deploy our agent as a DaemonSet. This requires careful consideration of permissions, as the agent needs access to the host kernel's tracing capabilities.

Here is a production-grade DaemonSet manifest:

yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ebpf-security-monitor
  namespace: kube-system
  labels:
    app: ebpf-security-monitor
spec:
  selector:
    matchLabels:
      app: ebpf-security-monitor
  template:
    metadata:
      labels:
        app: ebpf-security-monitor
    spec:
      # Run on the host's PID and network namespaces
      hostPID: true
      hostNetwork: true
      # Tolerations to run on all nodes, including control-plane
      tolerations:
      - operator: Exists
      containers:
      - name: monitor-agent
        image: your-repo/ebpf-agent:latest # Your agent container image
        command: ["python", "/app/agent.py"]
        # CRITICAL: Privileged access is required to load eBPF programs
        securityContext:
          privileged: true
        volumeMounts:
        # Mount necessary host paths for eBPF
        - name: sys-kernel-debug
          mountPath: /sys/kernel/debug
          readOnly: true
        - name: kernel-src
          mountPath: /usr/src
          readOnly: true
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
      volumes:
      - name: sys-kernel-debug
        hostPath:
          path: /sys/kernel/debug
      - name: kernel-src
        hostPath:
          path: /usr/src
      - name: lib-modules
        hostPath:
          path: /lib/modules

Critical Security and Configuration Points:

  • securityContext.privileged: true: This is non-negotiable for this use case. Loading eBPF programs requires the CAP_BPF and CAP_SYS_ADMIN capabilities. Privileged mode is the most straightforward way to grant this. This has significant security implications; the container has root access to the host node. The agent's container image must be minimal, hardened, and from a trusted source.
  • hostPID: true: Essential for the agent to see processes from all pods on the node, not just its own PID namespace.
  • Volume Mounts: We mount /sys/kernel/debug for access to tracing filesystems, /lib/modules for kernel module information, and /usr/src for kernel headers, which the BCC framework often needs for compiling the eBPF program on the fly.

  • Advanced Pattern: In-Kernel Aggregation and Filtering

    The execve monitor is powerful, but a noisy system can generate thousands of events per second, overwhelming the perf buffer and the user-space agent. A more advanced pattern is to perform filtering and aggregation directly in the kernel using eBPF maps.

    Let's evolve our use case to detect anomalous outbound network connections. A compromised pod might try to connect to a known command-and-control (C2) server. Sending every single connect() event to user space for analysis is inefficient.

    Instead, we can maintain a blocklist of IP addresses in an eBPF map. The kernel-space program checks against this map and only sends an event to user space if a forbidden connection is attempted.

    eBPF Program with Map-based Filtering

    c
    // BPF C code (connect_monitor.c)
    #include <uapi/linux/ptrace.h>
    #include <net/sock.h>
    #include <bcc/proto.h>
    
    // A hash map to store the blocked destination IPs (key: u32 IP, value: u8 flag)
    BPF_HASH(blocked_ips, u32, u8);
    
    struct event_t {
        u32 pid;
        u32 saddr;
        u32 daddr;
        u16 dport;
        char comm[TASK_COMM_LEN];
    };
    
    BPF_PERF_OUTPUT(events);
    
    int kprobe__tcp_v4_connect(struct pt_regs *ctx, struct sock *sk) {
        u32 pid = bpf_get_current_pid_tgid() >> 32;
    
        // Details from the sock struct
        u32 daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
        u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
    
        // Check if the destination IP is in our blocklist map
        u8 *is_blocked = blocked_ips.lookup(&daddr);
        if (is_blocked == NULL) {
            return 0; // Not a blocked IP, do nothing
        }
    
        // If it IS a blocked IP, send an event
        struct event_t event = {};
        event.pid = pid;
        event.saddr = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
        event.daddr = daddr;
        event.dport = ntohs(dport); // Convert from network to host byte order
        bpf_get_current_comm(&event.comm, sizeof(event.comm));
    
        events.perf_submit(ctx, &event, sizeof(event));
    
        return 0;
    }

    Key Implementation Details:

    * BPF_HASH(blocked_ips, u32, u8): Defines a hash map in the kernel. The user-space agent will populate this map.

    * kprobe__tcp_v4_connect: We attach a kprobe to the tcp_v4_connect kernel function to intercept IPv4 connection attempts.

    * blocked_ips.lookup(&daddr): This is the critical optimization. The lookup happens entirely within the kernel. Billions of connections can be checked per second with minimal overhead. An event is only generated for the rare case of a match.

    User-Space Agent Managing the eBPF Map

    The user-space agent now has an additional responsibility: populating and updating the blocked_ips map. This can be done dynamically from a threat intelligence feed.

    python
    # Python agent for connect monitoring
    from bcc import BPF
    import socket
    import struct
    
    # ... (Event class definition similar to before) ...
    
    # --- Threat Intelligence Feed (example) ---
    C2_SERVER_IPS = [
        "198.51.100.23", # Known malicious IP
        "203.0.113.88"   # Another known C2 server
    ]
    
    # Load eBPF program
    b = BPF(src_file="connect_monitor.c")
    
    # Get a reference to the eBPF map
    blocked_ips_map = b.get_table("blocked_ips")
    
    # Populate the map from our threat feed
    print("Populating blocklist map...")
    for ip_str in C2_SERVER_IPS:
        # Convert IP string to 32-bit integer in network byte order
        ip_int = struct.unpack("!I", socket.inet_aton(ip_str))[0]
        blocked_ips_map[ct.c_uint32(ip_int)] = ct.c_uint8(1)
    
    print("Map populated. Monitoring for connections to blocked IPs...")
    
    # ... (open_perf_buffer and poll loop as before) ...

    This pattern dramatically reduces the data volume sent to user space, making the solution scalable even on nodes with extremely high network traffic. It shifts the detection logic from a reactive user-space process to a proactive in-kernel filter.


    Edge Cases and Production Hardening

    Deploying eBPF at scale requires addressing several complex challenges.

    1. Kernel Version and Feature Skew (The CO-RE Solution)

    A major historical pain point of eBPF was its dependency on kernel headers. The BCC framework solves this by including a Clang/LLVM toolchain in the agent container and compiling the C code on the target host at runtime. This works, but results in large container images and high CPU usage on agent startup.

    The modern solution is CO-RE (Compile Once - Run Everywhere), enabled by BTF (BPF Type Format). BTF embeds debugging type information about kernel structures directly into the kernel itself. An eBPF program compiled with BTF awareness can read this information at load time and perform runtime relocations, adjusting its own code to match the specific kernel struct layouts of the host it's running on.

    Practical Implication: Instead of shipping C source code and a compiler (BCC), you ship a pre-compiled eBPF object file. Your user-space loader (using a library like libbpf) loads this object and the libbpf runtime performs the CO-RE relocations. This leads to:

    * Dramatically smaller agent container images.

    * Faster agent startup times.

    * Reduced dependencies on host-mounted kernel headers.

    Adopting a libbpf-based loader (e.g., libbpf-go, libbpf-rs, or libbpf-c++) is a critical step for mature, production-grade eBPF deployments.

    2. The Verifier and Program Complexity

    The eBPF verifier is a static analysis engine in the kernel that ensures an eBPF program is safe to run. It checks for out-of-bounds memory access, infinite loops, and excessive complexity. For security monitoring, you may run into its limitations:

    * Bounded Loops: The verifier must be able to prove that all loops will terminate. If you need to iterate, you must use constructs that the verifier can analyze, such as #pragma unroll for a fixed number of iterations.

    * Instruction Limit: Programs are typically limited to 1 million instructions (prior to kernel 5.3, it was 4096). Complex logic must be broken down.

    * Tail Calls: To overcome the instruction limit, you can chain eBPF programs together using bpf_tail_call. This allows one program to jump to another, effectively creating a state machine for more complex analysis, such as parsing a multi-packet protocol.

    3. Handling Container Exits and PID Reuse

    In our execve example, we get a PID. By the time our user-space agent processes the event, the container might have exited and the PID could have been reused by a new process. This is a classic race condition.

    A robust solution involves capturing more context at the time of the event. We can use bpf_get_current_cgroup_id() in the eBPF program to get the cgroup ID, which is a much more stable identifier for a container than a PID. The user-space agent can then correlate the cgroup ID with Kubernetes pod metadata, even if the process has already terminated.

    Conclusion: eBPF as a Foundation for Modern Security

    eBPF is not just another tool; it represents a fundamental shift in how we build observability and security systems for cloud-native infrastructure. By moving detection logic into the kernel, we can build solutions that are orders of magnitude more performant and have deeper visibility than their user-space counterparts.

    The patterns discussed here—attaching to tracepoints for process monitoring and using kprobes with maps for efficient network filtering—are the building blocks for sophisticated intrusion detection systems. While powerful open-source tools like Falco and Cilium's Tetragon have built comprehensive solutions on these principles, understanding the underlying eBPF implementation details is crucial for senior engineers tasked with securing complex Kubernetes environments. The ability to write and deploy custom eBPF programs allows you to address specific security concerns, optimize for unique performance characteristics, and build a truly kernel-native defense-in-depth strategy.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles