Kernel-Level Container Security with eBPF for Anomaly Detection

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond the Sidecar: Runtime Anomaly Detection with eBPF

In modern cloud-native environments, container security is paramount. Static image scanning and network policies are foundational, but they fail to address runtime threats—malicious processes executed or network connections initiated after a container is already running. Traditional runtime security tools often rely on ptrace, LD_PRELOAD hooking, or network-level sidecar proxies. These approaches, while functional, introduce significant performance overhead, increase attack surface, or lack the complete visibility needed to detect sophisticated attacks.

Enter eBPF (extended Berkeley Packet Filter). eBPF allows us to run sandboxed programs directly within the Linux kernel, triggered by events like syscalls, function entries/exits, or network packets. For security engineering, this is a paradigm shift. We can achieve unparalleled visibility into container behavior with minimal performance impact, all without modifying the application code or container image.

This article is not an introduction to eBPF. It assumes you understand the basic concepts of eBPF programs, maps, and the loader/userspace controller architecture. We will dive straight into building a practical, production-oriented runtime security monitor that detects two common indicators of compromise:

  • Unauthorized Process Execution: Detecting when a process inside a container executes a command that is not on a pre-defined allowlist (e.g., a shell spawned by a web server vulnerability).
  • Suspicious Outbound Network Connections: Identifying when a container attempts to connect to a known malicious IP address or an external address when it should only be communicating internally.
  • We will implement this using a CO-RE (Compile Once - Run Everywhere) approach with libbpf and a Go-based userspace controller, ensuring our solution is portable across different kernel versions.


    The Architecture: Kernel Hooks and a Userspace Brain

    Our security monitor consists of two main components:

  • eBPF Kernel Programs (C): Small, efficient C programs that will be JIT-compiled and attached to specific kernel functions (kprobes). These programs will collect event data (e.g., filename for execve, destination IP for connect) and send it to userspace.
  • Userspace Controller (Go): A Go application responsible for loading and attaching the eBPF programs, listening for events sent from the kernel, and applying security logic to these events (e.g., checking against an allowlist/blocklist).
  • We'll use a BPF_MAP_TYPE_PERF_EVENT_ARRAY to stream data from kernel to userspace. This is a highly efficient mechanism for handling a high volume of events.

    Let's start by setting up our project structure.

    sh
    . 
    ├── go.mod 
    ├── go.sum 
    ├── main.go               # Go userspace controller 
    ├── bpf/                  # Directory for eBPF C code 
    │   ├── bpf_helpers.h     # Helper header 
    │   ├── monitor.bpf.c     # Our eBPF program 
    │   └── vmlinux.h         # Kernel type definitions for CO-RE 
    └── Makefile              # To compile the eBPF C code

    To generate vmlinux.h for CO-RE, you'll need bpftool:

    bash
    bpftool btf dump file /sys/kernel/btf/vmlinux format c > bpf/vmlinux.h

    Our Makefile will use clang to compile the C code into an eBPF object file.

    makefile
    # Makefile
    
    CLANG ?= clang
    LLC ?= llc
    
    GO_APP := monitor
    EBPF_SRC := ./bpf/monitor.bpf.c
    EBPF_OBJ := ./bpf/monitor.bpf.o
    
    .PHONY: all clean
    
    all: $(GO_APP)
    
    $(EBPF_OBJ): $(EBPF_SRC)
    	$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_x86 \
    		-I./bpf \
    		-c $(EBPF_SRC) -o $(EBPF_OBJ)
    
    $(GO_APP): main.go $(EBPF_OBJ)
    	go build -o $(GO_APP) main.go
    
    clean:
    	rm -f $(GO_APP) $(EBPF_OBJ)

    Phase 1: Detecting Unauthorized Process Execution

    Our first goal is to trace every execve syscall across the system, capture the filename being executed, and send it to our Go application for analysis.

    The eBPF Program (`monitor.bpf.c`)

    We'll attach a kprobe to the sys_execve syscall. When triggered, our BPF program will read the filename argument and submit it to a perf event buffer.

    c
    // bpf/monitor.bpf.c
    
    #include "vmlinux.h"
    #include <bpf/bpf_helpers.h>
    #include <bpf/bpf_tracing.h>
    
    #define TASK_COMM_LEN 16
    #define MAX_FILENAME_LEN 256
    
    // Event structure sent to userspace
    struct exec_event {
        u32 pid;
        u32 ppid;
        u64 cgroup_id;
        char comm[TASK_COMM_LEN];
        char filename[MAX_FILENAME_LEN];
    };
    
    // Perf event map to send data to userspace
    struct {
        __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
        __uint(key_size, sizeof(u32));
        __uint(value_size, sizeof(u32));
    } exec_events SEC(".maps");
    
    SEC("kprobe/sys_execve")
    int BPF_KPROBE(handle_execve, const char __user *filename)
    {
        struct exec_event event = {};
        u64 id = bpf_get_current_pid_tgid();
        u32 pid = id >> 32;
    
        // Get parent PID
        struct task_struct *task = (struct task_struct*)bpf_get_current_task();
        event.ppid = BPF_CORE_READ(task, real_parent, tgid);
    
        event.pid = pid;
        event.cgroup_id = bpf_get_current_cgroup_id();
        bpf_get_current_comm(&event.comm, sizeof(event.comm));
        bpf_probe_read_user_str(&event.filename, sizeof(event.filename), filename);
    
        // Submit the event to the perf buffer
        bpf_perf_event_output(ctx, &exec_events, BPF_F_CURRENT_CPU, &event, sizeof(event));
        return 0;
    }
    
    char LICENSE[] SEC("license") = "GPL";

    Advanced Implementation Details:

    * vmlinux.h and BPF_CORE_READ: We are using BTF (BPF Type Format) and CO-RE. Instead of including dozens of kernel headers, vmlinux.h provides all kernel type definitions. BPF_CORE_READ is a macro that allows safe, portable access to kernel struct fields. This makes our program resilient to changes in kernel data structures across different Linux versions.

    * bpf_get_current_cgroup_id(): Relying on PID alone is insufficient in a containerized environment due to PID namespace virtualization and PID wrapping. The cgroup ID is a much more stable identifier for a container. We will use this in userspace to associate events with specific containers.

    * bpf_probe_read_user_str(): This helper safely copies the filename from userspace memory (where the syscall arguments reside) into our BPF program's stack. It's crucial for preventing kernel panics from invalid user pointers.

    The Go Userspace Controller (`main.go` - Part 1)

    Now, let's write the Go code to load this eBPF program and process the events.

    We will use the cilium/ebpf library, which provides excellent Go bindings for interacting with the eBPF subsystem.

    go
    // main.go
    
    package main
    
    import (
    	"bytes"
    	"encoding/binary"
    	"errors"
    	"log"
    	"os"
    	"os/signal"
    	"syscall"
    
    	"github.com/cilium/ebpf/link"
    	"github.com/cilium/ebpf/perf"
    	"github.com/cilium/ebpf/rlimit"
    )
    
    //go:generate go run github.com/cilium/ebpf/cmd/bpf2go -cc clang -cflags "-O2 -g -Wall -D__TARGET_ARCH_x86" bpf ./bpf/monitor.bpf.c -- -I./bpf
    
    const ( 
        taskCommLen = 16 
        maxFilenameLen = 256 
    ) 
    
    // This mirrors the C struct
    type execEvent struct {
    	PID        uint32
    	PPID       uint32
    	CgroupID   uint64
    	Comm       [taskCommLen]byte
    	Filename   [maxFilenameLen]byte
    }
    
    func main() {
    	// Allow the current process to lock memory for eBPF maps.
    	if err := rlimit.RemoveMemlock(); err != nil {
    		log.Fatal(err)
    	}
    
    	// Load pre-compiled programs and maps into the kernel.
    	objs := bpfObjects{}
    	if err := loadBpfObjects(&objs, nil); err != nil {
    		log.Fatalf("loading objects: %v", err)
    	}
    	defer objs.Close()
    
    	// Attach the kprobe for execve
    	kp, err := link.Kprobe("sys_execve", objs.HandleExecve, nil)
    	if err != nil {
    		log.Fatalf("attaching kprobe: %s", err)
    	}
    	defer kp.Close()
    
    	log.Println("eBPF programs attached. Waiting for events...")
    
    	// Set up a PerfReader to read events from the perf buffer map
    	execRd, err := perf.NewReader(objs.ExecEvents, os.Getpagesize())
    	if err != nil {
    		log.Fatalf("creating perf event reader: %s", err)
    	}
    	defer execRd.Close()
    
        // Set up a channel to handle OS signals
    	stopper := make(chan os.Signal, 1)
    	signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
    
    	go handleExecEvents(execRd)
    
    	// Wait for a signal
    	<-stopper
    	log.Println("Received signal, exiting...")
    }
    
    func handleExecEvents(rd *perf.Reader) {
        // This is our rudimentary security policy: an allowlist of binaries.
    	// In a real system, this would be dynamically configured per-container.
    	allowedBinaries := map[string]bool{
    		"/usr/bin/ls":   true,
    		"/usr/bin/cat":  true,
    		"/usr/bin/ps":   true,
            "/bin/busybox":  true,
    	}
    
    	var event execEvent
    	for {
    		record, err := rd.Read()
    		if err != nil {
    			if errors.Is(err, perf.ErrClosed) {
    				return
    			}
    			log.Printf("reading from perf buffer: %s", err)
    			continue
    		}
    
    		if record.LostSamples > 0 {
    			log.Printf("perf buffer dropped %d samples", record.LostSamples)
    			continue
    		}
    
    		if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
    			log.Printf("parsing perf event: %s", err)
    			continue
    		}
    
    		filename := string(bytes.TrimRight(event.Filename[:], "\x00"))
    		comm := string(bytes.TrimRight(event.Comm[:], "\x00"))
    
    		// Apply security policy
    		if !allowedBinaries[filename] {
    			log.Printf(
    				"SECURITY ALERT: Unauthorized execution detected! PID: %d, PPID: %d, Cgroup: %d, Comm: %s, Filename: %s",
    				event.PID,
    				event.PPID,
    				event.CgroupID,
    				comm,
    				filename,
    			)
    		} else {
    			log.Printf("Authorized execution: PID: %d, Filename: %s", event.PID, filename)
    		}
    	}
    }

    Running the Example:

  • Run make to compile both the eBPF program and the Go controller.
  • Run the controller with sudo: sudo ./monitor.
    • In another terminal, execute some commands:

    * ls / (Should be logged as allowed)

    * ps aux (Should be logged as allowed)

    * nmap localhost (If nmap is not in our allowlist, this will trigger a security alert).

    text
    # Output from ./monitor
    
    INFO[0000] eBPF programs attached. Waiting for events...
    INFO[0005] Authorized execution: PID: 12345, Filename: /usr/bin/ls
    ERRO[0010] SECURITY ALERT: Unauthorized execution detected! PID: 12348, PPID: 5678, Cgroup: 67231, Comm: bash, Filename: /usr/bin/nmap

    Phase 2: Detecting Malicious Network Connections

    Now, let's extend our monitor to detect suspicious outbound TCP connections. We'll hook into the tcp_v4_connect kernel function, extract the destination IP and port, and check it against a blocklist.

    Extending the eBPF Program (`monitor.bpf.c`)

    We add a new event struct and a new kprobe. Extracting network information is more complex as it involves traversing nested kernel structs.

    c
    // Additions to bpf/monitor.bpf.c
    
    // ... (previous code for exec_event and exec_events map) ...
    
    // Event structure for network connections
    struct net_event {
        u32 pid;
        u64 cgroup_id;
        u32 daddr; // Destination IPv4 address
        u16 dport; // Destination port
        char comm[TASK_COMM_LEN];
    };
    
    // Perf event map for network events
    struct {
        __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
        __uint(key_size, sizeof(u32));
        __uint(value_size, sizeof(u32));
    } net_events SEC(".maps");
    
    SEC("kprobe/tcp_v4_connect")
    int BPF_KPROBE(handle_tcp_connect, struct sock *sk)
    {
        struct net_event event = {};
        u64 id = bpf_get_current_pid_tgid();
        event.pid = id >> 32;
        event.cgroup_id = bpf_get_current_cgroup_id();
        bpf_get_current_comm(&event.comm, sizeof(event.comm));
    
        // Read destination address and port. This is the advanced part.
        // BPF_CORE_READ is essential for portability here.
        event.daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
        event.dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
    
        // The port is in network byte order, we'll convert it in userspace.
    
        bpf_perf_event_output(ctx, &net_events, BPF_F_CURRENT_CPU, &event, sizeof(event));
        return 0;
    }

    Advanced Implementation Details:

    Hooking tcp_v4_connect: We are attaching to the kernel function that implements the connect syscall for TCPv4. The first argument is a struct sock , which contains all the information about the socket.

    * Navigating struct sock: The socket structure is one of the most complex in the kernel. Fields like skc_daddr (destination address) are nested within __sk_common. Without CO-RE and BPF_CORE_READ, we would need to hardcode struct offsets, which would break with any kernel update. This is a classic example of why CO-RE is a production requirement for eBPF.

    Extending the Go Controller (`main.go`)

    We'll add a new event handler for network events.

    go
    // Additions to main.go
    
    // ... (imports and execEvent struct) ...
    
    // Add the netEvent struct
    type netEvent struct {
    	PID      uint32
    	CgroupID uint64
    	DAddr    uint32
    	DPort    uint16
    	Comm     [taskCommLen]byte
    }
    
    func main() {
        // ... (setup code as before) ...
    
    	// Attach kprobe for execve (same as before)
    	kp_exec, err := link.Kprobe("sys_execve", objs.HandleExecve, nil)
    	if err != nil {
    		log.Fatalf("attaching execve kprobe: %s", err)
    	}
    	defer kp_exec.Close()
    
    	// Attach kprobe for tcp_v4_connect
    	kp_net, err := link.Kprobe("tcp_v4_connect", objs.HandleTcpConnect, nil)
    	if err != nil {
    		log.Fatalf("attaching tcp_connect kprobe: %s", err)
    	}
    	defer kp_net.Close()
    
    	log.Println("eBPF programs attached. Waiting for events...")
    
    	// Set up PerfReader for exec events
    	execRd, err := perf.NewReader(objs.ExecEvents, os.Getpagesize())
    	if err != nil {
    		log.Fatalf("creating exec perf reader: %s", err)
    	}
    	defer execRd.Close()
    
    	// Set up PerfReader for net events
    	netRd, err := perf.NewReader(objs.NetEvents, os.Getpagesize())
    	if err != nil {
    		log.Fatalf("creating net perf reader: %s", err)
    	}
    	defer netRd.Close()
    
    	stopper := make(chan os.Signal, 1)
    	signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
    
    	go handleExecEvents(execRd) // From Part 1
    	go handleNetEvents(netRd)   // New handler
    
    	<-stopper
    	log.Println("Received signal, exiting...")
    }
    
    // ... (handleExecEvents function as before) ...
    
    func handleNetEvents(rd *perf.Reader) {
        // A blocklist of known malicious IPs.
        // In a real system, this would be fed from a threat intelligence source.
    	maliciousIPs := map[string]bool{
    		"1.2.3.4": true, // Example C2 server
    		"8.8.8.8": false, // Benign, but good for testing
    	}
    
    	var event netEvent
    	for {
    		record, err := rd.Read()
    		if err != nil {
    			if errors.Is(err, perf.ErrClosed) {
    				return
    			}
    			log.Printf("reading from net perf buffer: %s", err)
    			continue
    		}
    
    		if record.LostSamples > 0 {
    			log.Printf("net perf buffer dropped %d samples", record.LostSamples)
    			continue
    		}
    
    		if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
    			log.Printf("parsing net perf event: %s", err)
    			continue
    		}
    
    		// Convert IP and Port to human-readable format
    		destIP := intToIP(event.DAddr).String()
    		destPort := binary.BigEndian.Uint16([]byte{byte(event.DPort >> 8), byte(event.DPort)})
    
    		comm := string(bytes.TrimRight(event.Comm[:], "\x00"))
    
    		// Apply security policy
    		if maliciousIPs[destIP] {
    			log.Printf(
    				"SECURITY ALERT: Malicious outbound connection detected! PID: %d, Cgroup: %d, Comm: %s, Destination: %s:%d",
    				event.PID,
    				event.CgroupID,
    				comm,
    				destIP,
    				destPort,
    			)
    		}
    	}
    }
    
    // Helper to convert uint32 IP to net.IP
    func intToIP(ipInt uint32) string {
        // The IP address is in little-endian from the kernel struct
        // so we need to reverse the byte order for the typical big-endian representation.
    	b := make([]byte, 4)
    	b[0] = byte(ipInt)
    	b[1] = byte(ipInt >> 8)
    	b[2] = byte(ipInt >> 16)
    	b[3] = byte(ipInt >> 24)
    	return fmt.Sprintf("%d.%d.%d.%d", b[0], b[1], b[2], b[3])
    }

    Now, when you run sudo ./monitor and execute curl 1.2.3.4 in another terminal, you'll see the security alert for the malicious connection.


    Edge Cases and Production Hardening

    The examples above work, but deploying them in a high-traffic production environment requires addressing several critical edge cases.

    1. The High-Volume Syscall Problem (The "Thirsty Syscall")

    Syscalls like execve and connect are relatively infrequent. But what if you wanted to monitor read or write for data exfiltration? These can occur millions of times per second, overwhelming the perf buffer and the userspace controller. Sending every event is not feasible.

    Solution: In-Kernel Aggregation with BPF Maps

    Instead of sending an event for every syscall, we can use a BPF hash map to count syscalls per process inside the kernel. We then periodically read this map from userspace.

    Example: Counting openat syscalls (eBPF C code)

    c
    // A map to store counts: key=pid, value=count
    struct {
        __uint(type, BPF_MAP_TYPE_HASH);
        __uint(max_entries, 10240);
        __type(key, u32);
        __type(value, u64);
    } syscall_counts SEC(".maps");
    
    SEC("kprobe/sys_openat")
    int BPF_KPROBE(handle_openat)
    {
        u32 pid = bpf_get_current_pid_tgid() >> 32;
        u64 *count;
    
        count = bpf_map_lookup_elem(&syscall_counts, &pid);
        if (count) {
            __sync_fetch_and_add(count, 1);
        } else {
            u64 init_val = 1;
            bpf_map_update_elem(&syscall_counts, &pid, &init_val, BPF_ANY);
        }
        return 0;
    }

    In userspace, you would then have a goroutine that iterates over this map every few seconds, reads the counts, and resets them. This reduces data transfer from millions of events per second to a few kilobytes every polling interval.

    2. The Verifier Gauntlet: Writing Safe eBPF Code

    The eBPF verifier is a static analyzer in the kernel that ensures your eBPF program is safe to run. It checks for unbounded loops, out-of-bounds memory access, and null pointer dereferences. Writing verifier-friendly code is an art.

    Common Pitfall: Unbounded String Reads

    bpf_probe_read_user_str() can be rejected by the verifier if it thinks the source string could be too long. Always provide a compile-time constant for the size argument.

    Common Pitfall: Complex Loops

    The verifier must be able to prove that all loops will terminate. For this reason, loops must have a constant upper bound known at compile time.

    c
    // Verifier will accept this
    #pragma unroll
    for (int i = 0; i < 10; i++) {
        // ...
    }

    3. Deployment in Kubernetes

    To monitor all containers on a cluster, this agent must run on every node. The standard pattern is to deploy it as a DaemonSet.

    daemonset.yaml Snippet:

    yaml
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: ebpf-security-monitor
    spec:
      # ...
      template:
        spec:
          hostPID: true # Required to see PIDs outside the container's namespace
          containers:
          - name: monitor
            image: my-registry/ebpf-monitor:latest
            securityContext:
              privileged: true # Simplest way, but risky. 
              # Better: use specific capabilities
              # capabilities:
              #   add: ["SYS_ADMIN", "BPF"]
            volumeMounts:
            - name: bpf-fs
              mountPath: /sys/fs/bpf
          volumes:
          - name: bpf-fs
            hostPath:
              path: /sys/fs/bpf

    Key Considerations:

    * Permissions: eBPF requires powerful capabilities. CAP_SYS_ADMIN and CAP_BPF are typically needed. Running as a privileged container is common but should be done with extreme caution.

    * BPF FS: The agent needs access to the BPF filesystem (/sys/fs/bpf) to pin maps and programs, allowing them to persist even if the userspace agent restarts.

    Performance Considerations: eBPF vs. The World

    The primary advantage of eBPF is its performance. Let's consider a hypothetical benchmark on a node handling 10,000 requests per second, each request triggering several execve and connect calls.

    MethodCPU Overhead (on Node)Latency Impact (per request)IntrusivenessVisibility
    eBPF Kprobes< 1-2%< 1µsVery LowKernel-level
    ptrace (e.g., strace)10-500% (catastrophic)100s of µs to msHighSyscall-level
    Sidecar Proxy (Istio)5-15%1-5msMediumNetwork-level
    LD_PRELOAD2-10%10-50µsHighLibc-level

    As you can see, eBPF provides the best visibility-to-performance ratio by a significant margin. The overhead is orders of magnitude lower than ptrace and significantly less than even highly optimized service mesh proxies, all while providing deeper insights into system behavior.

    Conclusion

    eBPF is not just another tool; it's a fundamental capability of the modern Linux kernel that enables a new generation of highly performant and deeply insightful security and observability tools. By hooking directly into kernel operations, we can build runtime security monitors that are more efficient, more comprehensive, and less intrusive than any preceding technology.

    We have demonstrated a practical, CO-RE based implementation for detecting unauthorized process execution and malicious network activity. We also explored critical production concerns like handling high-volume events, navigating the eBPF verifier, and deploying within a Kubernetes cluster. While we've only scratched the surface, these patterns form the foundation of powerful, next-generation cloud-native security solutions.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles