Kernel-Level K8s Runtime Security with eBPF and Custom Falco Rules

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Observability Gap in Ephemeral Infrastructure

In a Kubernetes environment, traditional host-based intrusion detection systems (HIDS) and security agents often fail. They either lack the context of container namespaces and cgroups, leading to noisy and irrelevant alerts, or they require intrusive sidecars and privileged daemons that increase the attack surface and introduce performance overhead. The core problem is observing process and network behavior within a container's isolated context without compromising the host or the container itself.

Static image scanning is necessary but insufficient. It cannot detect zero-day vulnerabilities or threats introduced at runtime, such as a compromised dependency that opens a reverse shell or an application that begins reading sensitive files it shouldn't access. We need to monitor system calls (syscalls) — the fundamental interface between an application and the kernel — to understand a workload's true behavior.

This is where eBPF (extended Berkeley Packet Filter) provides a paradigm shift. By allowing us to run sandboxed programs directly in the kernel, eBPF gives us a safe, performant, and context-aware mechanism to observe every syscall made by any process on the system. We can attach eBPF programs to kernel probes (kprobes) or tracepoints to capture events like execve, openat, and connect without modifying application code or kernel source.

However, writing, compiling, and loading raw eBPF programs using libbpf or BCC is a complex, low-level task. For production runtime security, we need a higher-level abstraction. This is where the CNCF project Falco excels. Falco uses an eBPF probe to collect a stream of syscall events, enriches them with Kubernetes metadata (pod name, namespace, labels), and evaluates them against a powerful, declarative rule engine.

This post will not cover the basics of installing Falco. We assume you have a running Kubernetes cluster and have deployed the Falco Helm chart. Instead, we will focus on the advanced techniques required to make Falco a truly effective runtime security tool in a production environment.

Advanced Falco Configuration for Production

Your default Helm values.yaml is a starting point. For a production deployment, several key areas require careful tuning.

1. Forcing the eBPF Driver

Falco can use either a kernel module (falco-kmod) or an eBPF probe (falco-bpf). The kernel module was the original method, but it can be brittle, requiring recompilation for each new kernel version and potentially causing kernel panics if incompatible. The eBPF probe is the modern, safer, and often more performant choice.

Ensure you are explicitly using the eBPF driver and that it's correctly configured in your values.yaml.

yaml
# values.yaml for Falco Helm Chart

driver:
  enabled: true
  kind: ebpf

ebpf:
  # If your nodes don't have BTF info, Falco can download it.
  # This is common on older kernels or custom-built kernels.
  # Set to false if you are confident BTF is available locally.
  # On modern systems (kernel 5.8+), this is often unnecessary.
  probe: ""

# Resource allocation is critical. Default requests/limits are too low for busy nodes.
falco:
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: 2
      memory: 2Gi

2. Performance Tuning the Syscall Buffer

Falco's eBPF probe uses a per-CPU buffer to pass syscall events from the kernel to the userspace Falco process. If the rate of syscalls on a node is extremely high, this buffer can fill up, leading to dropped events. You'll see a "Falco drop" message in the logs. This is a critical failure, as it means you have a blind spot in your security monitoring.

To mitigate this, you can increase the size of the buffer via an environment variable. The default is 8MB. For nodes running high-throughput applications like databases or message queues, you may need to increase this.

yaml
# values.yaml

falco:
  extraEnv:
    - name: SYSDIG_BPF_PROBE_CPU_BUFFER_BYTES
      # Increase buffer from 8MB default to 32MB
      value: "33554432"

Benchmarking this change is crucial. Increasing the buffer consumes more non-swappable kernel memory on each node. Monitor the memory usage of the Falco daemonset pods and the overall node memory pressure after applying this change. Use Falco's own metrics (falco_stats_sc_evt_drop_perc) exposed via Prometheus to track the drop rate and validate that your tuning is effective.

Crafting Context-Aware Custom Rules

Falco's default ruleset is excellent but generic. True value comes from writing rules tailored to your specific applications and security policies. We'll store our custom rules in a ConfigMap and mount it into the Falco pods.

First, configure your values.yaml to load them:

yaml
# values.yaml

falco:
  customRules:
    # The key of the ConfigMap entry
    rules.yaml: |-
      # Custom rules will be placed here

Now, let's build some advanced, production-grade rules.

Scenario 1: Detecting Shells in a Distroless Container

A common best practice is to use distroless base images, which contain only the application and its runtime dependencies, omitting shells and other utilities. A shell spawning in such a container is a massive red flag, often indicating a successful remote code execution (RCE) exploit.

A naive rule might just look for execve of bash. A better rule uses Falco's list and macro system and is conditioned on Kubernetes metadata.

yaml
# rules.yaml (inside the ConfigMap)

# List of common shell binaries
- list: shell_binaries
  items: [sh, bash, csh, tcsh, ksh, zsh, dash]

# Macro to identify a container that SHOULD be distroless
- macro: distroless_container
  condition: k8s.pod.label.runtime = 'distroless'

# The actual rule
- rule: Unexpected Shell in Distroless Container
  desc: >
    A shell process was spawned in a container that is labeled as distroless.
    This is highly suspicious and could indicate a container escape or RCE.
  condition: >
    spawned_process and
    distroless_container and
    proc.name in (shell_binaries)
  output: >
    Unexpected shell spawned in distroless container (user=%user.name command=%proc.cmdline %container.info parent=%proc.pname)
  priority: CRITICAL
  tags: [k8s, runtime, security, mitre_execution]

To test this, apply a label to one of your deployments:

yaml
# vulnerable-app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vulnerable-app
  labels:
    app: vulnerable
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vulnerable
  template:
    metadata:
      labels:
        app: vulnerable
        runtime: distroless # <-- The critical label for our rule
    spec:
      containers:
      - name: main
        # Using a standard ubuntu image to simulate a compromise
        image: ubuntu:latest
        command: ["sleep", "3600"]

Now, kubectl exec into that pod and run bash:

bash
kubectl exec -it $(kubectl get pods -l app=vulnerable -o jsonpath='{.items[0].metadata.name}') -- /bin/bash

Falco will immediately generate a CRITICAL alert:

text
14:35:01.234567890: Critical Unexpected shell spawned in distroless container (user=root command=bash container.id=... container.name=main k8s.ns.name=default k8s.pod.name=vulnerable-app-... parent=runc)

This rule is powerful because it's context-aware. It won't trigger for pods that are expected to have a shell, reducing alert fatigue.

Scenario 2: Monitoring Service Account Token Access

Every Kubernetes pod is mounted with a service account token at /var/run/secrets/kubernetes.io/serviceaccount/token. If an attacker gains RCE, one of their first actions is often to exfiltrate this token to pivot and attack the Kubernetes API server.

We want to detect any process that reads this token, except for legitimate processes that need it (e.g., a service mesh client, a metrics scraper, or the application itself on startup).

yaml
# rules.yaml

- macro: k8s_sa_token_read
  condition: >
    (open_read or open_directory) and
    fd.name contains /var/run/secrets/kubernetes.io/serviceaccount

- list: legitimate_sa_token_readers
  items: [istio-agent, linkerd-proxy, prometheus, jaeger-agent, my-app-binary]

- rule: Suspicious K8s Service Account Token Read
  desc: >
    A process read the K8s service account token. This is often a precursor to privilege escalation.
    Whitelist legitimate processes in the 'legitimate_sa_token_readers' list.
  condition: k8s_sa_token_read and not proc.name in (legitimate_sa_token_readers)
  output: >
    Suspicious read of K8s SA token (user=%user.name command=%proc.cmdline file=%fd.name %container.info)
  priority: WARNING
  tags: [k8s, security, mitre_credential_access]

This rule demonstrates the power of whitelisting. By maintaining a list of known-good processes, you can create a high-fidelity alert that only triggers on anomalous behavior. When a new legitimate tool is introduced, you simply update the list.

Scenario 3: Detecting Outbound Connections to Crypto-Mining Pools

Cryptojacking is a common attack where a compromised container is used to mine cryptocurrency. This often involves an outbound network connection to a known mining pool address on a specific port.

yaml
# rules.yaml

- list: crypto_miner_domains
  items: ["pool.monero.hashvault.pro", "xmr-us-west1.nanopool.org", "ca.minexmr.com"]

- list: crypto_miner_ports
  items: [3333, 4444, 5555, 6666, 7777, 8888, 14444]

- rule: Outbound Connection to Crypto-Mining Pool
  desc: >
    An outbound network connection was made to a known crypto-mining pool domain or IP.
  condition: >
    outbound and
    (fd.sip.name in (crypto_miner_domains) or fd.rip in (crypto_miner_domains)) and
    fd.sport in (crypto_miner_ports)
  output: >
    Outbound connection to crypto-mining pool detected (user=%user.name command=%proc.cmdline connection=%fd.name %container.info)
  priority: CRITICAL
  tags: [network, security, mitre_impact]

To make this rule even more robust in a production environment, you would likely use a threat intelligence feed to dynamically populate the crypto_miner_domains list, rather than hardcoding it.

Advanced Edge Case: Handling False Positives with Overrides

Even with well-crafted rules, false positives are inevitable. A developer might run a diagnostic script that reads a sensitive file, or a new version of an application might change its behavior. Falco's rule override mechanism is essential for managing this without disabling a rule entirely.

Let's say our Suspicious K8s Service Account Token Read rule is triggering for a nightly backup script called kube-backup.sh.

Instead of adding kube-backup.sh to the global whitelist, which might be too permissive, we can create a more specific exception.

Create a new file for overrides, e.g., rules_override.yaml:

yaml
# rules_override.yaml

- rule: Suspicious K8s Service Account Token Read
  append: true # This MODIFIES the existing rule
  condition: >
    (k8s_sa_token_read and not proc.name in (legitimate_sa_token_readers)) and
    not (proc.name = 'kube-backup.sh' and container.image.repository = 'my-org/backup-tools')

In your values.yaml, load this file after the main rules file. The append: true flag tells Falco to add the new condition to the existing rule. Here, we've created a highly specific exception: the rule will not trigger if the process name is kube-backup.sh AND it's running from a specific, trusted container image.

This granular approach is critical for maintaining a strong security posture while adapting to the operational realities of a complex system.

Integrating Alerts with Production Systems

Alerts in stdout are useless. Falco needs to integrate with your existing monitoring and incident response workflows. The standard way to do this is with falcosidekick.

falcosidekick is a small proxy that receives alerts from Falco and forwards them to dozens of possible outputs like Slack, PagerDuty, Elasticsearch, Loki, or a generic webhook.

Enable it in your values.yaml and configure your desired output.

yaml
# values.yaml

falcosidekick:
  enabled: true
  # Pod resources for falcosidekick itself
  resources: {}
  webui:
    enabled: false # Disable for production unless needed
  config:
    # Example: Sending alerts to Slack and an Elasticsearch cluster
    slack:
      webhookurl: "YOUR_SLACK_WEBHOOK_URL"
      # You can customize the message format
      outputformat: "Time: %falco.time%\nRule: %falco.rule%\nPriority: %falco.priority%\nPod: %k8s.pod.name% (%k8s.ns.name%)\nCommand: %proc.cmdline%\nUser: %user.name%"
    elasticsearch:
      hostport: "http://elasticsearch-master:9200"
      index: "falco"
      type: "events"
      minimumpriority: "debug"
      # Buffer settings for high-volume environments
      buffer_size: 1000
      buffer_max_payload_size: 512

By shipping structured JSON events to your SIEM (Security Information and Event Management) system, like Elasticsearch, you unlock the ability to perform advanced analysis:

* Correlation: Correlate Falco runtime events with application logs, network flow data, and other telemetry.

* Dashboarding: Create dashboards to visualize threat trends, top-triggering rules, and most-targeted pods.

* Automated Response: Set up automated actions based on specific high-priority alerts, such as cordoning a Kubernetes node, scaling down a compromised deployment, or triggering a memory dump for forensic analysis.

Conclusion: From Detection to Defense

eBPF and Falco provide an unprecedented level of visibility into the runtime behavior of your Kubernetes workloads. By moving beyond default configurations and implementing these advanced patterns, you can transform Falco from a noisy observability tool into a high-fidelity, context-aware runtime security engine.

The key takeaways for senior engineers are:

  • Tune for Performance: Don't accept the defaults. Profile your nodes, monitor for dropped events, and adjust buffer sizes and resource allocations to ensure complete visibility without impacting performance.
  • Context is King: Write rules that leverage Kubernetes metadata. A rule conditioned on a pod label, namespace, or service account is infinitely more valuable than a generic process or file-based rule.
  • Embrace the Override: False positives will happen. Use the override mechanism to create granular exceptions that maintain security integrity while allowing for legitimate operational behavior.
  • Integrate and Automate: A security alert that isn't seen or acted upon is worthless. Pipe Falco events into your central SIEM and build dashboards, correlation rules, and automated response playbooks.
  • Runtime security is not a one-time setup. It's an iterative process of observing behavior, refining rules, and reducing noise. By mastering these advanced eBPF and Falco techniques, you can build a robust, kernel-level defense layer that is purpose-built for the dynamic, ephemeral nature of modern cloud-native applications.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles