Kernel-Level K8s Runtime Security with eBPF and Custom Falco Rules
The Observability Gap in Ephemeral Infrastructure
In a Kubernetes environment, traditional host-based intrusion detection systems (HIDS) and security agents often fail. They either lack the context of container namespaces and cgroups, leading to noisy and irrelevant alerts, or they require intrusive sidecars and privileged daemons that increase the attack surface and introduce performance overhead. The core problem is observing process and network behavior within a container's isolated context without compromising the host or the container itself.
Static image scanning is necessary but insufficient. It cannot detect zero-day vulnerabilities or threats introduced at runtime, such as a compromised dependency that opens a reverse shell or an application that begins reading sensitive files it shouldn't access. We need to monitor system calls (syscalls) — the fundamental interface between an application and the kernel — to understand a workload's true behavior.
This is where eBPF (extended Berkeley Packet Filter) provides a paradigm shift. By allowing us to run sandboxed programs directly in the kernel, eBPF gives us a safe, performant, and context-aware mechanism to observe every syscall made by any process on the system. We can attach eBPF programs to kernel probes (kprobes) or tracepoints to capture events like execve, openat, and connect without modifying application code or kernel source.
However, writing, compiling, and loading raw eBPF programs using libbpf or BCC is a complex, low-level task. For production runtime security, we need a higher-level abstraction. This is where the CNCF project Falco excels. Falco uses an eBPF probe to collect a stream of syscall events, enriches them with Kubernetes metadata (pod name, namespace, labels), and evaluates them against a powerful, declarative rule engine.
This post will not cover the basics of installing Falco. We assume you have a running Kubernetes cluster and have deployed the Falco Helm chart. Instead, we will focus on the advanced techniques required to make Falco a truly effective runtime security tool in a production environment.
Advanced Falco Configuration for Production
Your default Helm values.yaml is a starting point. For a production deployment, several key areas require careful tuning.
1. Forcing the eBPF Driver
Falco can use either a kernel module (falco-kmod) or an eBPF probe (falco-bpf). The kernel module was the original method, but it can be brittle, requiring recompilation for each new kernel version and potentially causing kernel panics if incompatible. The eBPF probe is the modern, safer, and often more performant choice.
Ensure you are explicitly using the eBPF driver and that it's correctly configured in your values.yaml.
# values.yaml for Falco Helm Chart
driver:
enabled: true
kind: ebpf
ebpf:
# If your nodes don't have BTF info, Falco can download it.
# This is common on older kernels or custom-built kernels.
# Set to false if you are confident BTF is available locally.
# On modern systems (kernel 5.8+), this is often unnecessary.
probe: ""
# Resource allocation is critical. Default requests/limits are too low for busy nodes.
falco:
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 2
memory: 2Gi
2. Performance Tuning the Syscall Buffer
Falco's eBPF probe uses a per-CPU buffer to pass syscall events from the kernel to the userspace Falco process. If the rate of syscalls on a node is extremely high, this buffer can fill up, leading to dropped events. You'll see a "Falco drop" message in the logs. This is a critical failure, as it means you have a blind spot in your security monitoring.
To mitigate this, you can increase the size of the buffer via an environment variable. The default is 8MB. For nodes running high-throughput applications like databases or message queues, you may need to increase this.
# values.yaml
falco:
extraEnv:
- name: SYSDIG_BPF_PROBE_CPU_BUFFER_BYTES
# Increase buffer from 8MB default to 32MB
value: "33554432"
Benchmarking this change is crucial. Increasing the buffer consumes more non-swappable kernel memory on each node. Monitor the memory usage of the Falco daemonset pods and the overall node memory pressure after applying this change. Use Falco's own metrics (falco_stats_sc_evt_drop_perc) exposed via Prometheus to track the drop rate and validate that your tuning is effective.
Crafting Context-Aware Custom Rules
Falco's default ruleset is excellent but generic. True value comes from writing rules tailored to your specific applications and security policies. We'll store our custom rules in a ConfigMap and mount it into the Falco pods.
First, configure your values.yaml to load them:
# values.yaml
falco:
customRules:
# The key of the ConfigMap entry
rules.yaml: |-
# Custom rules will be placed here
Now, let's build some advanced, production-grade rules.
Scenario 1: Detecting Shells in a Distroless Container
A common best practice is to use distroless base images, which contain only the application and its runtime dependencies, omitting shells and other utilities. A shell spawning in such a container is a massive red flag, often indicating a successful remote code execution (RCE) exploit.
A naive rule might just look for execve of bash. A better rule uses Falco's list and macro system and is conditioned on Kubernetes metadata.
# rules.yaml (inside the ConfigMap)
# List of common shell binaries
- list: shell_binaries
items: [sh, bash, csh, tcsh, ksh, zsh, dash]
# Macro to identify a container that SHOULD be distroless
- macro: distroless_container
condition: k8s.pod.label.runtime = 'distroless'
# The actual rule
- rule: Unexpected Shell in Distroless Container
desc: >
A shell process was spawned in a container that is labeled as distroless.
This is highly suspicious and could indicate a container escape or RCE.
condition: >
spawned_process and
distroless_container and
proc.name in (shell_binaries)
output: >
Unexpected shell spawned in distroless container (user=%user.name command=%proc.cmdline %container.info parent=%proc.pname)
priority: CRITICAL
tags: [k8s, runtime, security, mitre_execution]
To test this, apply a label to one of your deployments:
# vulnerable-app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vulnerable-app
labels:
app: vulnerable
spec:
replicas: 1
selector:
matchLabels:
app: vulnerable
template:
metadata:
labels:
app: vulnerable
runtime: distroless # <-- The critical label for our rule
spec:
containers:
- name: main
# Using a standard ubuntu image to simulate a compromise
image: ubuntu:latest
command: ["sleep", "3600"]
Now, kubectl exec into that pod and run bash:
kubectl exec -it $(kubectl get pods -l app=vulnerable -o jsonpath='{.items[0].metadata.name}') -- /bin/bash
Falco will immediately generate a CRITICAL alert:
14:35:01.234567890: Critical Unexpected shell spawned in distroless container (user=root command=bash container.id=... container.name=main k8s.ns.name=default k8s.pod.name=vulnerable-app-... parent=runc)
This rule is powerful because it's context-aware. It won't trigger for pods that are expected to have a shell, reducing alert fatigue.
Scenario 2: Monitoring Service Account Token Access
Every Kubernetes pod is mounted with a service account token at /var/run/secrets/kubernetes.io/serviceaccount/token. If an attacker gains RCE, one of their first actions is often to exfiltrate this token to pivot and attack the Kubernetes API server.
We want to detect any process that reads this token, except for legitimate processes that need it (e.g., a service mesh client, a metrics scraper, or the application itself on startup).
# rules.yaml
- macro: k8s_sa_token_read
condition: >
(open_read or open_directory) and
fd.name contains /var/run/secrets/kubernetes.io/serviceaccount
- list: legitimate_sa_token_readers
items: [istio-agent, linkerd-proxy, prometheus, jaeger-agent, my-app-binary]
- rule: Suspicious K8s Service Account Token Read
desc: >
A process read the K8s service account token. This is often a precursor to privilege escalation.
Whitelist legitimate processes in the 'legitimate_sa_token_readers' list.
condition: k8s_sa_token_read and not proc.name in (legitimate_sa_token_readers)
output: >
Suspicious read of K8s SA token (user=%user.name command=%proc.cmdline file=%fd.name %container.info)
priority: WARNING
tags: [k8s, security, mitre_credential_access]
This rule demonstrates the power of whitelisting. By maintaining a list of known-good processes, you can create a high-fidelity alert that only triggers on anomalous behavior. When a new legitimate tool is introduced, you simply update the list.
Scenario 3: Detecting Outbound Connections to Crypto-Mining Pools
Cryptojacking is a common attack where a compromised container is used to mine cryptocurrency. This often involves an outbound network connection to a known mining pool address on a specific port.
# rules.yaml
- list: crypto_miner_domains
items: ["pool.monero.hashvault.pro", "xmr-us-west1.nanopool.org", "ca.minexmr.com"]
- list: crypto_miner_ports
items: [3333, 4444, 5555, 6666, 7777, 8888, 14444]
- rule: Outbound Connection to Crypto-Mining Pool
desc: >
An outbound network connection was made to a known crypto-mining pool domain or IP.
condition: >
outbound and
(fd.sip.name in (crypto_miner_domains) or fd.rip in (crypto_miner_domains)) and
fd.sport in (crypto_miner_ports)
output: >
Outbound connection to crypto-mining pool detected (user=%user.name command=%proc.cmdline connection=%fd.name %container.info)
priority: CRITICAL
tags: [network, security, mitre_impact]
To make this rule even more robust in a production environment, you would likely use a threat intelligence feed to dynamically populate the crypto_miner_domains list, rather than hardcoding it.
Advanced Edge Case: Handling False Positives with Overrides
Even with well-crafted rules, false positives are inevitable. A developer might run a diagnostic script that reads a sensitive file, or a new version of an application might change its behavior. Falco's rule override mechanism is essential for managing this without disabling a rule entirely.
Let's say our Suspicious K8s Service Account Token Read rule is triggering for a nightly backup script called kube-backup.sh.
Instead of adding kube-backup.sh to the global whitelist, which might be too permissive, we can create a more specific exception.
Create a new file for overrides, e.g., rules_override.yaml:
# rules_override.yaml
- rule: Suspicious K8s Service Account Token Read
append: true # This MODIFIES the existing rule
condition: >
(k8s_sa_token_read and not proc.name in (legitimate_sa_token_readers)) and
not (proc.name = 'kube-backup.sh' and container.image.repository = 'my-org/backup-tools')
In your values.yaml, load this file after the main rules file. The append: true flag tells Falco to add the new condition to the existing rule. Here, we've created a highly specific exception: the rule will not trigger if the process name is kube-backup.sh AND it's running from a specific, trusted container image.
This granular approach is critical for maintaining a strong security posture while adapting to the operational realities of a complex system.
Integrating Alerts with Production Systems
Alerts in stdout are useless. Falco needs to integrate with your existing monitoring and incident response workflows. The standard way to do this is with falcosidekick.
falcosidekick is a small proxy that receives alerts from Falco and forwards them to dozens of possible outputs like Slack, PagerDuty, Elasticsearch, Loki, or a generic webhook.
Enable it in your values.yaml and configure your desired output.
# values.yaml
falcosidekick:
enabled: true
# Pod resources for falcosidekick itself
resources: {}
webui:
enabled: false # Disable for production unless needed
config:
# Example: Sending alerts to Slack and an Elasticsearch cluster
slack:
webhookurl: "YOUR_SLACK_WEBHOOK_URL"
# You can customize the message format
outputformat: "Time: %falco.time%\nRule: %falco.rule%\nPriority: %falco.priority%\nPod: %k8s.pod.name% (%k8s.ns.name%)\nCommand: %proc.cmdline%\nUser: %user.name%"
elasticsearch:
hostport: "http://elasticsearch-master:9200"
index: "falco"
type: "events"
minimumpriority: "debug"
# Buffer settings for high-volume environments
buffer_size: 1000
buffer_max_payload_size: 512
By shipping structured JSON events to your SIEM (Security Information and Event Management) system, like Elasticsearch, you unlock the ability to perform advanced analysis:
* Correlation: Correlate Falco runtime events with application logs, network flow data, and other telemetry.
* Dashboarding: Create dashboards to visualize threat trends, top-triggering rules, and most-targeted pods.
* Automated Response: Set up automated actions based on specific high-priority alerts, such as cordoning a Kubernetes node, scaling down a compromised deployment, or triggering a memory dump for forensic analysis.
Conclusion: From Detection to Defense
eBPF and Falco provide an unprecedented level of visibility into the runtime behavior of your Kubernetes workloads. By moving beyond default configurations and implementing these advanced patterns, you can transform Falco from a noisy observability tool into a high-fidelity, context-aware runtime security engine.
The key takeaways for senior engineers are:
Runtime security is not a one-time setup. It's an iterative process of observing behavior, refining rules, and reducing noise. By mastering these advanced eBPF and Falco techniques, you can build a robust, kernel-level defense layer that is purpose-built for the dynamic, ephemeral nature of modern cloud-native applications.