eBPF-based TLS Sniffing for Istio Service Mesh Observability
The Observability Gap in Encrypted Service Meshes
In a standard Istio service mesh, mutual TLS (mTLS) is the bedrock of inter-service security. While invaluable, it creates an observability challenge. The Envoy sidecar proxy, which handles traffic encryption and decryption, is the primary source of L7 telemetry. However, this model has inherent limitations that become critical in high-performance or complex environments:
TLS_PASSTHROUGH), the mesh loses all L7 visibility.Traditional Application Performance Monitoring (APM) agents can solve this, but they require language-specific instrumentation, introduce their own overhead, and increase the application's dependency footprint. We need a more universal, lower-level, and less intrusive method.
This is where eBPF (extended Berkeley Packet Filter) provides a paradigm shift. By executing sandboxed programs within the Linux kernel, we can gain unprecedented visibility into system and application behavior. Instead of relying on proxy logs or application-level agents, we can intercept data at the source: the moment it's read from or written to a TLS library by the application, but before it's encrypted and sent to the network stack.
This article focuses on a specific, powerful technique: using eBPF uprobes (user-space probes) to hook into the OpenSSL shared library functions (SSL_write and SSL_read) used by a microservice running in an Istio-enabled Kubernetes pod. This provides raw, unencrypted payload data with minimal performance impact.
The eBPF Strategy: Intercepting TLS at the Source
Our goal is to capture the plaintext data of an RPC call (e.g., a gRPC request) made by an application before it gets encrypted by TLS. Sniffing packets at the kernel's network layer (tcp_sendmsg, tcp_recvmsg) using kprobes is insufficient, as the data is already encrypted by the time it reaches this level.
The solution is to move our probes from the kernel space to the user space, targeting the shared library responsible for TLS encryption. For a vast number of applications written in Go, Python, Ruby, Node.js, etc., this library is OpenSSL (libssl.so).
The core functions for handling application data in OpenSSL are:
   SSL_write(SSL ssl, const void *buf, int num): Application calls this to send plaintext data.
   SSL_read(SSL ssl, void *buf, int num): Application calls this to receive plaintext data.
Our strategy will be to attach eBPF programs to the entry and exit points of these two functions for any target process.
SSL_write entry: When the application calls SSL_write, our eBPF program will execute. We'll capture the pointer to the data buffer (buf) and its length (num). We'll store this information in an eBPF map, keyed by the unique thread identifier (pid_tgid).SSL_write exit (uretprobe): When SSL_write returns, our exit probe will execute. We'll retrieve the stored buffer information from the map. The function's return value tells us how many bytes were actually written. We can then read the data from the user-space memory buffer and submit it to our user-space controller via a perf buffer.SSL_read: The process is analogous for SSL_read, allowing us to capture response data.This approach is powerful because it's application-agnostic. As long as the application dynamically links OpenSSL, we can observe its traffic without any code changes.
The Challenge of State and CO-RE
This isn't without its challenges. We need to handle:
* Library Versions: Different container images might use different versions of OpenSSL. Hardcoding offsets is brittle. We will use eBPF CO-RE (Compile Once - Run Everywhere) with BTF (BPF Type Format) to make our eBPF program portable.
*   Finding the Library: We need to locate the path to libssl.so within the container's mount namespace from our host-level BPF controller.
*   Asynchronous I/O: A single logical message (e.g., a large gRPC payload) might be broken into multiple SSL_write calls. Our BPF program needs to manage this state, buffering partial data in a map until a complete message is formed.
Implementation: eBPF Program and User-space Controller
We will use libbpf for our C-based eBPF program and Go for our user-space controller, as this combination is common in production cloud-native tooling (e.g., Cilium).
1. The eBPF C Program (`bpf_tls_sniffer.c`)
This program defines the probes, maps, and logic for capturing data. It's designed with CO-RE in mind.
// +build ignore
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
// Define a structure to hold data sent to user-space
struct tls_data_event {
    u64 timestamp_ns;
    u32 pid;
    u32 tid;
    char comm[16];
    u8 is_write; // 1 for SSL_write, 0 for SSL_read
    u32 data_len;
    u8 data[4096]; // Max data to capture per event
};
// BPF ring buffer for sending data to user-space
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024); // 256 KB
} events SEC(".maps");
// Map to store buffer pointers between function entry and exit
struct ssl_write_args_t {
    const char *buf;
};
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u64);
    __type(value, struct ssl_write_args_t);
} active_ssl_write_args SEC(".maps");
struct ssl_read_args_t {
    char *buf;
};
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u64);
    __type(value, struct ssl_read_args_t);
} active_ssl_read_args SEC(".maps");
// --- SSL_write Probes ---
SEC("uprobe//usr/lib/x86_64-linux-gnu/libssl.so.3:SSL_write")
int BPF_KPROBE(uprobe_ssl_write, void *ssl, const void *buf, int num) {
    u64 id = bpf_get_current_pid_tgid();
    struct ssl_write_args_t args = {};
    args.buf = (const char*)buf;
    bpf_map_update_elem(&active_ssl_write_args, &id, &args, BPF_ANY);
    return 0;
}
SEC("uretprobe//usr/lib/x86_64-linux-gnu/libssl.so.3:SSL_write")
int BPF_KRETPROBE(uretprobe_ssl_write, int ret) {
    u64 id = bpf_get_current_pid_tgid();
    struct ssl_write_args_t *args = bpf_map_lookup_elem(&active_ssl_write_args, &id);
    if (!args) {
        return 0;
    }
    bpf_map_delete_elem(&active_ssl_write_args, &id);
    if (ret <= 0) { // Error or no data written
        return 0;
    }
    u32 len = (u32)ret;
    if (len == 0) {
        return 0;
    }
    struct tls_data_event *event = bpf_ringbuf_reserve(&events, sizeof(*event), 0);
    if (!event) {
        return 0;
    }
    event->timestamp_ns = bpf_ktime_get_ns();
    event->pid = id >> 32;
    event->tid = (u32)id;
    bpf_get_current_comm(&event->comm, sizeof(event->comm));
    event->is_write = 1;
    event->data_len = len < sizeof(event->data) ? len : sizeof(event->data);
    bpf_probe_read_user(&event->data, event->data_len, args->buf);
    bpf_ringbuf_submit(event, 0);
    return 0;
}
// --- SSL_read Probes ---
SEC("uprobe//usr/lib/x86_64-linux-gnu/libssl.so.3:SSL_read")
int BPF_KPROBE(uprobe_ssl_read, void *ssl, void *buf, int num) {
    u64 id = bpf_get_current_pid_tgid();
    struct ssl_read_args_t args = {};
    args.buf = (char*)buf;
    bpf_map_update_elem(&active_ssl_read_args, &id, &args, BPF_ANY);
    return 0;
}
SEC("uretprobe//usr/lib/x86_64-linux-gnu/libssl.so.3:SSL_read")
int BPF_KRETPROBE(uretprobe_ssl_read, int ret) {
    u64 id = bpf_get_current_pid_tgid();
    struct ssl_read_args_t *args = bpf_map_lookup_elem(&active_ssl_read_args, &id);
    if (!args) {
        return 0;
    }
    bpf_map_delete_elem(&active_ssl_read_args, &id);
    if (ret <= 0) {
        return 0;
    }
    u32 len = (u32)ret;
    if (len == 0) {
        return 0;
    }
    struct tls_data_event *event = bpf_ringbuf_reserve(&events, sizeof(*event), 0);
    if (!event) {
        return 0;
    }
    event->timestamp_ns = bpf_ktime_get_ns();
    event->pid = id >> 32;
    event->tid = (u32)id;
    bpf_get_current_comm(&event->comm, sizeof(event->comm));
    event->is_write = 0;
    event->data_len = len < sizeof(event->data) ? len : sizeof(event->data);
    bpf_probe_read_user(&event->data, event->data_len, args->buf);
    bpf_ringbuf_submit(event, 0);
    return 0;
}
char LICENSE[] SEC("license") = "GPL";Key Points in the BPF Code:
*   vmlinux.h: This header is generated by bpftool and contains kernel type definitions, essential for CO-RE.
*   uprobe/.../libssl.so.3:SSL_write: The SEC macro defines the probe type and target. Note the hardcoded path. We will address this in the deployment section.
*   BPF Maps: We use separate hash maps (active_ssl_write_args, active_ssl_read_args) to pass the buffer pointer from the entry probe to the return probe. The key is the thread group ID, ensuring thread safety.
*   bpf_ringbuf_reserve/submit: We use a modern ring buffer for efficient data transfer to user-space, which is superior to the older perf buffer mechanism.
*   bpf_probe_read_user: This helper safely reads memory from the probed process's user-space address space into the BPF program's stack.
2. The Go User-space Controller
This Go application will load the compiled eBPF object, attach the probes, and listen for data.
//go:build linux
package main
import (
	"bytes"
	"encoding/binary"
	"log"
	"os"
	"os/signal"
	"syscall"
	"github.com/cilium/ebpf/link"
	"github.com/cilium/ebpf/ringbuf"
	"github.com/cilium/ebpf/rlimit"
)
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -cc clang bpf ./bpf_tls_sniffer.c -- -I./headers
// This struct must match the C struct exactly.
type tlsDataEvent struct {
	TimestampNs uint64
	Pid         uint32
	Tid         uint32
	Comm        [16]byte
	IsWrite     uint8
	_           [3]byte // Padding
	DataLen     uint32
	Data        [4096]byte
}
func main() {
	stopper := make(chan os.Signal, 1)
	signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
	// Allow the current process to lock memory for BPF maps.
	if err := rlimit.RemoveMemlock(); err != nil {
		log.Fatal(err)
	}
	// Load pre-compiled programs and maps into the kernel.
	objs := bpfObjects{}
	if err := loadBpfObjects(&objs, nil); err != nil {
		log.Fatalf("loading objects: %v", err)
	}
	defer objs.Close()
	// --- Attach Probes ---
	// IMPORTANT: The path to libssl.so must be correct for the target container.
	// This is a simplified example. Production code needs to resolve this dynamically.
	ex, err := link.OpenExecutable("/usr/lib/x86_64-linux-gnu/libssl.so.3")
	if err != nil {
		log.Fatalf("opening executable: %s", err)
	}
	upWrite, err := ex.Uprobe("SSL_write", objs.UprobeSslWrite, nil)
	if err != nil {
		log.Fatalf("creating uprobe for SSL_write: %s", err)
	}
	defer upWrite.Close()
	urpWrite, err := ex.Uretprobe("SSL_write", objs.UretprobeSslWrite, nil)
	if err != nil {
		log.Fatalf("creating uretprobe for SSL_write: %s", err)
	}
	defer urpWrite.Close()
	upRead, err := ex.Uprobe("SSL_read", objs.UprobeSslRead, nil)
	if err != nil {
		log.Fatalf("creating uprobe for SSL_read: %s", err)
	}
	defer upRead.Close()
	urpRead, err := ex.Uretprobe("SSL_read", objs.UretprobeSslRead, nil)
	if err != nil {
		log.Fatalf("creating uretprobe for SSL_read: %s", err)
	}
	defer urpRead.Close()
	// --- Read from Ring Buffer ---
	rd, err := ringbuf.NewReader(objs.events)
	if err != nil {
		log.Fatalf("opening ringbuf reader: %s", err)
	}
	defer rd.Close()
	go func() {
		<-stopper
		rd.Close()
	}()
	log.Println("Waiting for events...")
	var event tlsDataEvent
	for {
		record, err := rd.Read()
		if err != nil {
			if err == ringbuf.ErrClosed {
				log.Println("Received signal, exiting...")
				return
			}
			log.Printf("error reading from reader: %s", err)
			continue
		}
		if err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event); err != nil {
			log.Printf("parsing ringbuf event: %s", err)
			continue
		}
		comm := string(bytes.TrimRight(event.Comm[:], "\x00"))
		direction := "Read"
		if event.IsWrite == 1 {
			direction = "Write"
		}
		// For demonstration, we print the raw data. In a real system, 
		// this would be parsed as HTTP/2, gRPC, etc.
		log.Printf("PID: %d, Comm: %s, Direction: %s\n--- Data ---\n%s\n----------\n", 
			event.Pid, comm, direction, string(event.Data[:event.DataLen]))
	}
}This controller uses the cilium/ebpf library to handle loading the BPF object file, attaching the probes to a specified executable path, and reading from the ring buffer. The go:generate directive automates the compilation of the C code into a Go-embeddable object file.
Production Deployment in a Kubernetes/Istio Cluster
Running this on a developer machine is one thing; deploying it reliably in a cluster is another. The standard pattern is a DaemonSet, which ensures our observability agent runs on every node.
1. The DaemonSet Manifest
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: tls-sniffer-agent
  namespace: kube-system
  labels:
    app: tls-sniffer-agent
spec:
  selector:
    matchLabels:
      app: tls-sniffer-agent
  template:
    metadata:
      labels:
        app: tls-sniffer-agent
    spec:
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      tolerations:
      - operator: Exists
      containers:
      - name: sniffer
        image: your-repo/tls-sniffer-agent:latest
        securityContext:
          privileged: true # Required for BPF operations
        volumeMounts:
        - name: proc
          mountPath: /proc
          readOnly: true
        - name: cgroup
          mountPath: /sys/fs/cgroup
          readOnly: true
        - name: bpf
          mountPath: /sys/fs/bpf
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: cgroup
        hostPath:
          path: /sys/fs/cgroup
      - name: bpf
        hostPath:
          path: /sys/fs/bpfCritical DaemonSet settings:
*   hostPID: true: This allows our agent running on the host to see the process IDs of all containers on that node.
*   privileged: true: This is the simplest way to grant the necessary capabilities (CAP_SYS_ADMIN, CAP_BPF). In a hardened environment, you would use a more granular Security Context with specific capabilities.
*   Volume Mounts: We mount the host's /proc, /sys/fs/cgroup, and /sys/fs/bpf filesystems to give the BPF loader the context it needs.
2. The Dynamic Library Path Problem
The most significant challenge is that the hardcoded path /usr/lib/x86_64-linux-gnu/libssl.so.3 in our BPF C code is incorrect. The path is relative to the target container's filesystem, not our agent's.
Our Go controller, running on the host, must dynamically discover the correct path for each target container and attach the probes accordingly. This is a non-trivial systems engineering problem.
Solution Strategy:
/proc). Using the Kubernetes API, it can correlate PIDs with Pods and containers, filtering for those we want to monitor (e.g., pods with a specific annotation like ebpf.observability.io/tls-sniff: "true")./proc//maps  to find memory mappings for shared libraries. It can look for an entry corresponding to libssl.so./proc//root/ .SEC macro for auto-attachment, the Go controller will load the BPF program and then use the link.Uprobe API to attach it to the dynamically resolved library path and PID.This makes the solution far more robust and adaptable to the heterogeneous nature of a microservices environment.
Advanced Edge Cases and Performance Considerations
A production-ready system must handle numerous edge cases.
1. Handling Asynchronous I/O and Partial Writes
High-performance applications (e.g., written in Go or using Netty in Java) often perform non-blocking I/O. A single logical message, like a 1MB gRPC payload, might be sent via multiple SSL_write calls. Our current BPF program would fire an event for each partial write, making reassembly in user-space difficult and inefficient.
Advanced Solution: In-Kernel Buffering
We can enhance our BPF program to handle this statefully.
BPF_MAP_TYPE_PERCPU_ARRAY or BPF_MAP_TYPE_HASH to act as a per-thread buffer.    struct data_buffer {
        u32 len;
        u8 buf[MAX_BUFFER_SIZE];
    };
    struct {
        __uint(type, BPF_MAP_TYPE_HASH);
        __uint(max_entries, 10240);
        __type(key, u64); // key is tgid
        __type(value, struct data_buffer);
    } write_buffers SEC(".maps");uretprobe_ssl_write: Instead of immediately sending the data, append it to the buffer in the map.\r\n\r\n. For HTTP/2 or gRPC, the framing protocol is binary. The BPF program would need a basic parser to identify frame headers and determine if a full message has been buffered. When a full message is detected, it's sent to the ring buffer, and the per-thread buffer is cleared. This significantly reduces the data volume sent to user-space and simplifies the controller logic, at the cost of more complex BPF code.2. Performance and Overhead
While eBPF is highly efficient, it's not free.
*   Probe Overhead: Each SSL_write/SSL_read call incurs the cost of trapping into the kernel, executing our BPF program, and returning. This is typically measured in microseconds, but for services with extremely high I/O rates, it can add up.
*   Data Transfer: The ringbuf is efficient, but sending gigabytes of payload data per second from kernel to user-space will consume CPU.
Mitigation Strategies:
* In-Kernel Filtering: Don't send everything. Enhance the BPF program to parse just enough of the payload to make a filtering decision. For example, parse the gRPC path from the HTTP/2 headers and only send data for specific, critical methods.
    // Pseudo-code in BPF program
    if (is_http2_header_frame(args->buf)) {
        char path[128];
        parse_http2_path(args->buf, path, sizeof(path));
        if (bpf_strncmp(path, "/my.critical.Service/", 22) == 0) {
            // Only submit data for this service
            bpf_ringbuf_submit(...);
        }
    }*   Sampling: For high-traffic services, only capture a fraction of requests. This can be implemented in the BPF program using bpf_get_prandom_u32() to make a probabilistic decision on whether to process an event.
3. CO-RE and BTF Portability
To avoid recompiling the BPF program for every Linux kernel version or OpenSSL library version, we rely on CO-RE. This requires BTF (BPF Type Format) information to be available for both the kernel (vmlinux) and the user-space libraries.
Modern Linux distributions provide this for the kernel. For user-space libraries like OpenSSL, you may need to generate the BTF information yourself using pahole. This information can then be embedded in the ELF object file of the library, allowing the libbpf loader to perform the necessary relocations at load time, making your BPF program truly portable.
Conclusion: The Future of Mesh Observability
Using eBPF to hook into user-space TLS libraries represents a sophisticated and powerful approach to service mesh observability. It moves data collection from the proxy sidecar to a universal, kernel-level layer, offering several key advantages:
* Lower Overhead: Bypasses the proxy for data collection, reducing latency.
* Universal Applicability: Works for any application that dynamically links a targeted TLS library, without requiring code instrumentation.
* Complete Visibility: Captures traffic that may never traverse the sidecar proxy.
This technique is not a replacement for service meshes like Istio, but rather a powerful enhancement. It provides a new data source that can feed into existing observability platforms, offering a more detailed and performant view of service behavior. Projects like Cilium's Hubble and Pixie are already productizing these concepts, abstracting away the complexities of writing and deploying eBPF programs. However, for senior engineers, understanding the underlying mechanics of eBPF, uprobes, and CO-RE is crucial for debugging, optimizing, and extending the next generation of cloud-native observability tools.