Enforcing Multi-Tenant Security with Kubernetes Admission Controllers
The Limits of Static Policy in Dynamic Environments
In any non-trivial multi-tenant Kubernetes cluster, the limitations of declarative, static authorization mechanisms like Role-Based Access Control (RBAC) quickly become apparent. RBAC is excellent for defining who can do what to which resources (e.g., "Team A can create Deployments in namespace tenant-a"). However, it cannot enforce policies based on the content or context of those resources. 
Consider these common multi-tenancy requirements:
tenant-a-ns) must only pull images from a dedicated, scanned container registry path (e.g., gcr.io/my-corp/tenant-a/*).ResourceQuota.Ingress objects created must contain a tenancy.my-corp.com/tenant-id label that matches the label on their containing namespace.RBAC has no mechanism to inspect the spec.containers[].image field of a Pod or cross-reference a namespace's labels during an API request. This is where the Kubernetes API server's extension mechanism, Dynamic Admission Control, becomes indispensable. It provides webhooks—ValidatingAdmissionWebhook and MutatingAdmissionWebhook—that intercept API requests before they are persisted to etcd, allowing for custom, programmatic validation and modification.
This article is not an introduction. We assume you understand the basic concept of admission controllers. Instead, we will build a production-ready, high-performance Validating Admission Webhook in Go from the ground up to solve the complex multi-tenant policy challenges outlined above. We will focus on the nuances of production deployment, performance optimization, and failure handling that are critical for a component that sits in the API server's critical path.
Section 1: Architecting the Go Webhook Server and TLS
The core of our admission controller is an HTTPS server that exposes an endpoint (e.g., /validate) for the Kubernetes API server to call. The TLS requirement is non-negotiable; the API server will refuse to communicate over unencrypted HTTP.
1.1. The AdmissionReview Request/Response Lifecycle
When a user runs kubectl apply -f pod.yaml, the API server, upon successful authentication and authorization, serializes the request into an AdmissionReview object and POSTs it to our webhook. Our server's responsibility is to:
AdmissionReview request.AdmissionRequest payload.- Perform our custom validation logic against the object within the request.
AdmissionResponse indicating whether the request is allowed or denied (with a reason).AdmissionReview object and serialize it back to the API server.Here is the Go struct mapping for these critical objects from the k8s.io/api/admission/v1 package:
// AdmissionReview encapsulates an admission request and a response.
// Both AdmissionRequest and AdmissionResponse are embedded.
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type AdmissionReview struct {
	typeMeta metav1.TypeMeta `json:",inline"`
	// Request describes the attributes for the admission request.
	Request *AdmissionRequest `json:"request,omitempty"`
	// Response describes the attributes for the admission response.
	Response *AdmissionResponse `json:"response,omitempty"`
}
// AdmissionRequest describes the admission request parameters.
type AdmissionRequest struct {
	// UID is an identifier for the individual request/response.
	UID types.UID `json:"uid"`
	// Kind is the type of object being manipulated.
	Kind metav1.GroupVersionKind `json:"kind"`
	// Resource is the name of the resource being manipulated.
	Resource metav1.GroupVersionResource `json:"resource"`
	// Object is the object from the incoming request.
	Object runtime.RawExtension `json:"object"`
	// OldObject is the existing object. Only populated for UPDATE and DELETE.
	OldObject runtime.RawExtension `json:"oldObject,omitempty"`
    // ... other fields
}
// AdmissionResponse describes an admission response.
type AdmissionResponse struct {
	// UID is an identifier for the individual request/response.
	UID types.UID `json:"uid"`
	// Allowed indicates whether or not the admission request was permitted.
	Allowed bool `json:"allowed"`
	// Result contains extra details into why an admission request was denied.
	Result *metav1.Status `json:"status,omitempty"`
}1.2. The Core HTTP Server Implementation
We'll use Go's standard net/http library. The key is to handle JSON serialization/deserialization correctly and set up the TLS configuration.
// main.go
package main
import (
	"encoding/json"
	"fmt"
	"io/ioutil"
	"net/http"
	admissionv1 "k8s.io/api/admission/v1"
	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/runtime/serializer"
)
var (
	universalDeserializer = serializer.NewCodecFactory(runtime.NewScheme()).UniversalDeserializer()
)
// admissionHandler handles the webhook requests from the Kubernetes API server.
func admissionHandler(w http.ResponseWriter, r *http.Request) {
	body, err := ioutil.ReadAll(r.Body)
	if err != nil {
		http.Error(w, "could not read request body", http.StatusBadRequest)
		return
	}
	var admissionReview admissionv1.AdmissionReview
	if _, _, err := universalDeserializer.Decode(body, nil, &admissionReview); err != nil {
		http.Error(w, "could not deserialize request", http.StatusBadRequest)
		return
	}
	if admissionReview.Request == nil {
		http.Error(w, "malformed admission review: request is nil", http.StatusBadRequest)
		return
	}
	// The core validation logic goes here.
	// For now, let's just create a basic response.
	admissionResponse := &admissionv1.AdmissionResponse{
		UID:     admissionReview.Request.UID,
		Allowed: true, // Default to allowed
	}
	// In a real implementation, you would call a validation function:
	// admissionResponse = validatePodCreation(admissionReview.Request)
	// Wrap the response in a new AdmissionReview object.
	responseReview := admissionv1.AdmissionReview{
		TypeMeta: metav1.TypeMeta{
			APIVersion: "admission.k8s.io/v1",
			Kind:       "AdmissionReview",
		},
		Response: admissionResponse,
	}
	respBytes, err := json.Marshal(responseReview)
	if err != nil {
		http.Error(w, "could not serialize response", http.StatusInternalServerError)
		return
	}
	w.Header().Set("Content-Type", "application/json")
	w.Write(respBytes)
}
func main() {
	http.HandleFunc("/validate", admissionHandler)
	// Paths to the TLS certificate and key.
	// These will be mounted from a Kubernetes Secret.
	certPath := "/etc/webhook/certs/tls.crt"
	keyPath := "/etc/webhook/certs/tls.key"
	fmt.Println("Starting webhook server on :8443...")
	if err := http.ListenAndServeTLS(":8443", certPath, keyPath, nil); err != nil {
		panic(err)
	}
}This provides the boilerplate for our server. Note the use of universalDeserializer from k8s.io/apimachinery—this is the canonical way to decode Kubernetes API objects.
Section 2: Implementing Context-Aware Validation Logic
Now we'll implement the logic to enforce our multi-tenant policies. This requires our webhook to not only inspect the incoming object but also to query the Kubernetes API server for additional context (like namespace labels).
2.1. Setting up the Kubernetes Client
We need client-go to interact with the API server. We'll use an in-cluster configuration, which assumes our webhook is running as a Pod inside the cluster.
// client.go
package main
import (
	"k8s.io/client-go/kubernetes"
	"k8s.io/client-go/rest"
)
var clientset *kubernetes.Clientset
func init() {
	config, err := rest.InClusterConfig()
	if err != nil {
		panic(err.Error())
	}
	clientset, err = kubernetes.NewForConfig(config)
	if err != nil {
		panic(err.Error())
	}
}By placing this in an init() function, the clientset will be initialized once when the application starts.
2.2. The Validation Function
This function will contain the core logic. It receives the AdmissionRequest and returns an AdmissionResponse.
// validator.go
package main
import (
	"context"
	"fmt"
	"strings"
	admissionv1 "k8s.io/api/admission/v1"
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/api/resource"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
const (
	tenantIDLabel = "tenancy.my-corp.com/tenant-id"
	tenantTierLabel = "tenancy.my-corp.com/tier"
	premiumTierValue = "premium"
	standardCPULimit = "2"
	premiumCPULimit = "16"
)
func validatePod(req *admissionv1.AdmissionRequest) *admissionv1.AdmissionResponse {
	// We only care about Pod creation
	if req.Resource.Resource != "pods" || req.Operation != admissionv1.Create {
		return &admissionv1.AdmissionResponse{Allowed: true}
	}
	// Deserialize the Pod object from the request
	pod := &corev1.Pod{}
	if err := json.Unmarshal(req.Object.Raw, pod); err != nil {
		return toAdmissionResponse(false, fmt.Sprintf("failed to unmarshal pod: %v", err))
	}
	// Fetch the namespace for context
	ns, err := clientset.CoreV1().Namespaces().Get(context.TODO(), req.Namespace, metav1.GetOptions{})
	if err != nil {
		// IMPORTANT: If we can't get the namespace, should we allow or deny?
		// A fail-closed approach is more secure.
		return toAdmissionResponse(false, fmt.Sprintf("failed to get namespace '%s': %v", req.Namespace, err))
	}
	// --- Policy 1: Image Registry Enforcement ---
	tenantID, ok := ns.Labels[tenantIDLabel]
	if !ok {
		// If the namespace isn't a tenant namespace, we don't apply the policy.
		// This prevents us from blocking system pods.
		return &admissionv1.AdmissionResponse{Allowed: true}
	}
	allowedRegistryPrefix := fmt.Sprintf("gcr.io/my-corp/%s/", tenantID)
	for _, container := range pod.Spec.Containers {
		if !strings.HasPrefix(container.Image, allowedRegistryPrefix) {
			msg := fmt.Sprintf("invalid image registry for tenant '%s'. Image '%s' must be from '%s'", tenantID, container.Image, allowedRegistryPrefix)
			return toAdmissionResponse(false, msg)
		}
	}
	// --- Policy 2: Tiered Resource Allocation ---
	tenantTier := ns.Labels[tenantTierLabel]
	var cpuLimit resource.Quantity
	if tenantTier == premiumTierValue {
		cpuLimit = resource.MustParse(premiumCPULimit)
	} else {
		cpuLimit = resource.MustParse(standardCPULimit)
	}
	for _, container := range pod.Spec.Containers {
		if container.Resources.Limits != nil {
			if container.Resources.Limits.Cpu().Cmp(cpuLimit) > 0 {
				msg := fmt.Sprintf("CPU limit %s exceeds tier limit of %s for container '%s'", container.Resources.Limits.Cpu().String(), cpuLimit.String(), container.Name)
				return toAdmissionResponse(false, msg)
			}
		}
	}
	return &admissionv1.AdmissionResponse{Allowed: true}
}
// Helper to create a denied response
func toAdmissionResponse(allowed bool, message string) *admissionv1.AdmissionResponse {
	return &admissionv1.AdmissionResponse{
		Allowed: allowed,
		Result: &metav1.Status{
			Message: message,
		},
	}
}
// Update main.go to call this function:
// admissionResponse = validatePod(admissionReview.Request)This implementation demonstrates the power of a dynamic controller. It fetches the pod's namespace, inspects its labels (tenant-id and tier), and then applies logic to the incoming pod spec based on that external context. This is impossible with RBAC alone.
Section 3: Production Deployment and Configuration
Deploying an admission controller requires more than just a Deployment. We need to manage TLS certificates, configure the webhook registration, and ensure high availability.
3.1. Dockerizing the Go Application
We'll use a multi-stage Dockerfile for a minimal, secure final image.
# --- Build Stage ---
FROM golang:1.19-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# Build the binary with optimizations
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o webhook ./
# --- Final Stage ---
FROM gcr.io/distroless/static-debian11
WORKDIR /root/
COPY --from=builder /app/webhook .
# The webhook binary will be run by the Kubernetes deployment spec
CMD ["/root/webhook"]This results in a tiny image containing only our statically linked Go binary, reducing the attack surface.
3.2. Kubernetes Manifests
This is the most complex part. We need a Deployment, a Service, a mechanism for TLS, and the ValidatingWebhookConfiguration.
A common production pattern is to use a tool like cert-manager to automatically provision and rotate the TLS certificates. The cert-manager CA injector will also automatically populate the caBundle field in the webhook configuration, which is a frequent point of failure when managed manually.
Here's a simplified set of manifests assuming cert-manager is installed.
# 01-tls-certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: multi-tenant-webhook-cert
  namespace: security-tools
spec:
  secretName: multi-tenant-webhook-tls
  dnsNames:
  - multi-tenant-webhook.security-tools.svc
  - multi-tenant-webhook.security-tools.svc.cluster.local
  issuerRef:
    name: selfsigned-cluster-issuer # Or your production issuer
    kind: ClusterIssuer
---
# 02-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multi-tenant-webhook
  namespace: security-tools
  labels:
    app: multi-tenant-webhook
spec:
  replicas: 2 # For High Availability
  selector:
    matchLabels:
      app: multi-tenant-webhook
  template:
    metadata:
      labels:
        app: multi-tenant-webhook
    spec:
      containers:
      - name: webhook
        image: gcr.io/my-corp/multi-tenant-webhook:v1.0.0
        ports:
        - containerPort: 8443
          name: webhook-tls
        volumeMounts:
        - name: webhook-tls-certs
          mountPath: /etc/webhook/certs
          readOnly: true
      volumes:
      - name: webhook-tls-certs
        secret:
          secretName: multi-tenant-webhook-tls
---
# 03-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: multi-tenant-webhook
  namespace: security-tools
spec:
  selector:
    app: multi-tenant-webhook
  ports:
  - port: 443
    targetPort: webhook-tls
---
# 04-validating-webhook-configuration.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: multi-tenant-policy.my-corp.com
  annotations:
    # Use cert-manager to inject the CA bundle
    cert-manager.io/inject-ca-from: "security-tools/multi-tenant-webhook-cert"
spec:
  failurePolicy: Fail # Critical for security enforcement
  sideEffects: None
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE"]
    resources: ["pods"]
    scope: "Namespaced"
  clientConfig:
    service:
      namespace: security-tools
      name: multi-tenant-webhook
      path: "/validate"
  # IMPORTANT: Scope the webhook to avoid hitting system namespaces
  namespaceSelector:
    matchExpressions:
    - key: tenancy.my-corp.com/tenant-id
      operator: Exists
  timeoutSeconds: 5Key Production Considerations in these Manifests:
*   replicas: 2: Running multiple webhook pods prevents the controller from being a single point of failure.
*   cert-manager Integration: Automates the complex and error-prone process of managing the CA bundle that the API server uses to trust our webhook.
   failurePolicy: Fail: This is crucial. If set to Ignore, any failure to reach the webhook (e.g., network issue, pod crash) would result in the API server allowing* the request, silently bypassing our security policy. Fail ensures the API call is rejected, maintaining a secure posture.
   namespaceSelector: This is a critical performance optimization. It tells the API server to only* call our webhook for pods being created in namespaces that have the tenancy.my-corp.com/tenant-id label. This prevents our webhook from being invoked for every single pod creation in the cluster (e.g., in kube-system), reducing load and latency.
*   timeoutSeconds: A reasonable timeout (e.g., 5 seconds) prevents a slow webhook from catastrophically blocking the API server.
Section 4: Advanced Edge Cases and Performance Tuning
A component in the API server's critical path must be robust and performant. Here we discuss common failure modes and optimization strategies.
4.1. The Latency Problem: Caching with Informers
In our current implementation, every call to the /validate endpoint results in a GET request to the API server to fetch the namespace. On a busy cluster, this can add significant latency to every pod creation and put undue load on the API server itself.
The solution is to maintain a local, in-memory cache of namespaces. The client-go library provides an excellent mechanism for this: Informers.
An informer watches a resource type (like Namespaces) and maintains an up-to-date local cache. Queries against this cache are near-instantaneous and do not hit the API server.
// cache.go
package main
import (
	corev1 "k8s.io/api/core/v1"
	"k8s.io/client-go/informers"
	"k8s.io/client-go/tools/cache"
	"time"
)
var namespaceLister cache.GenericLister
// startInformer initializes and starts a shared informer for namespaces.
func startInformer(stopCh <-chan struct{}) {
	factory := informers.NewSharedInformerFactory(clientset, 30*time.Minute)
	namespaceInformer := factory.Core().V1().Namespaces().Informer()
	namespaceLister = factory.Core().V1().Namespaces().Lister()
	go factory.Start(stopCh)
	// Wait for the initial cache sync.
	if !cache.WaitForCacheSync(stopCh, namespaceInformer.HasSynced) {
		panic("failed to sync cache")
	}
}
// In main.go, start the informer:
// stopCh := make(chan struct{})
// defer close(stopCh)
// go startInformer(stopCh)
// Then, in validator.go, replace the API call:
/*
// OLD WAY:
ns, err := clientset.CoreV1().Namespaces().Get(context.TODO(), req.Namespace, metav1.GetOptions{})
*/
// NEW WAY:
obj, err := namespaceLister.Get(req.Namespace)
if err != nil {
    return toAdmissionResponse(false, fmt.Sprintf("failed to get namespace '%s' from cache: %v", req.Namespace, err))
}
ns := obj.(*corev1.Namespace)By replacing the direct API call with a lookup against the informer's Lister, we reduce the validation latency from potentially hundreds of milliseconds to microseconds, dramatically improving the performance and scalability of our webhook.
4.2. Operational Risk: The `failurePolicy: Fail` Deadlock
The failurePolicy: Fail setting creates a significant operational risk. If a bug is deployed to the webhook, or if all its pods crash, it may become impossible to create or update any pods in the selected namespaces. You could effectively lock yourself out of deploying a fix.
Mitigation Strategies:
ValidatingWebhookConfiguration for the canary that only applies to a test namespace.http_requests_total, http_request_duration_seconds, and error rates. Alert immediately if the webhook becomes unavailable or starts erroring.ValidatingWebhookConfiguration in an emergency. This immediately disables the webhook, allowing normal operations to resume.    # Emergency command, should be guarded by strong RBAC
    kubectl delete validatingwebhookconfiguration multi-tenant-policy.my-corp.com4.3. Race Conditions
Consider this scenario: an attacker submits a valid pod spec for a namespace, and simultaneously changes the namespace's tenant-id label to point to a different tenant. Could the pod be validated against the old label and created with access to the new tenant's resources?
Fortunately, the admission control process is synchronous and transactional. The API server's state at the time of the admission request is the state used for validation. The GET request for the namespace (or the informer cache lookup) will see the state of the namespace as it exists at that moment. The pod will not be persisted to etcd until after our webhook returns an allowed: true response. This atomicity largely mitigates this class of race condition.
Conclusion
Dynamic Admission Controllers are a powerful tool for implementing the kind of nuanced, context-aware security policies that are essential in a multi-tenant Kubernetes environment. By moving beyond simple RBAC, we can enforce fine-grained rules about image provenance, resource consumption, and metadata consistency.
However, this power comes with significant responsibility. An admission controller is a critical, synchronous component in the Kubernetes control plane. Building and operating one requires a deep understanding of its failure modes, performance characteristics, and the operational risks involved. By employing strategies like client-go informer caches for performance, cert-manager for robust TLS, and carefully planned high-availability and disaster recovery procedures, you can build admission controllers that are not only powerful but also production-grade, secure, and resilient.