Building a K8s Dynamic Admission Controller in Go for Policy Enforcement

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Governance Gap: Why RBAC Isn't Enough

In any mature Kubernetes environment, the native Role-Based Access Control (RBAC) system is the bedrock of security, dictating who can perform what actions on which resources. However, its scope is limited to authorizing actions, not validating the content of the resources being created or updated. This leaves a significant governance gap. For instance, RBAC can't enforce policies such as:

  • All Deployment resources must include an owner label for cost allocation and accountability.
  • Container images must originate exclusively from a trusted corporate registry (e.g., gcr.io/my-company/*).
  • Ingress objects must not use wildcard hosts (*.example.com) in production namespaces.
    • Custom Resource Definitions (CRDs) must conform to a specific internal schema beyond basic OpenAPI validation.

    With the deprecation of PodSecurityPolicy (PSP), the responsibility for enforcing such fine-grained, business-logic-driven policies shifts squarely to admission controllers. While frameworks like OPA/Gatekeeper and Kyverno offer powerful policy-as-code solutions, they introduce their own abstractions (like the Rego language) and performance characteristics. For ultimate control, performance, and integration with internal systems, building a custom Dynamic Admission Controller provides an unparalleled solution.

    This post is a deep dive for platform engineers and SREs on building, deploying, and operating a production-ready validating admission webhook in Go. We will skip the basics and focus on the architecture, implementation details, and operational realities of running such a critical component in your control plane.

    Kubernetes API Request Lifecycle: The Admission Controller's Role

    Before we write a single line of Go, it's critical to understand precisely where our webhook fits into the Kubernetes API server's request flow. When a client (like kubectl) sends a request to create or update a resource, it passes through several stages:

  • Authentication: The requestor's identity is verified.
  • Authorization: RBAC checks if the authenticated user is permitted to perform the requested action.
  • Mutating Admission: A series of mutating admission webhooks are called sequentially. These can modify the incoming object. A common example is a service mesh sidecar injector.
  • Object Schema Validation: The API server validates that the (potentially mutated) object conforms to its schema (e.g., a Deployment has a valid spec).
  • Validating Admission: This is our domain. A series of validating admission webhooks are called, often in parallel. These webhooks can inspect the object and reject the request, but they cannot modify the object. Our policy enforcement logic lives here.
  • Persistence: If all checks pass, the object is written to etcd.
  • Our focus is on the ValidatingAdmissionWebhook. By intercepting requests at this stage, we can enforce complex invariants without altering the user's original intent, providing clear, immediate feedback on policy violations.

    The Anatomy of a `ValidatingWebhookConfiguration`

    The entire mechanism is orchestrated by a ValidatingWebhookConfiguration resource. This object tells the API server how and when to call our webhook. Let's dissect a production-grade example:

    yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingWebhookConfiguration
    metadata:
      name: my-company.policy-enforcer.webhook
    webhooks:
      - name: policy-enforcer.my-company.com
        rules:
          - apiGroups: ["apps"]
            apiVersions: ["v1"]
            operations: ["CREATE", "UPDATE"]
            resources: ["deployments"]
            scope: "Namespaced"
        clientConfig:
          service:
            namespace: policy-enforcer
            name: policy-enforcer-webhook-svc
            path: "/validate/deployment"
            port: 443
          caBundle: "LS0tLS1CR...=" # Base64-encoded CA certificate
        admissionReviewVersions: ["v1"]
        sideEffects: None
        timeoutSeconds: 3
        failurePolicy: Fail
        namespaceSelector:
          matchExpressions:
            - key: kubernetes.io/metadata.name
              operator: NotIn
              values: [kube-system, policy-enforcer]

    Key fields for senior engineers:

  • rules: This is a critical performance and reliability lever. Be as specific as possible. Don't listen for * operations or resources unless absolutely necessary. Every matching request adds latency and a potential point of failure.
  • clientConfig.service: Points to the Service that routes traffic to our webhook pods. The API server initiates this connection.
  • caBundle: The PEM-encoded CA certificate that signed the webhook server's certificate. The API server uses this to verify the identity of our webhook. This is non-negotiable for security. We'll discuss managing this with cert-manager later.
  • sideEffects: Must be None for validating webhooks. This is a guarantee to the API server that your webhook has no side effects on other resources.
  • timeoutSeconds: A low timeout (e.g., 1-3 seconds) is crucial. A slow webhook can cripple your cluster's control plane. If your validation logic requires external calls, it must be extremely fast and reliable.
  • failurePolicy: This is arguably the most important operational decision.
  • - Fail: If the webhook is unreachable or times out, the API request fails (fail-closed). This guarantees policy enforcement but risks control plane availability if your webhook deployment fails.

    - Ignore: If the webhook is unreachable, the API request is allowed (fail-open). This prioritizes availability but allows temporary policy bypasses. The choice depends on the criticality of the policy being enforced.

  • namespaceSelector: An essential tool for avoiding chaos. It prevents the webhook from acting on system namespaces or even its own namespace, which could lead to a deadlocked cluster where you can't fix a broken webhook because the webhook itself is blocking the fix.
  • Building the Webhook Server in Go

    Let's implement the server that will enforce two policies on Deployment objects:

  • Must have a non-empty owner label.
  • All container images must come from the gcr.io/my-company registry.
  • Project Setup

    Initialize a Go module and fetch the necessary Kubernetes API libraries.

    bash
    go mod init github.com/my-company/policy-enforcer
    go get k8s.io/[email protected]
    go get k8s.io/[email protected]

    The Core HTTP Handler Logic

    The webhook server is a standard Go net/http server. The key is correctly decoding the AdmissionReview request from the API server and encoding an AdmissionReview response.

    Here is a complete, production-ready main.go.

    go
    // main.go
    package main
    
    import (
    	"encoding/json"
    	"fmt"
    	"io"
    	"log"
    	"net/http"
    	"strings"
    
    	appsv1 "k8s.io/api/apps/v1"
    	admissionv1 "k8s.io/api/admission/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/apimachinery/pkg/runtime"
    	"k8s.io/apimachinery/pkg/runtime/serializer"
    )
    
    var (
    	universalDeserializer = serializer.NewCodecFactory(runtime.NewScheme()).UniversalDeserializer()
    )
    
    // WebhookServer is the main server struct
    type WebhookServer struct {
    	Server *http.Server
    }
    
    // main handler for the webhook
    func (ws *WebhookServer) handleValidate(w http.ResponseWriter, r *http.Request) {
    	// 1. Read and validate request body
    	body, err := io.ReadAll(r.Body)
    	if err != nil {
    		log.Printf("Error reading request body: %v", err)
    		w.WriteHeader(http.StatusBadRequest)
    		return
    	}
    
    	// 2. Decode the AdmissionReview request
    	admissionReviewReq := admissionv1.AdmissionReview{}
    	if _, _, err := universalDeserializer.Decode(body, nil, &admissionReviewReq); err != nil {
    		log.Printf("Error decoding admission review: %v", err)
    		w.WriteHeader(http.StatusBadRequest)
    		fmt.Fprintf(w, "error decoding admission review: %v", err)
    		return
    	}
    
    	// 3. Construct the AdmissionReview response
    	admissionReviewResp := admissionv1.AdmissionReview{
    		TypeMeta: metav1.TypeMeta{
    			Kind:       "AdmissionReview",
    			APIVersion: "admission.k8s.io/v1",
    		},
    		Response: &admissionv1.AdmissionResponse{
    			UID: admissionReviewReq.Request.UID,
    		},
    	}
    
    	// 4. Apply validation logic
    	allowed, reason, err := validateDeployment(admissionReviewReq.Request)
    	if err != nil {
    		admissionReviewResp.Response.Allowed = false
    		admissionReviewResp.Response.Result = &metav1.Status{
    			Message: err.Error(),
    			Code:    http.StatusInternalServerError,
    		}
    	} else {
    		admissionReviewResp.Response.Allowed = allowed
    		if !allowed {
    			admissionReviewResp.Response.Result = &metav1.Status{
    				Message: reason,
    				Code:    http.StatusForbidden,
    			}
    		}
    	}
    
    	// 5. Send the response
    	respBytes, err := json.Marshal(admissionReviewResp)
    	if err != nil {
    		log.Printf("Error marshalling response: %v", err)
    		w.WriteHeader(http.StatusInternalServerError)
    		return
    	}
    
    	w.Header().Set("Content-Type", "application/json")
    	w.Write(respBytes)
    }
    
    // validateDeployment contains the core policy logic
    func validateDeployment(req *admissionv1.AdmissionRequest) (bool, string, error) {
    	// We only care about Deployment objects
    	if req.Kind.Kind != "Deployment" {
    		return true, "", nil // Allow other resources
    	}
    
    	deployment := appsv1.Deployment{}
    	if _, _, err := universalDeserializer.Decode(req.Object.Raw, nil, &deployment); err != nil {
    		return false, "", fmt.Errorf("could not deserialize deployment object: %v", err)
    	}
    
    	// Policy 1: Check for 'owner' label
    	if owner, ok := deployment.Labels["owner"]; !ok || owner == "" {
    		return false, "Deployment must have a non-empty 'owner' label", nil
    	}
    
    	// Policy 2: Check container image registry
    	allowedRegistry := "gcr.io/my-company"
    	for _, container := range deployment.Spec.Template.Spec.Containers {
    		if !strings.HasPrefix(container.Image, allowedRegistry) {
    			msg := fmt.Sprintf("Invalid container image registry for image '%s'. Only images from '%s' are allowed.", container.Image, allowedRegistry)
    			return false, msg, nil
    		}
    	}
    
    	return true, "", nil
    }
    
    func main() {
    	// Paths to TLS certificate and key
    	certPath := "/etc/webhook/certs/tls.crt"
    	keyPath := "/etc/webhook/certs/tls.key"
    
    	mux := http.NewServeMux()
    	ws := &WebhookServer{}
    	mux.HandleFunc("/validate/deployment", ws.handleValidate)
    
    	ws.Server = &http.Server{
    		Addr:      ":8443",
    		Handler:   mux,
    	}
    
    	log.Println("Starting webhook server on :8443")
    	if err := ws.Server.ListenAndServeTLS(certPath, keyPath); err != nil {
    		log.Fatalf("Failed to start server: %v", err)
    	}
    }

    This implementation is robust: it correctly handles JSON serialization, separates the HTTP handling from the validation logic, and provides clear, actionable error messages back to the user via the AdmissionResponse.Result.Message field.

    Production-Grade TLS with `cert-manager`

    Hard-coding certificates or using openssl scripts for generation is not a viable production strategy. Certificates expire, and manual rotation is error-prone. This is a solved problem in Kubernetes using cert-manager.

    cert-manager will automatically issue a certificate from a CA (e.g., Let's Encrypt, Vault, or a self-signed issuer for internal services), keep it renewed, and store it in a Secret. Crucially, it can also automatically inject the CA bundle into our ValidatingWebhookConfiguration, completing the trust chain.

    Here’s how to set it up:

    1. Install cert-manager: Follow the official installation guide. Typically kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/vX.Y.Z/cert-manager.yaml.

    2. Create an Issuer: For internal services, a self-signed Issuer is appropriate. This will act as our internal Certificate Authority.

    yaml
    # issuer.yaml
    apiVersion: cert-manager.io/v1
    kind: Issuer
    metadata:
      name: self-signed-issuer
      namespace: policy-enforcer
    spec:
      selfSigned: {}

    3. Create a Certificate: This resource requests a certificate from the Issuer. cert-manager will fulfill this request and store the result in a Secret named policy-enforcer-tls.

    yaml
    # certificate.yaml
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: policy-enforcer-cert
      namespace: policy-enforcer
    spec:
      secretName: policy-enforcer-tls # The secret that will be created
      dnsNames:
        - policy-enforcer-webhook-svc.policy-enforcer.svc
        - policy-enforcer-webhook-svc.policy-enforcer.svc.cluster.local
      issuerRef:
        name: self-signed-issuer
        kind: Issuer

    The dnsNames are critical. The Kubernetes API server will access our webhook via its internal service DNS name, so the certificate's Common Name (CN) or Subject Alternative Name (SAN) must match.

    4. Automate caBundle Injection: This is the magic. Instead of manually populating caBundle in our ValidatingWebhookConfiguration, we add an annotation:

    yaml
    # validating-webhook-configuration.yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingWebhookConfiguration
    metadata:
      name: my-company.policy-enforcer.webhook
      annotations:
        cert-manager.io/inject-ca-from: "policy-enforcer/policy-enforcer-cert"
    webhooks:
      - name: policy-enforcer.my-company.com
        # ... rest of the configuration
        clientConfig:
          service:
            namespace: policy-enforcer
            name: policy-enforcer-webhook-svc
            path: "/validate/deployment"
            port: 443
          # caBundle is now managed by cert-manager!

    The cert-manager-cainjector controller will watch for this annotation and automatically patch this resource with the correct CA from the Issuer.

    Deployment Manifests

    Now, let's tie it all together with the necessary Kubernetes manifests.

    1. Dockerfile (Multi-stage build):

    Dockerfile
    # Build stage
    FROM golang:1.21-alpine AS builder
    WORKDIR /app
    COPY go.mod go.sum ./
    RUN go mod download
    COPY . .
    # Build the binary with optimizations
    RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o policy-enforcer .
    
    # Final stage
    FROM alpine:latest
    WORKDIR /app
    COPY --from=builder /app/policy-enforcer .
    
    # Create a non-root user
    RUN addgroup -S appgroup && adduser -S appuser -G appgroup
    USER appuser
    
    EXPOSE 8443
    CMD ["./policy-enforcer"]

    2. Deployment and Service:

    yaml
    # deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: policy-enforcer-webhook
      namespace: policy-enforcer
      labels:
        app: policy-enforcer
    spec:
      replicas: 2 # Start with HA in mind
      selector:
        matchLabels:
          app: policy-enforcer
      template:
        metadata:
          labels:
            app: policy-enforcer
        spec:
          containers:
            - name: webhook
              image: gcr.io/my-company/policy-enforcer:v1.0.0
              ports:
                - containerPort: 8443
                  name: webhook-tls
              volumeMounts:
                - name: tls-certs
                  mountPath: /etc/webhook/certs
                  readOnly: true
              readinessProbe:
                httpGet:
                  scheme: HTTPS
                  path: /healthz # You should implement a health check endpoint
                  port: 8443
          volumes:
            - name: tls-certs
              secret:
                secretName: policy-enforcer-tls # Mount the secret created by cert-manager
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: policy-enforcer-webhook-svc
      namespace: policy-enforcer
    spec:
      selector:
        app: policy-enforcer
      ports:
        - port: 443
          targetPort: webhook-tls

    Notice how the Deployment mounts the Secret (policy-enforcer-tls) created by cert-manager into the path (/etc/webhook/certs) that our Go application expects.

    Advanced Considerations and Edge Cases

    Running this in production requires thinking beyond the happy path.

    The `failurePolicy` Dilemma

    Choosing between failurePolicy: Fail and failurePolicy: Ignore is a trade-off between security and availability.

  • Use Fail for: Critical security policies where a bypass would be a major incident (e.g., blocking root containers, enforcing network policies). Your monitoring and alerting on the webhook's health must be flawless. If the webhook deployment fails, you could block kubectl, operators, and even system components from making changes.
  • Use Ignore for: Softer governance policies (e.g., label enforcement, annotations). This ensures the cluster remains fully operational even if the webhook is down. You must have a secondary process (like a daily report) to detect non-compliant resources created during a webhook outage.
  • Performance and Latency

    Your webhook is in the critical path of the control plane. Every millisecond counts.

  • Avoid external calls: Do not call external APIs (e.g., a user database, a vulnerability scanner) synchronously within the webhook handler. This is a recipe for cascading failures. If you need external data, use a caching controller that populates a local ConfigMap or CRD, which the webhook can read from quickly.
  • Benchmark your logic: The validation code should be highly efficient. For our example, label and string prefix checks are nanosecond operations. Complex regex or deep object graph traversals should be benchmarked.
  • Optimize resource consumption: The Go binary is small, but monitor its CPU and memory usage. Set appropriate resource requests and limits in your Deployment spec.
  • A Robust Testing Strategy

    How do you test a component that integrates so deeply with the API server?

  • Unit Tests: The validation logic (validateDeployment function) should be covered by standard Go unit tests. This is straightforward.
  • Integration Tests with envtest: The controller-runtime project provides the envtest package, which can spin up a local, temporary etcd and kube-apiserver binary for testing. This allows you to write integration tests that create a real ValidatingWebhookConfiguration and send actual resources to the test API server, which then calls your running webhook. This provides the highest fidelity testing outside of a real cluster.
  • Example test flow using envtest:

  • Start envtest control plane.
    • Run your webhook server in a goroutine, pointing to the test API server.
  • Create the ValidatingWebhookConfiguration in the test API server.
  • Use a client to create a non-compliant Deployment.
    • Assert that the create call fails with the expected error message from your webhook.
  • Create a compliant Deployment and assert it succeeds.
  • Handling API Version Skew

    What if a user submits a Deployment using apps/v1beta1 (in an older cluster) but your webhook is strongly typed to apps/v1? Your deserialization will fail.

    The robust solution is to use unstructured.Unstructured from k8s.io/apimachinery/pkg/apis/meta/v1/unstructured. This allows you to work with the object as a map[string]interface{}, making your validation logic resilient to API version changes.

    go
    import "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
    
    // ... inside validation function
    var obj unstructured.Unstructured
    if err := json.Unmarshal(req.Object.Raw, &obj); err != nil { /* ... */ }
    
    // Access fields with helper functions
    labels := obj.GetLabels()
    if owner, ok := labels["owner"]; !ok || owner == "" {
        return false, "Missing owner label", nil
    }
    
    // For nested fields, use the unstructured helpers
    containers, found, err := unstructured.NestedSlice(obj.Object, "spec", "template", "spec", "containers")
    // ... loop through containers and check image

    This approach is more defensive and recommended for webhooks intended to run across a fleet of clusters with varying Kubernetes versions.

    Conclusion: A Powerful, Precision Tool

    Building a custom Dynamic Admission Controller in Go is a significant engineering effort, but it provides the ultimate power to enforce the specific, nuanced policies that define a well-governed and secure Kubernetes platform. It's a scalpel in a world of blunter instruments.

    By focusing on a production-ready implementation with automated TLS via cert-manager, architecting for high availability, and critically evaluating failure modes and performance, you can build a component that becomes a reliable and indispensable part of your control plane. While tools like Gatekeeper have their place, understanding how to build a webhook from scratch gives you a deeper understanding of the Kubernetes API machinery and a powerful tool for those situations that demand a custom-fit solution.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles