K8s Dynamic Admission Controllers for Multi-Cluster GitOps Policy

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Enforcement Gap in Declarative GitOps

In a mature multi-cluster Kubernetes environment managed by GitOps controllers like ArgoCD or Flux, the declarative state in a Git repository is the single source of truth. While this provides unparalleled auditability and consistency, it also presents a critical challenge: how do you enforce policies that cannot be expressed purely through static YAML linting or post-sync checks? GitOps ensures what is in the repo gets applied, but it doesn't inherently validate the compliance of that state at the moment of application.

This is the enforcement gap. For instance, you might require all production workloads to have specific resource limits, a team ownership label, and a seccompProfile set to RuntimeDefault. While tools like OPA Gatekeeper or Kyverno are powerful for policy-as-code, building a custom Dynamic Admission Controller offers ultimate programmatic flexibility, allowing for complex logic that might involve external API calls, intricate business rules, or dynamic configuration that's difficult to express in Rego or other policy languages.

A dynamic admission controller intercepts requests to the Kubernetes API server before an object is persisted in etcd. This provides a real-time, synchronous gate. When a GitOps controller attempts to apply a non-compliant manifest, the API server forwards the request to our webhook. The webhook rejects it, causing the apply operation to fail. The GitOps controller then correctly reports a sync failure, immediately alerting the responsible team that their proposed change violates cluster policy. This is a powerful, preventative control mechanism.

This article details the end-to-end process of building, deploying, and managing a production-grade validating admission webhook in Go. We will not cover the basics of what a webhook is, but rather the practical engineering challenges involved in making one reliable, secure, and performant.


The Admission Control Flow: A Technical Refresher

Before we write any code, it's crucial to have a precise mental model of the API server's interaction with a ValidatingAdmissionWebhook. When a request (e.g., CREATE a Deployment) arrives, the API server, after authentication and authorization, checks its configured webhooks.

  • Request Initiation: A client (e.g., kubectl or an ArgoCD pod) sends a manifest to the Kubernetes API server.
  • API Server Processing: The server authenticates, authorizes, and performs schema validation.
  • Webhook Invocation: If a ValidatingWebhookConfiguration matches the request's resource type, version, and operation, the API server constructs an admission.k8s.io/v1.AdmissionReview object. This object encapsulates the original AdmissionRequest.
  • HTTP POST to Webhook: The API server sends this AdmissionReview object as the body of an HTTP POST request to the Service endpoint defined in the webhook configuration. This call is synchronous and blocking.
  • Webhook Logic Execution: Our custom webhook server receives the request, deserializes the AdmissionReview, inspects the request.object.raw field (which contains the full YAML/JSON of the resource being created/updated), and applies its validation logic.
  • Webhook Response: The webhook constructs a new AdmissionReview object containing an AdmissionResponse. The key fields are:
  • * uid: Must match the UID from the incoming request.

    * allowed: A boolean (true or false).

    * status: If allowed: false, this contains an HTTP status code and a human-readable message explaining the reason for denial.

  • API Server Action: The API server receives the response. If allowed: true, it proceeds to persist the object. If allowed: false, it rejects the entire request and sends the webhook's status message back to the original client.
  • This synchronous nature is both a strength and a liability. It provides a hard guarantee of enforcement but also introduces a new point of failure and a source of latency for all matching API requests. Our implementation must be fast and highly available.


    Building the Go Webhook Server

    We'll build a webhook that enforces two policies on all Deployment objects created or updated in namespaces not labeled control-plane=true:

  • Every container must have CPU and memory limits defined.
  • The Deployment must have a spec.template.metadata.labels.team label.
  • Project Setup and Dependencies

    Initialize a new Go module:

    bash
    go mod init github.com/your-org/gitops-validator
    go get k8s.io/api/admission/v1
    go get k8s.io/api/apps/v1
    go get k8s.io/apimachinery/pkg/runtime
    go get k8s.io/apimachinery/pkg/runtime/serializer

    We directly use the Kubernetes API types to ensure correctness when handling the AdmissionReview and Deployment objects.

    The Core HTTP Handler

    Our server needs to handle JSON, so we'll set up a universal deserializer.

    main.go

    go
    package main
    
    import (
    	"encoding/json"
    	"fmt"
    	"io/ioutil"
    	"net/http"
    	"strings"
    
    	admissionv1 "k8s.io/api/admission/v1"
    	appsv1 "k8s.io/api/apps/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/apimachinery/pkg/runtime"
    	"k8s.io/apimachinery/pkg/runtime/serializer"
    	"k8s.io/klog/v2"
    )
    
    var (
    	universalDeserializer = serializer.NewCodecFactory(runtime.NewScheme()).UniversalDeserializer()
    )
    
    // admissionResponse is a helper to create an AdmissionResponse
    func admissionResponse(allowed bool, message string) *admissionv1.AdmissionResponse {
    	return &admissionv1.AdmissionResponse{
    		Allowed: allowed,
    		Result: &metav1.Status{
    			Message: message,
    		},
    	}
    }
    
    // validateDeployment is our core policy logic
    func validateDeployment(ar *admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {
    	req := ar.Request
    	klog.Infof("AdmissionReview for Kind=%v, Namespace=%v Name=%v UID=%v operation=%v UserInfo=%v",
    		req.Kind, req.Namespace, req.Name, req.UID, req.Operation, req.UserInfo)
    
    	if req.Kind.Kind != "Deployment" {
    		klog.Errorf("Unexpected kind: %s", req.Kind.Kind)
    		return admissionResponse(false, "This webhook only validates Deployments.")
    	}
    
    	deployment := appsv1.Deployment{}
    	if _, _, err := universalDeserializer.Decode(req.Object.Raw, nil, &deployment); err != nil {
    		msg := fmt.Sprintf("Could not deserialize deployment object: %v", err)
    		klog.Error(msg)
    		return admissionResponse(false, msg)
    	}
    
    	// Policy 1: Enforce 'team' label
    	if _, ok := deployment.Spec.Template.ObjectMeta.Labels["team"]; !ok {
    		msg := "Validation failed: Deployment must have a 'spec.template.metadata.labels.team' label."
    		klog.Warningf("Denying deployment %s/%s: %s", deployment.Namespace, deployment.Name, msg)
    		return admissionResponse(false, msg)
    	}
    
    	// Policy 2: Enforce resource limits on all containers
    	var validationErrors []string
    	for _, container := range deployment.Spec.Template.Spec.Containers {
    		if container.Resources.Limits == nil {
    			validationErrors = append(validationErrors, fmt.Sprintf("container '%s' is missing resource limits", container.Name))
    			continue
    		}
    		if _, ok := container.Resources.Limits["cpu"]; !ok {
    			validationErrors = append(validationErrors, fmt.Sprintf("container '%s' is missing cpu limits", container.Name))
    		}
    		if _, ok := container.Resources.Limits["memory"]; !ok {
    			validationErrors = append(validationErrors, fmt.Sprintf("container '%s' is missing memory limits", container.Name))
    		}
    	}
    
    	if len(validationErrors) > 0 {
    		msg := fmt.Sprintf("Validation failed: %s", strings.Join(validationErrors, "; "))
    		klog.Warningf("Denying deployment %s/%s: %s", deployment.Namespace, deployment.Name, msg)
    		return admissionResponse(false, msg)
    	}
    
    	klog.Infof("Allowing deployment %s/%s", deployment.Namespace, deployment.Name)
    	return admissionResponse(true, "Deployment is compliant.")
    }
    
    // handleValidate is the main HTTP handler function
    func handleValidate(w http.ResponseWriter, r *http.Request) {
    	body, err := ioutil.ReadAll(r.Body)
    	if err != nil {
    		klog.Errorf("Could not read request body: %v", err)
    		http.Error(w, "Could not read request body", http.StatusBadRequest)
    		return
    	}
    
    	var admissionReview admissionv1.AdmissionReview
    	if _, _, err := universalDeserializer.Decode(body, nil, &admissionReview); err != nil {
    		klog.Errorf("Could not deserialize AdmissionReview: %v", err)
    		http.Error(w, "Could not deserialize AdmissionReview", http.StatusBadRequest)
    		return
    	}
    
    	if admissionReview.Request == nil {
    		klog.Error("AdmissionReview contains no request")
    		http.Error(w, "AdmissionReview contains no request", http.StatusBadRequest)
    		return
    	}
    
    	admissionResponse := validateDeployment(&admissionReview)
    
    	// Construct the final response AdmissionReview
    	responseReview := admissionv1.AdmissionReview{
    		TypeMeta: metav1.TypeMeta{
    			APIVersion: "admission.k8s.io/v1",
    			Kind:       "AdmissionReview",
    		},
    		Response: admissionResponse,
    	}
    	// The UID of the response MUST match the UID of the request.
    	responseReview.Response.UID = admissionReview.Request.UID
    
    	respBytes, err := json.Marshal(responseReview)
    	if err != nil {
    		klog.Errorf("Could not marshal response: %v", err)
    		http.Error(w, "Could not marshal response", http.StatusInternalServerError)
    		return
    	}
    
    	w.Header().Set("Content-Type", "application/json")
    	w.WriteHeader(http.StatusOK)
    	_, _ = w.Write(respBytes)
    }
    
    func main() {
    	// Paths to TLS certificate and key
    	certPath := "/etc/webhook/certs/tls.crt"
    	keyPath := "/etc/webhook/certs/tls.key"
    
    	http.HandleFunc("/validate", handleValidate)
    	klog.Info("Starting webhook server on :8443")
    
    	if err := http.ListenAndServeTLS(":8443", certPath, keyPath, nil); err != nil {
    		klog.Fatalf("Failed to start HTTPS server: %v", err)
    	}
    }

    This code sets up a complete, albeit simple, webhook. The key takeaway is the strict handling of the AdmissionReview object and ensuring the response UID matches the request UID. Failure to do so will cause the API server to reject the webhook's response.


    Production-Grade Deployment and TLS

    A webhook that isn't running is worse than no webhook at all, as it can bring down your entire CI/CD pipeline if the failurePolicy is set to Fail. Security is also paramount, as the webhook receives sensitive information about cluster state changes.

    Dockerizing the Webhook

    We use a multi-stage Docker build to create a minimal, secure container image.

    Dockerfile

    dockerfile
    # --- Build Stage ---
    FROM golang:1.21-alpine AS builder
    
    WORKDIR /app
    
    COPY go.mod go.sum ./
    RUN go mod download
    
    COPY . .
    
    # Build the binary with optimizations
    RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o gitops-validator .
    
    # --- Final Stage ---
    FROM alpine:latest
    
    WORKDIR /app
    
    # Copy the binary from the builder stage
    COPY --from=builder /app/gitops-validator .
    
    # Non-root user for security
    RUN addgroup -S appgroup && adduser -S appuser -G appgroup
    USER appuser
    
    # The server will listen on port 8443
    EXPOSE 8443
    
    ENTRYPOINT ["/app/gitops-validator"]

    Automated Certificate Management with `cert-manager`

    Hard-coding certificates is a non-starter in production. The Kubernetes API server must trust the TLS certificate presented by the webhook. We will use cert-manager to automate this entire process.

  • Install cert-manager: If not already present in your cluster, install it.
  • Create an Issuer: We'll use a self-signed issuer for simplicity, but in a production multi-cluster setup, you might use a central Vault or Let's Encrypt issuer.
  • issuer.yaml

    yaml
        apiVersion: cert-manager.io/v1
        kind: Issuer
        metadata:
          name: selfsigned-issuer
          namespace: gitops-validator
        spec:
          selfSigned: {}
  • Create a Certificate: This Certificate resource tells cert-manager to generate a key/cert pair and store it in a Secret. The dnsNames must match the internal Service name of our webhook.
  • certificate.yaml

    yaml
        apiVersion: cert-manager.io/v1
        kind: Certificate
        metadata:
          name: gitops-validator-cert
          namespace: gitops-validator
        spec:
          secretName: gitops-validator-tls
          duration: 2160h # 90d
          renewBefore: 360h # 15d
          dnsNames:
          - gitops-validator-svc.gitops-validator.svc
          - gitops-validator-svc.gitops-validator.svc.cluster.local
          issuerRef:
            name: selfsigned-issuer
            kind: Issuer

    cert-manager will now create a secret named gitops-validator-tls containing tls.crt, tls.key, and ca.crt.

    Kubernetes Manifests for the Webhook

    Now we tie everything together with a Deployment, Service, and ValidatingWebhookConfiguration.

    deployment.yaml

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: gitops-validator
      namespace: gitops-validator
      labels:
        app: gitops-validator
    spec:
      replicas: 2 # For High Availability
      selector:
        matchLabels:
          app: gitops-validator
      template:
        metadata:
          labels:
            app: gitops-validator
        spec:
          containers:
          - name: server
            image: your-registry/gitops-validator:latest
            ports:
            - containerPort: 8443
              name: webhook-tls
            volumeMounts:
            - name: tls-certs
              mountPath: /etc/webhook/certs
              readOnly: true
          volumes:
          - name: tls-certs
            secret:
              secretName: gitops-validator-tls

    Note the volume mount, which projects the cert-manager-created secret into the path our Go application expects.

    service.yaml

    yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: gitops-validator-svc
      namespace: gitops-validator
    spec:
      selector:
        app: gitops-validator
      ports:
      - port: 443
        targetPort: webhook-tls

    We map port 443 on the service to our container's 8443 port. This is standard practice.

    webhook-configuration.yaml

    yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingWebhookConfiguration
    metadata:
      name: gitops-policy-validator
      annotations:
        # This annotation tells cert-manager to inject the CA bundle from our secret
        cert-manager.io/inject-ca-from: "gitops-validator/gitops-validator-cert"
    webhooks:
    - name: validator.your-domain.com
      clientConfig:
        # The caBundle will be automatically populated by cert-manager
        service:
          namespace: gitops-validator
          name: gitops-validator-svc
          path: "/validate"
          port: 443
      rules:
      - operations: ["CREATE", "UPDATE"]
        apiGroups: ["apps"]
        apiVersions: ["v1"]
        resources: ["deployments"]
      # Exclude our own namespace and kube-system to prevent deadlocks
      namespaceSelector:
        matchExpressions:
        - key: control-plane
          operator: DoesNotExist
      sideEffects: None
      admissionReviewVersions: ["v1"]
      # CRITICAL: What happens if the webhook is down?
      # 'Fail' blocks API requests. 'Ignore' bypasses the webhook.
      # Use 'Fail' for critical security policies.
      failurePolicy: Fail
      # How long the API server will wait for a response.
      # Keep this low to minimize latency impact.
      timeoutSeconds: 5

    The cert-manager.io/inject-ca-from annotation is the magic that solves the caBundle problem. cert-manager will watch this resource and automatically patch the caBundle field with the CA from our generated certificate, establishing the chain of trust.

    Critically, the namespaceSelector prevents the webhook from validating its own deployment or critical system components, avoiding a catastrophic circular dependency where the webhook can't be deployed because it needs to be validated by itself.


    Edge Cases and Performance Considerations

    Latency Impact

    Every Deployment CREATE or UPDATE request in a validating namespace now incurs a round-trip network hop to your webhook pod. This adds latency. A timeoutSeconds of 5 is a reasonable starting point, but your webhook logic must be highly efficient.

    * Avoid External Calls: Do not make blocking calls to external databases or APIs within your validation logic. If you must, use aggressive caching and short timeouts.

    * Monitor Performance: Expose Prometheus metrics from your webhook server. Track the duration of validation requests (http_request_duration_seconds). Set up alerts if the p95 or p99 latency exceeds a threshold (e.g., 200ms).

    * Resource Allocation: Ensure the webhook deployment has adequate CPU and memory requests/limits to handle the load of API server requests. Under-provisioning will lead to throttling and increased latency.

    High Availability

    Running a single replica of the webhook is a single point of failure. If that pod crashes and your failurePolicy is Fail, you can no longer create or update Deployments. Always run at least two replicas (replicas: 2) spread across different nodes using pod anti-affinity.

    yaml
    # In the Deployment spec.template.spec
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - gitops-validator
            topologyKey: "kubernetes.io/hostname"

    Debugging a Failing Webhook

    When a webhook misbehaves, it can be opaque. Here's a debugging checklist:

  • Check Webhook Pod Logs: This is the first step. Look for errors in deserialization, panics in your logic, or network issues.
  • kubectl logs -n gitops-validator -l app=gitops-validator

  • Check API Server Logs: On your control plane nodes, the API server logs will show errors when trying to call the webhook endpoint (e.g., TLS handshake failures, timeouts).
  • Check cert-manager Status: Ensure the Certificate and Issuer are in a Ready state.
  • kubectl get certificate -n gitops-validator gitops-validator-cert

  • Inspect the caBundle: Verify that cert-manager has correctly injected the CA into the ValidatingWebhookConfiguration.
  • kubectl get validatingwebhookconfiguration gitops-policy-validator -o yaml

    The caBundle field should be a large base64-encoded string.

  • Simulate a Request Locally: You can test your handler logic without the API server. Save a sample AdmissionReview JSON to a file and use curl to POST it to your webhook Service from within the cluster.
  • bash
        # From a debug pod inside the cluster
        curl -k -X POST -H "Content-Type: application/json" --data @review.json https://gitops-validator-svc.gitops-validator.svc/validate

    Handling Object Versions and Updates

    Our current code only handles v1 Deployment objects. In a real-world scenario, you might need to handle different versions or even different kinds of objects (StatefulSet, DaemonSet, etc.). The universalDeserializer can handle this, but your logic must be robust. For UPDATE operations, the AdmissionRequest contains both object (the new state) and oldObject (the state before the change), allowing for complex validation, such as preventing immutable fields from being changed.

    Conclusion: Programmatic Guardrails for GitOps

    While declarative policy engines like OPA/Gatekeeper are excellent for many use cases, a custom dynamic admission controller provides the ultimate escape hatch for complex, programmatic policy enforcement. It integrates seamlessly into a GitOps workflow, acting as a real-time, synchronous guardrail that prevents non-compliant configuration from ever reaching the cluster's desired state.

    By building a robust Go service, automating TLS with cert-manager, and carefully configuring the ValidatingWebhookConfiguration, you can create a powerful enforcement point that scales across a fleet of clusters. The engineering discipline required—focusing on performance, high availability, and debuggability—is what elevates this from a simple webhook to a critical piece of production infrastructure, ensuring that your declarative GitOps environment remains not only consistent but also compliant and secure.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles