K8s Dynamic Admission Controllers for Multi-Cluster GitOps Policy
The Enforcement Gap in Declarative GitOps
In a mature multi-cluster Kubernetes environment managed by GitOps controllers like ArgoCD or Flux, the declarative state in a Git repository is the single source of truth. While this provides unparalleled auditability and consistency, it also presents a critical challenge: how do you enforce policies that cannot be expressed purely through static YAML linting or post-sync checks? GitOps ensures what is in the repo gets applied, but it doesn't inherently validate the compliance of that state at the moment of application.
This is the enforcement gap. For instance, you might require all production workloads to have specific resource limits, a team
ownership label, and a seccompProfile
set to RuntimeDefault
. While tools like OPA Gatekeeper or Kyverno are powerful for policy-as-code, building a custom Dynamic Admission Controller offers ultimate programmatic flexibility, allowing for complex logic that might involve external API calls, intricate business rules, or dynamic configuration that's difficult to express in Rego or other policy languages.
A dynamic admission controller intercepts requests to the Kubernetes API server before an object is persisted in etcd. This provides a real-time, synchronous gate. When a GitOps controller attempts to apply
a non-compliant manifest, the API server forwards the request to our webhook. The webhook rejects it, causing the apply
operation to fail. The GitOps controller then correctly reports a sync failure, immediately alerting the responsible team that their proposed change violates cluster policy. This is a powerful, preventative control mechanism.
This article details the end-to-end process of building, deploying, and managing a production-grade validating admission webhook in Go. We will not cover the basics of what a webhook is, but rather the practical engineering challenges involved in making one reliable, secure, and performant.
The Admission Control Flow: A Technical Refresher
Before we write any code, it's crucial to have a precise mental model of the API server's interaction with a ValidatingAdmissionWebhook
. When a request (e.g., CREATE
a Deployment
) arrives, the API server, after authentication and authorization, checks its configured webhooks.
kubectl
or an ArgoCD pod) sends a manifest to the Kubernetes API server.ValidatingWebhookConfiguration
matches the request's resource type, version, and operation, the API server constructs an admission.k8s.io/v1.AdmissionReview
object. This object encapsulates the original AdmissionRequest
.AdmissionReview
object as the body of an HTTP POST request to the Service
endpoint defined in the webhook configuration. This call is synchronous and blocking.AdmissionReview
, inspects the request.object.raw
field (which contains the full YAML/JSON of the resource being created/updated), and applies its validation logic.AdmissionReview
object containing an AdmissionResponse
. The key fields are: * uid
: Must match the UID from the incoming request.
* allowed
: A boolean (true
or false
).
* status
: If allowed: false
, this contains an HTTP status code and a human-readable message explaining the reason for denial.
allowed: true
, it proceeds to persist the object. If allowed: false
, it rejects the entire request and sends the webhook's status message back to the original client.This synchronous nature is both a strength and a liability. It provides a hard guarantee of enforcement but also introduces a new point of failure and a source of latency for all matching API requests. Our implementation must be fast and highly available.
Building the Go Webhook Server
We'll build a webhook that enforces two policies on all Deployment
objects created or updated in namespaces not labeled control-plane=true
:
limits
defined.Deployment
must have a spec.template.metadata.labels.team
label.Project Setup and Dependencies
Initialize a new Go module:
go mod init github.com/your-org/gitops-validator
go get k8s.io/api/admission/v1
go get k8s.io/api/apps/v1
go get k8s.io/apimachinery/pkg/runtime
go get k8s.io/apimachinery/pkg/runtime/serializer
We directly use the Kubernetes API types to ensure correctness when handling the AdmissionReview
and Deployment
objects.
The Core HTTP Handler
Our server needs to handle JSON, so we'll set up a universal deserializer.
main.go
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strings"
admissionv1 "k8s.io/api/admission/v1"
appsv1 "k8s.io/api/apps/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/serializer"
"k8s.io/klog/v2"
)
var (
universalDeserializer = serializer.NewCodecFactory(runtime.NewScheme()).UniversalDeserializer()
)
// admissionResponse is a helper to create an AdmissionResponse
func admissionResponse(allowed bool, message string) *admissionv1.AdmissionResponse {
return &admissionv1.AdmissionResponse{
Allowed: allowed,
Result: &metav1.Status{
Message: message,
},
}
}
// validateDeployment is our core policy logic
func validateDeployment(ar *admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {
req := ar.Request
klog.Infof("AdmissionReview for Kind=%v, Namespace=%v Name=%v UID=%v operation=%v UserInfo=%v",
req.Kind, req.Namespace, req.Name, req.UID, req.Operation, req.UserInfo)
if req.Kind.Kind != "Deployment" {
klog.Errorf("Unexpected kind: %s", req.Kind.Kind)
return admissionResponse(false, "This webhook only validates Deployments.")
}
deployment := appsv1.Deployment{}
if _, _, err := universalDeserializer.Decode(req.Object.Raw, nil, &deployment); err != nil {
msg := fmt.Sprintf("Could not deserialize deployment object: %v", err)
klog.Error(msg)
return admissionResponse(false, msg)
}
// Policy 1: Enforce 'team' label
if _, ok := deployment.Spec.Template.ObjectMeta.Labels["team"]; !ok {
msg := "Validation failed: Deployment must have a 'spec.template.metadata.labels.team' label."
klog.Warningf("Denying deployment %s/%s: %s", deployment.Namespace, deployment.Name, msg)
return admissionResponse(false, msg)
}
// Policy 2: Enforce resource limits on all containers
var validationErrors []string
for _, container := range deployment.Spec.Template.Spec.Containers {
if container.Resources.Limits == nil {
validationErrors = append(validationErrors, fmt.Sprintf("container '%s' is missing resource limits", container.Name))
continue
}
if _, ok := container.Resources.Limits["cpu"]; !ok {
validationErrors = append(validationErrors, fmt.Sprintf("container '%s' is missing cpu limits", container.Name))
}
if _, ok := container.Resources.Limits["memory"]; !ok {
validationErrors = append(validationErrors, fmt.Sprintf("container '%s' is missing memory limits", container.Name))
}
}
if len(validationErrors) > 0 {
msg := fmt.Sprintf("Validation failed: %s", strings.Join(validationErrors, "; "))
klog.Warningf("Denying deployment %s/%s: %s", deployment.Namespace, deployment.Name, msg)
return admissionResponse(false, msg)
}
klog.Infof("Allowing deployment %s/%s", deployment.Namespace, deployment.Name)
return admissionResponse(true, "Deployment is compliant.")
}
// handleValidate is the main HTTP handler function
func handleValidate(w http.ResponseWriter, r *http.Request) {
body, err := ioutil.ReadAll(r.Body)
if err != nil {
klog.Errorf("Could not read request body: %v", err)
http.Error(w, "Could not read request body", http.StatusBadRequest)
return
}
var admissionReview admissionv1.AdmissionReview
if _, _, err := universalDeserializer.Decode(body, nil, &admissionReview); err != nil {
klog.Errorf("Could not deserialize AdmissionReview: %v", err)
http.Error(w, "Could not deserialize AdmissionReview", http.StatusBadRequest)
return
}
if admissionReview.Request == nil {
klog.Error("AdmissionReview contains no request")
http.Error(w, "AdmissionReview contains no request", http.StatusBadRequest)
return
}
admissionResponse := validateDeployment(&admissionReview)
// Construct the final response AdmissionReview
responseReview := admissionv1.AdmissionReview{
TypeMeta: metav1.TypeMeta{
APIVersion: "admission.k8s.io/v1",
Kind: "AdmissionReview",
},
Response: admissionResponse,
}
// The UID of the response MUST match the UID of the request.
responseReview.Response.UID = admissionReview.Request.UID
respBytes, err := json.Marshal(responseReview)
if err != nil {
klog.Errorf("Could not marshal response: %v", err)
http.Error(w, "Could not marshal response", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_, _ = w.Write(respBytes)
}
func main() {
// Paths to TLS certificate and key
certPath := "/etc/webhook/certs/tls.crt"
keyPath := "/etc/webhook/certs/tls.key"
http.HandleFunc("/validate", handleValidate)
klog.Info("Starting webhook server on :8443")
if err := http.ListenAndServeTLS(":8443", certPath, keyPath, nil); err != nil {
klog.Fatalf("Failed to start HTTPS server: %v", err)
}
}
This code sets up a complete, albeit simple, webhook. The key takeaway is the strict handling of the AdmissionReview
object and ensuring the response UID matches the request UID. Failure to do so will cause the API server to reject the webhook's response.
Production-Grade Deployment and TLS
A webhook that isn't running is worse than no webhook at all, as it can bring down your entire CI/CD pipeline if the failurePolicy
is set to Fail
. Security is also paramount, as the webhook receives sensitive information about cluster state changes.
Dockerizing the Webhook
We use a multi-stage Docker build to create a minimal, secure container image.
Dockerfile
# --- Build Stage ---
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# Build the binary with optimizations
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o gitops-validator .
# --- Final Stage ---
FROM alpine:latest
WORKDIR /app
# Copy the binary from the builder stage
COPY --from=builder /app/gitops-validator .
# Non-root user for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
# The server will listen on port 8443
EXPOSE 8443
ENTRYPOINT ["/app/gitops-validator"]
Automated Certificate Management with `cert-manager`
Hard-coding certificates is a non-starter in production. The Kubernetes API server must trust the TLS certificate presented by the webhook. We will use cert-manager
to automate this entire process.
cert-manager
: If not already present in your cluster, install it. issuer.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
namespace: gitops-validator
spec:
selfSigned: {}
Certificate
resource tells cert-manager
to generate a key/cert pair and store it in a Secret
. The dnsNames
must match the internal Service
name of our webhook. certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: gitops-validator-cert
namespace: gitops-validator
spec:
secretName: gitops-validator-tls
duration: 2160h # 90d
renewBefore: 360h # 15d
dnsNames:
- gitops-validator-svc.gitops-validator.svc
- gitops-validator-svc.gitops-validator.svc.cluster.local
issuerRef:
name: selfsigned-issuer
kind: Issuer
cert-manager
will now create a secret named gitops-validator-tls
containing tls.crt
, tls.key
, and ca.crt
.
Kubernetes Manifests for the Webhook
Now we tie everything together with a Deployment
, Service
, and ValidatingWebhookConfiguration
.
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitops-validator
namespace: gitops-validator
labels:
app: gitops-validator
spec:
replicas: 2 # For High Availability
selector:
matchLabels:
app: gitops-validator
template:
metadata:
labels:
app: gitops-validator
spec:
containers:
- name: server
image: your-registry/gitops-validator:latest
ports:
- containerPort: 8443
name: webhook-tls
volumeMounts:
- name: tls-certs
mountPath: /etc/webhook/certs
readOnly: true
volumes:
- name: tls-certs
secret:
secretName: gitops-validator-tls
Note the volume mount, which projects the cert-manager
-created secret into the path our Go application expects.
service.yaml
apiVersion: v1
kind: Service
metadata:
name: gitops-validator-svc
namespace: gitops-validator
spec:
selector:
app: gitops-validator
ports:
- port: 443
targetPort: webhook-tls
We map port 443
on the service to our container's 8443
port. This is standard practice.
webhook-configuration.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: gitops-policy-validator
annotations:
# This annotation tells cert-manager to inject the CA bundle from our secret
cert-manager.io/inject-ca-from: "gitops-validator/gitops-validator-cert"
webhooks:
- name: validator.your-domain.com
clientConfig:
# The caBundle will be automatically populated by cert-manager
service:
namespace: gitops-validator
name: gitops-validator-svc
path: "/validate"
port: 443
rules:
- operations: ["CREATE", "UPDATE"]
apiGroups: ["apps"]
apiVersions: ["v1"]
resources: ["deployments"]
# Exclude our own namespace and kube-system to prevent deadlocks
namespaceSelector:
matchExpressions:
- key: control-plane
operator: DoesNotExist
sideEffects: None
admissionReviewVersions: ["v1"]
# CRITICAL: What happens if the webhook is down?
# 'Fail' blocks API requests. 'Ignore' bypasses the webhook.
# Use 'Fail' for critical security policies.
failurePolicy: Fail
# How long the API server will wait for a response.
# Keep this low to minimize latency impact.
timeoutSeconds: 5
The cert-manager.io/inject-ca-from
annotation is the magic that solves the caBundle
problem. cert-manager
will watch this resource and automatically patch the caBundle
field with the CA from our generated certificate, establishing the chain of trust.
Critically, the namespaceSelector
prevents the webhook from validating its own deployment or critical system components, avoiding a catastrophic circular dependency where the webhook can't be deployed because it needs to be validated by itself.
Edge Cases and Performance Considerations
Latency Impact
Every Deployment
CREATE
or UPDATE
request in a validating namespace now incurs a round-trip network hop to your webhook pod. This adds latency. A timeoutSeconds
of 5 is a reasonable starting point, but your webhook logic must be highly efficient.
* Avoid External Calls: Do not make blocking calls to external databases or APIs within your validation logic. If you must, use aggressive caching and short timeouts.
* Monitor Performance: Expose Prometheus metrics from your webhook server. Track the duration of validation requests (http_request_duration_seconds
). Set up alerts if the p95 or p99 latency exceeds a threshold (e.g., 200ms).
* Resource Allocation: Ensure the webhook deployment has adequate CPU and memory requests/limits to handle the load of API server requests. Under-provisioning will lead to throttling and increased latency.
High Availability
Running a single replica of the webhook is a single point of failure. If that pod crashes and your failurePolicy
is Fail
, you can no longer create or update Deployments. Always run at least two replicas (replicas: 2
) spread across different nodes using pod anti-affinity.
# In the Deployment spec.template.spec
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- gitops-validator
topologyKey: "kubernetes.io/hostname"
Debugging a Failing Webhook
When a webhook misbehaves, it can be opaque. Here's a debugging checklist:
kubectl logs -n gitops-validator -l app=gitops-validator
cert-manager
Status: Ensure the Certificate
and Issuer
are in a Ready
state. kubectl get certificate -n gitops-validator gitops-validator-cert
caBundle
: Verify that cert-manager
has correctly injected the CA into the ValidatingWebhookConfiguration
. kubectl get validatingwebhookconfiguration gitops-policy-validator -o yaml
The caBundle
field should be a large base64-encoded string.
AdmissionReview
JSON to a file and use curl
to POST it to your webhook Service
from within the cluster. # From a debug pod inside the cluster
curl -k -X POST -H "Content-Type: application/json" --data @review.json https://gitops-validator-svc.gitops-validator.svc/validate
Handling Object Versions and Updates
Our current code only handles v1
Deployment
objects. In a real-world scenario, you might need to handle different versions or even different kinds of objects (StatefulSet
, DaemonSet
, etc.). The universalDeserializer
can handle this, but your logic must be robust. For UPDATE
operations, the AdmissionRequest
contains both object
(the new state) and oldObject
(the state before the change), allowing for complex validation, such as preventing immutable fields from being changed.
Conclusion: Programmatic Guardrails for GitOps
While declarative policy engines like OPA/Gatekeeper are excellent for many use cases, a custom dynamic admission controller provides the ultimate escape hatch for complex, programmatic policy enforcement. It integrates seamlessly into a GitOps workflow, acting as a real-time, synchronous guardrail that prevents non-compliant configuration from ever reaching the cluster's desired state.
By building a robust Go service, automating TLS with cert-manager
, and carefully configuring the ValidatingWebhookConfiguration
, you can create a powerful enforcement point that scales across a fleet of clusters. The engineering discipline required—focusing on performance, high availability, and debuggability—is what elevates this from a simple webhook to a critical piece of production infrastructure, ensuring that your declarative GitOps environment remains not only consistent but also compliant and secure.