Building a K8s Dynamic Admission Controller in Go for Policy Enforcement
The Governance Gap: Why RBAC Isn't Enough
In any mature Kubernetes environment, the native Role-Based Access Control (RBAC) system is the bedrock of security, dictating who can perform what actions on which resources. However, its scope is limited to authorizing actions, not validating the content of the resources being created or updated. This leaves a significant governance gap. For instance, RBAC can't enforce policies such as:
Deployment resources must include an owner label for cost allocation and accountability.gcr.io/my-company/*).Ingress objects must not use wildcard hosts (*.example.com) in production namespaces.- Custom Resource Definitions (CRDs) must conform to a specific internal schema beyond basic OpenAPI validation.
With the deprecation of PodSecurityPolicy (PSP), the responsibility for enforcing such fine-grained, business-logic-driven policies shifts squarely to admission controllers. While frameworks like OPA/Gatekeeper and Kyverno offer powerful policy-as-code solutions, they introduce their own abstractions (like the Rego language) and performance characteristics. For ultimate control, performance, and integration with internal systems, building a custom Dynamic Admission Controller provides an unparalleled solution.
This post is a deep dive for platform engineers and SREs on building, deploying, and operating a production-ready validating admission webhook in Go. We will skip the basics and focus on the architecture, implementation details, and operational realities of running such a critical component in your control plane.
Kubernetes API Request Lifecycle: The Admission Controller's Role
Before we write a single line of Go, it's critical to understand precisely where our webhook fits into the Kubernetes API server's request flow. When a client (like kubectl) sends a request to create or update a resource, it passes through several stages:
Deployment has a valid spec).Our focus is on the ValidatingAdmissionWebhook. By intercepting requests at this stage, we can enforce complex invariants without altering the user's original intent, providing clear, immediate feedback on policy violations.
The Anatomy of a `ValidatingWebhookConfiguration`
The entire mechanism is orchestrated by a ValidatingWebhookConfiguration resource. This object tells the API server how and when to call our webhook. Let's dissect a production-grade example:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: my-company.policy-enforcer.webhook
webhooks:
- name: policy-enforcer.my-company.com
rules:
- apiGroups: ["apps"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["deployments"]
scope: "Namespaced"
clientConfig:
service:
namespace: policy-enforcer
name: policy-enforcer-webhook-svc
path: "/validate/deployment"
port: 443
caBundle: "LS0tLS1CR...=" # Base64-encoded CA certificate
admissionReviewVersions: ["v1"]
sideEffects: None
timeoutSeconds: 3
failurePolicy: Fail
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values: [kube-system, policy-enforcer]
Key fields for senior engineers:
rules: This is a critical performance and reliability lever. Be as specific as possible. Don't listen for * operations or resources unless absolutely necessary. Every matching request adds latency and a potential point of failure.clientConfig.service: Points to the Service that routes traffic to our webhook pods. The API server initiates this connection.caBundle: The PEM-encoded CA certificate that signed the webhook server's certificate. The API server uses this to verify the identity of our webhook. This is non-negotiable for security. We'll discuss managing this with cert-manager later.sideEffects: Must be None for validating webhooks. This is a guarantee to the API server that your webhook has no side effects on other resources.timeoutSeconds: A low timeout (e.g., 1-3 seconds) is crucial. A slow webhook can cripple your cluster's control plane. If your validation logic requires external calls, it must be extremely fast and reliable.failurePolicy: This is arguably the most important operational decision. - Fail: If the webhook is unreachable or times out, the API request fails (fail-closed). This guarantees policy enforcement but risks control plane availability if your webhook deployment fails.
- Ignore: If the webhook is unreachable, the API request is allowed (fail-open). This prioritizes availability but allows temporary policy bypasses. The choice depends on the criticality of the policy being enforced.
namespaceSelector: An essential tool for avoiding chaos. It prevents the webhook from acting on system namespaces or even its own namespace, which could lead to a deadlocked cluster where you can't fix a broken webhook because the webhook itself is blocking the fix.Building the Webhook Server in Go
Let's implement the server that will enforce two policies on Deployment objects:
owner label.gcr.io/my-company registry.Project Setup
Initialize a Go module and fetch the necessary Kubernetes API libraries.
go mod init github.com/my-company/policy-enforcer
go get k8s.io/[email protected]
go get k8s.io/[email protected]
The Core HTTP Handler Logic
The webhook server is a standard Go net/http server. The key is correctly decoding the AdmissionReview request from the API server and encoding an AdmissionReview response.
Here is a complete, production-ready main.go.
// main.go
package main
import (
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"strings"
appsv1 "k8s.io/api/apps/v1"
admissionv1 "k8s.io/api/admission/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/serializer"
)
var (
universalDeserializer = serializer.NewCodecFactory(runtime.NewScheme()).UniversalDeserializer()
)
// WebhookServer is the main server struct
type WebhookServer struct {
Server *http.Server
}
// main handler for the webhook
func (ws *WebhookServer) handleValidate(w http.ResponseWriter, r *http.Request) {
// 1. Read and validate request body
body, err := io.ReadAll(r.Body)
if err != nil {
log.Printf("Error reading request body: %v", err)
w.WriteHeader(http.StatusBadRequest)
return
}
// 2. Decode the AdmissionReview request
admissionReviewReq := admissionv1.AdmissionReview{}
if _, _, err := universalDeserializer.Decode(body, nil, &admissionReviewReq); err != nil {
log.Printf("Error decoding admission review: %v", err)
w.WriteHeader(http.StatusBadRequest)
fmt.Fprintf(w, "error decoding admission review: %v", err)
return
}
// 3. Construct the AdmissionReview response
admissionReviewResp := admissionv1.AdmissionReview{
TypeMeta: metav1.TypeMeta{
Kind: "AdmissionReview",
APIVersion: "admission.k8s.io/v1",
},
Response: &admissionv1.AdmissionResponse{
UID: admissionReviewReq.Request.UID,
},
}
// 4. Apply validation logic
allowed, reason, err := validateDeployment(admissionReviewReq.Request)
if err != nil {
admissionReviewResp.Response.Allowed = false
admissionReviewResp.Response.Result = &metav1.Status{
Message: err.Error(),
Code: http.StatusInternalServerError,
}
} else {
admissionReviewResp.Response.Allowed = allowed
if !allowed {
admissionReviewResp.Response.Result = &metav1.Status{
Message: reason,
Code: http.StatusForbidden,
}
}
}
// 5. Send the response
respBytes, err := json.Marshal(admissionReviewResp)
if err != nil {
log.Printf("Error marshalling response: %v", err)
w.WriteHeader(http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write(respBytes)
}
// validateDeployment contains the core policy logic
func validateDeployment(req *admissionv1.AdmissionRequest) (bool, string, error) {
// We only care about Deployment objects
if req.Kind.Kind != "Deployment" {
return true, "", nil // Allow other resources
}
deployment := appsv1.Deployment{}
if _, _, err := universalDeserializer.Decode(req.Object.Raw, nil, &deployment); err != nil {
return false, "", fmt.Errorf("could not deserialize deployment object: %v", err)
}
// Policy 1: Check for 'owner' label
if owner, ok := deployment.Labels["owner"]; !ok || owner == "" {
return false, "Deployment must have a non-empty 'owner' label", nil
}
// Policy 2: Check container image registry
allowedRegistry := "gcr.io/my-company"
for _, container := range deployment.Spec.Template.Spec.Containers {
if !strings.HasPrefix(container.Image, allowedRegistry) {
msg := fmt.Sprintf("Invalid container image registry for image '%s'. Only images from '%s' are allowed.", container.Image, allowedRegistry)
return false, msg, nil
}
}
return true, "", nil
}
func main() {
// Paths to TLS certificate and key
certPath := "/etc/webhook/certs/tls.crt"
keyPath := "/etc/webhook/certs/tls.key"
mux := http.NewServeMux()
ws := &WebhookServer{}
mux.HandleFunc("/validate/deployment", ws.handleValidate)
ws.Server = &http.Server{
Addr: ":8443",
Handler: mux,
}
log.Println("Starting webhook server on :8443")
if err := ws.Server.ListenAndServeTLS(certPath, keyPath); err != nil {
log.Fatalf("Failed to start server: %v", err)
}
}
This implementation is robust: it correctly handles JSON serialization, separates the HTTP handling from the validation logic, and provides clear, actionable error messages back to the user via the AdmissionResponse.Result.Message field.
Production-Grade TLS with `cert-manager`
Hard-coding certificates or using openssl scripts for generation is not a viable production strategy. Certificates expire, and manual rotation is error-prone. This is a solved problem in Kubernetes using cert-manager.
cert-manager will automatically issue a certificate from a CA (e.g., Let's Encrypt, Vault, or a self-signed issuer for internal services), keep it renewed, and store it in a Secret. Crucially, it can also automatically inject the CA bundle into our ValidatingWebhookConfiguration, completing the trust chain.
Here’s how to set it up:
1. Install cert-manager: Follow the official installation guide. Typically kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/vX.Y.Z/cert-manager.yaml.
2. Create an Issuer: For internal services, a self-signed Issuer is appropriate. This will act as our internal Certificate Authority.
# issuer.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: self-signed-issuer
namespace: policy-enforcer
spec:
selfSigned: {}
3. Create a Certificate: This resource requests a certificate from the Issuer. cert-manager will fulfill this request and store the result in a Secret named policy-enforcer-tls.
# certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: policy-enforcer-cert
namespace: policy-enforcer
spec:
secretName: policy-enforcer-tls # The secret that will be created
dnsNames:
- policy-enforcer-webhook-svc.policy-enforcer.svc
- policy-enforcer-webhook-svc.policy-enforcer.svc.cluster.local
issuerRef:
name: self-signed-issuer
kind: Issuer
The dnsNames are critical. The Kubernetes API server will access our webhook via its internal service DNS name, so the certificate's Common Name (CN) or Subject Alternative Name (SAN) must match.
4. Automate caBundle Injection: This is the magic. Instead of manually populating caBundle in our ValidatingWebhookConfiguration, we add an annotation:
# validating-webhook-configuration.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: my-company.policy-enforcer.webhook
annotations:
cert-manager.io/inject-ca-from: "policy-enforcer/policy-enforcer-cert"
webhooks:
- name: policy-enforcer.my-company.com
# ... rest of the configuration
clientConfig:
service:
namespace: policy-enforcer
name: policy-enforcer-webhook-svc
path: "/validate/deployment"
port: 443
# caBundle is now managed by cert-manager!
The cert-manager-cainjector controller will watch for this annotation and automatically patch this resource with the correct CA from the Issuer.
Deployment Manifests
Now, let's tie it all together with the necessary Kubernetes manifests.
1. Dockerfile (Multi-stage build):
# Build stage
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# Build the binary with optimizations
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o policy-enforcer .
# Final stage
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/policy-enforcer .
# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
EXPOSE 8443
CMD ["./policy-enforcer"]
2. Deployment and Service:
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: policy-enforcer-webhook
namespace: policy-enforcer
labels:
app: policy-enforcer
spec:
replicas: 2 # Start with HA in mind
selector:
matchLabels:
app: policy-enforcer
template:
metadata:
labels:
app: policy-enforcer
spec:
containers:
- name: webhook
image: gcr.io/my-company/policy-enforcer:v1.0.0
ports:
- containerPort: 8443
name: webhook-tls
volumeMounts:
- name: tls-certs
mountPath: /etc/webhook/certs
readOnly: true
readinessProbe:
httpGet:
scheme: HTTPS
path: /healthz # You should implement a health check endpoint
port: 8443
volumes:
- name: tls-certs
secret:
secretName: policy-enforcer-tls # Mount the secret created by cert-manager
---
apiVersion: v1
kind: Service
metadata:
name: policy-enforcer-webhook-svc
namespace: policy-enforcer
spec:
selector:
app: policy-enforcer
ports:
- port: 443
targetPort: webhook-tls
Notice how the Deployment mounts the Secret (policy-enforcer-tls) created by cert-manager into the path (/etc/webhook/certs) that our Go application expects.
Advanced Considerations and Edge Cases
Running this in production requires thinking beyond the happy path.
The `failurePolicy` Dilemma
Choosing between failurePolicy: Fail and failurePolicy: Ignore is a trade-off between security and availability.
Fail for: Critical security policies where a bypass would be a major incident (e.g., blocking root containers, enforcing network policies). Your monitoring and alerting on the webhook's health must be flawless. If the webhook deployment fails, you could block kubectl, operators, and even system components from making changes.Ignore for: Softer governance policies (e.g., label enforcement, annotations). This ensures the cluster remains fully operational even if the webhook is down. You must have a secondary process (like a daily report) to detect non-compliant resources created during a webhook outage.Performance and Latency
Your webhook is in the critical path of the control plane. Every millisecond counts.
ConfigMap or CRD, which the webhook can read from quickly.Deployment spec.A Robust Testing Strategy
How do you test a component that integrates so deeply with the API server?
validateDeployment function) should be covered by standard Go unit tests. This is straightforward.envtest: The controller-runtime project provides the envtest package, which can spin up a local, temporary etcd and kube-apiserver binary for testing. This allows you to write integration tests that create a real ValidatingWebhookConfiguration and send actual resources to the test API server, which then calls your running webhook. This provides the highest fidelity testing outside of a real cluster.Example test flow using envtest:
envtest control plane.- Run your webhook server in a goroutine, pointing to the test API server.
ValidatingWebhookConfiguration in the test API server.Deployment.- Assert that the create call fails with the expected error message from your webhook.
Deployment and assert it succeeds.Handling API Version Skew
What if a user submits a Deployment using apps/v1beta1 (in an older cluster) but your webhook is strongly typed to apps/v1? Your deserialization will fail.
The robust solution is to use unstructured.Unstructured from k8s.io/apimachinery/pkg/apis/meta/v1/unstructured. This allows you to work with the object as a map[string]interface{}, making your validation logic resilient to API version changes.
import "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
// ... inside validation function
var obj unstructured.Unstructured
if err := json.Unmarshal(req.Object.Raw, &obj); err != nil { /* ... */ }
// Access fields with helper functions
labels := obj.GetLabels()
if owner, ok := labels["owner"]; !ok || owner == "" {
return false, "Missing owner label", nil
}
// For nested fields, use the unstructured helpers
containers, found, err := unstructured.NestedSlice(obj.Object, "spec", "template", "spec", "containers")
// ... loop through containers and check image
This approach is more defensive and recommended for webhooks intended to run across a fleet of clusters with varying Kubernetes versions.
Conclusion: A Powerful, Precision Tool
Building a custom Dynamic Admission Controller in Go is a significant engineering effort, but it provides the ultimate power to enforce the specific, nuanced policies that define a well-governed and secure Kubernetes platform. It's a scalpel in a world of blunter instruments.
By focusing on a production-ready implementation with automated TLS via cert-manager, architecting for high availability, and critically evaluating failure modes and performance, you can build a component that becomes a reliable and indispensable part of your control plane. While tools like Gatekeeper have their place, understanding how to build a webhook from scratch gives you a deeper understanding of the Kubernetes API machinery and a powerful tool for those situations that demand a custom-fit solution.