Kubernetes VAP: Fine-Grained Pod Security with Advanced CEL
The Post-PSP Era: Why VAP is More Than Just a Replacement
For engineers who have managed Kubernetes clusters for any length of time, the deprecation of PodSecurityPolicy (PSP) left a significant gap in native security governance. While Pod Security Standards (PSS) provide a baseline set of profiles (privileged, baseline, restricted), they lack the granular, customizable control that production environments demand. This is where Validating Admission Policies (VAP), a beta feature in Kubernetes 1.26 and GA in 1.29, step in.
VAP isn't merely a direct replacement for PSP; it's a fundamental shift towards a more declarative, expressive, and integrated policy-as-code model within the Kubernetes API server itself. It leverages the Common Expression Language (CEL) to execute validation logic directly at the point of admission, without requiring external webhooks or third-party policy engines like OPA/Gatekeeper or Kyverno for a large class of use cases.
This article assumes you understand the basics of admission control and the limitations of PSP/PSS. We will not cover introductory concepts. Instead, we will dive directly into crafting sophisticated, production-ready policies with VAP and CEL, focusing on patterns you would implement to secure a multi-tenant, enterprise-grade cluster.
Production Pattern 1: Enforcing Strict Container Image Provenance
A foundational security posture for any cluster is controlling the origin and tagging of container images. A naive policy might just check for a private registry prefix. A production-grade policy must be far more robust, handling multiple container types, enforcing semantic versioning, and providing clear failure messages.
The Problem:
- Disallow all images from public registries like Docker Hub.
artifactory.my-company.com or gcr.io/my-project).:latest or :stable to ensure declarative, deterministic deployments.- The policy must apply to standard containers, init containers, and ephemeral containers.
The Advanced CEL Implementation
We'll use CEL's list comprehensions (all()) and regular expressions (matches()) to build a comprehensive policy. The logic will iterate through every container definition in a pod spec and apply our rules.
Here is the complete ValidatingAdmissionPolicy manifest:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "prod-image-provenance-policy"
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["pods"]
validations:
- expression: >
!has(object.spec.initContainers) || object.spec.initContainers.all(container,
container.image.startsWith('artifactory.my-company.com/') &&
!container.image.endsWith(':latest') &&
container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
)
- expression: >
object.spec.containers.all(container,
container.image.startsWith('artifactory.my-company.com/') &&
!container.image.endsWith(':latest') &&
container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
)
- expression: >
!has(object.spec.ephemeralContainers) || object.spec.ephemeralContainers.all(container,
container.image.startsWith('artifactory.my-company.com/') &&
!container.image.endsWith(':latest') &&
container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
)
message: "Image policy violation: All images must come from 'artifactory.my-company.com', use a valid semantic version tag (e.g., v1.2.3), and not use ':latest'."
And the corresponding ValidatingAdmissionPolicyBinding to apply it cluster-wide:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "prod-image-provenance-policy-binding"
spec:
policyName: "prod-image-provenance-policy"
validationActions: [Deny]
matchResources:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values: ["kube-system", "gatekeeper-system"]
Dissecting the Advanced Logic
!has(object.spec.initContainers) || ... pattern. This is crucial. If a pod has no initContainers, object.spec.initContainers is null. Attempting to call .all() on null would cause an evaluation error. This pattern short-circuits the logic, ensuring the policy doesn't fail on pods without init or ephemeral containers.all(): The all() macro is a powerful tool for enforcing a rule across every element in a list. The expression list.all(variable, expression) returns true only if the expression is true for every variable in the list.^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d)\\.(0|[1-9]\\d)\\.(0|[1-9]\\d)(?:-((?:0|[1-9]\\d|\\d[a-zA-Z-][0-9a-zA-Z-])(?:\\.(?:0|[1-9]\\d|\\d[a-zA-Z-][0-9a-zA-Z-]))))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$ is a standard, albeit complex, regular expression for validating Semantic Versioning 2.0.0. This is far more precise than simply disallowing :latest.validations list, and the request is denied if any of them fail.ValidatingAdmissionPolicyBinding uses a namespaceSelector to avoid applying this strict policy to system namespaces like kube-system, which is a critical production practice.Testing the Policy
Let's try to apply a non-compliant pod:
# bad-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: bad-image-pod
namespace: default
spec:
containers:
- name: nginx
image: nginx:latest # Violates registry and tag policy
Applying this manifest will be rejected by the API server:
$ kubectl apply -f bad-pod.yaml
Error from server: error when creating "bad-pod.yaml": admission webhook "validation.policy.admission.k8s.io" denied the request: Image policy violation: All images must come from 'artifactory.my-company.com', use a valid semantic version tag (e.g., v1.2.3), and not use ':latest'.
This immediate, clear feedback is invaluable for developers and CI/CD systems.
Production Pattern 2: Parameterized Policies with `paramKind`
The previous policy is effective but rigid. The trusted registry name is hardcoded. What if you have multiple trusted registries? Or want different teams to use different policies? Hardcoding values in policies leads to policy sprawl and maintenance nightmares.
VAP's paramKind feature solves this by allowing policies to reference external configuration objects, typically a ConfigMap or a Custom Resource (CRD). This enables true policy reusability.
The Problem:
- We need a single, generic image policy.
- Different namespaces or teams require different sets of allowed registries.
- The policy logic should be decoupled from its configuration.
The Implementation: CRD, Policy, and Params
Step 1: Define a CRD for our parameters.
A CRD provides a typed, schema-validated way to manage configuration.
# imagepolicyconfig-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: imagepolicyconfigs.policy.my-company.com
spec:
group: policy.my-company.com
names:
kind: ImagePolicyConfig
listKind: ImagePolicyConfigList
plural: imagepolicyconfigs
singular: imagepolicyconfig
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
allowedRegistries:
type: array
items:
type: string
disallowedTags:
type: array
items:
type: string
required:
- allowedRegistries
served: true
storage: true
Step 2: Create the parameterized ValidatingAdmissionPolicy.
This policy now references the ImagePolicyConfig CRD via paramKind and uses the params object in its CEL expressions.
# parameterized-image-policy.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "parameterized-image-provenance-policy"
spec:
failurePolicy: Fail
paramKind:
apiVersion: policy.my-company.com/v1alpha1
kind: ImagePolicyConfig
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["pods"]
validations:
- expression: >
let allContainers = (has(object.spec.initContainers) ? object.spec.initContainers : []) +
object.spec.containers +
(has(object.spec.ephemeralContainers) ? object.spec.ephemeralContainers : []);
allContainers.all(container,
params.spec.allowedRegistries.exists(registry, container.image.startsWith(registry)) &&
!params.spec.disallowedTags.exists(tag, container.image.endsWith(tag))
)
message: "Image does not conform to the configured policy for this namespace."
Dissecting the CEL:
paramKind: This block tells VAP that this policy expects configuration from an ImagePolicyConfig resource.params Object: The params variable is now available in CEL. It holds the entire custom resource object that the binding points to.let: We define a variable allContainers to simplify the expression. It concatenates all three container lists, safely handling cases where initContainers or ephemeralContainers are null using a ternary-like operator (condition ? true_val : false_val).exists() Macro: Instead of hardcoding the registry, we use params.spec.allowedRegistries.exists(registry, container.image.startsWith(registry)). This checks if the container's image starts with any of the strings in the allowedRegistries list from our config object.Step 3: Create a configuration instance and bind it.
Now, a team managing the app-team-1 namespace can define their own specific configuration.
# team-1-config.yaml
apiVersion: policy.my-company.com/v1alpha1
kind: ImagePolicyConfig
metadata:
name: team-1-image-rules
namespace: app-team-1
spec:
allowedRegistries:
- "artifactory.my-company.com/team-1/"
- "gcr.io/my-project/team-1-images/"
disallowedTags:
- ":latest"
- ":unstable"
Finally, the binding connects the generic policy to the specific configuration for a target namespace.
# team-1-binding.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "team-1-image-policy-binding"
spec:
policyName: "parameterized-image-provenance-policy"
paramRef:
name: team-1-image-rules
namespace: app-team-1
validationActions: [Deny]
matchResources:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: app-team-1
This pattern is exceptionally powerful. The platform team maintains a library of generic, parameterized policies, and application teams self-service their own configuration via type-safe custom resources. This drastically reduces policy duplication and operational burden.
Edge Cases and Performance Optimization
Senior engineers know that correctness is only half the battle; performance and handling edge cases are paramount, especially in a critical path like the API server's admission controller.
`matchConditions` for Pre-filtering: A Critical Performance Win
The CEL expressions in the validations block are executed for every request that matches the matchConstraints. This can be computationally expensive. matchConditions (GA in Kubernetes 1.28) provides a way to pre-filter requests using CEL before the main validation logic is ever invoked. This is a huge performance optimization.
Scenario: Imagine a policy that should only apply to pods that have a specific annotation security.my-company.com/scan-required: "true".
Inefficient Approach (validation expression only):
validations:
- expression: >
!has(object.metadata.annotations) ||
object.metadata.annotations['security.my-company.com/scan-required'] != 'true' ||
(has(object.spec.containers) && object.spec.containers.all(c, has(c.securityContext) && c.securityContext.readOnlyRootFilesystem == true))
message: "Scanned pods must have a read-only root filesystem."
Here, the complex securityContext logic is evaluated for every single pod creation/update, even those without the annotation. The expression has to first check if the annotation exists and if it has the right value before proceeding.
Efficient Approach (using matchConditions):
matchConditions:
- name: 'check-for-scan-annotation'
expression: 'has(object.metadata.annotations) && object.metadata.annotations["security.my-company.com/scan-required"] == "true"'
validations:
- expression: >
object.spec.containers.all(c, has(c.securityContext) && c.securityContext.readOnlyRootFilesystem == true)
message: "Scanned pods must have a read-only root filesystem."
With this structure, the main validation CEL is only executed if the matchConditions expression evaluates to true. The API server can efficiently discard irrelevant requests early. The CEL environment for matchConditions is more limited (e.g., no access to params), but it is ideal for filtering based on object metadata.
Rule of Thumb: Use matchConditions to define the scope of your policy based on immutable or metadata fields. Use the validations block to enforce the logic of the policy on the objects within that scope.
CEL Cost Budget and Expression Complexity
To prevent denial-of-service attacks or runaway expressions from crippling the API server, CEL has a computational "cost budget." Each operation (a function call, a regex match, a list traversal) has an assigned cost. If an expression's evaluation exceeds the budget, it fails.
While the exact budget is an implementation detail, you should be mindful of expression complexity:
list1.all(x, list2.all(y, x == y)) is quadratic in complexity and can easily exhaust the cost budget on large lists.exists() over filtering and size checks: list.filter(x, x.matches('...')).size() > 0 is less efficient than list.exists(x, x.matches('...')) because exists can stop as soon as it finds a match.Handling `oldSelf` in UPDATE Operations
When handling UPDATE operations, your CEL expression has access to oldSelf, the object's state before the change. This is critical for policies that enforce immutability.
Problem: Prevent developers from changing the team-owner label on a Deployment once it's set.
# in a ValidatingAdmissionPolicy targeting Deployments
validations:
- expression: >
!has(oldSelf.metadata.labels) ||
!has(oldSelf.metadata.labels['team-owner']) ||
(has(object.metadata.labels) && object.metadata.labels['team-owner'] == oldSelf.metadata.labels['team-owner'])
message: "The 'team-owner' label is immutable and cannot be changed after creation."
This expression safely handles the initial creation (where oldSelf has no team-owner label) and subsequent updates, ensuring the label, if it exists, remains unchanged.
VAP vs. OPA/Gatekeeper vs. Kyverno: A Senior Engineer's Heuristic
With VAP now generally available, the inevitable question is: "When should I use VAP, and when do I still need a full-fledged policy engine?"
Here's a decision framework:
| Feature/Requirement | Validating Admission Policies (VAP) | OPA/Gatekeeper, Kyverno |
|---|---|---|
| Operational Overhead | Very Low. Native to API server. No new pods. | Medium. Requires installing/managing controllers. |
| Policy Language | CEL. Familiar to developers. | Rego (OPA) or YAML-based DSL (Kyverno). Steeper curve. |
| Policy Type | Validation only. No mutation. | Validation and Mutation (e.g., adding default labels). |
| External Data | No. Cannot call out to external systems. | Yes. Can query external APIs, other k8s objects, etc. |
| Policy Complexity | Ideal for low-to-high complexity logic. | Unbounded. Can handle extremely complex logic. |
| Audit / Dry-Run | validationActions: [Audit] sends to audit log. | Full-featured dry-run modes and audit dashboards. |
| Use Case Sweet Spot | Enforcing schema, labels, image sources, security contexts, resource limits. | Cross-object validation, policies requiring external state, complex mutations. |
Heuristic:
hostPath, requiring resource limits, enforcing securityContext settings), VAP is sufficient, more performant, and has zero operational overhead. You can likely cover 80% of your needs with it. * Mutation: You need to automatically add a sidecar.istio.io/inject: "true" annotation to new pods. VAP cannot do this.
* External State: Your policy needs to check if a new Ingress hostname is already registered in an external DNS provider. VAP cannot make external calls.
* Cross-Object State: You need a policy that ensures a ServiceMonitor's labels match the labels on the Service it's targeting. VAP operates on a single object at a time (object, oldSelf, params).
By embracing VAP as the default, you keep your cluster lean and leverage a powerful, native capability. You can then strategically deploy a more heavyweight policy engine only when its unique features are truly required.
Conclusion: VAP as a First-Class Citizen in Cluster Governance
Validating Admission Policies are a game-changer for native Kubernetes security. By moving beyond simple examples and mastering advanced CEL constructs like list comprehensions, paramKind for reusability, and performance tuning with matchConditions, platform and security engineers can build a robust, efficient, and maintainable policy-as-code framework. VAP is not just a PSP replacement; it is a powerful tool that should be the first choice for implementing validation-based governance in any modern Kubernetes cluster.