Kubernetes VAP: Fine-Grained Pod Security with Advanced CEL

October 18, 2025

16 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Post-PSP Era: Why VAP is More Than Just a Replacement

For engineers who have managed Kubernetes clusters for any length of time, the deprecation of PodSecurityPolicy (PSP) left a significant gap in native security governance. While Pod Security Standards (PSS) provide a baseline set of profiles (privileged, baseline, restricted), they lack the granular, customizable control that production environments demand. This is where Validating Admission Policies (VAP), a beta feature in Kubernetes 1.26 and GA in 1.29, step in.

VAP isn't merely a direct replacement for PSP; it's a fundamental shift towards a more declarative, expressive, and integrated policy-as-code model within the Kubernetes API server itself. It leverages the Common Expression Language (CEL) to execute validation logic directly at the point of admission, without requiring external webhooks or third-party policy engines like OPA/Gatekeeper or Kyverno for a large class of use cases.

This article assumes you understand the basics of admission control and the limitations of PSP/PSS. We will not cover introductory concepts. Instead, we will dive directly into crafting sophisticated, production-ready policies with VAP and CEL, focusing on patterns you would implement to secure a multi-tenant, enterprise-grade cluster.

Production Pattern 1: Enforcing Strict Container Image Provenance

A foundational security posture for any cluster is controlling the origin and tagging of container images. A naive policy might just check for a private registry prefix. A production-grade policy must be far more robust, handling multiple container types, enforcing semantic versioning, and providing clear failure messages.

The Problem:

Disallow all images from public registries like Docker Hub.

Enforce usage of a specific set of internal, trusted registries (e.g., artifactory.my-company.com or gcr.io/my-project).

Prohibit the use of floating tags like :latest or :stable to ensure declarative, deterministic deployments.

The policy must apply to standard containers, init containers, and ephemeral containers.

The Advanced CEL Implementation

We'll use CEL's list comprehensions (all()) and regular expressions (matches()) to build a comprehensive policy. The logic will iterate through every container definition in a pod spec and apply our rules.

Here is the complete ValidatingAdmissionPolicy manifest:

yaml

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: "prod-image-provenance-policy"
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups:   [""]
      apiVersions: ["v1"]
      operations:  ["CREATE", "UPDATE"]
      resources:   ["pods"]
  validations:
    - expression: >
        !has(object.spec.initContainers) || object.spec.initContainers.all(container,
          container.image.startsWith('artifactory.my-company.com/') &&
          !container.image.endsWith(':latest') &&
          container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
        )
    - expression: >
        object.spec.containers.all(container,
          container.image.startsWith('artifactory.my-company.com/') &&
          !container.image.endsWith(':latest') &&
          container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
        )
    - expression: >
        !has(object.spec.ephemeralContainers) || object.spec.ephemeralContainers.all(container,
          container.image.startsWith('artifactory.my-company.com/') &&
          !container.image.endsWith(':latest') &&
          container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
        )
      message: "Image policy violation: All images must come from 'artifactory.my-company.com', use a valid semantic version tag (e.g., v1.2.3), and not use ':latest'."

And the corresponding ValidatingAdmissionPolicyBinding to apply it cluster-wide:

yaml

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: "prod-image-provenance-policy-binding"
spec:
  policyName: "prod-image-provenance-policy"
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: NotIn
        values: ["kube-system", "gatekeeper-system"]

Dissecting the Advanced Logic

Handling Optional Lists: Notice the !has(object.spec.initContainers) || ... pattern. This is crucial. If a pod has no initContainers, object.spec.initContainers is null. Attempting to call .all() on null would cause an evaluation error. This pattern short-circuits the logic, ensuring the policy doesn't fail on pods without init or ephemeral containers.

Iterating with all(): The all() macro is a powerful tool for enforcing a rule across every element in a list. The expression list.all(variable, expression) returns true only if the expression is true for every variable in the list.

Complex Regex for SemVer: The regex

^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d)\\.(0|[1-9]\\d)\\.(0|[1-9]\\d)(?:-((?:0|[1-9]\\d|\\d[a-zA-Z-][0-9a-zA-Z-])(?:\\.(?:0|[1-9]\\d|\\d[a-zA-Z-][0-9a-zA-Z-]))))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$

is a standard, albeit complex, regular expression for validating Semantic Versioning 2.0.0. This is far more precise than simply disallowing :latest.

Multiple Validation Expressions: Instead of one giant, unreadable CEL expression, we've broken down the validation for each container type into its own expression. VAP evaluates all expressions in the validations list, and the request is denied if any of them fail.

Scoped Binding: The ValidatingAdmissionPolicyBinding uses a namespaceSelector to avoid applying this strict policy to system namespaces like kube-system, which is a critical production practice.

Testing the Policy

Let's try to apply a non-compliant pod:

yaml

# bad-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: bad-image-pod
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:latest # Violates registry and tag policy

Applying this manifest will be rejected by the API server:

$ kubectl apply -f bad-pod.yaml
Error from server: error when creating "bad-pod.yaml": admission webhook "validation.policy.admission.k8s.io" denied the request: Image policy violation: All images must come from 'artifactory.my-company.com', use a valid semantic version tag (e.g., v1.2.3), and not use ':latest'.

This immediate, clear feedback is invaluable for developers and CI/CD systems.

Production Pattern 2: Parameterized Policies with `paramKind`

The previous policy is effective but rigid. The trusted registry name is hardcoded. What if you have multiple trusted registries? Or want different teams to use different policies? Hardcoding values in policies leads to policy sprawl and maintenance nightmares.

VAP's paramKind feature solves this by allowing policies to reference external configuration objects, typically a ConfigMap or a Custom Resource (CRD). This enables true policy reusability.

The Problem:

We need a single, generic image policy.
Different namespaces or teams require different sets of allowed registries.
The policy logic should be decoupled from its configuration.

The Implementation: CRD, Policy, and Params

Step 1: Define a CRD for our parameters.

A CRD provides a typed, schema-validated way to manage configuration.

yaml

# imagepolicyconfig-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: imagepolicyconfigs.policy.my-company.com
spec:
  group: policy.my-company.com
  names:
    kind: ImagePolicyConfig
    listKind: ImagePolicyConfigList
    plural: imagepolicyconfigs
    singular: imagepolicyconfig
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              allowedRegistries:
                type: array
                items:
                  type: string
              disallowedTags:
                type: array
                items:
                  type: string
            required:
            - allowedRegistries
    served: true
    storage: true

Step 2: Create the parameterized ValidatingAdmissionPolicy.

This policy now references the ImagePolicyConfig CRD via paramKind and uses the params object in its CEL expressions.

yaml

# parameterized-image-policy.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: "parameterized-image-provenance-policy"
spec:
  failurePolicy: Fail
  paramKind:
    apiVersion: policy.my-company.com/v1alpha1
    kind: ImagePolicyConfig
  matchConstraints:
    resourceRules:
    - apiGroups:   [""]
      apiVersions: ["v1"]
      operations:  ["CREATE", "UPDATE"]
      resources:   ["pods"]
  validations:
    - expression: >
        let allContainers = (has(object.spec.initContainers) ? object.spec.initContainers : []) +
                            object.spec.containers +
                            (has(object.spec.ephemeralContainers) ? object.spec.ephemeralContainers : []);
        allContainers.all(container,
          params.spec.allowedRegistries.exists(registry, container.image.startsWith(registry)) &&
          !params.spec.disallowedTags.exists(tag, container.image.endsWith(tag))
        )
      message: "Image does not conform to the configured policy for this namespace."

Dissecting the CEL:

paramKind: This block tells VAP that this policy expects configuration from an ImagePolicyConfig resource.

params Object: The params variable is now available in CEL. It holds the entire custom resource object that the binding points to.

CEL Variables with let: We define a variable allContainers to simplify the expression. It concatenates all three container lists, safely handling cases where initContainers or ephemeralContainers are null using a ternary-like operator (condition ? true_val : false_val).

exists() Macro: Instead of hardcoding the registry, we use params.spec.allowedRegistries.exists(registry, container.image.startsWith(registry)). This checks if the container's image starts with any of the strings in the allowedRegistries list from our config object.

Step 3: Create a configuration instance and bind it.

Now, a team managing the app-team-1 namespace can define their own specific configuration.

yaml

# team-1-config.yaml
apiVersion: policy.my-company.com/v1alpha1
kind: ImagePolicyConfig
metadata:
  name: team-1-image-rules
  namespace: app-team-1
spec:
  allowedRegistries:
    - "artifactory.my-company.com/team-1/"
    - "gcr.io/my-project/team-1-images/"
  disallowedTags:
    - ":latest"
    - ":unstable"

Finally, the binding connects the generic policy to the specific configuration for a target namespace.

yaml

# team-1-binding.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: "team-1-image-policy-binding"
spec:
  policyName: "parameterized-image-provenance-policy"
  paramRef:
    name: team-1-image-rules
    namespace: app-team-1
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: app-team-1

This pattern is exceptionally powerful. The platform team maintains a library of generic, parameterized policies, and application teams self-service their own configuration via type-safe custom resources. This drastically reduces policy duplication and operational burden.

Edge Cases and Performance Optimization

Senior engineers know that correctness is only half the battle; performance and handling edge cases are paramount, especially in a critical path like the API server's admission controller.

`matchConditions` for Pre-filtering: A Critical Performance Win

The CEL expressions in the validations block are executed for every request that matches the matchConstraints. This can be computationally expensive. matchConditions (GA in Kubernetes 1.28) provides a way to pre-filter requests using CEL before the main validation logic is ever invoked. This is a huge performance optimization.

Scenario: Imagine a policy that should only apply to pods that have a specific annotation security.my-company.com/scan-required: "true".

Inefficient Approach (validation expression only):

yaml

  validations:
    - expression: >
        !has(object.metadata.annotations) || 
        object.metadata.annotations['security.my-company.com/scan-required'] != 'true' ||
        (has(object.spec.containers) && object.spec.containers.all(c, has(c.securityContext) && c.securityContext.readOnlyRootFilesystem == true))
      message: "Scanned pods must have a read-only root filesystem."

Here, the complex securityContext logic is evaluated for every single pod creation/update, even those without the annotation. The expression has to first check if the annotation exists and if it has the right value before proceeding.

Efficient Approach (using matchConditions):

yaml

  matchConditions:
    - name: 'check-for-scan-annotation'
      expression: 'has(object.metadata.annotations) && object.metadata.annotations["security.my-company.com/scan-required"] == "true"'
  validations:
    - expression: >
        object.spec.containers.all(c, has(c.securityContext) && c.securityContext.readOnlyRootFilesystem == true)
      message: "Scanned pods must have a read-only root filesystem."

With this structure, the main validation CEL is only executed if the matchConditions expression evaluates to true. The API server can efficiently discard irrelevant requests early. The CEL environment for matchConditions is more limited (e.g., no access to params), but it is ideal for filtering based on object metadata.

Rule of Thumb: Use matchConditions to define the scope of your policy based on immutable or metadata fields. Use the validations block to enforce the logic of the policy on the objects within that scope.

CEL Cost Budget and Expression Complexity

To prevent denial-of-service attacks or runaway expressions from crippling the API server, CEL has a computational "cost budget." Each operation (a function call, a regex match, a list traversal) has an assigned cost. If an expression's evaluation exceeds the budget, it fails.

While the exact budget is an implementation detail, you should be mindful of expression complexity:

Avoid Nested Loops: An expression like list1.all(x, list2.all(y, x == y)) is quadratic in complexity and can easily exhaust the cost budget on large lists.

Prefer exists() over filtering and size checks: list.filter(x, x.matches('...')).size() > 0 is less efficient than list.exists(x, x.matches('...')) because exists can stop as soon as it finds a match.

Compile-time vs. Run-time costs: Regex compilation contributes to the cost. If you use the same complex regex multiple times, consider if your logic can be restructured.

Handling `oldSelf` in UPDATE Operations

When handling UPDATE operations, your CEL expression has access to oldSelf, the object's state before the change. This is critical for policies that enforce immutability.

Problem: Prevent developers from changing the team-owner label on a Deployment once it's set.

yaml

# in a ValidatingAdmissionPolicy targeting Deployments
  validations:
    - expression: >
        !has(oldSelf.metadata.labels) || 
        !has(oldSelf.metadata.labels['team-owner']) || 
        (has(object.metadata.labels) && object.metadata.labels['team-owner'] == oldSelf.metadata.labels['team-owner'])
      message: "The 'team-owner' label is immutable and cannot be changed after creation."

This expression safely handles the initial creation (where oldSelf has no team-owner label) and subsequent updates, ensuring the label, if it exists, remains unchanged.

VAP vs. OPA/Gatekeeper vs. Kyverno: A Senior Engineer's Heuristic

With VAP now generally available, the inevitable question is: "When should I use VAP, and when do I still need a full-fledged policy engine?"

Here's a decision framework:

Feature/Requirement	Validating Admission Policies (VAP)	OPA/Gatekeeper, Kyverno
Operational Overhead	Very Low. Native to API server. No new pods.	Medium. Requires installing/managing controllers.
Policy Language	CEL. Familiar to developers.	Rego (OPA) or YAML-based DSL (Kyverno). Steeper curve.
Policy Type	Validation only. No mutation.	Validation and Mutation (e.g., adding default labels).
External Data	No. Cannot call out to external systems.	Yes. Can query external APIs, other k8s objects, etc.
Policy Complexity	Ideal for low-to-high complexity logic.	Unbounded. Can handle extremely complex logic.
Audit / Dry-Run	`validationActions: [Audit]` sends to audit log.	Full-featured dry-run modes and audit dashboards.
Use Case Sweet Spot	Enforcing schema, labels, image sources, security contexts, resource limits.	Cross-object validation, policies requiring external state, complex mutations.

Heuristic:

Start with VAP. For the vast majority of common security and governance policies (image provenance, label enforcement, disallowing hostPath, requiring resource limits, enforcing securityContext settings), VAP is sufficient, more performant, and has zero operational overhead. You can likely cover 80% of your needs with it.

Reach for Gatekeeper/Kyverno when you need:

* Mutation: You need to automatically add a sidecar.istio.io/inject: "true" annotation to new pods. VAP cannot do this.

* External State: Your policy needs to check if a new Ingress hostname is already registered in an external DNS provider. VAP cannot make external calls.

* Cross-Object State: You need a policy that ensures a ServiceMonitor's labels match the labels on the Service it's targeting. VAP operates on a single object at a time (object, oldSelf, params).

By embracing VAP as the default, you keep your cluster lean and leverage a powerful, native capability. You can then strategically deploy a more heavyweight policy engine only when its unique features are truly required.

Conclusion: VAP as a First-Class Citizen in Cluster Governance

Validating Admission Policies are a game-changer for native Kubernetes security. By moving beyond simple examples and mastering advanced CEL constructs like list comprehensions, paramKind for reusability, and performance tuning with matchConditions, platform and security engineers can build a robust, efficient, and maintainable policy-as-code framework. VAP is not just a PSP replacement; it is a powerful tool that should be the first choice for implementing validation-based governance in any modern Kubernetes cluster.

The Post-PSP Era: Why VAP is More Than Just a Replacement

Production Pattern 1: Enforcing Strict Container Image Provenance

The Advanced CEL Implementation

Dissecting the Advanced Logic

Testing the Policy

Production Pattern 2: Parameterized Policies with `paramKind`

The Implementation: CRD, Policy, and Params

Edge Cases and Performance Optimization

`matchConditions` for Pre-filtering: A Critical Performance Win

CEL Cost Budget and Expression Complexity

Handling `oldSelf` in UPDATE Operations

VAP vs. OPA/Gatekeeper vs. Kyverno: A Senior Engineer's Heuristic

Conclusion: VAP as a First-Class Citizen in Cluster Governance

Found this article helpful?