Kubernetes VAP: Fine-Grained Pod Security with Advanced CEL

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Post-PSP Era: Why VAP is More Than Just a Replacement

For engineers who have managed Kubernetes clusters for any length of time, the deprecation of PodSecurityPolicy (PSP) left a significant gap in native security governance. While Pod Security Standards (PSS) provide a baseline set of profiles (privileged, baseline, restricted), they lack the granular, customizable control that production environments demand. This is where Validating Admission Policies (VAP), a beta feature in Kubernetes 1.26 and GA in 1.29, step in.

VAP isn't merely a direct replacement for PSP; it's a fundamental shift towards a more declarative, expressive, and integrated policy-as-code model within the Kubernetes API server itself. It leverages the Common Expression Language (CEL) to execute validation logic directly at the point of admission, without requiring external webhooks or third-party policy engines like OPA/Gatekeeper or Kyverno for a large class of use cases.

This article assumes you understand the basics of admission control and the limitations of PSP/PSS. We will not cover introductory concepts. Instead, we will dive directly into crafting sophisticated, production-ready policies with VAP and CEL, focusing on patterns you would implement to secure a multi-tenant, enterprise-grade cluster.

Production Pattern 1: Enforcing Strict Container Image Provenance

A foundational security posture for any cluster is controlling the origin and tagging of container images. A naive policy might just check for a private registry prefix. A production-grade policy must be far more robust, handling multiple container types, enforcing semantic versioning, and providing clear failure messages.

The Problem:

  • Disallow all images from public registries like Docker Hub.
  • Enforce usage of a specific set of internal, trusted registries (e.g., artifactory.my-company.com or gcr.io/my-project).
  • Prohibit the use of floating tags like :latest or :stable to ensure declarative, deterministic deployments.
    • The policy must apply to standard containers, init containers, and ephemeral containers.

    The Advanced CEL Implementation

    We'll use CEL's list comprehensions (all()) and regular expressions (matches()) to build a comprehensive policy. The logic will iterate through every container definition in a pod spec and apply our rules.

    Here is the complete ValidatingAdmissionPolicy manifest:

    yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingAdmissionPolicy
    metadata:
      name: "prod-image-provenance-policy"
    spec:
      failurePolicy: Fail
      matchConstraints:
        resourceRules:
        - apiGroups:   [""]
          apiVersions: ["v1"]
          operations:  ["CREATE", "UPDATE"]
          resources:   ["pods"]
      validations:
        - expression: >
            !has(object.spec.initContainers) || object.spec.initContainers.all(container,
              container.image.startsWith('artifactory.my-company.com/') &&
              !container.image.endsWith(':latest') &&
              container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
            )
        - expression: >
            object.spec.containers.all(container,
              container.image.startsWith('artifactory.my-company.com/') &&
              !container.image.endsWith(':latest') &&
              container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
            )
        - expression: >
            !has(object.spec.ephemeralContainers) || object.spec.ephemeralContainers.all(container,
              container.image.startsWith('artifactory.my-company.com/') &&
              !container.image.endsWith(':latest') &&
              container.image.matches('^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
            )
          message: "Image policy violation: All images must come from 'artifactory.my-company.com', use a valid semantic version tag (e.g., v1.2.3), and not use ':latest'."

    And the corresponding ValidatingAdmissionPolicyBinding to apply it cluster-wide:

    yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingAdmissionPolicyBinding
    metadata:
      name: "prod-image-provenance-policy-binding"
    spec:
      policyName: "prod-image-provenance-policy"
      validationActions: [Deny]
      matchResources:
        namespaceSelector:
          matchExpressions:
          - key: kubernetes.io/metadata.name
            operator: NotIn
            values: ["kube-system", "gatekeeper-system"]

    Dissecting the Advanced Logic

  • Handling Optional Lists: Notice the !has(object.spec.initContainers) || ... pattern. This is crucial. If a pod has no initContainers, object.spec.initContainers is null. Attempting to call .all() on null would cause an evaluation error. This pattern short-circuits the logic, ensuring the policy doesn't fail on pods without init or ephemeral containers.
  • Iterating with all(): The all() macro is a powerful tool for enforcing a rule across every element in a list. The expression list.all(variable, expression) returns true only if the expression is true for every variable in the list.
  • Complex Regex for SemVer: The regex ^artifactory.my-company.com/[-a-zA-Z0-9/]+:v(0|[1-9]\\d)\\.(0|[1-9]\\d)\\.(0|[1-9]\\d)(?:-((?:0|[1-9]\\d|\\d[a-zA-Z-][0-9a-zA-Z-])(?:\\.(?:0|[1-9]\\d|\\d[a-zA-Z-][0-9a-zA-Z-]))))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$ is a standard, albeit complex, regular expression for validating Semantic Versioning 2.0.0. This is far more precise than simply disallowing :latest.
  • Multiple Validation Expressions: Instead of one giant, unreadable CEL expression, we've broken down the validation for each container type into its own expression. VAP evaluates all expressions in the validations list, and the request is denied if any of them fail.
  • Scoped Binding: The ValidatingAdmissionPolicyBinding uses a namespaceSelector to avoid applying this strict policy to system namespaces like kube-system, which is a critical production practice.
  • Testing the Policy

    Let's try to apply a non-compliant pod:

    yaml
    # bad-pod.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: bad-image-pod
      namespace: default
    spec:
      containers:
      - name: nginx
        image: nginx:latest # Violates registry and tag policy

    Applying this manifest will be rejected by the API server:

    sh
    $ kubectl apply -f bad-pod.yaml
    Error from server: error when creating "bad-pod.yaml": admission webhook "validation.policy.admission.k8s.io" denied the request: Image policy violation: All images must come from 'artifactory.my-company.com', use a valid semantic version tag (e.g., v1.2.3), and not use ':latest'.

    This immediate, clear feedback is invaluable for developers and CI/CD systems.

    Production Pattern 2: Parameterized Policies with `paramKind`

    The previous policy is effective but rigid. The trusted registry name is hardcoded. What if you have multiple trusted registries? Or want different teams to use different policies? Hardcoding values in policies leads to policy sprawl and maintenance nightmares.

    VAP's paramKind feature solves this by allowing policies to reference external configuration objects, typically a ConfigMap or a Custom Resource (CRD). This enables true policy reusability.

    The Problem:

    • We need a single, generic image policy.
    • Different namespaces or teams require different sets of allowed registries.
    • The policy logic should be decoupled from its configuration.

    The Implementation: CRD, Policy, and Params

    Step 1: Define a CRD for our parameters.

    A CRD provides a typed, schema-validated way to manage configuration.

    yaml
    # imagepolicyconfig-crd.yaml
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
      name: imagepolicyconfigs.policy.my-company.com
    spec:
      group: policy.my-company.com
      names:
        kind: ImagePolicyConfig
        listKind: ImagePolicyConfigList
        plural: imagepolicyconfigs
        singular: imagepolicyconfig
      scope: Namespaced
      versions:
      - name: v1alpha1
        schema:
          openAPIV3Schema:
            type: object
            properties:
              spec:
                type: object
                properties:
                  allowedRegistries:
                    type: array
                    items:
                      type: string
                  disallowedTags:
                    type: array
                    items:
                      type: string
                required:
                - allowedRegistries
        served: true
        storage: true

    Step 2: Create the parameterized ValidatingAdmissionPolicy.

    This policy now references the ImagePolicyConfig CRD via paramKind and uses the params object in its CEL expressions.

    yaml
    # parameterized-image-policy.yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingAdmissionPolicy
    metadata:
      name: "parameterized-image-provenance-policy"
    spec:
      failurePolicy: Fail
      paramKind:
        apiVersion: policy.my-company.com/v1alpha1
        kind: ImagePolicyConfig
      matchConstraints:
        resourceRules:
        - apiGroups:   [""]
          apiVersions: ["v1"]
          operations:  ["CREATE", "UPDATE"]
          resources:   ["pods"]
      validations:
        - expression: >
            let allContainers = (has(object.spec.initContainers) ? object.spec.initContainers : []) +
                                object.spec.containers +
                                (has(object.spec.ephemeralContainers) ? object.spec.ephemeralContainers : []);
            allContainers.all(container,
              params.spec.allowedRegistries.exists(registry, container.image.startsWith(registry)) &&
              !params.spec.disallowedTags.exists(tag, container.image.endsWith(tag))
            )
          message: "Image does not conform to the configured policy for this namespace."

    Dissecting the CEL:

  • paramKind: This block tells VAP that this policy expects configuration from an ImagePolicyConfig resource.
  • params Object: The params variable is now available in CEL. It holds the entire custom resource object that the binding points to.
  • CEL Variables with let: We define a variable allContainers to simplify the expression. It concatenates all three container lists, safely handling cases where initContainers or ephemeralContainers are null using a ternary-like operator (condition ? true_val : false_val).
  • exists() Macro: Instead of hardcoding the registry, we use params.spec.allowedRegistries.exists(registry, container.image.startsWith(registry)). This checks if the container's image starts with any of the strings in the allowedRegistries list from our config object.
  • Step 3: Create a configuration instance and bind it.

    Now, a team managing the app-team-1 namespace can define their own specific configuration.

    yaml
    # team-1-config.yaml
    apiVersion: policy.my-company.com/v1alpha1
    kind: ImagePolicyConfig
    metadata:
      name: team-1-image-rules
      namespace: app-team-1
    spec:
      allowedRegistries:
        - "artifactory.my-company.com/team-1/"
        - "gcr.io/my-project/team-1-images/"
      disallowedTags:
        - ":latest"
        - ":unstable"

    Finally, the binding connects the generic policy to the specific configuration for a target namespace.

    yaml
    # team-1-binding.yaml
    apiVersion: admissionregistration.k8s.io/v1
    kind: ValidatingAdmissionPolicyBinding
    metadata:
      name: "team-1-image-policy-binding"
    spec:
      policyName: "parameterized-image-provenance-policy"
      paramRef:
        name: team-1-image-rules
        namespace: app-team-1
      validationActions: [Deny]
      matchResources:
        namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: app-team-1

    This pattern is exceptionally powerful. The platform team maintains a library of generic, parameterized policies, and application teams self-service their own configuration via type-safe custom resources. This drastically reduces policy duplication and operational burden.

    Edge Cases and Performance Optimization

    Senior engineers know that correctness is only half the battle; performance and handling edge cases are paramount, especially in a critical path like the API server's admission controller.

    `matchConditions` for Pre-filtering: A Critical Performance Win

    The CEL expressions in the validations block are executed for every request that matches the matchConstraints. This can be computationally expensive. matchConditions (GA in Kubernetes 1.28) provides a way to pre-filter requests using CEL before the main validation logic is ever invoked. This is a huge performance optimization.

    Scenario: Imagine a policy that should only apply to pods that have a specific annotation security.my-company.com/scan-required: "true".

    Inefficient Approach (validation expression only):

    yaml
      validations:
        - expression: >
            !has(object.metadata.annotations) || 
            object.metadata.annotations['security.my-company.com/scan-required'] != 'true' ||
            (has(object.spec.containers) && object.spec.containers.all(c, has(c.securityContext) && c.securityContext.readOnlyRootFilesystem == true))
          message: "Scanned pods must have a read-only root filesystem."

    Here, the complex securityContext logic is evaluated for every single pod creation/update, even those without the annotation. The expression has to first check if the annotation exists and if it has the right value before proceeding.

    Efficient Approach (using matchConditions):

    yaml
      matchConditions:
        - name: 'check-for-scan-annotation'
          expression: 'has(object.metadata.annotations) && object.metadata.annotations["security.my-company.com/scan-required"] == "true"'
      validations:
        - expression: >
            object.spec.containers.all(c, has(c.securityContext) && c.securityContext.readOnlyRootFilesystem == true)
          message: "Scanned pods must have a read-only root filesystem."

    With this structure, the main validation CEL is only executed if the matchConditions expression evaluates to true. The API server can efficiently discard irrelevant requests early. The CEL environment for matchConditions is more limited (e.g., no access to params), but it is ideal for filtering based on object metadata.

    Rule of Thumb: Use matchConditions to define the scope of your policy based on immutable or metadata fields. Use the validations block to enforce the logic of the policy on the objects within that scope.

    CEL Cost Budget and Expression Complexity

    To prevent denial-of-service attacks or runaway expressions from crippling the API server, CEL has a computational "cost budget." Each operation (a function call, a regex match, a list traversal) has an assigned cost. If an expression's evaluation exceeds the budget, it fails.

    While the exact budget is an implementation detail, you should be mindful of expression complexity:

  • Avoid Nested Loops: An expression like list1.all(x, list2.all(y, x == y)) is quadratic in complexity and can easily exhaust the cost budget on large lists.
  • Prefer exists() over filtering and size checks: list.filter(x, x.matches('...')).size() > 0 is less efficient than list.exists(x, x.matches('...')) because exists can stop as soon as it finds a match.
  • Compile-time vs. Run-time costs: Regex compilation contributes to the cost. If you use the same complex regex multiple times, consider if your logic can be restructured.
  • Handling `oldSelf` in UPDATE Operations

    When handling UPDATE operations, your CEL expression has access to oldSelf, the object's state before the change. This is critical for policies that enforce immutability.

    Problem: Prevent developers from changing the team-owner label on a Deployment once it's set.

    yaml
    # in a ValidatingAdmissionPolicy targeting Deployments
      validations:
        - expression: >
            !has(oldSelf.metadata.labels) || 
            !has(oldSelf.metadata.labels['team-owner']) || 
            (has(object.metadata.labels) && object.metadata.labels['team-owner'] == oldSelf.metadata.labels['team-owner'])
          message: "The 'team-owner' label is immutable and cannot be changed after creation."

    This expression safely handles the initial creation (where oldSelf has no team-owner label) and subsequent updates, ensuring the label, if it exists, remains unchanged.

    VAP vs. OPA/Gatekeeper vs. Kyverno: A Senior Engineer's Heuristic

    With VAP now generally available, the inevitable question is: "When should I use VAP, and when do I still need a full-fledged policy engine?"

    Here's a decision framework:

    Feature/RequirementValidating Admission Policies (VAP)OPA/Gatekeeper, Kyverno
    Operational OverheadVery Low. Native to API server. No new pods.Medium. Requires installing/managing controllers.
    Policy LanguageCEL. Familiar to developers.Rego (OPA) or YAML-based DSL (Kyverno). Steeper curve.
    Policy TypeValidation only. No mutation.Validation and Mutation (e.g., adding default labels).
    External DataNo. Cannot call out to external systems.Yes. Can query external APIs, other k8s objects, etc.
    Policy ComplexityIdeal for low-to-high complexity logic.Unbounded. Can handle extremely complex logic.
    Audit / Dry-RunvalidationActions: [Audit] sends to audit log.Full-featured dry-run modes and audit dashboards.
    Use Case Sweet SpotEnforcing schema, labels, image sources, security contexts, resource limits.Cross-object validation, policies requiring external state, complex mutations.

    Heuristic:

  • Start with VAP. For the vast majority of common security and governance policies (image provenance, label enforcement, disallowing hostPath, requiring resource limits, enforcing securityContext settings), VAP is sufficient, more performant, and has zero operational overhead. You can likely cover 80% of your needs with it.
  • Reach for Gatekeeper/Kyverno when you need:
  • * Mutation: You need to automatically add a sidecar.istio.io/inject: "true" annotation to new pods. VAP cannot do this.

    * External State: Your policy needs to check if a new Ingress hostname is already registered in an external DNS provider. VAP cannot make external calls.

    * Cross-Object State: You need a policy that ensures a ServiceMonitor's labels match the labels on the Service it's targeting. VAP operates on a single object at a time (object, oldSelf, params).

    By embracing VAP as the default, you keep your cluster lean and leverage a powerful, native capability. You can then strategically deploy a more heavyweight policy engine only when its unique features are truly required.

    Conclusion: VAP as a First-Class Citizen in Cluster Governance

    Validating Admission Policies are a game-changer for native Kubernetes security. By moving beyond simple examples and mastering advanced CEL constructs like list comprehensions, paramKind for reusability, and performance tuning with matchConditions, platform and security engineers can build a robust, efficient, and maintainable policy-as-code framework. VAP is not just a PSP replacement; it is a powerful tool that should be the first choice for implementing validation-based governance in any modern Kubernetes cluster.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles