Fleet-Wide GitOps: Taming Drift with ArgoCD ApplicationSets & Kustomize

17 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitability of Drift in Large-Scale GitOps

As a senior engineer responsible for a Kubernetes platform, you understand the core value proposition of GitOps: Git as the single source of truth for your cluster's desired state. For a single application on a single cluster, this is a straightforward and powerful paradigm. But reality is rarely that simple. Your organization runs dozens of microservices, deployed across development, staging, and production environments, often spanning multiple geographic regions and cloud providers.

This immediately translates into a combinatorial explosion of configuration. Consider a modest setup: 30 microservices x 3 environments x 4 regions = 360 distinct application deployments. In a naive ArgoCD implementation, this could mean 360 manually created Application CRD manifests in your Git repository.

The problem with this approach isn't just the initial setup effort; it's the ongoing maintenance burden and the high probability of configuration drift. What happens when you need to update a universal annotation, change the syncPolicy for all production apps, or point every staging deployment to a new container registry? You are now faced with a high-risk, error-prone task of updating hundreds of YAML files. A missed file or a copy-paste error can lead to subtle, hard-to-diagnose inconsistencies between environments—the very definition of configuration drift.

This manual, repetitive approach violates the DRY (Don't Repeat Yourself) principle and simply does not scale. It creates operational fragility and turns your GitOps repository into a liability rather than an asset. To manage a fleet of applications effectively, we need a mechanism to abstract, template, and programmatically generate our Application resources. This is precisely the problem the ArgoCD ApplicationSet controller was designed to solve.

This article will dissect a production-proven pattern that combines the power of ApplicationSet generators with the surgical precision of Kustomize overlays to manage fleet-wide configuration declaratively, safely, and at scale.


The Anti-Pattern: Manual `Application` Manifest Proliferation

Before we dive into the solution, it's critical to understand why the most obvious approach fails. Let's visualize the problematic repository structure.

bash
# A repository structure destined for failure at scale
. 
└── argocd-apps/
    ├── dev/
    │   ├── service-a-app.yaml
    │   ├── service-b-app.yaml
    │   └── ... 50 more files
    ├── staging/
    │   ├── service-a-app.yaml
    │   ├── service-b-app.yaml
    │   └── ... 50 more files
    └── prod/
        ├── prod-us-east-1/
        │   ├── service-a-app.yaml
        │   └── ...
        └── prod-eu-west-1/
            ├── service-a-app.yaml
            └── ...

Each .yaml file is a slightly different variant of an ArgoCD Application manifest:

yaml
# argocd-apps/staging/service-a-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: staging-service-a
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: 'https://github.com/my-org/service-a-config.git'
    targetRevision: staging
    path: .
  destination:
    server: 'https://1.2.3.4'
    namespace: service-a-staging
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

The prod-us-east-1 version of this file might only differ by targetRevision, destination.server, and destination.namespace. While this works for a handful of applications, the scaling issues are severe:

  • Massive Duplication: 90% of the content in these files is boilerplate. A change to a common field, like adding a syncOption, requires a tedious and risky find-and-replace operation.
  • Inconsistent Configuration: It becomes nearly impossible to enforce standards. A developer might forget to add the resources-finalizer to a new application, or set selfHeal: false in production by mistake.
  • Onboarding Friction: Adding a new service or a new cluster requires manually creating and tailoring dozens of new files, increasing the cognitive load for development teams.
  • Some teams attempt to solve this by using Helm to template the Application resources themselves. While clever, this often becomes an anti-pattern, creating an awkward layer of abstraction and making it difficult to see the final rendered Application manifests in Git.

    The correct solution is to treat Application resources as a product of a higher-level abstraction, a factory. That factory is the ApplicationSet.


    The `ApplicationSet` Controller: A Factory for Applications

    The ApplicationSet controller, now a core part of ArgoCD, operates on a simple but powerful principle: it uses a template to generate ArgoCD Application resources based on a set of parameters. These parameters are provided by a generator.

    An ApplicationSet resource has two key components:

    * generators: An array of sources that provide parameters. These can be a static list, a query of Kubernetes clusters known to ArgoCD, or files/directories discovered in a Git repository.

    * template: A blueprint for the Application resources that will be generated. The fields in this template can be populated with values from the generators.

    Let's explore the most powerful generators for managing a fleet.

    The `Cluster` Generator: For Multi-Cluster Consistency

    The Cluster generator is the cornerstone of multi-cluster management. It automatically discovers clusters that have been registered with ArgoCD (by creating a Secret in the argocd namespace) and uses their metadata as parameters.

    Scenario: You need to deploy a standard monitoring stack (e.g., kube-prometheus-stack) to every single production cluster in your fleet.

    First, ensure your clusters are labeled appropriately within their ArgoCD Secret manifests:

    yaml
    # Example secret for a prod cluster
    apiVersion: v1
    kind: Secret
    metadata:
      name: prod-us-east-1-cluster-secret
      namespace: argocd
      labels:
        argocd.argoproj.io/secret-type: cluster
        environment: production
        region: us-east-1
    # ... rest of the secret data

    Now, you can create a single ApplicationSet to target all clusters with the environment: production label.

    yaml
    # applicationset-prometheus.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: kube-prometheus-stack
      namespace: argocd
    spec:
      generators:
      - clusters:
          selector:
            matchLabels:
              environment: 'production'
      template:
        metadata:
          name: '{{name}}-prometheus' # 'name' is the cluster name from the secret
          namespace: 'argocd'
        spec:
          project: 'platform-services'
          source:
            repoURL: 'https://prometheus-community.github.io/helm-charts'
            chart: 'kube-prometheus-stack'
            targetRevision: '45.2.1'
            helm:
              releaseName: 'prometheus'
              values: |-
                grafana:
                  adminPassword: $grafana-admin-password # Using a secret management solution
                prometheus:
                  prometheusSpec:
                    # Use cluster region label as an external label
                    externalLabels:
                      region: '{{metadata.labels.region}}'
          destination:
            server: '{{server}}' # 'server' is the API server URL from the secret
            namespace: 'monitoring'
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            syncOptions:
            - CreateNamespace=true

    Analysis of this pattern:

    Declarative Fleet Management: With this one file, you have defined the desired state for the Prometheus stack on all current and future* production clusters. Adding a new production cluster is as simple as registering its secret with the correct label; the ApplicationSet controller will automatically deploy the stack.

    * Dynamic Templating: We are using parameters from the generator like {{name}}, {{server}}, and {{metadata.labels.region}}. This allows us to customize the generated Application for each specific cluster, such as setting a Prometheus external label based on the cluster's region.

    * Enforced Consistency: Every generated application will have the exact same syncPolicy, project, and chart version, eliminating a major source of configuration drift.

    The `Git` Generator: For Application-Centric Onboarding

    While the Cluster generator is excellent for deploying the same application to many clusters, the Git generator excels at deploying many different applications. It works by discovering either directories or files in a Git repository.

    Scenario: You want to empower application teams to self-service their deployments. They should be able to create a new folder in a Git repository to have their application deployed to a development cluster.

    Let's use the directory discovery mode.

    Git Repository Structure (app-configs repo):

    bash
    . 
    └── apps/
        ├── checkout-service/
        │   ├── config.json
        │   └── manifests/
        │       └── ... k8s manifests
        ├── payment-service/
        │   ├── config.json
        │   └── manifests/
        │       └── ... k8s manifests
        └── inventory-service/
            ├── config.json
            └── manifests/
                └── ... k8s manifests

    Each config.json file contains metadata specific to that application:

    json
    // apps/checkout-service/config.json
    {
      "projectName": "e-commerce",
      "targetNamespace": "checkout-dev"
    }

    The ApplicationSet uses the Git generator to scan for directories under apps/ and reads the config.json in each one.

    yaml
    # applicationset-dev-apps.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: dev-applications
      namespace: argocd
    spec:
      generators:
      - git:
          repoURL: https://github.com/my-org/app-configs.git
          revision: HEAD
          # Discover all directories under the 'apps' path
          directories:
          - path: apps/*
      # This is an advanced feature: Generator chaining is not directly supported.
      # We use a single generator and combine its output with Kustomize for per-env config.
      # The real power comes from combining this with Kustomize overlays, discussed next.
    
      template:
        metadata:
          # 'path.basename' is the directory name, e.g., 'checkout-service'
          name: 'dev-{{path.basename}}'
        spec:
          project: '{{projectName}}' # This would require a file-based generator or advanced templating
          source:
            repoURL: https://github.com/my-org/app-configs.git
            targetRevision: HEAD
            path: '{{path}}/manifests' # Path to the k8s manifests
          destination:
            server: 'https://dev-cluster.my-org.com'
            namespace: '{{targetNamespace}}'

    Note: Reading parameters from a JSON/YAML file within the discovered directory requires the matrix generator combining a git directory generator with a git file generator. For simplicity, we'll focus on the more powerful pattern of using Kustomize.

    This pattern creates a powerful self-service workflow. A developer simply creates a new directory for their service, adds their manifests and a config file, and pushes to Git. The ApplicationSet controller detects the new directory and automatically deploys their application.


    The Ultimate Pattern: `ApplicationSets` + Kustomize Overlays

    The true power of this model is realized when you combine the application generation capabilities of ApplicationSet with the configuration customization of Kustomize. Generators are great for stamping out identical resources, but in the real world, every environment has its unique variables: resource limits, replica counts, ingress hostnames, and ConfigMap values.

    Kustomize's base and overlay model is perfectly suited for this. We define a common base configuration for an application, and then create overlays that only specify the differences for each environment.

    Scenario: We are deploying a guestbook application to a staging and a production cluster. Production requires more replicas, higher resource limits, and a different welcome message.

    Step 1: Structure the Kustomize Application Repository

    bash
    . # Git repo: my-org/guestbook-config
    ├── base/
    │   ├── deployment.yaml
    │   ├── service.yaml
    │   └── kustomization.yaml
    └── overlays/
        ├── staging/
        │   ├── config.patch.yaml
        │   ├── replicas.patch.yaml
        │   └── kustomization.yaml
        └── production/
            ├── config.patch.yaml
            ├── replicas.patch.yaml
            └── kustomization.yaml

    base/deployment.yaml (excerpt):

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: guestbook-ui
    spec:
      replicas: 1 # Default replica count
      template:
        spec:
          containers:
          - name: guestbook-ui
            image: gcr.io/heptio-images/ks-guestbook-demo:0.2
            env:
            - name: WELCOME_MESSAGE
              value: "Welcome! (Default)"
            resources:
              requests:
                cpu: 100m
                memory: 128Mi

    base/kustomization.yaml:

    yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    resources:
    - deployment.yaml
    - service.yaml

    overlays/production/replicas.patch.yaml:

    yaml
    # This file only specifies the change
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: guestbook-ui
    spec:
      replicas: 5 # Override for production

    overlays/production/kustomization.yaml:

    yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    # Inherit from the base
    bases:
    - ../../base
    
    # Apply patches
    patchesStrategicMerge:
    - replicas.patch.yaml
    - config.patch.yaml # Assume this patch changes the WELCOME_MESSAGE env var

    Step 2: Create the ApplicationSet to Target the Overlays

    Now, we create an ApplicationSet that uses a generator to decide which Kustomize overlay to deploy to which cluster. Here, we'll use a List generator for clarity, but a Cluster generator is ideal for a real-world multi-cluster setup.

    yaml
    # applicationset-guestbook.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: guestbook
      namespace: argocd
    spec:
      generators:
      - list:
          elements:
          - cluster: 'staging-cluster'
            url: 'https://1.2.3.4'
            env: 'staging'
          - cluster: 'production-cluster'
            url: 'https://5.6.7.8'
            env: 'production'
    
      template:
        metadata:
          name: '{{cluster}}-guestbook'
        spec:
          project: default
          source:
            repoURL: 'https://github.com/my-org/guestbook-config.git'
            targetRevision: main
            # This is the magic! Dynamically point to the correct overlay.
            path: 'overlays/{{env}}'
          destination:
            server: '{{url}}'
            namespace: 'guestbook'
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            syncOptions:
            - CreateNamespace=true

    How It Works:

  • The ApplicationSet's list generator produces two sets of parameters, one for staging and one for production.
  • For each set, it stamps out an Application resource from the template.
  • The key is the spec.source.path field: path: 'overlays/{{env}}'. This expression is interpolated with the env parameter from the generator.
  • The staging Application will point its Kustomize source to overlays/staging.
  • The production Application will point its Kustomize source to overlays/production.
  • This pattern is exceptionally powerful. You have achieved complete separation of concerns:

    The application repository (guestbook-config) defines what* the application is and how it differs between environments.

    The ApplicationSet manifest defines where and how* the application is deployed, binding environments to specific clusters.

    This decouples platform configuration from application configuration, allowing teams to work in parallel with greater safety and clarity.


    Advanced Patterns and Production Considerations

    While the core pattern is powerful, real-world production systems require handling more complex scenarios.

    Edge Case 1: Progressive Delivery with Sync Waves and Phases

    Deploying a complex application isn't always an atomic operation. You might need to run a database migration Job before the main Deployment is updated. ArgoCD's Sync Waves can be controlled directly from the ApplicationSet template.

    Let's add a database migrator to our guestbook app's Kustomize base, annotated with a negative sync wave:

    yaml
    # base/db-migration-job.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: guestbook-db-migration
      annotations:
        argocd.argoproj.io/sync-wave: "-10"
    # ... Job spec

    The main Deployment would have a default or positive sync wave (e.g., argocd.argoproj.io/sync-wave: "0"). When ArgoCD syncs the generated Application, it will always apply the Job first and wait for it to complete (if using sync hooks) before proceeding with the Deployment.

    This ensures a safe, ordered deployment process, defined declaratively and applied consistently across all environments targeted by the ApplicationSet.

    Edge Case 2: Drift Remediation Strategy at Scale

    selfHeal: true is powerful but can be dangerous at scale. An accidental change in the Git repository could be automatically rolled out across your entire fleet, causing a widespread outage. A more cautious and robust strategy is to separate drift detection from remediation.

  • Configure the ApplicationSet template for detection only:
  • yaml
        # In the ApplicationSet template spec.syncPolicy
        syncPolicy:
          automated: null # Disable automated sync
  • Monitor for OutOfSync status: Use the ArgoCD API or Prometheus metrics (argocd_app_info) to monitor for applications that are in an OutOfSync state.
  • Alerting: Fire an alert to the responsible team when an application remains OutOfSync for a specified period.
  • Manual or Semi-Automated Remediation: The team can then inspect the drift (via the ArgoCD UI/CLI) and manually trigger a sync. This human-in-the-loop approach prevents catastrophic automated rollouts.
  • For non-critical applications or environments, you can still enable selfHeal, but for production workloads, this decoupled approach provides a crucial safety net.

    Performance Considerations for 1000+ Applications

    An ApplicationSet can easily generate thousands of Application resources. This places significant load on the ArgoCD components.

    * argocd-application-controller: This controller is responsible for reconciling the state of each Application. If you have thousands of applications, you may need to increase the number of replicas for this controller. ArgoCD 1.9+ introduced sharding, allowing you to distribute applications across multiple controller replicas, which is essential for massive scale.

    * argocd-repo-server: This component is responsible for cloning Git repositories and rendering manifests (e.g., running kustomize build). High application counts or large repositories can make it a bottleneck. Monitor its CPU/memory usage closely and increase replicas as needed. You can also optimize by enabling repository caching and tuning Git polling intervals.

    * Git Provider API Limits: With thousands of applications polling a single repository, you can easily hit API rate limits on providers like GitHub. Use webhooks instead of polling where possible. The Git generator in ApplicationSet can be configured to re-scan on a webhook event, making it far more efficient.

    Conclusion: From Configuration Management to a Configuration Factory

    By moving from manually managed Application manifests to a factory model using ApplicationSets and Kustomize, we fundamentally change our operational posture. We are no longer just managing configuration; we are building a declarative platform that enforces consistency, reduces toil, and enables safe, scalable fleet-wide changes.

    The combination of ApplicationSet generators to define what gets deployed where, and Kustomize overlays to manage the deltas between environments, provides a robust and flexible framework. This pattern directly addresses the problem of configuration drift at its source, replacing error-prone manual tasks with a programmable, auditable, and truly GitOps-native workflow. It is this level of automation and control that transforms ArgoCD from a simple deployment tool into the backbone of a modern, scalable internal developer platform.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles