Argo CD ApplicationSets for Git-Generated Multi-Cluster Deployments

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Scaling Problem with Vanilla GitOps

In a mature Kubernetes ecosystem, managing a single cluster with Argo CD is straightforward. You define Application custom resources in a Git repository, and Argo CD ensures the cluster state matches the repository's desired state. However, this model breaks down at scale. When you're managing dozens or hundreds of clusters—for different environments (dev, staging, prod), regions (us-east-1, eu-west-2), or tenants—the operational overhead of maintaining a separate Application manifest for each target becomes untenable.

Duplicating manifests for each cluster is a violation of the DRY (Don't Repeat Yourself) principle and introduces significant risk. A simple change, like updating a monitoring agent's version, requires modifying dozens of files, a process prone to human error. This is the precise problem the ApplicationSet controller, a sub-project of Argo CD, was designed to solve.

While the ApplicationSet controller offers several generators (List, Cluster, Matrix), the Git generator provides the most powerful and flexible pattern for production environments. It allows you to use a Git repository not just as the source of application manifests, but as the source of truth for which applications should be deployed to which clusters, and with what configuration. This article provides a deep dive into implementing this pattern, focusing on advanced use cases and production-ready configurations.


Prerequisites: A Multi-Cluster Argo CD Setup

This article assumes you have a running Argo CD instance with multiple Kubernetes clusters registered as deployment targets. The core mechanism for this is creating a Secret in the Argo CD namespace for each remote cluster. The secret contains the cluster's API server URL and authentication credentials. Argo CD uses labels on these secrets to identify and target clusters.

For our examples, let's assume we have two clusters registered, labeled for environment and region:

Cluster 1 Secret (prod-us-east-1-cluster):

yaml
apiVersion: v1
kind: Secret
metadata:
  name: prod-us-east-1-cluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    environment: production
    region: us-east-1
stringData:
  name: prod-us-east-1
  server: https://123.45.67.89
  config: |
    {
      "bearerToken": "<token>",
      "tlsClientConfig": {
        "insecure": false,
        "caData": "<ca-data>"
      }
    }

Cluster 2 Secret (staging-eu-west-1-cluster):

yaml
apiVersion: v1
kind: Secret
metadata:
  name: staging-eu-west-1-cluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    environment: staging
    region: eu-west-1
stringData:
  name: staging-eu-west-1
  server: https://987.65.43.21
  config: |
    {
      "bearerToken": "<token>",
      "tlsClientConfig": {
        "insecure": false,
        "caData": "<ca-data>"
      }
    }

With this setup, we can proceed to define ApplicationSet resources that dynamically generate Application resources targeting these clusters.


Core Pattern: Git Directory Generator

The Git Directory generator scans a specified directory within a Git repository and generates parameters for each subdirectory found. This pattern is exceptionally effective for managing cluster-specific configurations.

1. The Git Repository Structure

First, we establish a clear structure in our GitOps configuration repository. This structure itself becomes part of the declarative configuration.

sh
# Example git repository: my-cluster-configs
.
└── clusters
    ├── prod-us-east-1
    │   └── config.json
    └── staging-eu-west-1
        └── config.json

Each subdirectory under clusters/ represents a target cluster. Inside each, a config.json file contains cluster-specific parameters.

clusters/prod-us-east-1/config.json:

json
{
  "clusterName": "prod-us-east-1",
  "clusterUrl": "https://123.45.67.89",
  "environment": "production",
  "monitoring": {
    "namespace": "monitoring-prod",
    "prometheus": {
      "retention": "30d",
      "replicas": 3
    }
  }
}

clusters/staging-eu-west-1/config.json:

json
{
  "clusterName": "staging-eu-west-1",
  "clusterUrl": "https://987.65.43.21",
  "environment": "staging",
  "monitoring": {
    "namespace": "monitoring-staging",
    "prometheus": {
      "retention": "7d",
      "replicas": 1
    }
  }
}

This structure is highly scalable. Onboarding a new cluster is as simple as creating a new directory and config.json file in a pull request.

2. The ApplicationSet Manifest

Now, we create the ApplicationSet that consumes this repository structure. This single manifest will generate an Argo CD Application for each subdirectory it finds.

yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-monitoring-stack
  namespace: argocd
spec:
  generators:
    - git:
        repoURL: https://github.com/my-org/my-cluster-configs.git
        revision: HEAD
        directories:
          - path: clusters/*
  template:
    metadata:
      name: '{{path.basename}}-monitoring'
      # Using path.basename ensures a unique application name, e.g., 'prod-us-east-1-monitoring'
    spec:
      project: default
      source:
        repoURL: https://github.com/my-org/my-app-charts.git
        targetRevision: 1.2.3
        path: helm-charts/prometheus-stack
        helm:
          valueFiles:
            - values.yaml
          # --- Advanced Parameter Overrides --- #
          parameters:
            - name: "prometheus.prometheusSpec.retention"
              value: "{{monitoring.prometheus.retention}}"
            - name: "prometheus.replicaCount"
              value: "{{monitoring.prometheus.replicas}}"
            - name: "grafana.adminPassword"
              value: "$argocd-secrets:grafana-passwords:{{environment}}" # Dynamic secret retrieval

      destination:
        server: '{{clusterUrl}}' # From config.json
        namespace: '{{monitoring.namespace}}' # From config.json

      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Deconstructing the Manifest:

  • spec.generators.git: This is the core of the pattern.
  • * repoURL: Points to our configuration repository.

    directories.path: clusters/: This is the glob pattern. The ApplicationSet controller will scan the clusters directory and treat every immediate subdirectory as a source of parameters.

  • spec.template: This is the blueprint for the Application resources that will be generated. It's heavily parameterized.
  • * metadata.name: '{{path.basename}}-monitoring': The ApplicationSet controller makes several parameters available. path.basename resolves to the name of the subdirectory found (e.g., prod-us-east-1). This guarantees unique Application names.

    * source.helm.parameters: This is where the magic happens. We're overriding Helm chart values using data parsed from the corresponding config.json file. The controller automatically parses the JSON and makes its keys available as template variables (e.g., {{monitoring.prometheus.retention}}).

    Dynamic Secret Retrieval: The line value: "$argocd-secrets:grafana-passwords:{{environment}}" is a powerful, production-critical pattern. It instructs Argo CD not* to use a plaintext value, but to fetch a key from a Kubernetes Secret named grafana-passwords in the Argo CD namespace. The key it fetches is dynamic, based on the environment field from our config.json. This avoids committing secrets to Git while maintaining per-environment configuration.

    * destination.server: '{{clusterUrl}}': The target Kubernetes API server is dynamically set from our configuration file, ensuring the generated application targets the correct cluster.

    When this ApplicationSet is applied, the controller will:

  • Clone https://github.com/my-org/my-cluster-configs.git.
  • Find two directories: clusters/prod-us-east-1 and clusters/staging-eu-west-1.
  • For each directory, it will parse the config.json file.
  • It will then generate two distinct Application resources, templating the values from each respective config.json file.

  • Advanced Pattern: The Matrix Generator for Many-to-Many Deployments

    The Git Directory generator is excellent for cluster-specific bootstrapping. However, you often need to deploy a standard set of applications (e.g., observability, security, networking) to a specific group of clusters. This is a many-to-many mapping problem. The Matrix generator solves this by combining the outputs of two or more other generators.

    Let's model a scenario: Deploy a list of addon applications to all production clusters.

    1. The Git Repository Structure

    We'll define our applications and their configurations in one part of the repo and our clusters in another.

    sh
    # Example git repository: my-platform-configs
    .
    ├── addons
    │   ├── cert-manager
    │   │   └── config.json
    │   └── external-dns
    │       └── config.json
    └── clusters
        ├── prod-us-east-1
        │   └── config.json
        ├── prod-eu-west-1
        │   └── config.json
        └── staging-eu-west-1
            └── config.json

    addons/cert-manager/config.json:

    json
    {
      "appName": "cert-manager",
      "namespace": "cert-manager",
      "repoURL": "https://charts.jetstack.io",
      "chart": "cert-manager",
      "version": "v1.8.0"
    }

    clusters/prod-us-east-1/config.json:

    json
    {
      "name": "prod-us-east-1",
      "server": "https://123.45.67.89",
      "environment": "production"
    }

    2. The Matrix ApplicationSet Manifest

    This ApplicationSet will generate the Cartesian product of the two generators: (every addon) x (every cluster).

    yaml
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: platform-addons
      namespace: argocd
    spec:
      # --- Define a filter to only act on Production clusters --- #
      goTemplate: true # Enable Go template functionality
      goTemplateOptions: ["missingkey=error"]
      generators:
        - matrix:
            generators:
              # Generator 1: Discover all addons
              - git:
                  repoURL: https://github.com/my-org/my-platform-configs.git
                  revision: HEAD
                  directories:
                    - path: addons/*
              # Generator 2: Discover all clusters
              - clusters:
                  # Use a label selector to target only production clusters
                  selector:
                    matchLabels:
                      environment: production
    
      # --- Use a template filter to ensure we only combine addons with their intended clusters --- #
      # This is a safety check; the primary filtering is in the cluster generator selector.
      # For this example, we assume all addons go to all selected clusters.
    
      template:
        metadata:
          # Create a unique name like 'prod-us-east-1-cert-manager'
          name: '{{name}}-{{path.basename}}'
          labels:
            cluster: '{{name}}'
            addon: '{{path.basename}}'
        spec:
          project: platform
          source:
            repoURL: '{{repoURL}}' # From addons/../config.json
            chart: '{{chart}}' # From addons/../config.json
            targetRevision: '{{version}}' # From addons/../config.json
            helm:
              releaseName: '{{appName}}'
    
          destination:
            server: '{{server}}' # From cluster secret
            namespace: '{{namespace}}' # From addons/../config.json
    
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            syncOptions:
              - CreateNamespace=true

    Deconstructing the Matrix Pattern:

  • generators.matrix: This is the key element. It contains a list of sub-generators.
  • matrix.generators[0]: A Git Directory generator, identical to our first example. It scans the addons/* path and produces a list of parameter sets, one for cert-manager and one for external-dns.
  • matrix.generators[1]: A Cluster generator. This is a different type of generator that doesn't use Git. Instead, it queries the secrets in the argocd namespace that are labeled with argocd.argoproj.io/secret-type: cluster. The selector allows us to filter these clusters. In this case, we're only selecting secrets with the environment: production label. This will produce a list of parameter sets for prod-us-east-1 and prod-eu-west-1.
  • Cartesian Product: The Matrix generator combines these two lists. It will generate a parameter set for every possible combination:
  • * (cert-manager params) + (prod-us-east-1 params)

    * (cert-manager params) + (prod-eu-west-1 params)

    * (external-dns params) + (prod-us-east-1 params)

    * (external-dns params) + (prod-eu-west-1 params)

  • Templating: The template section now has access to parameters from both generators. {{path.basename}}, {{repoURL}}, {{chart}} come from the Git generator. {{name}} and {{server}} come from the Cluster generator.
  • This results in four Application resources being created, correctly deploying both addons to both production clusters, without deploying anything to the staging cluster.


    Edge Cases and Performance Considerations

    While powerful, ApplicationSet controllers in a large-scale environment require careful management.

    Edge Case: Handling Deletion

    When a directory is deleted from the clusters/ path in our Git repository, the ApplicationSet controller will detect this change on its next refresh. It will then delete the corresponding generated Application resource. If that Application has a syncPolicy with prune: true, Argo CD will proceed to delete all the Kubernetes resources associated with that application from the target cluster. This is the desired behavior for decommissioning a cluster or an application, but it highlights the critical importance of code review and branch protection on your GitOps repository.

    Edge Case: Merging Helm Values from Multiple Sources

    Sometimes you need a base set of Helm values for an application, with specific overrides per cluster. The Git File generator can be used for this. You can have a base-values.yaml and a cluster-specific-values.yaml and merge them.

    Git structure:

    sh
    └── prometheus
        ├── base-values.yaml
        └── clusters
            └── prod-us-east-1
                └── override-values.yaml

    ApplicationSet Snippet:

    yaml
    spec:
      # ... generator discovers clusters/prod-us-east-1
      template:
        spec:
          source:
            repoURL: ...
            path: helm-charts/prometheus
            helm:
              valueFiles:
                - /apps/prometheus/base-values.yaml # Absolute path in repo
                - '{{path}}/override-values.yaml'    # Path from generator

    This pattern provides a clean separation of concerns between default configuration and per-cluster exceptions.

    Performance at Scale (100+ Clusters)

    The ApplicationSet controller's performance is primarily bound by two factors: Git repository polling and Kubernetes API server load.

  • Git Refresh Interval: By default, Argo CD polls Git repositories every 3 minutes. With hundreds of ApplicationSet resources, this can lead to significant load on your Git provider and the Argo CD repo-server. You can adjust this interval in the argocd-cm ConfigMap:
  • yaml
        data:
          timeout.reconciliation: 180s

    For ApplicationSet, you can also configure the requeueAfterSeconds field in the generator to control how often it re-runs, independent of the Git refresh.

  • Controller Resource Limits: The argocd-applicationset-controller deployment itself needs adequate CPU and memory. A single controller can manage hundreds of generated applications, but if you observe high reconciliation latency, you may need to increase its resource limits. Monitor the workqueue_depth and reconcile_time_seconds Prometheus metrics exposed by the controller.
  • Webhook vs. Polling: For very large installations, consider configuring Git webhooks. This changes the model from pull (polling) to push. Your Git provider notifies Argo CD of a new commit, triggering an immediate refresh. This is far more efficient than constant polling but requires exposing the Argo CD API server and configuring security correctly.
  • Conclusion

    The ApplicationSet controller, particularly with the Git and Matrix generators, is a transformative tool for platform teams managing Kubernetes at scale. By shifting the definition of what runs where into a structured Git repository, you create a scalable, auditable, and declarative system for multi-cluster application management. The patterns discussed here—using Git directories for cluster bootstrapping, combining generators with the Matrix strategy for addon management, and leveraging dynamic secret injection—are not theoretical exercises. They are battle-tested strategies for building a robust and maintainable internal developer platform on top of Kubernetes and the Argo ecosystem.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles