AWS Lambda Cold Starts: Provisioned Concurrency vs. SnapStart for Java

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Unforgiving Latency: Deconstructing the Java Lambda Cold Start

For senior engineers building serverless systems, the term "cold start" is a familiar adversary. While present in all Lambda runtimes, it manifests as a particularly stubborn performance bottleneck in the Java ecosystem. Before we dissect the advanced mitigation strategies, it's crucial to move beyond the high-level understanding and analyze the precise components contributing to this latency. A standard cold start isn't a monolithic event; it's a sequence of time-consuming operations:

  • Execution Environment Provisioning: AWS allocates resources, downloads the function code, and starts the microVM (Firecracker).
  • Runtime Bootstrap: The Java Virtual Machine (JVM) itself is initialized. This is a non-trivial step involving loading core libraries, setting up memory heaps, and initializing internal structures.
  • Static Initialization & Class Loading: Your application code's static initializers are executed. The JVM's classloader finds and loads necessary classes from your JAR file. With large frameworks like Spring or Quarkus, this can involve scanning the classpath and loading thousands of classes.
  • Framework Initialization (The Great Bottleneck): Dependency Injection (DI) containers are built. Spring's ApplicationContext is refreshed, beans are instantiated, proxies are created, and component scans are performed. This is often the single largest contributor to Java's cold start latency.
  • Handler Execution: Finally, your actual handler method is invoked.
  • Here's a conceptual breakdown of where the time is spent in a typical Spring Boot application's cold start:

    plaintext
    |----------------|-----------------|--------------------------------------|----------|
    | Env Provision  |   JVM Startup   |      Application Initialization      |  Invoke  |
    | (AWS Internal) | (e.g., 200ms)   | (e.g., Spring Context, DB Pool, etc) | (e.g., 50ms) |
    |                |                 |         (CAN BE 2-10+ SECONDS)       |          |

    A P99 cold start latency of 5-8 seconds for a non-trivial Java Lambda is not uncommon. For latency-sensitive APIs or synchronous data processing pipelines, this is unacceptable. To address this, AWS has provided two powerful, yet fundamentally different, solutions: Provisioned Concurrency (PC) and Lambda SnapStart. This article provides a deep, comparative analysis to help you architect the right solution for your production workload.

    Strategy 1: Provisioned Concurrency (PC) - The Brute Force Guarantee

    Provisioned Concurrency is the original solution to the cold start problem. Its philosophy is simple: don't have a cold start by ensuring the environment is already initialized before the request arrives. It achieves this by pre-emptively executing the entire initialization phase (steps 1-4 from our list above) for a specified number of concurrent environments and holding them in a "hot" state, ready to immediately execute the handler method.

    The Underlying Mechanism

    When you configure PC, Lambda does the following:

    • It immediately provisions and initializes the number of execution environments you requested.
    • It runs your function's initialization code—everything outside the handler method—including JVM startup and framework bootstrapping.
  • These environments are now "primed." When an invocation arrives that is routed to a provisioned environment, Lambda skips the entire init phase and proceeds directly to the Invoke phase.
  • This effectively transforms a cold start into a warm start, providing the most predictable, low-latency performance possible.

    Production Implementation with Terraform

    Configuring PC is an infrastructure concern. Here is a production-grade example using Terraform to configure a Lambda function with 10 units of provisioned concurrency, tied to a specific alias. Using an alias is critical for blue/green deployments.

    hcl
    # main.tf
    
    resource "aws_lambda_function" "java_api" {
      function_name = "MyLatencySensitiveJavaAPI"
      role          = aws_iam_role.lambda_exec.arn
      handler       = "com.example.StreamLambdaHandler::handleRequest"
      runtime       = "java17"
      memory_size   = 1024
      timeout       = 30
    
      filename         = "target/my-app-1.0.0-aws.jar"
      source_code_hash = filebase64sha256("target/my-app-1.0.0-aws.jar")
    
      # We publish a new version on every code change to enable aliases
      publish = true
    }
    
    resource "aws_lambda_alias" "live" {
      name             = "live"
      function_name    = aws_lambda_function.java_api.function_name
      function_version = aws_lambda_function.java_api.version
    }
    
    resource "aws_lambda_provisioned_concurrency_config" "api_pc" {
      function_name                     = aws_lambda_alias.live.function_name
      provisioned_concurrent_executions = 10
      qualifier                         = aws_lambda_alias.live.name
    
      # Ensure PC is configured only after the alias is pointing to the new version
      depends_on = [aws_lambda_alias.live]
    }

    Performance & Cost Analysis

    Performance: With PC, the P99 latency for requests within the provisioned limit is virtually identical to a warm start. You can expect consistent sub-50ms invocation times (excluding your business logic's execution time). However, a critical edge case is concurrency spillover. If you receive 11 concurrent requests for your function configured with provisioned_concurrent_executions = 10, the 11th request will experience a full cold start. This makes accurate capacity planning essential.

    Cost Model: PC's power comes at a significant cost. You are billed for two components:

  • Provisioned Concurrency Cost: You pay for the amount of concurrency you provision, for the time it is provisioned, regardless of whether it's used. The price is (Memory in GB) (Price per GB-hour) (Number of Concurrent Environments) * (Time). This is like paying for an idle EC2 instance.
  • Invocation Cost: You still pay the standard per-request and per-GB-second execution cost when the function is invoked.
  • Example Calculation (us-east-1):

    • Lambda Memory: 1024 MB (1 GB)
    • Provisioned Concurrency: 10 units
    • PC Price: ~$0.0000046875 per GB-second

    Hourly PC Cost = 1 GB 10 units 3600 seconds/hour * $0.0000046875/GB-sec = $0.16875 per hour

    Monthly PC Cost = $0.16875 24 30 = ~$121.50 per month (This is before a single request is processed).

    Advanced Pattern: Dynamic Scaling with Application Auto Scaling

    For workloads with predictable traffic patterns (e.g., high traffic during business hours), a static PC value is inefficient. You can use Application Auto Scaling to dynamically adjust PC, optimizing cost.

    Here's a Terraform implementation for scaling based on a schedule:

    hcl
    # auto_scaling.tf
    
    resource "aws_appautoscaling_target" "lambda_pc_target" {
      max_capacity       = 50
      min_capacity       = 5
      resource_id        = "function:${aws_lambda_alias.live.function_name}:${aws_lambda_alias.live.name}"
      scalable_dimension = "lambda:function:ProvisionedConcurrency"
      service_namespace  = "lambda"
    }
    
    # Scale up for business hours (9 AM UTC)
    resource "aws_appautoscaling_scheduled_action" "scale_up" {
      name               = "scale-up-weekdays"
      service_namespace  = aws_appautoscaling_target.lambda_pc_target.service_namespace
      resource_id        = aws_appautoscaling_target.lambda_pc_target.resource_id
      scalable_dimension = aws_appautoscaling_target.lambda_pc_target.scalable_dimension
      schedule           = "cron(0 9 * * ? *)" # Every day at 9 AM UTC
      scalable_target_action {
        min_capacity = 20
        max_capacity = 50
      }
    }
    
    # Scale down for off-peak hours (5 PM UTC)
    resource "aws_appautoscaling_scheduled_action" "scale_down" {
      name               = "scale-down-weekdays"
      service_namespace  = aws_appautoscaling_target.lambda_pc_target.service_namespace
      resource_id        = aws_appautoscaling_target.lambda_pc_target.resource_id
      scalable_dimension = aws_appautoscaling_target.lambda_pc_target.scalable_dimension
      schedule           = "cron(0 17 * * ? *)" # Every day at 5 PM UTC
      scalable_target_action {
        min_capacity = 5
        max_capacity = 50
      }
    }

    Strategy 2: Lambda SnapStart - The Intelligent Snapshot

    Introduced at re:Invent 2022, SnapStart is a revolutionary approach specifically for Java runtimes (initially). Instead of keeping environments constantly running, SnapStart dramatically speeds up the initialization process by leveraging Firecracker's microVM snapshotting capabilities. It's an opt-in feature that changes the Lambda lifecycle.

    The Underlying Mechanism: Snapshot and Resume

    SnapStart splits the function lifecycle into two distinct phases:

  • Deployment (Snapshot Phase): When you publish a new version of a SnapStart-enabled function, Lambda executes the entire initialization phase once. It starts the JVM, loads all classes, and runs your framework's bootstrap code. Just before your handler would be invoked, Lambda pauses the environment and takes a full snapshot of the memory and disk state of the microVM. This encrypted snapshot is then cached and optimized for fast resumption.
  • Invocation (Resume Phase): When a new request arrives that requires a new environment, Lambda doesn't start from scratch. It finds a cached snapshot, loads it into a Firecracker microVM, and resumes execution from the point where the snapshot was taken. This entire process bypasses the costly JVM and framework initialization, reducing the startup time by up to 90%.
  • Production Implementation with Terraform

    Enabling SnapStart is remarkably simple at the infrastructure level. It's a single configuration block on the Lambda resource. Crucially, SnapStart only works on published versions and aliases, not on the $LATEST alias.

    hcl
    # main.tf for SnapStart
    
    resource "aws_lambda_function" "java_api_snapstart" {
      function_name = "MyFastJavaAPISnapStart"
      role          = aws_iam_role.lambda_exec.arn
      handler       = "com.example.StreamLambdaHandler::handleRequest"
      runtime       = "java17"
      memory_size   = 1024
      timeout       = 30
    
      filename         = "target/my-app-1.0.0-aws.jar"
      source_code_hash = filebase64sha256("target/my-app-1.0.0-aws.jar")
    
      publish = true
    
      # Enable SnapStart
      snap_start {
        apply_on = "PublishedVersions"
      }
    }
    
    resource "aws_lambda_alias" "live_snapstart" {
      name             = "live"
      function_name    = aws_lambda_function.java_api_snapstart.function_name
      function_version = aws_lambda_function.java_api_snapstart.version
    }
    

    Performance & Cost Analysis

    Performance: SnapStart delivers dramatic improvements. A 6-second cold start can often be reduced to 400-600ms. While not as instantaneous as a fully warm PC environment (which might be <50ms), it's a massive reduction that makes Java viable for a much wider range of synchronous use cases. The resume-from-snapshot process itself has a small overhead.

    Cost Model: This is SnapStart's killer feature: it is free. There are no additional charges for enabling SnapStart. You pay the standard per-request and per-GB-second execution costs. This fundamentally alters the cost-performance trade-off for serverless Java.

    Advanced Considerations & Edge Cases: The Uniqueness Constraint

    SnapStart's power comes with a critical caveat: the state captured in the snapshot is reused for every resumed environment. This can lead to subtle, hard-to-debug issues if your initialization code generates state that must be unique per execution environment. This is known as the "uniqueness constraint."

    Edge Case 1: Cryptographic Randomness

    If you initialize a java.security.SecureRandom instance during the init phase, the initial seed will be part of the snapshot. Every Lambda environment resumed from that snapshot will start with the exact same seed, producing the same sequence of "random" numbers. This is a severe security vulnerability.

    Solution: CRaC Hooks

    The Corretto team has developed an open-source project called Coordinated Restore at Checkpoint (CRaC), which defines a simple API for responding to snapshot/restore events. The Lambda runtime supports these hooks.

    You implement the org.crac.Resource interface and register it with the global context. The afterRestore method is invoked immediately after the environment is resumed from a snapshot, giving you a chance to fix non-unique state.

    java
    import org.crac.Context;
    import org.crac.Core;
    import org.crac.Resource;
    import java.security.SecureRandom;
    
    // Assuming this is part of a dependency injection managed bean or handler
    public class CryptographyService implements Resource {
    
        private SecureRandom secureRandom;
    
        public CryptographyService() {
            // Initial setup
            this.secureRandom = new SecureRandom();
            System.out.println("CryptographyService initialized with random: " + secureRandom.toString());
            
            // Register this instance to receive CRaC events
            Core.getGlobalContext().register(this);
        }
    
        @Override
        public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
            // Called before the snapshot is taken. Can be used for cleanup.
            System.out.println("CRaC: beforeCheckpoint hook triggered.");
        }
    
        @Override
        public void afterRestore(Context<? extends Resource> context) throws Exception {
            // Called after restoring from a snapshot. THIS IS THE CRITICAL PART.
            System.out.println("CRaC: afterRestore hook triggered. Re-seeding SecureRandom.");
            // Create a new instance or re-seed to ensure uniqueness
            this.secureRandom = new SecureRandom(); 
        }
    
        public byte[] generateRandomBytes(int length) {
            byte[] bytes = new byte[length];
            this.secureRandom.nextBytes(bytes);
            return bytes;
        }
    }

    To use this, you need the CRaC dependency:

    xml
    <dependency>
        <groupId>io.github.crac</groupId>
        <artifactId>crac</artifactId>
        <version>1.4.0</version>
    </dependency>

    Edge Case 2: Network Connections (Database Pools)

    Any network connections (e.g., to a PostgreSQL or MySQL database) established during the init phase will be captured in the snapshot. When the environment is resumed minutes or hours later, these connections will be stale, closed by a firewall, or otherwise invalid, leading to runtime errors.

    Solution: Lazy Initialization or CRaC Hooks

  • Lazy Initialization (Simple): Don't initialize the connection pool in the constructor. Initialize it on the first invocation within the handler method. This is simple but adds latency to the first request in a new environment.
  • CRaC Hooks (Advanced & Recommended): The ideal pattern is to initialize the connection pool during the init phase to get the performance benefit, but use CRaC hooks to manage its lifecycle.
  • Here's a conceptual example with a HikariCP connection pool in a Spring Boot application:

    java
    import com.zaxxer.hikari.HikariDataSource;
    import org.crac.Context;
    import org.crac.Core;
    import org.crac.Resource;
    import org.springframework.stereotype.Component;
    
    import javax.annotation.PostConstruct;
    import javax.annotation.PreDestroy;
    
    @Component
    public class SnapstartDatabaseConnectionManager implements Resource {
    
        private final HikariDataSource dataSource;
    
        public SnapstartDatabaseConnectionManager(HikariDataSource dataSource) {
            this.dataSource = dataSource;
            // We don't register in the constructor because Spring needs to fully initialize the bean first.
        }
    
        @PostConstruct
        public void registerCracHook() {
            System.out.println("Registering DB connection manager with CRaC context.");
            Core.getGlobalContext().register(this);
        }
    
        @Override
        public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
            if (dataSource != null && !dataSource.isClosed()) {
                System.out.println("CRaC: Closing database connection pool before checkpoint.");
                dataSource.close();
            }
        }
    
        @Override
        public void afterRestore(Context<? extends Resource> context) throws Exception {
            if (dataSource != null) {
                System.out.println("CRaC: Re-initializing database connection pool after restore.");
                // HikariCP's configuration is retained, closing and re-opening effectively re-initializes it.
                // A more robust implementation might re-create the object if necessary.
                // Here we assume Spring will re-inject a working instance or we can re-start it.
                // This part is framework-specific. For vanilla Hikari, you'd create a new instance.
                // For Spring, you might need to refresh the specific bean.
                // A simpler approach for Spring might be to just call `dataSource.resumePool()` if available,
                // or fully re-initialize.
                System.out.println("Pool state needs to be managed here.");
            }
        }
    }

    This pattern ensures that you take the snapshot with a clean slate (no active connections) and re-establish fresh connections upon restoration.

    Head-to-Head Comparison & Decision Framework

    FeatureProvisioned Concurrency (PC)Lambda SnapStart
    PerformanceBest-in-class. P99 latency is identical to a warm start (~<50ms).Excellent. P99 latency reduced by up to 90% (e.g., 6s -> 500ms).
    CostExpensive. Billed for idle capacity 24/7.Free. No additional cost beyond standard Lambda pricing.
    Spillover BehaviorRequests exceeding provisioned limit suffer a full cold start.No spillover concept; all new environments benefit from SnapStart.
    ImplementationInfrastructure-only change (Terraform/SAM).Infrastructure-only change, but requires code changes (CRaC hooks) for stateful apps.
    Code ImpactNone. Your application code is unaware of PC.Significant. Requires careful handling of randomness, network connections, etc.
    DeploymentFast. Simply updates the alias configuration.Slower. Adds a snapshotting step to the version publishing process.
    PredictabilityHighly predictable latency as long as you stay within capacity.Highly predictable, but with a slightly higher baseline latency than PC.
    Supported RuntimesAll runtimes.Java runtimes only (as of early 2024).

    The Senior Engineer's Decision Framework

    Use this framework to make a production-ready decision:

  • Is your application written in a supported Java runtime?
  • * No: Your only option is Provisioned Concurrency.

    * Yes: Proceed.

  • What is your primary constraint: absolute lowest latency or budget?
  • * Budget: SnapStart is the clear winner. Its performance is excellent for the vast majority of use cases at zero additional cost.

    * Absolute Lowest Latency: If you are building a high-frequency trading system or an ad-bidding platform where every millisecond counts, PC's guarantee of warm-start performance might be worth the cost.

  • Can your application code be made "snapshot-safe"?
  • * Yes: You are comfortable implementing CRaC hooks and auditing your code for uniqueness constraints. -> Choose SnapStart.

    * No: The application is a legacy monolith, has complex native dependencies, or the engineering effort to refactor is too high. -> Choose Provisioned Concurrency as a safer, albeit more expensive, option.

  • What does your traffic pattern look like?
  • * Spiky & Unpredictable: SnapStart excels here. It handles sudden bursts of traffic gracefully without requiring you to pay for idle capacity.

    * Predictable & Sustained: PC with Application Auto Scaling can be very cost-effective and performant, matching capacity precisely to your known traffic patterns.

    Conclusion: A New Default for Serverless Java

    For years, Provisioned Concurrency was the only tool available to tame the Java cold start beast, forcing teams into a difficult trade-off between performance and cost. Lambda SnapStart has fundamentally changed this equation.

    For the vast majority of new, latency-sensitive serverless Java applications, SnapStart should be the default starting point. Its combination of massive performance improvement and zero cost is a game-changer. The engineering investment required to implement CRaC hooks and ensure snapshot safety is a one-time effort that pays long-term dividends in both performance and cost savings.

    Provisioned Concurrency remains a valid and powerful tool, but its role has become more niche. It is best reserved for applications with ultra-low latency requirements (where the ~400ms overhead of SnapStart is still too high), applications with unpredictable state that cannot be refactored, or workloads running on non-Java runtimes. By understanding the deep technical trade-offs presented here, you can now make an informed, architectural decision that best fits the unique constraints of your system.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles