AWS Lambda Cold Starts: Provisioned Concurrency vs. SnapStart for Java

October 3, 2025

19 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Unforgiving Latency: Deconstructing the Java Lambda Cold Start

For senior engineers building serverless systems, the term "cold start" is a familiar adversary. While present in all Lambda runtimes, it manifests as a particularly stubborn performance bottleneck in the Java ecosystem. Before we dissect the advanced mitigation strategies, it's crucial to move beyond the high-level understanding and analyze the precise components contributing to this latency. A standard cold start isn't a monolithic event; it's a sequence of time-consuming operations:

Execution Environment Provisioning: AWS allocates resources, downloads the function code, and starts the microVM (Firecracker).

Runtime Bootstrap: The Java Virtual Machine (JVM) itself is initialized. This is a non-trivial step involving loading core libraries, setting up memory heaps, and initializing internal structures.

Static Initialization & Class Loading: Your application code's static initializers are executed. The JVM's classloader finds and loads necessary classes from your JAR file. With large frameworks like Spring or Quarkus, this can involve scanning the classpath and loading thousands of classes.

Framework Initialization (The Great Bottleneck): Dependency Injection (DI) containers are built. Spring's ApplicationContext is refreshed, beans are instantiated, proxies are created, and component scans are performed. This is often the single largest contributor to Java's cold start latency.

Handler Execution: Finally, your actual handler method is invoked.

Here's a conceptual breakdown of where the time is spent in a typical Spring Boot application's cold start:

plaintext

|----------------|-----------------|--------------------------------------|----------|
| Env Provision  |   JVM Startup   |      Application Initialization      |  Invoke  |
| (AWS Internal) | (e.g., 200ms)   | (e.g., Spring Context, DB Pool, etc) | (e.g., 50ms) |
|                |                 |         (CAN BE 2-10+ SECONDS)       |          |

A P99 cold start latency of 5-8 seconds for a non-trivial Java Lambda is not uncommon. For latency-sensitive APIs or synchronous data processing pipelines, this is unacceptable. To address this, AWS has provided two powerful, yet fundamentally different, solutions: Provisioned Concurrency (PC) and Lambda SnapStart. This article provides a deep, comparative analysis to help you architect the right solution for your production workload.

Strategy 1: Provisioned Concurrency (PC) - The Brute Force Guarantee

Provisioned Concurrency is the original solution to the cold start problem. Its philosophy is simple: don't have a cold start by ensuring the environment is already initialized before the request arrives. It achieves this by pre-emptively executing the entire initialization phase (steps 1-4 from our list above) for a specified number of concurrent environments and holding them in a "hot" state, ready to immediately execute the handler method.

The Underlying Mechanism

When you configure PC, Lambda does the following:

It immediately provisions and initializes the number of execution environments you requested.
It runs your function's initialization code—everything outside the handler method—including JVM startup and framework bootstrapping.

These environments are now "primed." When an invocation arrives that is routed to a provisioned environment, Lambda skips the entire init phase and proceeds directly to the Invoke phase.

This effectively transforms a cold start into a warm start, providing the most predictable, low-latency performance possible.

Production Implementation with Terraform

Configuring PC is an infrastructure concern. Here is a production-grade example using Terraform to configure a Lambda function with 10 units of provisioned concurrency, tied to a specific alias. Using an alias is critical for blue/green deployments.

hcl

# main.tf

resource "aws_lambda_function" "java_api" {
  function_name = "MyLatencySensitiveJavaAPI"
  role          = aws_iam_role.lambda_exec.arn
  handler       = "com.example.StreamLambdaHandler::handleRequest"
  runtime       = "java17"
  memory_size   = 1024
  timeout       = 30

  filename         = "target/my-app-1.0.0-aws.jar"
  source_code_hash = filebase64sha256("target/my-app-1.0.0-aws.jar")

  # We publish a new version on every code change to enable aliases
  publish = true
}

resource "aws_lambda_alias" "live" {
  name             = "live"
  function_name    = aws_lambda_function.java_api.function_name
  function_version = aws_lambda_function.java_api.version
}

resource "aws_lambda_provisioned_concurrency_config" "api_pc" {
  function_name                     = aws_lambda_alias.live.function_name
  provisioned_concurrent_executions = 10
  qualifier                         = aws_lambda_alias.live.name

  # Ensure PC is configured only after the alias is pointing to the new version
  depends_on = [aws_lambda_alias.live]
}

Performance & Cost Analysis

Performance: With PC, the P99 latency for requests within the provisioned limit is virtually identical to a warm start. You can expect consistent sub-50ms invocation times (excluding your business logic's execution time). However, a critical edge case is concurrency spillover. If you receive 11 concurrent requests for your function configured with provisioned_concurrent_executions = 10, the 11th request will experience a full cold start. This makes accurate capacity planning essential.

Cost Model: PC's power comes at a significant cost. You are billed for two components:

Provisioned Concurrency Cost: You pay for the amount of concurrency you provision, for the time it is provisioned, regardless of whether it's used. The price is (Memory in GB) (Price per GB-hour) (Number of Concurrent Environments) * (Time). This is like paying for an idle EC2 instance.

Invocation Cost: You still pay the standard per-request and per-GB-second execution cost when the function is invoked.

Example Calculation (us-east-1):

Lambda Memory: 1024 MB (1 GB)
Provisioned Concurrency: 10 units
PC Price: ~$0.0000046875 per GB-second

Hourly PC Cost = 1 GB 10 units 3600 seconds/hour * $0.0000046875/GB-sec = $0.16875 per hour

Monthly PC Cost = $0.16875 24 30 = ~$121.50 per month (This is before a single request is processed).

Advanced Pattern: Dynamic Scaling with Application Auto Scaling

For workloads with predictable traffic patterns (e.g., high traffic during business hours), a static PC value is inefficient. You can use Application Auto Scaling to dynamically adjust PC, optimizing cost.

Here's a Terraform implementation for scaling based on a schedule:

hcl

# auto_scaling.tf

resource "aws_appautoscaling_target" "lambda_pc_target" {
  max_capacity       = 50
  min_capacity       = 5
  resource_id        = "function:${aws_lambda_alias.live.function_name}:${aws_lambda_alias.live.name}"
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  service_namespace  = "lambda"
}

# Scale up for business hours (9 AM UTC)
resource "aws_appautoscaling_scheduled_action" "scale_up" {
  name               = "scale-up-weekdays"
  service_namespace  = aws_appautoscaling_target.lambda_pc_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_pc_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_pc_target.scalable_dimension
  schedule           = "cron(0 9 * * ? *)" # Every day at 9 AM UTC
  scalable_target_action {
    min_capacity = 20
    max_capacity = 50
  }
}

# Scale down for off-peak hours (5 PM UTC)
resource "aws_appautoscaling_scheduled_action" "scale_down" {
  name               = "scale-down-weekdays"
  service_namespace  = aws_appautoscaling_target.lambda_pc_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_pc_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_pc_target.scalable_dimension
  schedule           = "cron(0 17 * * ? *)" # Every day at 5 PM UTC
  scalable_target_action {
    min_capacity = 5
    max_capacity = 50
  }
}

Strategy 2: Lambda SnapStart - The Intelligent Snapshot

Introduced at re:Invent 2022, SnapStart is a revolutionary approach specifically for Java runtimes (initially). Instead of keeping environments constantly running, SnapStart dramatically speeds up the initialization process by leveraging Firecracker's microVM snapshotting capabilities. It's an opt-in feature that changes the Lambda lifecycle.

The Underlying Mechanism: Snapshot and Resume

SnapStart splits the function lifecycle into two distinct phases:

Deployment (Snapshot Phase): When you publish a new version of a SnapStart-enabled function, Lambda executes the entire initialization phase once. It starts the JVM, loads all classes, and runs your framework's bootstrap code. Just before your handler would be invoked, Lambda pauses the environment and takes a full snapshot of the memory and disk state of the microVM. This encrypted snapshot is then cached and optimized for fast resumption.

Invocation (Resume Phase): When a new request arrives that requires a new environment, Lambda doesn't start from scratch. It finds a cached snapshot, loads it into a Firecracker microVM, and resumes execution from the point where the snapshot was taken. This entire process bypasses the costly JVM and framework initialization, reducing the startup time by up to 90%.

Production Implementation with Terraform

Enabling SnapStart is remarkably simple at the infrastructure level. It's a single configuration block on the Lambda resource. Crucially, SnapStart only works on published versions and aliases, not on the $LATEST alias.

hcl

# main.tf for SnapStart

resource "aws_lambda_function" "java_api_snapstart" {
  function_name = "MyFastJavaAPISnapStart"
  role          = aws_iam_role.lambda_exec.arn
  handler       = "com.example.StreamLambdaHandler::handleRequest"
  runtime       = "java17"
  memory_size   = 1024
  timeout       = 30

  filename         = "target/my-app-1.0.0-aws.jar"
  source_code_hash = filebase64sha256("target/my-app-1.0.0-aws.jar")

  publish = true

  # Enable SnapStart
  snap_start {
    apply_on = "PublishedVersions"
  }
}

resource "aws_lambda_alias" "live_snapstart" {
  name             = "live"
  function_name    = aws_lambda_function.java_api_snapstart.function_name
  function_version = aws_lambda_function.java_api_snapstart.version
}

Performance & Cost Analysis

Performance: SnapStart delivers dramatic improvements. A 6-second cold start can often be reduced to 400-600ms. While not as instantaneous as a fully warm PC environment (which might be <50ms), it's a massive reduction that makes Java viable for a much wider range of synchronous use cases. The resume-from-snapshot process itself has a small overhead.

Cost Model: This is SnapStart's killer feature: it is free. There are no additional charges for enabling SnapStart. You pay the standard per-request and per-GB-second execution costs. This fundamentally alters the cost-performance trade-off for serverless Java.

Advanced Considerations & Edge Cases: The Uniqueness Constraint

SnapStart's power comes with a critical caveat: the state captured in the snapshot is reused for every resumed environment. This can lead to subtle, hard-to-debug issues if your initialization code generates state that must be unique per execution environment. This is known as the "uniqueness constraint."

Edge Case 1: Cryptographic Randomness

If you initialize a java.security.SecureRandom instance during the init phase, the initial seed will be part of the snapshot. Every Lambda environment resumed from that snapshot will start with the exact same seed, producing the same sequence of "random" numbers. This is a severe security vulnerability.

Solution: CRaC Hooks

The Corretto team has developed an open-source project called Coordinated Restore at Checkpoint (CRaC), which defines a simple API for responding to snapshot/restore events. The Lambda runtime supports these hooks.

You implement the org.crac.Resource interface and register it with the global context. The afterRestore method is invoked immediately after the environment is resumed from a snapshot, giving you a chance to fix non-unique state.

java

import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;
import java.security.SecureRandom;

// Assuming this is part of a dependency injection managed bean or handler
public class CryptographyService implements Resource {

    private SecureRandom secureRandom;

    public CryptographyService() {
        // Initial setup
        this.secureRandom = new SecureRandom();
        System.out.println("CryptographyService initialized with random: " + secureRandom.toString());
        
        // Register this instance to receive CRaC events
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
        // Called before the snapshot is taken. Can be used for cleanup.
        System.out.println("CRaC: beforeCheckpoint hook triggered.");
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) throws Exception {
        // Called after restoring from a snapshot. THIS IS THE CRITICAL PART.
        System.out.println("CRaC: afterRestore hook triggered. Re-seeding SecureRandom.");
        // Create a new instance or re-seed to ensure uniqueness
        this.secureRandom = new SecureRandom(); 
    }

    public byte[] generateRandomBytes(int length) {
        byte[] bytes = new byte[length];
        this.secureRandom.nextBytes(bytes);
        return bytes;
    }
}

To use this, you need the CRaC dependency:

xml

<dependency>
    <groupId>io.github.crac</groupId>
    <artifactId>crac</artifactId>
    <version>1.4.0</version>
</dependency>

Edge Case 2: Network Connections (Database Pools)

Any network connections (e.g., to a PostgreSQL or MySQL database) established during the init phase will be captured in the snapshot. When the environment is resumed minutes or hours later, these connections will be stale, closed by a firewall, or otherwise invalid, leading to runtime errors.

Solution: Lazy Initialization or CRaC Hooks

Lazy Initialization (Simple): Don't initialize the connection pool in the constructor. Initialize it on the first invocation within the handler method. This is simple but adds latency to the first request in a new environment.

CRaC Hooks (Advanced & Recommended): The ideal pattern is to initialize the connection pool during the init phase to get the performance benefit, but use CRaC hooks to manage its lifecycle.

Here's a conceptual example with a HikariCP connection pool in a Spring Boot application:

java

import com.zaxxer.hikari.HikariDataSource;
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;

@Component
public class SnapstartDatabaseConnectionManager implements Resource {

    private final HikariDataSource dataSource;

    public SnapstartDatabaseConnectionManager(HikariDataSource dataSource) {
        this.dataSource = dataSource;
        // We don't register in the constructor because Spring needs to fully initialize the bean first.
    }

    @PostConstruct
    public void registerCracHook() {
        System.out.println("Registering DB connection manager with CRaC context.");
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
        if (dataSource != null && !dataSource.isClosed()) {
            System.out.println("CRaC: Closing database connection pool before checkpoint.");
            dataSource.close();
        }
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) throws Exception {
        if (dataSource != null) {
            System.out.println("CRaC: Re-initializing database connection pool after restore.");
            // HikariCP's configuration is retained, closing and re-opening effectively re-initializes it.
            // A more robust implementation might re-create the object if necessary.
            // Here we assume Spring will re-inject a working instance or we can re-start it.
            // This part is framework-specific. For vanilla Hikari, you'd create a new instance.
            // For Spring, you might need to refresh the specific bean.
            // A simpler approach for Spring might be to just call `dataSource.resumePool()` if available,
            // or fully re-initialize.
            System.out.println("Pool state needs to be managed here.");
        }
    }
}

This pattern ensures that you take the snapshot with a clean slate (no active connections) and re-establish fresh connections upon restoration.

Head-to-Head Comparison & Decision Framework

Feature	Provisioned Concurrency (PC)	Lambda SnapStart
Performance	Best-in-class. P99 latency is identical to a warm start (~<50ms).	Excellent. P99 latency reduced by up to 90% (e.g., 6s -> 500ms).
Cost	Expensive. Billed for idle capacity 24/7.	Free. No additional cost beyond standard Lambda pricing.
Spillover Behavior	Requests exceeding provisioned limit suffer a full cold start.	No spillover concept; all new environments benefit from SnapStart.
Implementation	Infrastructure-only change (Terraform/SAM).	Infrastructure-only change, but requires code changes (CRaC hooks) for stateful apps.
Code Impact	None. Your application code is unaware of PC.	Significant. Requires careful handling of randomness, network connections, etc.
Deployment	Fast. Simply updates the alias configuration.	Slower. Adds a snapshotting step to the version publishing process.
Predictability	Highly predictable latency as long as you stay within capacity.	Highly predictable, but with a slightly higher baseline latency than PC.
Supported Runtimes	All runtimes.	Java runtimes only (as of early 2024).

The Senior Engineer's Decision Framework

Use this framework to make a production-ready decision:

Is your application written in a supported Java runtime?

* No: Your only option is Provisioned Concurrency.

* Yes: Proceed.

What is your primary constraint: absolute lowest latency or budget?

* Budget: SnapStart is the clear winner. Its performance is excellent for the vast majority of use cases at zero additional cost.

* Absolute Lowest Latency: If you are building a high-frequency trading system or an ad-bidding platform where every millisecond counts, PC's guarantee of warm-start performance might be worth the cost.

Can your application code be made "snapshot-safe"?

* Yes: You are comfortable implementing CRaC hooks and auditing your code for uniqueness constraints. -> Choose SnapStart.

* No: The application is a legacy monolith, has complex native dependencies, or the engineering effort to refactor is too high. -> Choose Provisioned Concurrency as a safer, albeit more expensive, option.

What does your traffic pattern look like?

* Spiky & Unpredictable: SnapStart excels here. It handles sudden bursts of traffic gracefully without requiring you to pay for idle capacity.

* Predictable & Sustained: PC with Application Auto Scaling can be very cost-effective and performant, matching capacity precisely to your known traffic patterns.

Conclusion: A New Default for Serverless Java

For years, Provisioned Concurrency was the only tool available to tame the Java cold start beast, forcing teams into a difficult trade-off between performance and cost. Lambda SnapStart has fundamentally changed this equation.

For the vast majority of new, latency-sensitive serverless Java applications, SnapStart should be the default starting point. Its combination of massive performance improvement and zero cost is a game-changer. The engineering investment required to implement CRaC hooks and ensure snapshot safety is a one-time effort that pays long-term dividends in both performance and cost savings.

Provisioned Concurrency remains a valid and powerful tool, but its role has become more niche. It is best reserved for applications with ultra-low latency requirements (where the ~400ms overhead of SnapStart is still too high), applications with unpredictable state that cannot be refactored, or workloads running on non-Java runtimes. By understanding the deep technical trade-offs presented here, you can now make an informed, architectural decision that best fits the unique constraints of your system.

The Unforgiving Latency: Deconstructing the Java Lambda Cold Start

Strategy 1: Provisioned Concurrency (PC) - The Brute Force Guarantee

The Underlying Mechanism

Production Implementation with Terraform

Performance & Cost Analysis

Advanced Pattern: Dynamic Scaling with Application Auto Scaling

Strategy 2: Lambda SnapStart - The Intelligent Snapshot

The Underlying Mechanism: Snapshot and Resume

Production Implementation with Terraform

Performance & Cost Analysis

Advanced Considerations & Edge Cases: The Uniqueness Constraint

Edge Case 1: Cryptographic Randomness

Edge Case 2: Network Connections (Database Pools)

Head-to-Head Comparison & Decision Framework

The Senior Engineer's Decision Framework

Conclusion: A New Default for Serverless Java

Found this article helpful?