JVM on Lambda: SnapStart vs. Provisioned Concurrency Deep Dive

October 15, 2025

16 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Unavoidable Tax: Deconstructing the JVM Cold Start

For senior engineers building serverless systems, the term "cold start" is a familiar adversary. While runtimes like Node.js and Python face this, the Just-In-Time (JIT) compilation and extensive class-loading inherent to the Java Virtual Machine (JVM) impose a particularly heavy tax. This isn't a beginner's guide; we assume you're already painfully aware of multi-second P99 latencies on initial invocations. Our goal here is to dissect and compare the two premier, production-grade solutions offered by AWS: Provisioned Concurrency (PC) and Lambda SnapStart.

The choice is not merely about performance; it's a complex trade-off between latency guarantees, cost, architectural complexity, and operational overhead. Let's move past the superficial and into the mechanics.

A typical JVM cold start on Lambda isn't a monolithic event. It's a sequence:

Execution Environment Provisioning: AWS allocates a Firecracker microVM, downloads the function code, and sets up the environment. This is largely outside our control.

Runtime Initialization: The JVM process itself is started. This involves loading the Java runtime environment.

Static Initialization: Your function's static blocks and initializers are executed. This is where frameworks like Spring or Quarkus perform significant work: dependency injection container setup, classpath scanning, and configuration parsing.

Constructor Execution: The constructor of your handler class is invoked.

Handler Execution (First Invoke): The actual handler method runs. Critically, this first invocation often triggers JIT compilation for hot paths, adding further latency.

This entire sequence can easily span 5-10 seconds for a non-trivial application, an unacceptable delay for synchronous APIs. PC and SnapStart attack this problem from fundamentally different angles.

Strategy 1: Provisioned Concurrency - The Predictable Powerhouse

Provisioned Concurrency (PC) is the more mature of the two solutions. Its premise is straightforward: you instruct AWS to pre-initialize a specified number of execution environments and keep them in a hyper-ready state, before any requests arrive. When an invocation comes in that can be routed to a provisioned environment, it entirely bypasses steps 2, 3, and 4 of the cold start sequence. The experience is that of a perpetually warm function.

Implementation Patterns and Infrastructure as Code

Configuring PC is primarily an infrastructure concern. You don't change your application code. Here’s a production-grade example using Terraform to configure PC with application auto-scaling.

terraform

# main.tf

resource "aws_lambda_function" "payment_processor" {
  # ... other function configurations (runtime, handler, memory_size, etc.)
  function_name    = "payment-processor-prod"
  role             = aws_iam_role.lambda_exec.arn
  handler          = "com.example.PaymentHandler::handleRequest"
  runtime          = "java17"
  memory_size      = 1024
  timeout          = 30
  publish          = true # A published version or alias is required for PC
}

resource "aws_lambda_provisioned_concurrency_config" "payment_processor_pc" {
  function_name                     = aws_lambda_function.payment_processor.function_name
  provisioned_concurrent_executions = 10 # The minimum number of provisioned environments
  qualifier                         = aws_lambda_function.payment_processor.version
}

# Auto-scaling configuration for PC
resource "aws_appautoscaling_target" "lambda_target" {
  max_capacity       = 100
  min_capacity       = 10
  resource_id        = "function:${aws_lambda_function.payment_processor.function_name}:${aws_lambda_function.payment_processor.version}"
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  service_namespace  = "lambda"
}

resource "aws_appautoscaling_policy" "lambda_policy" {
  name               = "ScaleOnProvisionedConcurrencyUtilization"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
    target_value = 0.7 # Target 70% utilization
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

Key Production Considerations:

* publish = true: PC can only be applied to a specific function version or alias, not $LATEST. This is a critical best practice for production deployments, ensuring stability and enabling canary releases.

* Auto-Scaling: Statically setting provisioned_concurrent_executions is brittle. For any dynamic workload, aws_appautoscaling_target and aws_appautoscaling_policy are non-negotiable. The target_value of 0.7 (70%) is a common starting point, providing a 30% buffer for traffic spikes before the scaling policy reacts.

* Cooldown Periods: The scale_in_cooldown and scale_out_cooldown values are crucial for preventing thrashing, where the system rapidly scales up and down. A longer scale-in cooldown (e.g., 300 seconds) prevents premature de-provisioning after a brief traffic lull.

Performance & Cost Implications

Performance: The latency of an invocation served by a provisioned instance is consistently low, typically indistinguishable from a subsequent warm invocation. The cold start is, for all practical purposes, eliminated.

Benchmark Example (P99 Latency):

Invocation Type	P99 Latency (ms) for a Spring Boot Lambda
Standard Cold Start	8,500 ms
Subsequent Warm Start	150 ms
Provisioned Concurrency	150 ms

The Cost Model: This performance guarantee comes at a significant cost. You pay for the configured concurrency for the entire duration it is active, in addition to the standard per-request and GB-second fees when it's invoked.

Cost Formula (Simplified): (Provisioned Concurrency GB of Memory Price per GB-hour) + (Request Count * Price per Request)

This model means you are paying for idle capacity. If you provision 50 instances but only receive traffic for 10, you pay for all 50 to be ready.

Edge Case: Concurrency Spillover

The most critical edge case with PC is spillover. If you receive a burst of traffic that exceeds your currently provisioned level (e.g., 120 concurrent requests for a PC level of 100), the 20 excess requests will be served by standard, on-demand Lambda instances. These 20 requests will incur a full cold start.

This is why the auto-scaling target_value is so important. Setting it too high (e.g., 0.95) leaves little room for error and increases the likelihood of spillover. The business must decide what percentage of requests can tolerate a cold start, which directly informs this configuration.

Strategy 2: Lambda SnapStart - The State-Snapshotting Savant

SnapStart, available for Java runtimes, is a more technologically sophisticated solution. Instead of keeping environments running, SnapStart leverages Firecracker's microVM snapshotting capabilities. The process is:

Deployment Time: When you publish a new function version with SnapStart enabled, Lambda executes the entire initialization phase (JVM start, static init, constructor) once.

Snapshot: Immediately after initialization, Lambda takes a snapshot of the entire VM's memory and disk state and encrypts it.

Invocation Time: On the first invocation, instead of starting from scratch, Lambda resumes the execution environment from the saved snapshot. This is orders of magnitude faster than a full initialization.

Implementation with CRaC Hooks

Enabling SnapStart is a simple configuration change, but using it correctly in a stateful application requires code-level changes using the Coordinated Restore at Checkpoint (CRaC) API. This is where the complexity lies.

Terraform Configuration:

terraform

# main.tf

resource "aws_lambda_function" "order_service" {
  # ... other function configurations
  function_name = "order-service-prod"
  publish       = true # SnapStart also requires a published version

  snap_start {
    apply_on = "PublishedVersions"
  }
}

# Note: You still need an alias to point to the new version for invocation
resource "aws_lambda_alias" "order_service_live" {
  name             = "live"
  function_name    = aws_lambda_function.order_service.function_name
  function_version = aws_lambda_function.order_service.version
}

The CRaC Challenge: Handling State

Any state established during the initialization phase becomes part of the snapshot. This is particularly problematic for network connections, file handles, or any resource that relies on uniqueness.

Consider a database connection pool like HikariCP. If the pool is created during init, those TCP connections are frozen in the snapshot. When restored, the database server will have long since closed them, leading to errors. CRaC provides hooks to manage this.

Production-Grade Java Example with CRaC:

java

import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;
import com.zaxxer.hikari.HikariDataSource;

public class DatabaseHandler implements Resource {
    private HikariDataSource dataSource;

    public DatabaseHandler() {
        // Register this instance with the CRaC Core
        Core.getGlobalContext().register(this);
        // Initial connection pool setup during init
        this.initializeDataSource();
    }

    private void initializeDataSource() {
        // Standard HikariCP configuration
        // ...
        this.dataSource = new HikariDataSource(/* config */);
    }

    // This hook is called BY LAMBDA before the snapshot is taken.
    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
        System.out.println("Executing beforeCheckpoint: Closing DB connections...");
        if (dataSource != null) {
            dataSource.close(); // Gracefully close all connections in the pool
        }
    }

    // This hook is called BY LAMBDA after the snapshot is restored.
    @Override
    public void afterRestore(Context<? extends Resource> context) throws Exception {
        System.out.println("Executing afterRestore: Re-initializing DB connections...");
        this.initializeDataSource(); // Re-create the connection pool
    }

    public Connection getConnection() {
        return this.dataSource.getConnection();
    }
}

Why This is Critical:

beforeCheckpoint: You must* release external resources. Failure to do so results in stale handles post-restore.

afterRestore: You must* re-establish those resources. This hook is your new initialization point for every "snap-started" invocation.

Performance & Cost Implications

Performance: SnapStart dramatically reduces cold start latency, often by up to 90%. However, it is not zero. There is still a small overhead for resuming the VM state.

Benchmark Example (P99 Latency):

Invocation Type	P99 Latency (ms) for a Spring Boot Lambda
Standard Cold Start	8,500 ms
Subsequent Warm Start	150 ms
SnapStart "Cold" Start	750 ms

The Cost Model: This is SnapStart's killer feature. There is no additional cost. You pay the standard Lambda pricing. There is no charge for idle capacity. This makes it incredibly compelling from a financial perspective.

Edge Cases and Gotchas

SnapStart's power comes with sharp edges:

Uniqueness and Randomness: If you generate a random number or a unique ID during initialization (e.g., for logging correlation), every restored instance will start with the exact same value. Any such uniqueness must be generated after the afterRestore hook or within the handler itself.

java

    // BAD: This value will be the same for all snap-started instances
    private static final UUID INSTANCE_ID = UUID.randomUUID();

    // GOOD: Generate within the handler
    public APIGatewayProxyResponseEvent handleRequest(...) {
        UUID invocationId = UUID.randomUUID();
        // ...
    }

Time-Sensitive Initialization: If your initialization logic caches data based on System.currentTimeMillis(), that timestamp will be stale upon restore. Re-validate or refresh such data in the afterRestore hook.

Temporary Files: Any data written to /tmp before the snapshot is taken becomes part of the read-only snapshot. This can be a feature (pre-warming a local cache) or a bug if you expect a clean temporary directory.

Head-to-Head: A Decision Framework

Neither solution is universally superior. The choice requires a clear understanding of your application's specific requirements.

Feature	Provisioned Concurrency	Lambda SnapStart
Best-Case Latency	Lowest possible (~warm invoke)	Very low (~10% of cold start), but not zero
Cost	High (pay for idle capacity)	No additional cost
Implementation	Infrastructure-only change (Terraform/CloudFormation)	Requires code changes (CRaC hooks) for stateful apps
Scalability	Limited by provisioned amount + auto-scaling reaction time	Scales instantly like a standard on-demand function
Predictability	Highly predictable latency (up to provisioned limit)	Latency is low but can vary slightly based on snapshot size
State Management	No special considerations needed	Complex; requires careful handling of connections/randomness

Use Case Analysis

Choose Provisioned Concurrency when:

* Hard Latency SLOs: Your service absolutely cannot tolerate even a sub-second delay for any request (e.g., ad bidding, real-time financial transaction processing).

* Predictable Traffic: You have a stable, predictable traffic pattern where you can confidently provision capacity without excessive waste.

* Cost is Secondary to Performance: The business value of consistent, ultra-low latency outweighs the infrastructure cost.

* Legacy/Complex Codebase: You cannot easily refactor the application to be snapshot-safe with CRaC hooks.

Choose Lambda SnapStart when:

* Cost is a Primary Driver: You want to eliminate cold starts without a significant increase in your AWS bill.

* Spiky or Unpredictable Traffic: Your workload experiences sudden bursts of traffic that would be difficult and expensive to handle with PC auto-scaling.

* Latency Tolerance: A latency of 500-800ms on initial load is acceptable, and a massive improvement over 8 seconds is a huge win (e.g., internal admin panels, asynchronous processing jobs, user-facing but non-critical APIs).

* Greenfield or Modern Java Apps: You have control over the codebase and can properly implement the CRaC Resource interface to manage state.

The Hybrid Approach: The Best of Both Worlds

For the most demanding applications, a hybrid strategy is possible. You can apply both Provisioned Concurrency and SnapStart to the same function alias.

In this model, you set a baseline of PC to handle your predictable, average load, guaranteeing the lowest latency for that traffic. Any traffic that spills over the provisioned limit will then be handled by SnapStart-enabled on-demand instances, rather than suffering a full cold start. This provides a powerful combination of guaranteed performance for the base load and cost-effective, high-performance scaling for bursts.

Conclusion

Mitigating JVM cold starts on AWS Lambda has evolved from hacky keep-alive functions to sophisticated, first-class platform features. Provisioned Concurrency offers the ultimate performance guarantee at a premium price, best suited for predictable workloads with stringent latency requirements. Lambda SnapStart presents a revolutionary, cost-effective alternative that drastically reduces latency for the majority of use cases, provided you are willing to invest the engineering effort to make your application snapshot-aware.

The decision rests on a thorough analysis of your service's specific non-functional requirements. By understanding the deep mechanics, edge cases, and cost models of each, you can make an informed architectural choice that balances performance and financial prudence in a way that was previously impossible in the serverless JVM ecosystem.

The Unavoidable Tax: Deconstructing the JVM Cold Start

Strategy 1: Provisioned Concurrency - The Predictable Powerhouse

Implementation Patterns and Infrastructure as Code

Performance & Cost Implications

Edge Case: Concurrency Spillover

Strategy 2: Lambda SnapStart - The State-Snapshotting Savant

Implementation with CRaC Hooks

Performance & Cost Implications

Edge Cases and Gotchas

Head-to-Head: A Decision Framework

Use Case Analysis

The Hybrid Approach: The Best of Both Worlds

Conclusion

Found this article helpful?