CockroachDB Geo-Partitioning for Global Low-Latency Reads

17 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Achilles' Heel of Global Applications: Data Latency

As senior engineers, we've all faced the immutable constraint of physics in distributed systems: the speed of light. Building a globally available application is one thing; building one that feels fast for every user, whether in Tokyo, Frankfurt, or São Paulo, is another challenge entirely. Traditional database architectures force a difficult trade-off. A centralized database in us-east-1 serves New York users with sub-10ms latency but imposes a punishing 150-200ms penalty on users in Sydney. Read replicas can mitigate this for read-heavy workloads, but they introduce eventual consistency, which is often a non-starter for transactional applications, and they do nothing to solve write latency.

This is where distributed SQL databases like CockroachDB change the game. However, simply deploying a multi-region CockroachDB cluster isn't a silver bullet. Without a deliberate data topology strategy, your data can still be scattered across the globe, and the query planner may need to perform expensive cross-continent RPCs to satisfy a simple query. The key to unlocking true low-latency performance is to co-locate data with its users.

This article is a deep dive into CockroachDB's geo-partitioning feature, a powerful mechanism for explicitly controlling data placement. We will move beyond the documentation's introductory examples and explore production-ready patterns, complex schema designs, and the operational realities of managing a geo-partitioned database.

Prerequisite Understanding

This post assumes you are familiar with CockroachDB's core architecture, including:

* Ranges: How data is sharded into contiguous key-space chunks.

* Raft Consensus: The protocol used for replication and consistency.

* Leaseholders: The Raft replica that coordinates reads and writes for a range.

* Locality: How nodes are started with --locality flags to be region-aware.

We will not be covering these fundamentals. Our focus is squarely on the advanced application of these concepts through partitioning.

The Mechanics of Geo-Partitioning: Beyond the Basics

At its core, geo-partitioning in CockroachDB is about splitting a table's data (its ranges) based on a specific column and then using Zone Configurations to pin those ranges to specific geographic localities. This ensures that both the data and the Raft leaseholder for that data reside in the same region as the users who access it most frequently.

Let's model a realistic scenario: a global e-commerce platform with users, products, and orders. We need to comply with GDPR for European users and provide low latency for all users.

Our cluster has nodes in three regions:

* us-east-1 (USA)

* eu-west-1 (Europe)

* ap-southeast-1 (Asia-Pacific)

Step 1: Defining the Partition Key

The first and most critical step is choosing the right column to partition on. This column must be part of the primary key or a secondary index you intend to partition. For our users table, a region column is the natural choice.

sql
-- We'll use a custom ENUM type for region for data integrity.
CREATE TYPE region AS ENUM ('us-east-1', 'eu-west-1', 'ap-southeast-1');

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    region region NOT NULL,
    email STRING NOT NULL,
    name STRING,
    created_at TIMESTAMPTZ DEFAULT now(),
    -- Other user fields...
    INDEX users_region_idx (region) -- This index will be partitioned
);

Notice we haven't added the partitioning DDL yet. It's crucial to understand that partitioning is a physical layout strategy applied to an index. Let's partition the primary key itself to co-locate the entire row.

sql
-- This won't work directly, as the partition key must be part of the PK.
-- We must redesign the table.

-- CORRECTED SCHEMA for Primary Key Partitioning
CREATE TABLE users (
    id UUID NOT NULL,
    region region NOT NULL,
    email STRING NOT NULL UNIQUE,
    name STRING,
    created_at TIMESTAMPTZ DEFAULT now(),
    -- Other user fields...
    PRIMARY KEY (region, id) -- region is now the first part of the PK
);

This is a fundamental design decision. By making region the prefix of the primary key, all user data is automatically ordered by region first, then by ID. This is highly efficient for the database to split into ranges based on region.

Step 2: Applying the Partitioning DDL

Now we create the partitions. We are creating logical partitions that CockroachDB will use to group ranges.

sql
ALTER TABLE users PARTITION BY LIST (region) (
    PARTITION us_users VALUES IN ('us-east-1'),
    PARTITION eu_users VALUES IN ('eu-west-1'),
    PARTITION ap_users VALUES IN ('ap-southeast-1')
);

At this point, we've only defined the logical partitions. The data is not yet pinned to any specific location. We can verify the partitions exist:

sql
SHOW PARTITIONS FROM TABLE users;
--  table_name | partition_name | parent_partition | column_names | list_value      | index_name | partition_id
-- ------------+----------------+------------------+--------------+-----------------+------------+--------------
--  users      | us_users       |                  | region       | ('us-east-1')   | primary    |           56
--  users      | eu_users       |                  | region       | ('eu-west-1')   | primary    |           57
--  users      | ap_users       |                  | region       | ('ap-southeast-1')| primary    |           58

Step 3: Zone Configurations to Pin Data

This is where the magic happens. We use CONFIGURE ZONE to apply constraints to each partition. This tells CockroachDB where to place the replicas (and importantly, the leaseholder) for the ranges belonging to each partition.

sql
-- For US users, require all 3 replicas to be in us-east-1.
-- The leaseholder will also be in us-east-1, ensuring low-latency reads/writes.
ALTER PARTITION us_users OF TABLE users 
    CONFIGURE ZONE USING 
    num_replicas = 3,
    constraints = '{ "+region=us-east-1": 1 }';

-- For EU users, pin data to Europe for GDPR and latency.
ALTER PARTITION eu_users OF TABLE users 
    CONFIGURE ZONE USING 
    num_replicas = 3,
    constraints = '{ "+region=eu-west-1": 1 }';

-- For AP users, pin data to Asia-Pacific.
ALTER PARTITION ap_users OF TABLE users 
    CONFIGURE ZONE USING 
    num_replicas = 3,
    constraints = '{ "+region=ap-southeast-1": 1 }';

Now, when you insert a user with region = 'eu-west-1', the entire lifecycle of that row's range (Raft consensus, writes, and leaseholder reads) will be handled exclusively by nodes in eu-west-1. An application server in that same region will experience sub-5ms read latency.

Let's verify the physical layout. After inserting some data:

sql
-- Insert some sample data
INSERT INTO users (id, region, email) VALUES
    (gen_random_uuid(), 'us-east-1', '[email protected]'),
    (gen_random_uuid(), 'eu-west-1', '[email protected]'),
    (gen_random_uuid(), 'ap-southeast-1', '[email protected]');

-- Check where the ranges are (output simplified for clarity)
SHOW RANGES FROM TABLE users WITH DETAILS;
--  start_key                          | end_key                            | lease_holder | replicas
-- ------------------------------------+------------------------------------+--------------+-----------------------------
-- /Table/55/1/"ap-southeast-1"       | /Table/55/1/"eu-west-1"            | 3            | {3,4,5} -- (nodes in ap-southeast-1)
-- /Table/55/1/"eu-west-1"            | /Table/55/1/"us-east-1"            | 6            | {6,7,8} -- (nodes in eu-west-1)
-- /Table/55/1/"us-east-1"            | /Table/55/2                        | 9            | {9,10,11} -- (nodes in us-east-1)

The start_key clearly shows how CockroachDB has split the key space based on the region column, and the replicas column confirms they are located on the correct nodes.

Advanced Schema Design: Handling Relationships

Partitioning a simple users table is straightforward. The real complexity arises when dealing with related data. Let's consider a posts table.

sql
CREATE TABLE posts (
    id UUID NOT NULL,
    author_id UUID NOT NULL,
    region region NOT NULL, -- Denormalized for partitioning
    content STRING,
    created_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (region, id),
    CONSTRAINT fk_author FOREIGN KEY (region, author_id) REFERENCES users (region, id)
);

ALTER TABLE posts PARTITION BY LIST (region) (
    PARTITION us_posts VALUES IN ('us-east-1'),
    PARTITION eu_posts VALUES IN ('eu-west-1'),
    PARTITION ap_posts VALUES IN ('ap-southeast-1')
);

-- Apply zone configs similar to the users table
ALTER PARTITION us_posts OF TABLE posts CONFIGURE ZONE USING constraints = '{ "+region=us-east-1": 1 }';
ALTER PARTITION eu_posts OF TABLE posts CONFIGURE ZONE USING constraints = '{ "+region=eu-west-1": 1 }';
ALTER PARTITION ap_posts OF TABLE posts CONFIGURE ZONE USING constraints = '{ "+region=ap-southeast-1": 1 }';

Here, we've denormalized the region from the users table into the posts table. This is a critical pattern. It allows us to partition posts in the same way as users. The foreign key now includes region, ensuring that a post is always stored in the same partition as its author. A query like SELECT * FROM posts WHERE author_id = ? will be extremely fast, provided the application knows the author's region and can route the query to a local node.

The Many-to-Many Challenge: Followers

What about a followers table? A user in the US can follow a user in the EU. Where do you store that relationship record?

followers (follower_id, follower_region, followee_id, followee_region)

This is a classic distributed data modeling problem. You have several options, each with significant trade-offs:

Option 1: Partition by Follower

Partition the table on follower_region. This optimizes for the "who am I following?" query. However, the "who are my followers?" query becomes a costly scatter-gather operation across all regions.

Option 2: Partition by Followee

Partition on followee_region. This optimizes for fetching a user's followers, which is often a more frequent operation (e.g., for notification delivery). But fetching the list of people a user follows becomes a global query.

Option 3: Duplicate the Data (Advanced Pattern)

This is often the best solution for read-heavy social graphs. You store the relationship twice, once partitioned by the follower and once by the followee. This can be done with two separate tables or with a single table and two partitioned indexes.

Let's implement this with two indexes:

sql
CREATE TABLE followers (
    follower_id UUID NOT NULL,
    follower_region region NOT NULL,
    followee_id UUID NOT NULL,
    followee_region region NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (follower_region, follower_id, followee_id),
    INDEX by_followee (followee_region, followee_id, follower_id)
);

-- Partition the primary index by follower
ALTER TABLE followers PARTITION BY LIST (follower_region) (
    PARTITION us_followers VALUES IN ('us-east-1'),
    PARTITION eu_followers VALUES IN ('eu-west-1'),
    PARTITION ap_followers VALUES IN ('ap-southeast-1')
);

-- Partition the secondary index by followee
ALTER INDEX by_followee OF TABLE followers PARTITION BY LIST (followee_region) (
    PARTITION us_followees VALUES IN ('us-east-1'),
    PARTITION eu_followees VALUES IN ('eu-west-1'),
    PARTITION ap_followees VALUES IN ('ap-southeast-1')
);

-- Apply zone configs to all partitions of both indexes
-- ... (omitted for brevity, but you would pin each partition to its region)

Now, when your application needs to fetch who user X is following, it executes:

sql
-- Assuming the app knows user X is in us-east-1
-- This query will hit the primary index, which is partitioned by follower_region
-- It will be served entirely by nodes in us-east-1
SELECT followee_id FROM followers WHERE follower_region = 'us-east-1' AND follower_id = 'user-x-id';

When it needs to fetch the followers of user Y:

sql
-- Assuming the app knows user Y is in eu-west-1
-- CockroachDB's optimizer will choose the 'by_followee' index
-- This query will be served entirely by nodes in eu-west-1
SELECT follower_id FROM followers@by_followee WHERE followee_region = 'eu-west-1' AND followee_id = 'user-y-id';

This pattern provides single-digit millisecond latency for both queries, at the cost of doubling the storage and write amplification for the followers table. For many social applications, this read-performance trade-off is highly favorable.

Performance Analysis: The Power of `EXPLAIN`

Let's prove the performance difference. Consider a query to fetch a user's posts. The application is running in eu-west-1 and is connected to a local CockroachDB node.

The Good Query (Partition-Aware)

The application knows the user's region and includes it in the WHERE clause.

sql
EXPLAIN (VERBOSE) 
SELECT * FROM posts WHERE region = 'eu-west-1' AND author_id = 'some-eu-user-id';
text
• distribution: local
• spans: /Table/posts/eu_posts/Start to /Table/posts/eu_posts/End

The query plan is beautiful in its simplicity. distribution: local means no RPCs to other regions are needed. The query planner immediately identifies the eu_posts partition and only scans ranges within that partition, all of which are hosted on local nodes.

The Bad Query (Partition-Agnostic)

The application only has the author_id and doesn't provide the region.

sql
-- Let's create an index to make this query possible
CREATE INDEX on posts(author_id);

EXPLAIN (VERBOSE) 
SELECT * FROM posts WHERE author_id = 'some-eu-user-id';
text
• distribution: full
• spans: ALL

The query plan tells a different story. Because the index on author_id is not partitioned by region, the planner has no choice but to perform a full distribution lookup. It will send requests to nodes in us-east-1, eu-west-1, and ap-southeast-1 to find the relevant data. Even though the data only exists in one region, the database has to do the work to find it. This query will be subject to cross-continent network latency, resulting in 100-200ms response times instead of <5ms.

Key Takeaway: Your application logic must be aware of the partitioning scheme to benefit from it. This often means fetching a user's region once upon login and caching it in the application session.

Edge Cases and Operational Realities

1. User Data Migration

What happens when a user from the EU moves to the US and wants to change their home region?

This requires an UPDATE statement:

sql
UPDATE users SET region = 'us-east-1' WHERE id = 'some-eu-user-id' AND region = 'eu-west-1';

Behind the scenes, CockroachDB handles this with an elegant, asynchronous process:

  • The UPDATE transaction writes a new version of the row with region = 'us-east-1'. Since the primary key is (region, id), this is effectively a DELETE of the old key and an INSERT of a new key.
  • The new key ('us-east-1', 'user-id') falls into the key-space of the us_users partition.
  • CockroachDB's background rebalancing process detects that this range now belongs in us-east-1 according to the zone configuration.
  • It will transparently replicate the data to nodes in us-east-1, transfer the leaseholder, and then remove the old replicas from eu-west-1.
  • This process is atomic and consistent, but it's not instantaneous. For a brief period, the data may still be served from the old region until the leaseholder moves. You also need to update the region in all related tables (posts, etc.) within the same transaction to maintain data integrity.

    2. Cross-Region Analytics and Follower Reads

    What if a data analyst needs to run a query like SELECT region, COUNT(*) FROM users GROUP BY region? This query, by its nature, must touch all partitions.

    Running this as a standard transaction will be slow, as it requires coordinating with leaseholders in every region. For analytics where data can be slightly stale, follower reads are an excellent solution.

    sql
    BEGIN;
    SET TRANSACTION AS OF SYSTEM TIME '-5s'; -- Use data from 5 seconds ago
    
    SELECT region, COUNT(*) FROM users GROUP BY region;
    
    COMMIT;

    By using AS OF SYSTEM TIME, you are telling CockroachDB that you are willing to accept slightly stale data. This allows it to serve the read request from the closest available replica instead of having to route it to the leaseholder. If your analytics node is in us-east-1, it can read the data for the EU and AP partitions from local replicas (assuming you configured cross-region replication for availability), dramatically reducing latency for the query.

    3. The "Global Table" Pattern

    Some tables don't have a clear data locality. A features table that stores feature flags, for example, needs to be read with low latency from all regions.

    In this case, you should not partition the table. Instead, you create a standard table and configure its zone to have replicas in every region.

    sql
    CREATE TABLE features (
        name STRING PRIMARY KEY,
        enabled BOOL
    );
    
    -- Configure this table to have replicas in all regions.
    -- This increases write latency but provides low-latency reads everywhere.
    ALTER TABLE features CONFIGURE ZONE USING
        num_replicas = 5, -- or 3, or 7...
        constraints = '{ "+region=us-east-1": 1, "+region=eu-west-1": 1, "+region=ap-southeast-1": 1 }',
        lease_preferences = '[[+region=us-east-1], [+region=eu-west-1], [+region=ap-southeast-1]]';

    This configuration has a critical trade-off:

    * Low Read Latency: A read can be served by the leaseholder in the local region. The lease_preferences setting encourages CockroachDB to keep leaseholders balanced across all three regions.

    * High Write Latency: Every write to this table must achieve Raft consensus across all three continents. A write originating in the US must be committed by a quorum of replicas, which will include nodes in the EU and AP. This will subject every write to the full cross-planet round-trip time, often >200ms.

    This pattern is ideal for small, infrequently updated tables where read performance is paramount.

    Conclusion: An Architectural Commitment

    Geo-partitioning in CockroachDB is not a feature you simply turn on; it is a fundamental architectural decision that impacts your schema design, your application logic, and your operational playbooks. When implemented correctly, it provides an unparalleled ability to deliver low-latency performance to a global user base while respecting data sovereignty requirements.

    The key takeaways for senior engineers and architects are:

  • Schema is Paramount: The choice of partition key and the denormalization of that key into related tables are the most critical design decisions.
  • Application Awareness: Your application must be architected to be partition-aware, providing the partition key in queries to avoid expensive, distributed lookups.
  • Model Relationships Deliberately: For many-to-many relationships, be prepared to trade storage for read performance by duplicating data across partitions via multiple indexes.
  • Understand the Trade-offs: Every decision has consequences. A global table provides fast reads at the cost of slow writes. A partitioned table provides fast localized I/O but makes global queries complex.
  • Leverage EXPLAIN and Follower Reads: Use the tools CockroachDB provides. Analyze your query plans to ensure locality is being used, and offload expensive analytical queries to follower replicas to avoid impacting transactional workloads.
  • By moving beyond a simple multi-region deployment and embracing a deliberate, partition-based data topology, you can build truly world-class, high-performance global applications.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles