Optimizing Nested JSONB Queries in Postgres with GIN & B-Tree

October 14, 2025

15 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The High Cost of Naive JSONB Queries in Production

In modern application development, using PostgreSQL's JSONB type to store semi-structured data is a common and powerful pattern. It offers schema flexibility without sacrificing the transactional integrity of a relational database. However, this flexibility comes with a hidden performance cost. A JSONB column, without proper indexing, is an opaque blob to the query planner. Any query attempting to filter on its contents will inevitably lead to a full sequential scan, a performance killer for tables with millions of rows.

Consider a typical multi-tenant SaaS application managing user-generated documents. A simplified table might look like this:

sql

CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    data JSONB NOT NULL
);

-- Let's populate it with a significant amount of data
INSERT INTO documents (tenant_id, data)
SELECT
    'a1b2c3d4-e5f6-a7b8-c9d0-e1f2a3b4c5d6'::uuid,
    jsonb_build_object(
        'type', CASE WHEN random() < 0.5 THEN 'invoice' ELSE 'report' END,
        'status', 'processed',
        'metadata', jsonb_build_object(
            'source', 'api',
            'version', 2.1,
            'tags', ARRAY['internal', 'q3-2024']
        ),
        'content', jsonb_build_object(
            'customer_id', (random() * 10000)::int,
            'line_items', 5
        )
    )
FROM generate_series(1, 1000000);

Now, imagine a frequent query: finding all documents for a specific customer within the nested content object.

sql

EXPLAIN ANALYZE
SELECT id, data->'content'->'customer_id' as customer_id
FROM documents
WHERE data->'content'->>'customer_id' = '42';

The query plan will be grim:

text

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..63605.53 rows=500 width=40) (actual time=2.919..385.127 rows=95 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on documents  (cost=0.00..62555.53 rows=208 width=40) (actual time=1.841..378.895 rows=32 loops=3)
         Filter: ((data -> 'content' ->> 'customer_id'::text) = '42'::text)
         Rows Removed by Filter: 333301
 Planning Time: 0.116 ms
 Execution Time: 385.183 ms

A Parallel Sequential Scan. On a 1 million row table, this takes ~385ms. On a 100 million row table, this query becomes unusable. This is the baseline problem we will systematically solve using advanced indexing techniques.

Level 1: The Standard GIN Index (`jsonb_ops`)

The first line of defense is a Generalized Inverted Index (GIN). A standard GIN index on a JSONB column creates an index entry for every key and every value within the JSON document.

sql

CREATE INDEX idx_gin_documents_data ON documents USING GIN (data);

This index is a powerhouse for operators like existence (?), key existence (?|, ?&), and containment (@>). The containment operator is particularly useful for finding documents that have a specific key-value pair at any level of nesting.

Let's rewrite our query to use the containment operator:

sql

EXPLAIN ANALYZE
SELECT id, data->'content'->'customer_id' as customer_id
FROM documents
WHERE data @> '{"content": {"customer_id": 42}}';

Now, the query plan tells a different story:

text

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on documents  (cost=124.50..524.53 rows=1000 width=40) (actual time=0.291..0.435 rows=95 loops=1)
   Recheck Cond: (data @> '{"content": {"customer_id": 42}}'::jsonb)
   Heap Blocks: exact=95
   ->  Bitmap Index Scan on idx_gin_documents_data  (cost=0.00..124.25 rows=1000 width=0) (actual time=0.274..0.274 rows=95 loops=1)
         Index Cond: (data @> '{"content": {"customer_id": 42}}'::jsonb)
 Planning Time: 0.158 ms
 Execution Time: 0.467 ms

From 385ms to under 1ms. A massive improvement. The Bitmap Index Scan shows the GIN index is being used effectively.

Production Consideration: The default jsonb_ops GIN index is versatile but comes with a significant cost. It indexes everything. This leads to:

Large Index Size: The index can often be larger than the table data itself.

Slow Writes: INSERT and UPDATE operations become more expensive because every key and value in the JSONB payload must be processed and added to the index. This can be a major bottleneck in write-heavy systems.

For many use cases, especially those involving querying for the presence of data rather than specific values, this index is sufficient. But for performance-critical path-based lookups, we can do better.

Level 2: Path-Optimized GIN Index (`jsonb_path_ops`)

PostgreSQL offers an alternative GIN operator class: jsonb_path_ops. Unlike jsonb_ops, which indexes keys and values, jsonb_path_ops only indexes hashes of the values at specific paths. It does not index keys at all.

This has two primary effects:

Smaller Index Size: The resulting index is significantly smaller because it doesn't store every unique key.

Faster Indexing: Writes are faster as there is less information to process.

The trade-off is that jsonb_path_ops only supports the containment (@>) operator. It cannot accelerate key existence operators like ?.

Let's build this index and compare.

sql

-- First, drop the old index
DROP INDEX idx_gin_documents_data;

-- Create the path-optimized index
CREATE INDEX idx_gin_path_ops_documents_data ON documents USING GIN (data jsonb_path_ops);

Now, we run the exact same containment query:

sql

EXPLAIN ANALYZE
SELECT id, data->'content'->'customer_id' as customer_id
FROM documents
WHERE data @> '{"content": {"customer_id": 42}}';

text

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on documents  (cost=64.50..464.53 rows=1000 width=40) (actual time=0.187..0.321 rows=95 loops=1)
   Recheck Cond: (data @> '{"content": {"customer_id": 42}}'::jsonb)
   Heap Blocks: exact=95
   ->  Bitmap Index Scan on idx_gin_path_ops_documents_data  (cost=0.00..64.25 rows=1000 width=0) (actual time=0.173..0.173 rows=95 loops=1)
         Index Cond: (data @> '{"content": {"customer_id": 42}}'::jsonb)
 Planning Time: 0.121 ms
 Execution Time: 0.350 ms

The performance is slightly better (~0.35ms vs ~0.46ms), primarily because the smaller index can be scanned faster. The real win is in index size and write performance.

Let's check the index sizes:

sql

SELECT relname, pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE relname IN ('idx_gin_documents_data', 'idx_gin_path_ops_documents_data');

-- Assuming you recreate the first index to compare
-- idx_gin_documents_data         | 152 MB
-- idx_gin_path_ops_documents_data | 88 MB

The jsonb_path_ops index is nearly 45% smaller. In a production environment with billions of rows and frequent writes, this difference is monumental. It translates to lower storage costs, faster backups, and reduced write amplification.

When to use jsonb_path_ops: If your query patterns exclusively rely on checking if a JSON document contains a specific structure with specific values (@>), jsonb_path_ops is almost always the superior choice over the default jsonb_ops.

Level 3: B-Tree Indexes on Expressions for High Selectivity

GIN indexes are powerful, but they have a fundamental limitation: they are not ideal for operators that rely on ordering, such as range queries (>, <, BETWEEN) or even simple equality on specific, typed data. The query planner can use a GIN index for equality, as we saw, but it's a multi-step process (Bitmap Index Scan -> Bitmap Heap Scan) that is less efficient than a direct lookup.

This is where B-Tree indexes on expressions shine. If you consistently query a specific, strongly-typed field deep within your JSONB structure, you can create a B-Tree index directly on the expression that extracts that value.

Let's return to our original query that used the ->> operator, which extracts a value as text.

sql

-- This expression extracts the customer_id and casts it to an integer
(data->'content'->>'customer_id')::int

We can create an index on this exact expression.

sql

-- Drop the GIN index for a clean comparison
DROP INDEX idx_gin_path_ops_documents_data;

CREATE INDEX idx_btree_documents_customer_id ON documents (((data->'content'->>'customer_id')::int));

Note the double parentheses. The first set is for the expression, and the second is for the CREATE INDEX syntax.

Now, let's run the original query that performed so poorly.

sql

EXPLAIN ANALYZE
SELECT id, data->'content'->'customer_id' as customer_id
FROM documents
WHERE (data->'content'->>'customer_id')::int = 42;

text

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_documents_customer_id on documents  (cost=0.43..12.35 rows=95 width=40) (actual time=0.038..0.125 rows=95 loops=1)
   Index Cond: (((data -> 'content' ->> 'customer_id'::text))::integer = 42)
 Planning Time: 0.174 ms
 Execution Time: 0.155 ms

The query plan now shows an Index Scan. This is the most efficient access method in PostgreSQL. The planner directly navigates the B-Tree structure to find the pointers to the matching rows. The execution time has dropped to ~0.15ms, more than twice as fast as our best GIN index performance.

Advantages of B-Tree on Expressions:

Extreme Performance: For equality and range queries on highly selective fields, it's the fastest option.

Supports All B-Tree Operators: You can now perform efficient range scans: WHERE (data->'content'->>'customer_id')::int > 1000 AND (data->'content'->>'customer_id')::int < 1050.

Small Index Size: The index only stores the extracted value and the row pointer, making it very compact compared to a GIN index.

Disadvantages:

Inflexibility: It only accelerates queries that match the exact expression. A query for data->'content'->>'line_items' gets no benefit.

Schema Rigidity: You are implicitly enforcing a schema. If a document is missing the customer_id key, the expression will evaluate to NULL. You may need to handle this in your index or queries (e.g., WHERE COALESCE(...)).

Advanced Pattern: Partial Indexes for Heterogeneous Collections

Real-world JSONB columns are often used to store different types of documents within the same table. Our example table contains both invoice and report documents. It's likely that queries for invoices look at different fields than queries for reports.

Creating a global B-Tree index on (data->'content'->>'customer_id')::int is wasteful if only invoice documents contain this field. The index will be bloated with NULL entries for every report document.

The solution is a partial index.

Let's say we only ever search for the customer_id on documents of type invoice.

sql

-- Drop the full index
DROP INDEX idx_btree_documents_customer_id;

-- Create a partial index that only includes 'invoice' documents
CREATE INDEX idx_partial_btree_documents_invoice_customer_id 
ON documents (((data->'content'->>'customer_id')::int))
WHERE (data->>'type') = 'invoice';

The WHERE clause is the key. PostgreSQL will only add an entry to this index if the document satisfies the condition. This results in an index that is roughly half the size and faster to scan, as it doesn't have to sift through irrelevant entries.

Now, when we run a query that includes this condition, the planner will intelligently select the partial index:

sql

EXPLAIN ANALYZE
SELECT id
FROM documents
WHERE (data->>'type') = 'invoice' 
  AND (data->'content'->>'customer_id')::int = 42;

text

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_partial_btree_documents_invoice_customer_id on documents  (cost=0.42..10.25 rows=47 width=16) (actual time=0.035..0.075 rows=48 loops=1)
   Index Cond: (((data -> 'content' ->> 'customer_id'::text))::integer = 42)
 Planning Time: 0.201 ms
 Execution Time: 0.098 ms

The performance is now under 0.1ms. The query planner is smart enough to see that the WHERE clause in the query is a superset of the index's WHERE clause, and therefore uses the smaller, more efficient partial index.

This pattern is incredibly powerful for optimizing large, multi-type tables. You can create multiple, small, highly-specialized partial indexes for different document types, keeping overall index overhead low while providing lightning-fast lookups for specific query patterns.

Production Benchmarking and Decision Framework

To make an informed decision, you must benchmark against your own data and query patterns. Here is a summary of our findings on the 1M row table:

Indexing Strategy	Query Type	Execution Time (ms)	Index Size (MB)	Pros	Cons
None	Equality (`->>`)	385.183	0	No write overhead	Unusable performance on large tables
GIN (`jsonb_ops`)	Containment (`@>`)	0.467	~152	Versatile, supports existence & containment	Large index, slower writes
GIN (`jsonb_path_ops`)	Containment (`@>`)	0.350	~88	Smaller, faster writes, great for `@>`	Only supports `@>`, no existence checks (`?`)
B-Tree on Expression	Typed Equality (`(->>'key')::type = val`)	0.155	~25	Fastest for selective queries, supports ranges	Inflexible, only works for the specific expression
Partial B-Tree on Expression	Typed Equality with `WHERE` clause	0.098	~13	Extremely fast & small for filtered queries	Even more specific, requires matching `WHERE` clause

Decision Framework

When faced with a slow JSONB query, use this thought process:

What operators are you using?

If you need to find documents containing a set of keys or a nested structure (@>), start with a GIN index. Choose jsonb_path_ops if @> is your only* use case, for its smaller size and better write performance. Choose jsonb_ops if you also need key existence checks (?, ?|, ?&).

Are you filtering on a specific, strongly-typed value deep in the JSON?

* If you are frequently running queries like WHERE (data->'path'->>'key')::int = 123 or ... > 100, a B-Tree index on that specific expression will almost certainly outperform a GIN index.

Does the table contain different types of documents?

* If yes, and your queries target specific document types, always prefer a partial index. Create specialized B-Tree or GIN indexes with a WHERE (data->>'type') = 'your_type' clause. This prevents index bloat and dramatically improves performance by reducing the search space.

Can you combine indexes?

* Yes. It's common to have a general GIN (data jsonb_path_ops) index for ad-hoc exploration and several highly specific partial B-Tree indexes on expressions for performance-critical queries. PostgreSQL's query planner is capable of picking the best index for a given query, or even combining multiple indexes in some scenarios.

By moving beyond a one-size-fits-all GIN index and embracing these advanced, specific indexing strategies, you can unlock the full performance potential of JSONB in PostgreSQL, ensuring your application remains fast and scalable even as your data grows in volume and complexity.

The High Cost of Naive JSONB Queries in Production

Level 1: The Standard GIN Index (`jsonb_ops`)

Level 2: Path-Optimized GIN Index (`jsonb_path_ops`)

Level 3: B-Tree Indexes on Expressions for High Selectivity

Advanced Pattern: Partial Indexes for Heterogeneous Collections

Production Benchmarking and Decision Framework

Decision Framework

Found this article helpful?