PostgreSQL LATERAL Joins for Efficient Time-Series Top-N Queries
The Ubiquitous "Greatest-N-Per-Group" Bottleneck
As a senior engineer, you've inevitably encountered the "greatest-N-per-group" problem. It's a classic query pattern that surfaces in countless domains: retrieving the last 5 comments on a blog post, the 3 most recent sensor readings for each IoT device, or the top 10 trades for every stock symbol in a given day. While the problem statement is simple, the implementation in SQL can be a minefield of performance anti-patterns, especially when dealing with large-scale time-series data.
Many developers initially reach for solutions involving ROW_NUMBER() window functions or correlated subqueries. While these can produce the correct results on small datasets, they often collapse under production load. They can lead to full table scans, massive in-memory or on-disk sorts, and execution times that scale poorly with data volume.
This article dissects this problem and demonstrates a superior, highly-performant solution using PostgreSQL's LATERAL join. We will not just show you the syntax; we will explore the why—delving into query execution plans, critical indexing strategies, and advanced patterns that make this approach production-ready. We will benchmark the common anti-patterns against the LATERAL join to provide concrete evidence of its efficiency.
Setting the Stage: A Production-Scale Scenario
Let's model a common IoT monitoring system. We have millions of devices, each reporting metrics frequently. Our task is to build a dashboard that, for a given list of devices, efficiently retrieves the last 3 critical readings for each one.
First, let's define our schema and populate it with a significant amount of data to simulate a real-world environment.
-- Our devices table
CREATE TABLE devices (
id SERIAL PRIMARY KEY,
hostname TEXT NOT NULL UNIQUE,
location TEXT
);
-- Our time-series readings table
CREATE TABLE device_readings (
id BIGSERIAL PRIMARY KEY,
device_id INT NOT NULL REFERENCES devices(id),
metric_value DOUBLE PRECISION NOT NULL,
status TEXT NOT NULL, -- e.g., 'NORMAL', 'WARNING', 'CRITICAL'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Create an index on the foreign key for standard join performance
CREATE INDEX ON device_readings (device_id);
-- Create a more advanced index for our specific query later
-- We'll discuss the importance of this index in detail
CREATE INDEX idx_device_readings_device_id_created_at_desc ON device_readings (device_id, created_at DESC);
Now, let's populate these tables with a realistic dataset. We'll add 1,000 devices and give each device 10,000 readings.
-- Use a procedural block to generate data
DO $$
DECLARE
device_id_var INT;
BEGIN
-- Insert 1,000 devices
INSERT INTO devices (hostname, location)
SELECT 'device-' || i, 'rack-' || (i % 100)
FROM generate_series(1, 1000) AS i;
-- Insert 10,000 readings for each device (10 million total)
FOR device_id_var IN 1..1000 LOOP
INSERT INTO device_readings (device_id, metric_value, status, created_at)
SELECT
device_id_var,
random() * 100,
CASE (random() * 10)::INT
WHEN 0 THEN 'CRITICAL'
WHEN 1 THEN 'WARNING'
ELSE 'NORMAL'
END,
NOW() - (random() * 365 || ' days')::INTERVAL
FROM generate_series(1, 10000);
END LOOP;
END $$;
-- Ensure our statistics are up to date for the query planner
ANALYZE devices;
ANALYZE device_readings;
We now have a device_readings table with 10 million rows, a scenario where inefficient queries will be painfully obvious.
Anti-Pattern 1: The `ROW_NUMBER()` Window Function
A common first attempt involves using a window function like ROW_NUMBER() within a Common Table Expression (CTE) or subquery.
-- The goal: Get the last 3 CRITICAL readings for a subset of devices.
EXPLAIN ANALYZE
WITH ranked_readings AS (
SELECT
r.device_id,
r.metric_value,
r.status,
r.created_at,
ROW_NUMBER() OVER(PARTITION BY r.device_id ORDER BY r.created_at DESC) as rn
FROM
device_readings r
WHERE
r.status = 'CRITICAL'
AND r.device_id IN (1, 2, 3, 4, 5) -- Querying for 5 specific devices
)
SELECT
device_id,
metric_value,
status,
created_at
FROM
ranked_readings
WHERE
rn <= 3;
On the surface, this looks clean and declarative. However, let's dissect the EXPLAIN ANALYZE output.
Subquery Scan on ranked_readings (cost=17489.67..17711.09 rows=83 width=29) (actual time=143.513..144.755 rows=15 loops=1)
Filter: (ranked_readings.rn <= 3)
-> WindowAgg (cost=17489.67..17643.09 rows=16801 width=37) (actual time=143.510..144.606 rows=5003 loops=1)
-> Sort (cost=17489.67..17531.67 rows=16801 width=29) (actual time=143.490..143.725 rows=5003 loops=1)
Sort Key: r.device_id, r.created_at DESC
Sort Method: quicksort Memory: 576kB
-> Bitmap Heap Scan on device_readings r (cost=358.37..17125.66 rows=16801 width=29) (actual time=3.109..140.697 rows=5003 loops=1)
Recheck Cond: (device_id = ANY ('{1,2,3,4,5}'::integer[]))
Filter: (status = 'CRITICAL'::text)
Heap Blocks: exact=4991
-> Bitmap Index Scan on device_readings_device_id_idx (cost=0.00..354.17 rows=50000 width=0) (actual time=2.012..2.012 rows=50000 loops=1)
Index Cond: (device_id = ANY ('{1,2,3,4,5}'::integer[]))
Planning Time: 0.224 ms
Execution Time: 144.921 ms
Analysis of the `ROW_NUMBER()` Plan
The key bottleneck is the WindowAgg node preceded by a Sort. Here's the sequence of events:
device_id IN (1, 2, 3, 4, 5). This is relatively efficient.status = 'CRITICAL'. In our sample data, this leaves ~5,000 rows.device_id and then created_at DESC to prepare for the window function.ROW_NUMBER() function is applied to the sorted data.rn <= 3.The database has to fetch, filter, and sort a large intermediate result set just to discard most of it. While 144ms might seem acceptable, this execution time will grow significantly as the number of readings per device or the number of queried devices increases.
The `LATERAL` Join: A Paradigm Shift
A LATERAL join can be thought of as a for-each loop in SQL. For each row from the left-hand table, it executes the right-hand subquery, which can reference columns from the left-hand row. This is a powerful concept that allows us to change the fundamental query pattern.
Instead of fetching a massive dataset and then ranking it, we can say: "For each device, go and find its top 3 critical readings."
Here is the equivalent query using LATERAL:
EXPLAIN ANALYZE
SELECT
d.id AS device_id,
r.metric_value,
r.status,
r.created_at
FROM
devices d
JOIN LATERAL (
SELECT
rr.metric_value,
rr.status,
rr.created_at
FROM
device_readings rr
WHERE
rr.device_id = d.id
AND rr.status = 'CRITICAL'
ORDER BY
rr.created_at DESC
LIMIT 3
) r ON true
WHERE
d.id IN (1, 2, 3, 4, 5);
Let's examine the execution plan. This is where the magic happens.
Nested Loop (cost=0.86..41.24 rows=15 width=29) (actual time=0.046..0.129 rows=15 loops=1)
-> Index Scan using devices_pkey on devices d (cost=0.29..20.34 rows=5 width=4) (actual time=0.027..0.035 rows=5 loops=1)
Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
-> Limit (cost=0.57..4.17 rows=3 width=25) (actual time=0.016..0.017 rows=3 loops=5)
-> Index Scan using idx_device_readings_device_id_created_at_desc on device_readings rr (cost=0.57..1373.18 rows=1000 width=25) (actual time=0.015..0.016 rows=3 loops=5)
Index Cond: ((device_id = d.id) AND (status = 'CRITICAL'::text))
Planning Time: 0.457 ms
Execution Time: 0.183 ms
Analysis of the `LATERAL` Plan
This plan is fundamentally different and vastly more efficient.
devices table. It performs a fast Index Scan on the primary key to find the 5 devices we requested.LATERAL subquery. For a given d.id, PostgreSQL performs an Index Scan on our composite index idx_device_readings_device_id_created_at_desc. Because the index is on (device_id, created_at DESC), the database can instantly jump to the section of the index for the correct device_id and read the entries in the exact order required (created_at DESC).The result is a staggering performance improvement: 0.183 ms vs 144.921 ms. That's nearly a 800x speedup. This approach scales beautifully. Querying 100 devices will simply execute the hyper-efficient inner index scan 100 times, rather than creating a massive intermediate dataset to sort.
Indexing for `LATERAL` Supremacy: The Non-Negotiable Prerequisite
The performance of the LATERAL join is entirely dependent on having the correct index. Without it, the query planner would be forced to perform a full scan of device_readings for each device, which would be disastrously slow.
Let's revisit our index:
CREATE INDEX idx_device_readings_device_id_created_at_desc
ON device_readings (device_id, created_at DESC);
Why is this specific structure so critical?
device_id First: The equality condition (rr.device_id = d.id) is on device_id. Placing this column first in the index allows the planner to perform a highly efficient "seek" to the block of index entries for the specific device being processed in the loop.created_at DESC Second: The ORDER BY rr.created_at DESC clause in the subquery matches the sort order of this part of the index. This is the key. Because the index entries are already pre-sorted in the correct order, PostgreSQL doesn't need to perform a separate sort operation. It simply reads the entries from the index sequentially until the LIMIT is satisfied.What if we added the status filter to the index? A (device_id, status, created_at DESC) index would be even better.
-- A potentially even better index
CREATE INDEX idx_device_readings_filtered_sorted
ON device_readings (device_id, status, created_at DESC);
With this index, the planner can seek to the (device_id, 'CRITICAL') block and then read the pre-sorted entries by created_at DESC. This is the pinnacle of optimization for this query pattern.
Advanced `LATERAL` Patterns and Production Edge Cases
Mastering the basic LATERAL join is just the beginning. Let's explore more complex scenarios you'll encounter in production.
Edge Case 1: Handling Devices with Fewer Than N Readings (or None)
What happens if a device has only one 'CRITICAL' reading, or none at all? The JOIN LATERAL (which is an alias for INNER JOIN LATERAL) will exclude that device from the result set entirely, as the subquery will return no rows, causing the join to fail.
This is often not the desired behavior. You might want to see all devices, even if they have no recent critical readings. The solution is LEFT JOIN LATERAL.
-- Let's add a device with no critical readings
INSERT INTO devices (hostname, location) VALUES ('new-device-ok', 'lab-1');
EXPLAIN ANALYZE
SELECT
d.id AS device_id,
d.hostname,
r.metric_value,
r.status,
r.created_at
FROM
devices d
LEFT JOIN LATERAL (
SELECT
rr.metric_value,
rr.status,
rr.created_at
FROM
device_readings rr
WHERE
rr.device_id = d.id
AND rr.status = 'CRITICAL'
ORDER BY
rr.created_at DESC
LIMIT 3
) r ON true
WHERE
d.id IN (1, 1001); -- Device 1 has readings, 1001 does not
Result:
| device_id | hostname | metric_value | status | created_at |
|---|---|---|---|---|
| 1 | device-1 | 85.12 | CRITICAL | 2023-10-26 14:30:00+00 |
| 1 | device-1 | 92.45 | CRITICAL | 2023-10-25 11:20:00+00 |
| 1 | device-1 | 88.78 | CRITICAL | 2023-10-24 09:15:00+00 |
| 1001 | new-device-ok | NULL | NULL | NULL |
The LEFT JOIN ensures that device-1001 is preserved in the output, with NULL values for the columns coming from the LATERAL subquery. This is essential for building complete and accurate dashboards.
Pattern 2: Aggregating Results with `jsonb_agg`
Often, your API doesn't want to return a flattened list of rows. It's more convenient to have one row per device, with its recent readings nested as a JSON array. This is a perfect use case for combining LATERAL with PostgreSQL's powerful JSON functions.
SELECT
d.id AS device_id,
d.hostname,
r.latest_readings
FROM
devices d
LEFT JOIN LATERAL (
SELECT
jsonb_agg(
jsonb_build_object(
'metric_value', rr.metric_value,
'status', rr.status,
'created_at', rr.created_at
) ORDER BY rr.created_at DESC
) AS latest_readings
FROM (
SELECT
rr_inner.metric_value,
rr_inner.status,
rr_inner.created_at
FROM
device_readings rr_inner
WHERE
rr_inner.device_id = d.id
AND rr_inner.status = 'CRITICAL'
ORDER BY
rr_inner.created_at DESC
LIMIT 3
) rr
) r ON true
WHERE
d.id IN (1, 2, 1001);
This query is more complex, but follows the same efficient pattern. The innermost query is our familiar, high-performance LIMIT query. The result of that is then passed to jsonb_agg within the LATERAL subquery. This aggregation happens per device, on a tiny dataset of at most 3 rows, making it extremely fast.
The final output is clean and API-ready:
| device_id | hostname | latest_readings |
|---|---|---|
| 1 | device-1 | [{"status": "CRITICAL", "created_at": "...", "metric_value": 85.12}, {"status": "CRITICAL", "created_at": "...", "metric_value": 92.45}, ...] |
| 2 | device-2 | [{"status": "CRITICAL", "created_at": "...", "metric_value": 76.54}, ...] |
| 1001 | new-device-ok | NULL |
This pattern avoids sending redundant device_id and hostname data over the wire and simplifies client-side data processing.
When *Not* to Use `LATERAL` Joins
While LATERAL is a powerful tool, it's not a silver bullet. There are scenarios where a window function might be more appropriate, or even more performant.
LATERAL subquery is executed for every row passed to it from the left side. If your outer query is SELECT ... FROM devices LEFT JOIN LATERAL ... without a WHERE clause, you will execute the subquery 1,000 times in our example. If you had 1 million devices, that's 1 million executions. While each execution is fast, the cumulative overhead can become significant. In such cases, a single large sort for a window function might be faster, depending on the data distribution and available memory.Always use EXPLAIN ANALYZE on a representative data sample to validate your assumptions.
Conclusion: A Production-Ready Pattern
The "greatest-N-per-group" query is a common source of hidden performance debt in many applications. By moving away from naive ROW_NUMBER() implementations and embracing the JOIN LATERAL pattern, you can achieve orders-of-magnitude performance improvements.
The key takeaways for senior engineers are:
LATERAL: For this pattern, JOIN LATERAL (or LEFT JOIN LATERAL) should be your default tool in PostgreSQL.LATERAL join is inextricably linked to your indexing strategy. Create a composite index that matches the JOIN condition (device_id) and the ORDER BY clause (created_at DESC) of your subquery.LEFT JOIN LATERAL to ensure you don't inadvertently drop groups that have no matching data in the subquery.EXPLAIN ANALYZE to compare query plans and execution times. Prove that your chosen solution is the most efficient for your specific dataset and workload.By internalizing this pattern, you can write more scalable, efficient, and maintainable SQL, eliminating a common class of performance bottlenecks before they ever reach production.