Scaling IoT Metrics: TimescaleDB Continuous Aggregates & Compression
The Inevitable Failure of Naive Time-Series Tables
As a senior engineer tasked with building an observability platform for a fleet of IoT devices, you've likely encountered this scenario. You start with a standard PostgreSQL table, which works beautifully for the first 100 devices. By the time you reach 10,000 devices, each reporting multiple sensor readings every 10 seconds, your metrics table is growing by over 86 million rows per day. Your dashboard queries, once snappy, now take minutes to execute, locking up resources and causing cascading failures. The core problem is that traditional B-tree indexes become inefficient for time-series queries that scan vast time ranges, and aggregate functions like AVG(), MAX(), and MIN() must perform massive sequential scans over terabytes of data.
Let's define a concrete, production-grade schema to illustrate the problem. We're tracking environmental sensors across a global fleet of devices.
-- The schema for our IoT sensor readings
CREATE TABLE sensor_data (
ts TIMESTAMPTZ NOT NULL,
device_id UUID NOT NULL,
sensor_type TEXT NOT NULL, -- e.g., 'temperature', 'humidity', 'pressure'
location TEXT NOT NULL, -- e.g., 'us-east-1a-bldg7-floor3'
reading DOUBLE PRECISION NOT NULL,
PRIMARY KEY (ts, device_id, sensor_type)
);
-- An index to support per-device queries
CREATE INDEX ON sensor_data (device_id, ts DESC);
A typical dashboard query might be to find the average hourly temperature for all devices in a specific building.
-- A query that becomes prohibitively slow at scale
SELECT
time_bucket('1 hour', ts) AS hour,
location,
AVG(reading)
FROM sensor_data
WHERE
ts > NOW() - INTERVAL '7 days'
AND sensor_type = 'temperature'
AND location LIKE 'us-east-1a-bldg7%'
GROUP BY hour, location
ORDER BY hour DESC;
On a table with a few billion rows, this query forces the database to scan seven days' worth of raw data, filter it, and then perform a costly sort and group operation. This is unsustainable. This article assumes you've already made the leap to TimescaleDB and converted this table into a hypertable. We will not cover the basics of create_hypertable. Instead, we will focus on the advanced patterns required to make this system performant and cost-effective at massive scale.
Advanced Hypertable Architecture: Beyond Time Partitioning
Simply calling create_hypertable('sensor_data', 'ts') is just the first step. For high-cardinality workloads, you must leverage space partitioning to avoid creating unmanageably large chunks and to improve query performance where a secondary dimension is frequently used as a filter.
In our IoT scenario, device_id is the source of our high cardinality. By partitioning on it, we can ensure that data for a single device is co-located within a smaller set of chunks, dramatically improving the performance of queries that filter by device_id.
-- Convert to a hypertable with both time and space partitioning
-- Let's assume we want to create 16 space partitions (or 'slices')
SELECT create_hypertable('sensor_data', 'ts', 'device_id', 16);
The Trade-offs of Space Partitioning
Choosing the number of space partitions is not arbitrary. It's a trade-off:
* Too Few Partitions: If you have millions of devices, a small number of partitions (e.g., 4) means each partition still contains data from a massive number of devices. The locality benefit is diminished.
* Too Many Partitions: If you create too many partitions (e.g., 1024), you risk write amplification and connection management overhead. A single batch insert that contains data for many devices may need to write to hundreds of different chunk tables simultaneously.
* Cardinality Mismatch: The number of partitions should ideally be a multiple of the number of data nodes in a multi-node setup to ensure even distribution. For a single-node instance, a good starting point is a number close to the number of available CPU cores (e.g., 8, 16, 32) to allow for parallel operations.
Production Pattern: Start with a moderate number of space partitions (e.g., 2 * num_cores). Monitor I/O performance and query plans using EXPLAIN (ANALYZE, BUFFERS). If single-device queries are not showing significant improvements in buffer hits, and you see I/O contention during ingestion, you may need to adjust your strategy. Remember that changing partition keys on a hypertable with data is a complex migration task, so getting this right early is critical.
Continuous Aggregates: The Core of Time-Series Performance
Continuous Aggregates (C-Aggs) are the cornerstone of query performance in TimescaleDB. They are essentially materialized views that are automatically and incrementally updated in the background. The key is that they store pre-computed aggregates, allowing your dashboard queries to hit a much smaller, pre-summarized table instead of the raw, multi-billion-row hypertable.
Let's create a C-Agg for our slow dashboard query.
-- Create the continuous aggregate view
CREATE MATERIALIZED VIEW sensor_hourly_aggregates
WITH (timescaledb.continuous)
AS
SELECT
time_bucket('1 hour', ts) AS hour,
device_id,
location,
sensor_type,
MIN(reading) as min_reading,
MAX(reading) as max_reading,
AVG(reading) as avg_reading,
COUNT(reading) as num_readings
FROM sensor_data
GROUP BY hour, device_id, location, sensor_type;
Now, we need a policy to keep it updated.
-- Add a policy to refresh the aggregate automatically
SELECT add_continuous_aggregate_policy('sensor_hourly_aggregates',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '30 minutes');
Dissecting the Refresh Policy: Handling Late Data
This is where we move from basic usage to production-grade implementation. The start_offset and end_offset parameters are critical for correctness in systems where data can arrive late.
end_offset => INTERVAL '1 hour': This tells the materialization job to not* aggregate data from the most recent hour. Why? Because data for that hour might still be in transit from devices. If we materialized it immediately, we would get an incomplete, incorrect aggregate. By setting a 1-hour offset, we give data a grace period to arrive before it's summarized.
* start_offset => INTERVAL '3 hours': This tells the job how far back in time it should look for changes. When the job runs, it will refresh the region between 3 hours ago and 1 hour ago. This means if a device was offline and its data from 2.5 hours ago just arrived, the next policy run will correctly pick it up and update the aggregate. Setting this too low risks missing late data; setting it too high causes the refresh job to do unnecessary work.
Production Pattern: Your end_offset should be determined by the maximum expected data latency in your system (P99 or P99.9 latency). Your start_offset should be larger than your end_offset and is often set to a value that covers the refresh interval of the C-Agg itself, plus some buffer. For example, if the C-Agg refreshes every hour, a start_offset of 2 or 3 hours is reasonable.
Hierarchical Continuous Aggregates for Multi-Layer Dashboards
What if you have another dashboard that shows daily summaries? Querying the hourly aggregate we just created is much faster than the raw data, but it can still be slow if you need to summarize years of hourly data. The solution is to build a C-Agg on top of another C-Agg.
-- Create a DAILY aggregate based on our HOURLY aggregate
CREATE MATERIALIZED VIEW sensor_daily_aggregates
WITH (timescaledb.continuous)
AS
SELECT
time_bucket('1 day', hour) AS day,
location,
sensor_type,
MIN(min_reading) as min_reading_day,
MAX(max_reading) as max_reading_day,
-- Note the weighted average calculation for correctness!
SUM(avg_reading * num_readings) / SUM(num_readings) as avg_reading_day
FROM sensor_hourly_aggregates
GROUP BY day, location, sensor_type;
-- Add a policy for the daily aggregate
SELECT add_continuous_aggregate_policy('sensor_daily_aggregates',
start_offset => INTERVAL '2 days',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 hour');
CRITICAL DETAIL: Notice the calculation for avg_reading_day. You cannot simply take the average of averages (AVG(avg_reading)). This is statistically incorrect. You must store the COUNT and SUM (or AVG and COUNT) in the lower-level aggregate to correctly compute the weighted average in the higher-level aggregate. This is a common and serious mistake.
Your dashboard queries can now intelligently choose which aggregate to query based on the time range, ensuring consistently fast responses.
-- Query for the last 24 hours hits the HOURLY aggregate
SELECT * FROM sensor_hourly_aggregates WHERE hour > NOW() - INTERVAL '1 day';
-- Query for the last 90 days hits the much smaller DAILY aggregate
SELECT * FROM sensor_daily_aggregates WHERE day > NOW() - INTERVAL '90 days';
Advanced Compression: The Key to Cost-Effective Storage
With billions of rows ingested daily, storage is a primary concern. TimescaleDB's native compression is columnar, meaning it can achieve incredible compression ratios (often 90-98%) on time-series data. But default settings won't give you the best results.
Effective compression depends on ordering the data correctly before compressing it. We can control this with the compress_segmentby and compress_orderby options.
-- Enable compression with an optimized strategy
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'device_id, sensor_type',
timescaledb.compress_orderby = 'ts'
);
Dissecting `compress_segmentby` and `compress_orderby`
* compress_orderby = 'ts': This is the most crucial part. TimescaleDB uses algorithms like delta-of-delta encoding for timestamps and Gorilla-style compression for floats. These algorithms are most effective when data is ordered by time, as consecutive values are often very similar. Always order by your time column.
* compress_segmentby = 'device_id, sensor_type': This option tells TimescaleDB to create separate compression segments for each unique combination of device_id and sensor_type. This is extremely powerful. All the data for a single sensor on a single device will be stored together in a highly compressed format. This not only improves the compression ratio but also makes queries that filter on these columns (WHERE device_id = '...' AND sensor_type = '...') incredibly fast, as the database can decompress only the specific segments it needs.
Finally, we add a policy to compress old data chunks automatically.
-- Add a policy to compress chunks older than 7 days
SELECT add_compression_policy('sensor_data', compress_after => INTERVAL '7 days');
Production Pattern: The compress_after interval must be carefully coordinated with your C-Agg policies. You must ensure that data is aggregated before it is compressed. A common pattern is:
- Data arrives and is written to an uncompressed chunk.
- The C-Agg policy runs, materializing aggregates from recent, uncompressed chunks (e.g., data from 1-3 hours ago).
- The compression policy runs later, compressing chunks that are much older (e.g., > 7 days) and have already been fully aggregated.
This ensures that the frequent C-Agg jobs are always working with fast, uncompressed row-based data, while your long-term storage is optimized for size and scan speed.
Edge Cases and Production Monitoring
Operating this system at scale requires handling inevitable real-world problems.
Edge Case 1: Schema Evolution
What happens when you need to add a firmware_version column to your sensor_data table and include it in your aggregates?
ALTER TABLE sensor_data ADD COLUMN firmware_version TEXT;
ALTER the C-Agg view directly in a way that adds a new GROUP BY column. You must drop and recreate it. -- 1. Drop the policy first
SELECT remove_continuous_aggregate_policy('sensor_hourly_aggregates');
-- 2. Drop the view
DROP MATERIALIZED VIEW sensor_hourly_aggregates;
-- 3. Recreate the view with the new column
CREATE MATERIALIZED VIEW sensor_hourly_aggregates ...
AS
SELECT
time_bucket('1 hour', ts) AS hour,
device_id,
location,
sensor_type,
firmware_version, -- The new column
MIN(reading) as min_reading, ...
FROM sensor_data
GROUP BY hour, device_id, location, sensor_type, firmware_version;
-- 4. Re-add the policy
SELECT add_continuous_aggregate_policy(...);
Backfilling: After recreating the view, it will be empty. The policy will only start materializing new data. To backfill historical data, you must manually call refresh_continuous_aggregate for the desired time range. This can be a long-running, resource-intensive operation that should be done during off-peak hours.
-- Manually backfill the last year of data into the new aggregate
CALL refresh_continuous_aggregate('sensor_hourly_aggregates', '2022-01-01', '2023-01-01');
Edge Case 2: Monitoring Background Jobs
C-Agg refreshes, compression, and data retention policies all run as background jobs managed by the TimescaleDB scheduler. When a dashboard seems slow or data appears missing, these jobs are the first place to look.
-- Check the status and statistics of all TimescaleDB jobs
SELECT
job_id,
application_name,
schedule_interval,
config,
next_start,
total_runs,
total_successes,
total_failures
FROM timescaledb_information.jobs;
-- Check the detailed run history of a specific job (e.g., job_id 1000)
SELECT
job_id,
run_status,
start_time,
end_time,
total_duration
FROM timescaledb_information.job_stats
WHERE job_id = 1000
ORDER BY start_time DESC
LIMIT 100;
These views are indispensable for debugging. A high total_failures count indicates a problem (e.g., insufficient work_mem for the job, data corruption, permissions issues) that needs immediate investigation. You can also manually trigger a job using SELECT run_job( to test changes or force a refresh outside of its schedule.
Performance Benchmarking: A Comparative Analysis
To demonstrate the cumulative impact of these strategies, consider the following conceptual benchmark for our hourly aggregation query over 7 days of data on a system with 5 billion rows of raw sensor data.
| Strategy | Storage Size (Est.) | Query Time (Est.) | Implementation Complexity |
|---|---|---|---|
| Standard PostgreSQL Table | 10 TB | 300 - 1800 seconds | Low |
| TimescaleDB Hypertable (Time Partitioning Only) | 10 TB | 60 - 120 seconds | Medium |
| Hypertable + Space Partitioning | 10 TB | 45 - 90 seconds | Medium |
| Hypertable + Hourly Continuous Aggregate | 10.2 TB* | 50 - 200 milliseconds | High |
| Hypertable + C-Agg + Compression | 1.2 TB* | 50 - 200 milliseconds | High |
\Storage increases slightly due to the aggregate table.*
\\Total storage is drastically reduced as raw data chunks are compressed.
Conclusion: From Architecture to Operation
Scaling time-series workloads in PostgreSQL is not about a single feature; it's about the strategic combination of hypertables, advanced partitioning, hierarchical continuous aggregates, and intelligent compression. Moving from a simple hypertable to this multi-layered architecture is the difference between a system that fails under load and one that can scale cost-effectively to petabytes of data while providing millisecond-level query responses.
The key takeaways for senior engineers are:
compress_segmentby to align compression with your query patterns, and schedule compression to run after aggregation.timescaledb_information schema to monitor the health and performance of the background jobs that are the lifeblood of your system.By implementing these production-grade patterns, you can transform PostgreSQL into a world-class, highly performant time-series database capable of handling the most demanding IoT and observability workloads.