22 - Time-Series & Analytics Databases
Previous: 21 - Search Systems | Next: 23 - Distributed Task Scheduling
Time-Series Data Characteristics
Time-series data is fundamentally different from transactional data. Understanding these differences drives every architectural decision.
Example time-series record:
{ metric: "cpu_usage", host: "web-01", value: 73.2, timestamp: 1704067200 }
Characteristics:
- Append-heavy (write-once, read-many)
- Naturally ordered by time
- High cardinality (many unique tag combinations)
- Recent data queried far more than old data
- Aggregation queries dominate (avg, sum, percentile over time ranges)
- Individual records rarely updated or deleted
| Property | OLTP (Transactional) | Time-Series / OLAP |
|---|---|---|
| Write pattern | Random inserts/updates | Sequential appends |
| Read pattern | Point lookups by PK | Range scans over time |
| Query type | Single-row CRUD | Aggregations over millions of rows |
| Data lifecycle | Retained indefinitely | TTL / downsampled over time |
| Schema | Normalized (3NF) | Denormalized, wide tables |
| Index strategy | B-tree on PK/FK | Time-partitioned, columnar |
OLTP vs OLAP
OLTP OLAP
+------------------------------+ +------------------------------+
| User places order | | "What was total revenue |
| UPDATE inventory SET qty=.. | | by region last quarter?" |
| INSERT INTO orders ... | | |
| | | SELECT region, SUM(revenue) |
| Latency: < 10ms | | FROM sales |
| Rows affected: 1-10 | | WHERE date BETWEEN ... |
| Concurrency: thousands | | GROUP BY region |
+------------------------------+ | |
| Latency: seconds to minutes |
| Rows scanned: millions |
| Concurrency: tens |
+------------------------------+
| Dimension | OLTP | OLAP |
|---|---|---|
| Users | Application users | Analysts, dashboards |
| Data | Current state | Historical |
| Schema | Normalized | Star/snowflake |
| Optimized for | Writes, point reads | Reads, aggregations |
| Storage | Row-oriented | Column-oriented |
| Examples | PostgreSQL, MySQL | ClickHouse, BigQuery, Redshift |
Columnar Storage: Why It Is Fast for Analytics
Row vs Column Storage Layout
Row-oriented (PostgreSQL, MySQL):
Row 1: | user_id=1 | name="Alice" | age=30 | city="NYC" |
Row 2: | user_id=2 | name="Bob" | age=25 | city="London" |
Row 3: | user_id=3 | name="Carol" | age=35 | city="NYC" |
Disk: [1,Alice,30,NYC] [2,Bob,25,London] [3,Carol,35,NYC]
Column-oriented (ClickHouse, Parquet):
user_id: [1, 2, 3]
name: ["Alice", "Bob", "Carol"]
age: [30, 25, 35]
city: ["NYC", "London", "NYC"]
Disk: [1,2,3] [Alice,Bob,Carol] [30,25,35] [NYC,London,NYC]
Why Columnar Wins for Analytics
Query: SELECT AVG(age) FROM users WHERE city = 'NYC'
Row store: Must read ALL columns for EVERY row
Read: [1,Alice,30,NYC] [2,Bob,25,London] [3,Carol,35,NYC]
IO: 3 full rows * all columns = lots of wasted reads
Column store: Only read the 2 columns needed
Read: city = [NYC, London, NYC] --> filter: rows 0, 2
Read: age = [30, 35] --> AVG = 32.5
IO: only 2 columns, skip the rest
Three key advantages:
- Read amplification reduction: Only read columns referenced in the query
- Compression: Same-type values compress extremely well (run-length encoding, dictionary encoding, delta encoding)
- Vectorized execution: Process entire column vectors using SIMD instructions
Compression Techniques
| Technique | Best For | Example |
|---|---|---|
| Dictionary encoding | Low-cardinality strings | city: {0: "NYC", 1: "London"} -> [0, 1, 0] |
| Run-length encoding | Sorted repeated values | [NYC, NYC, NYC, London] -> [(NYC, 3), (London, 1)] |
| Delta encoding | Timestamps, sequences | [1000, 1001, 1002] -> [1000, +1, +1] |
| Bit packing | Small integers | Store 3-bit values packed into bytes |
| LZ4 / ZSTD | General blocks | Block-level compression after encoding |
Real-world compression ratios: 5-10x is typical for columnar stores. ClickHouse often achieves 10-20x on real workloads.
Time-Series Databases
InfluxDB
Purpose-built for metrics and events.
Write (line protocol):
cpu,host=web01,region=us-east value=73.2 1704067200000000000
Query (Flux language):
from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu" and r.host == "web01")
|> aggregateWindow(every: 5m, fn: mean)
| Feature | Detail |
|---|---|
| Storage engine | TSM (Time-Structured Merge Tree) |
| Data model | Measurement + tags + fields + timestamp |
| Retention policies | Auto-delete data after TTL |
| Downsampling | Continuous queries / tasks |
| Best for | DevOps metrics, IoT sensor data |
TimescaleDB
PostgreSQL extension for time-series. Full SQL compatibility.
sql-- Create hypertable (auto-partitioned by time) SELECT create_hypertable('metrics', 'time'); -- Insert like normal SQL INSERT INTO metrics (time, host, cpu) VALUES (NOW(), 'web01', 73.2); -- Query with full SQL power SELECT time_bucket('5 minutes', time) AS bucket, host, AVG(cpu) as avg_cpu FROM metrics WHERE time > NOW() - INTERVAL '1 hour' GROUP BY bucket, host ORDER BY bucket DESC;
| Feature | Detail |
|---|---|
| Engine | PostgreSQL + chunked hypertables |
| Data model | Regular SQL tables, auto-partitioned by time |
| Advantage | Full SQL, JOIN with relational data, extensions |
| Compression | Native columnar compression (90%+ reduction) |
| Best for | When you need SQL + time-series in one system |
Prometheus
Pull-based monitoring system with its own TSDB.
Architecture:
+-------------+ +------------+ +-----------+
| Application | <--- | Prometheus | ---> | Grafana |
| /metrics | pull | TSDB | query| Dashboard |
+-------------+ +------------+ +-----------+
|
[Alert Manager]
promql# Average CPU over 5 minutes, by instance avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) # 99th percentile request latency histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
| Feature | Detail |
|---|---|
| Collection | Pull-based (scrapes HTTP endpoints) |
| Storage | Local TSDB (WAL + blocks, 2-hour chunks) |
| Query | PromQL (purpose-built query language) |
| Retention | Default 15 days (local), longer via Thanos/Cortex |
| Best for | Infrastructure monitoring, alerting |
Analytics Databases
ClickHouse
Open-source columnar OLAP database. Extremely fast for analytical queries.
sql-- Create a MergeTree table (ClickHouse's core engine) CREATE TABLE events ( event_date Date, event_time DateTime, user_id UInt64, event_type LowCardinality(String), properties String ) ENGINE = MergeTree() PARTITION BY toYYYYMM(event_date) ORDER BY (event_type, user_id, event_time); -- Query: count events by type in the last hour SELECT event_type, count() FROM events WHERE event_time > now() - INTERVAL 1 HOUR GROUP BY event_type ORDER BY count() DESC;
| Feature | Detail |
|---|---|
| Storage | Columnar, compressed, sorted by primary key |
| Engine types | MergeTree family (ReplacingMergeTree, AggregatingMergeTree, etc.) |
| Performance | Scans billions of rows/sec on commodity hardware |
| Replication | ZooKeeper-based (or ClickHouse Keeper) |
| Best for | Real-time analytics, event tracking, log analysis |
Why ClickHouse is fast:
- Columnar storage with aggressive compression
- Vectorized query execution (processes data in batches, SIMD)
- Sparse primary index (granule-based, not per-row)
- Parallel query execution across cores and shards
BigQuery (Google)
Serverless analytics warehouse. No infrastructure to manage.
sqlSELECT DATE(created_at) as day, COUNT(DISTINCT user_id) as dau FROM `project.dataset.events` WHERE created_at >= '2024-01-01' GROUP BY day ORDER BY day;
| Feature | Detail |
|---|---|
| Architecture | Separation of storage (Colossus) and compute (Dremel) |
| Pricing | Per-query (bytes scanned) + storage |
| Performance | Massively parallel, auto-scales |
| Best for | Ad-hoc analytics, data warehousing, ML features |
Redshift (AWS)
Managed columnar data warehouse based on ParAccel (PostgreSQL fork).
| Feature | Detail |
|---|---|
| Architecture | Cluster of nodes (leader + compute nodes) |
| Storage | Columnar, zone maps, sort keys |
| Scaling | Resize cluster or use Redshift Serverless |
| Distribution styles | KEY, EVEN, ALL (affects join performance) |
| Best for | Enterprise data warehousing, BI dashboards |
Data Warehousing Concepts
Star Schema
The most common analytical schema. Simple, fast to query.
+----------------+
| dim_date |
| date_key (PK) |
| year, month, |
| quarter, ... |
+-------+--------+
|
+---------------+ | +----------------+
| dim_product | | | dim_customer |
| product_key +--+ | +--+ customer_key |
| name, category| | | | | name, region |
+---------------+ | | | +----------------+
| | |
+----+--+--+----+
| fact_sales |
| date_key (FK) |
| product_key |
| customer_key |
| quantity |
| revenue |
| discount |
+----------------+
Properties:
- Central fact table (measurable events: sales, clicks, orders)
- Dimension tables radiate outward (who, what, when, where)
- Denormalized dimensions (one JOIN to get any dimension attribute)
- Optimized for aggregation queries
Snowflake Schema
Normalized dimensions. Reduces redundancy at the cost of more JOINs.
dim_category dim_product fact_sales
+------------+ +-------------+ +------------+
| cat_key | <----- | product_key | <--- | product_key|
| cat_name | | name | | date_key |
+------------+ | cat_key (FK)| | revenue |
+-------------+ +------------+
Star vs Snowflake
| Property | Star | Snowflake |
|---|---|---|
| Query complexity | Simpler (fewer JOINs) | More JOINs |
| Query performance | Faster | Slower |
| Storage | More redundancy | Less redundancy |
| ETL complexity | Simpler | More complex |
| Recommendation | Default choice | Only if storage is a concern |
Fact and Dimension Tables
| Aspect | Fact Table | Dimension Table |
|---|---|---|
| Content | Measurements, metrics | Descriptive attributes |
| Size | Very large (billions of rows) | Small to medium |
| Growth | Rapid (new events constantly) | Slow (new products, users) |
| Examples | Sales, clicks, impressions | Products, users, dates, regions |
| Keys | Foreign keys to dimensions | Surrogate keys (auto-increment) |
Materialized Views and Pre-Aggregation
Materialized Views
Pre-computed query results stored as tables. Trade storage for query speed.
sql-- PostgreSQL materialized view CREATE MATERIALIZED VIEW daily_revenue AS SELECT date_trunc('day', order_time) AS day, product_category, SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM orders GROUP BY 1, 2; -- Refresh (full recomputation) REFRESH MATERIALIZED VIEW daily_revenue; -- ClickHouse: use AggregatingMergeTree for incremental materialized views CREATE MATERIALIZED VIEW daily_revenue_mv ENGINE = AggregatingMergeTree() ORDER BY (day, product_category) AS SELECT toDate(order_time) AS day, product_category, sumState(revenue) AS total_revenue, countState() AS order_count FROM orders GROUP BY day, product_category;
Rollup and Downsampling
Reduce storage by aggregating old high-resolution data into lower resolution.
Raw data (1-second resolution):
12:00:01 cpu=72.3
12:00:02 cpu=73.1
12:00:03 cpu=71.8
...
(86,400 points per day per metric)
After 7 days --> downsample to 1-minute:
12:00 cpu_avg=72.4, cpu_max=73.1, cpu_min=71.8
12:01 cpu_avg=74.2, cpu_max=76.0, cpu_min=73.5
(1,440 points per day -- 60x reduction)
After 30 days --> downsample to 1-hour:
12:00 cpu_avg=73.8, cpu_max=82.1, cpu_min=68.2
(24 points per day -- 3,600x reduction)
Retention policy example:
| Age | Resolution | Storage per metric/day |
|---|---|---|
| 0-7 days | 1 second | 86,400 points |
| 7-30 days | 1 minute | 1,440 points |
| 30-365 days | 1 hour | 24 points |
| 1+ years | 1 day | 1 point |
Lambda vs Kappa Architecture
Lambda Architecture
Separate batch and real-time processing paths. Merge results at query time.
+-------------------+
| Data Source |
+--------+----------+
|
+------------+------------+
| |
+--------+--------+ +--------+--------+
| Batch Layer | | Speed Layer |
| (Hadoop/Spark) | | (Flink/Storm) |
| Complete, accurate| | Low-latency, |
| Hours-old results | | approximate |
+--------+---------+ +--------+---------+
| |
+--------+---------+ +--------+---------+
| Batch Views | | Real-time Views |
| (pre-computed) | | (incremental) |
+--------+---------+ +--------+---------+
| |
+------------+------------+
|
+--------+----------+
| Serving Layer |
| Merge batch + |
| real-time results |
+-------------------+
Pros: Batch layer is simple and recomputable; speed layer handles freshness. Cons: Two codebases, complex merging logic, operational overhead.
Kappa Architecture
Single stream processing path handles everything. Reprocessing = replay the stream.
Data Source --> Kafka (immutable log) --> Stream Processor (Flink)
| |
| +------+------+
| | Serving |
| | Layer |
| +-------------+
|
To reprocess:
Reset consumer offset to beginning
Replay all events through new processor version
Pros: One codebase, simpler operations, single source of truth (the log). Cons: Reprocessing can be slow for large history; not all problems fit streaming.
When to Use Which
| Scenario | Recommendation |
|---|---|
| Legacy batch + new real-time need | Lambda (add speed layer) |
| Greenfield analytics pipeline | Kappa (simpler) |
| Complex ML training + real-time serving | Lambda (batch for training) |
| Event-driven microservices | Kappa (Kafka + Flink) |
| Both approaches are overkill | Just use a good OLAP DB (ClickHouse, BigQuery) |
Interview Tips
- Know when to reach for a columnar store. If the interviewer mentions dashboards, analytics, or "aggregate over millions of rows," that is your signal.
- Star schema is the default. Don't overcomplicate with snowflake schema unless storage is explicitly constrained.
- Explain why columnar compression works. Same data types in a column compress far better than mixed types in a row. This is the key insight.
- Downsampling is essential for time-series at scale. Show you understand retention policies and the storage/resolution trade-off.
- Lambda vs Kappa is a common follow-up in streaming questions. Know the trade-offs, and lean toward Kappa for newer systems.
- Mention separation of storage and compute. Modern warehouses (BigQuery, Snowflake, Redshift Serverless) separate these, enabling independent scaling.
Resources
- DDIA (Kleppmann) -- Chapter 3: Storage and Retrieval (column-oriented storage), Chapter 10: Batch Processing, Chapter 11: Stream Processing
- System Design Interview Vol. 2 (Alex Xu) -- Metrics monitoring and alerting system
- ClickHouse Documentation: MergeTree engine family, query optimization
- "The Lambda Architecture" -- Nathan Marz (original blog post)
- "Questioning the Lambda Architecture" -- Jay Kreps (Kappa proposal)
- Google Dremel Paper: "Dremel: Interactive Analysis of Web-Scale Datasets"
- Prometheus Documentation: TSDB storage, PromQL
- "The Data Warehouse Toolkit" (Ralph Kimball) -- Star schema, dimensional modeling
- TimescaleDB Documentation: Hypertables, continuous aggregates
- InfluxDB Documentation: TSM engine, Flux query language
Previous: 21 - Search Systems | Next: 23 - Distributed Task Scheduling