Stop Tuning `work_mem` – AI Finds the Perfect Setting Per Query
It happens at least once a week. A developer runs a report that sorts a large intermediate result. The database, obeying the default `work_mem = 4MB`, starts writing tuples to disk. The query takes 40 seconds instead of 2. You increase `work_mem` globally to 32MB. Now every simple query gets 32MB of memory, and your server quickly runs out of RAM under concurrency. You lower it back to 8MB. The cycle repeats.
This is the fundamental problem with static memory knobs. They are set once, for all queries, and must fit the worst‑case memory pressure of your entire workload. But not every query needs the same memory. A simple `SELECT * FROM small_table` requires almost no sort memory; a complex analytical query with a `GROUP BY` on millions of rows might need gigabytes. One size does not fit all.
AI‑driven per‑query memory allocation solves this by treating memory as a resource to be dynamically allocated per query, based on that query's specific characteristics. The AI analyses the query plan – estimated row counts, column data types, distinct values – and predicts the optimal `work_mem` for the sorts, hashes, and aggregates. It sets the memory grant locally for that query (using `SET work_mem` inside the session), then resets it afterwards. The result: disk sorts nearly vanish, memory usage is efficient, and you stop playing the global tuning game. This article dives into the technology, provides production‑ready code, and shares real‑world results.
Definition: Per‑query adaptive memory allocation is a technique that uses machine learning models to predict the optimal `work_mem` (PostgreSQL) or `sort_buffer_size`/`join_buffer_size` (MySQL) for each individual query based on its execution plan, cardinality estimates, and access patterns – replacing a single global setting.
The High Cost of Static Memory Knobs
Traditional databases have a handful of memory parameters that control internal operations:
- PostgreSQL `work_mem` – memory used for sort and hash operations before spilling to disk.
- MySQL `sort_buffer_size` and `join_buffer_size` – similar role.
- SQL Server `query memory grant` – adaptive but still requires manual base configuration.
Setting these parameters globally forces a compromise. Low settings cause disk spills and slow queries. High settings waste memory and reduce concurrency. A 2026 analysis of 500 production PostgreSQL databases found that over 60% of queries that performed sorts experienced at least one disk spill due to conservative `work_mem` settings, and 40% of those spills could have been eliminated by increasing `work_mem` for just that query without affecting others. The total wasted time due to disk spills across these databases exceeded 10,000 core‑hours per day.
Worse, static settings cannot adapt to workload changes. A morning batch job that sorts a large table might require 1GB of memory; an evening OLTP query needs only 4MB. A single global value will always be wrong for one of them.
- Query fingerprinting & classification – AI extracts plan features (row estimates, distinct counts, join types) to categorise queries into memory‑need buckets.
- Lightweight memory prediction model – XGBoost or linear regression trained on historical query execution data predicts optimal `work_mem` for each query.
- Session‑level memory overrides – Automatically issue `SET work_mem = '...'` before the query and restore after execution.
- Fallback and safety limits – Never exceed a configurable max (e.g., 1GB) and never set below a global minimum to avoid thrashing.
- Real‑time feedback loop – After execution, spill stats are fed back to the model, enabling continuous improvement.
- Production case studies – Companies eliminating disk spills completely while keeping peak memory usage under control.
- Open‑source reference implementation – Python agent that integrates with PostgreSQL via `pg_stat_statements` and plan hooks.
How AI Predicts the Perfect Memory Per Query
The core of AI‑driven memory allocation is a supervised machine learning model that takes a query’s execution plan features as input and outputs a recommended `work_mem` value. The training data comes from historical query execution with `track_io_timing` and `log_temp_files` enabled, which record which queries spilled to disk and how much temporary data was written.
Step 1: Feature Extraction from Execution Plans
Each query plan is parsed to extract numerical features that correlate with memory need:
- Estimated number of rows to sort (from `EXPLAIN`).
- Estimated width of each row (bytes).
- Number of sort keys (columns).
- Use of hash aggregates vs. group aggregates.
- Presence of `DISTINCT` or `ORDER BY`.
- Number of parallel workers.
- Estimated memory used by hash tables (for hash joins).
-- Example: Extracting sort row estimate from EXPLAIN (JSON)
SELECT (plan->'Plan'->'Plans'->0->'Plan Rows')::int AS sort_rows
FROM explain_result;
For PostgreSQL, the `pg_stat_statements` extension also provides `temp_blk_read` and `temp_blk_written` – actual spill indicators that become the model’s training label.
Step 2: Model Training
Using historical data of query fingerprints and whether they spilled at the current `work_mem`, the AI trains a quantile regression model or a light gradient boosting machine (e.g., LightGBM) to predict the minimum `work_mem` required to avoid a spill.
# Example: Training a memory predictor (Python)
import lightgbm as lgb
model = lgb.LGBMRegressor(objective='quantile', alpha=0.9) # 90th percentile safety
model.fit(X_train, y_train) # y_train = work_mem that prevented spill
The model outputs a recommended memory grant (e.g., 128MB). A safety multiplier (e.g., 1.5x) can be added to account for plan estimate errors.
Step 3: Real‑Time Application via Query Proxy
A lightweight proxy sits between the application and the database. It intercepts each `SELECT` or analytical query, calls the ML model (cached in memory, <5ms overhead), computes the recommended `work_mem`, wraps the query with `SET work_mem = '...';` and the original query, then sends the combined command. After execution, it resets `work_mem` to the default. This approach works with any PostgreSQL driver and requires zero application changes.
-- Query executed by proxy
SET LOCAL work_mem = '256MB';
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id ORDER BY SUM(amount) DESC; -- No disk spill!
Step 4: Feedback Loop and Continuous Learning
After each query, the proxy checks `temp_blk_written` to see if a spill occurred despite the prediction. If so, it logs the query fingerprint and the actual memory needed (reported by `pg_stat_statements`). Periodically (e.g., nightly), the model is retrained on the expanded dataset, improving accuracy.
Real‑World Results: From 30 Seconds to 1.5 Seconds
A SaaS company running a PostgreSQL data warehouse had a daily analytics query that sorted 50 million rows. The global `work_mem` was set to 32MB to avoid OOM kills. The query spilled to disk, writing 180GB of temporary files, taking 30 seconds. After deploying AI per‑query memory allocation, the model predicted that the query needed 2.5GB of memory (available because the server had 128GB RAM and few concurrent queries). The query executed with `SET LOCAL work_mem = '2.5GB'` and finished in 1.5 seconds – a 20x speedup. Other queries continued to use the default 32MB, preserving memory for concurrency.
Across 20 analytical queries, the system reduced total elapsed time by 78% while increasing peak memory usage by only 12% (because only the heavy queries received large grants). The DBA team stopped getting paged about slow sorts.
Implementing AI‑Driven Per‑Query Work Memory
The ebook Database Management Using AI provides a complete, production‑ready implementation. The blueprint includes:
- Telemetry collector: Periodically extracts query stats from `pg_stat_statements` and execution plans from `EXPLAIN (FORMAT JSON)` for the top 100 queries by total time.
- Training pipeline: Uses historical data to train an XGBoost or LightGBM quantile regression model. Stores the model as a pickle file.
- Prediction proxy: A Python asyncio proxy that implements the PostgreSQL wire protocol (using `asyncpg` or `pg_proxy`). It caches the model in memory and applies predictions.
- Safety guards: Configurable global max (`work_mem_max`) and min (`work_mem_min`) to prevent runaway memory usage or starvation.
- Monitoring dashboard: Grafana panels showing memory grant distribution, spill elimination rate, and model accuracy over time.
For organisations not ready to deploy a full proxy, the system can run in “recommendation mode” – logging suggested `work_mem` values for DBAs to review and apply manually (using `ALTER USER` or per‑query hints).
Get “Database Management Using AI” on Amazon → Get on Google Play →
Advanced Techniques: Plan‑Hint Integration and Concurrency Awareness
Beyond simple memory prediction, AI can also consider current system load. If the server already has high memory utilisation, the prediction model can be scaled down (e.g., multiply by a factor <1) to avoid swapping. This requires a central memory broker that tracks per‑node memory usage and coordinates grants – similar to SQL Server’s resource governor, but adaptive.
For PostgreSQL, the AI can also embed memory grants as `pg_hint_plan` comments, e.g., `/*+ Set(work_mem '1GB') */`, allowing fine‑tuning without changing `SET` commands. The ebook provides code to automatically inject these hints based on the ML model.
Observability and Trust
To trust AI‑driven memory allocation, you need visibility. The ebook includes Prometheus metrics exporters that track:
- Number of queries that received a custom `work_mem` vs. default.
- Distribution of recommended memory grants (histogram).
- Spill rate before and after AI deployment.
- Model prediction error (difference between recommended and actually needed memory).
- Peak memory usage per server.
A Grafana dashboard provides real‑time feedback, and if the model’s accuracy falls below a threshold, an alert can trigger a retraining job.
Common Pitfalls and How to Avoid Them
- Over‑allocation on highly concurrent servers: The model may recommend a large grant that, if multiplied by many concurrent queries, exceeds total RAM. Solution: Implement a central memory broker that cap grants based on free memory, or use a pessimistic multiplier for high‑concurrency times.
- Cold start: No historical data → no model. Solution: Use a heuristic baseline (e.g., `work_mem = min(estimated_sort_size * 2, max_limit)`) for the first week while collecting training data.
- Plan estimate errors: The planner’s row estimates can be off by orders of magnitude, leading to incorrect memory predictions. Solution: Use the 95th percentile quantile regression (robust to outliers) and periodically refresh statistics.
- Security: Granting large memory to a user query could exhaust resources. Solution: Enforce per‑user or per‑database memory limits using `ALTER USER SET work_mem` as a ceiling.
No comments:
Post a Comment