Why is a single global work_mem setting inefficient?

A global setting must compromise between small queries (wasting memory) and large queries (spilling to disk). AI per‑query allocation tailors memory to each query, eliminating spills while preserving concurrency. The ebook 'Database Management Using AI' provides the complete implementation ( Amazon / Google Play ).

How does the AI predict the required memory without executing the query?

The model uses features from the execution plan: estimated row count, row width, sort keys, and presence of hash aggregates. Trained on historical spill data, it learns the mapping from plan features to required memory. The ebook includes feature engineering code ( Amazon / Google Play ).

Does per‑query work_mem work with high concurrency?

Yes. The AI respects a global maximum (e.g., 1GB) and can be integrated with a memory broker that dynamically reduces grants when system memory pressure rises. Case studies in the ebook show concurrency improvements of 40% with same peak memory ( Amazon / Google Play ).

Can this technique be used with MySQL or SQL Server?

Yes, with analogous parameters: MySQL's `sort_buffer_size` and `join_buffer_size`, or SQL Server's query memory grant. The proxy approach works for any database that supports per‑session variable overrides. The ebook includes MySQL and SQL Server examples ( Amazon / Google Play ).

How do I start using AI‑driven per‑query memory allocation today?

Get 'Database Management Using AI' by A. Purushotham Reddy from Amazon or Google Play . Chapter 16 provides a ready‑to‑run Python proxy and training scripts – deployable in a weekend.

Stop Tuning `work_mem` – AI Finds the Perfect Setting Per Query

Every database administrator has faced the dreaded disk sort: a query that should be fast spills to disk because `work_mem` (or `sort_buffer_size`) is too low. Yet raising it globally wastes memory. AI‑driven per‑query memory allocation uses query fingerprints, cardinality estimates, and lightweight machine learning models to assign just the right amount of memory to each operation – eliminating disk sorts without blowing your RAM budget. Based on the ebook Database Management Using AI by A. Purushotham Reddy, this guide shows how to move from static knobs to query‑aware, adaptive memory grants.

It happens at least once a week. A developer runs a report that sorts a large intermediate result. The database, obeying the default `work_mem = 4MB`, starts writing tuples to disk. The query takes 40 seconds instead of 2. You increase `work_mem` globally to 32MB. Now every simple query gets 32MB of memory, and your server quickly runs out of RAM under concurrency. You lower it back to 8MB. The cycle repeats.

This is the fundamental problem with static memory knobs. They are set once, for all queries, and must fit the worst‑case memory pressure of your entire workload. But not every query needs the same memory. A simple `SELECT * FROM small_table` requires almost no sort memory; a complex analytical query with a `GROUP BY` on millions of rows might need gigabytes. One size does not fit all.

AI‑driven per‑query memory allocation solves this by treating memory as a resource to be dynamically allocated per query, based on that query's specific characteristics. The AI analyses the query plan – estimated row counts, column data types, distinct values – and predicts the optimal `work_mem` for the sorts, hashes, and aggregates. It sets the memory grant locally for that query (using `SET work_mem` inside the session), then resets it afterwards. The result: disk sorts nearly vanish, memory usage is efficient, and you stop playing the global tuning game. This article dives into the technology, provides production‑ready code, and shares real‑world results.

Definition: Per‑query adaptive memory allocation is a technique that uses machine learning models to predict the optimal `work_mem` (PostgreSQL) or `sort_buffer_size`/`join_buffer_size` (MySQL) for each individual query based on its execution plan, cardinality estimates, and access patterns – replacing a single global setting.

The High Cost of Static Memory Knobs

Traditional databases have a handful of memory parameters that control internal operations:

PostgreSQL `work_mem` – memory used for sort and hash operations before spilling to disk.
MySQL `sort_buffer_size` and `join_buffer_size` – similar role.
SQL Server `query memory grant` – adaptive but still requires manual base configuration.

Setting these parameters globally forces a compromise. Low settings cause disk spills and slow queries. High settings waste memory and reduce concurrency. A 2026 analysis of 500 production PostgreSQL databases found that over 60% of queries that performed sorts experienced at least one disk spill due to conservative `work_mem` settings, and 40% of those spills could have been eliminated by increasing `work_mem` for just that query without affecting others. The total wasted time due to disk spills across these databases exceeded 10,000 core‑hours per day.

Worse, static settings cannot adapt to workload changes. A morning batch job that sorts a large table might require 1GB of memory; an evening OLTP query needs only 4MB. A single global value will always be wrong for one of them.

📘 What “Database Management Using AI” gives you:

Query fingerprinting & classification – AI extracts plan features (row estimates, distinct counts, join types) to categorise queries into memory‑need buckets.
Lightweight memory prediction model – XGBoost or linear regression trained on historical query execution data predicts optimal `work_mem` for each query.
Session‑level memory overrides – Automatically issue `SET work_mem = '...'` before the query and restore after execution.
Fallback and safety limits – Never exceed a configurable max (e.g., 1GB) and never set below a global minimum to avoid thrashing.
Real‑time feedback loop – After execution, spill stats are fed back to the model, enabling continuous improvement.
Production case studies – Companies eliminating disk spills completely while keeping peak memory usage under control.
Open‑source reference implementation – Python agent that integrates with PostgreSQL via `pg_stat_statements` and plan hooks.

How AI Predicts the Perfect Memory Per Query

The core of AI‑driven memory allocation is a supervised machine learning model that takes a query’s execution plan features as input and outputs a recommended `work_mem` value. The training data comes from historical query execution with `track_io_timing` and `log_temp_files` enabled, which record which queries spilled to disk and how much temporary data was written.

Step 1: Feature Extraction from Execution Plans

Each query plan is parsed to extract numerical features that correlate with memory need:

Estimated number of rows to sort (from `EXPLAIN`).
Estimated width of each row (bytes).
Number of sort keys (columns).
Use of hash aggregates vs. group aggregates.
Presence of `DISTINCT` or `ORDER BY`.
Number of parallel workers.
Estimated memory used by hash tables (for hash joins).

-- Example: Extracting sort row estimate from EXPLAIN (JSON)
SELECT (plan->'Plan'->'Plans'->0->'Plan Rows')::int AS sort_rows
FROM explain_result;

For PostgreSQL, the `pg_stat_statements` extension also provides `temp_blk_read` and `temp_blk_written` – actual spill indicators that become the model’s training label.

Step 2: Model Training

Using historical data of query fingerprints and whether they spilled at the current `work_mem`, the AI trains a quantile regression model or a light gradient boosting machine (e.g., LightGBM) to predict the minimum `work_mem` required to avoid a spill.

# Example: Training a memory predictor (Python)
import lightgbm as lgb
model = lgb.LGBMRegressor(objective='quantile', alpha=0.9)  # 90th percentile safety
model.fit(X_train, y_train)  # y_train = work_mem that prevented spill

The model outputs a recommended memory grant (e.g., 128MB). A safety multiplier (e.g., 1.5x) can be added to account for plan estimate errors.

Step 3: Real‑Time Application via Query Proxy

A lightweight proxy sits between the application and the database. It intercepts each `SELECT` or analytical query, calls the ML model (cached in memory, <5ms overhead), computes the recommended `work_mem`, wraps the query with `SET work_mem = '...';` and the original query, then sends the combined command. After execution, it resets `work_mem` to the default. This approach works with any PostgreSQL driver and requires zero application changes.

-- Query executed by proxy
SET LOCAL work_mem = '256MB';
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id ORDER BY SUM(amount) DESC; -- No disk spill!

Step 4: Feedback Loop and Continuous Learning

After each query, the proxy checks `temp_blk_written` to see if a spill occurred despite the prediction. If so, it logs the query fingerprint and the actual memory needed (reported by `pg_stat_statements`). Periodically (e.g., nightly), the model is retrained on the expanded dataset, improving accuracy.

Diagram of the AI memory predictor pipeline: query → feature extraction → ML model → recommended work_mem → execute → feedback loop

Real‑World Results: From 30 Seconds to 1.5 Seconds

A SaaS company running a PostgreSQL data warehouse had a daily analytics query that sorted 50 million rows. The global `work_mem` was set to 32MB to avoid OOM kills. The query spilled to disk, writing 180GB of temporary files, taking 30 seconds. After deploying AI per‑query memory allocation, the model predicted that the query needed 2.5GB of memory (available because the server had 128GB RAM and few concurrent queries). The query executed with `SET LOCAL work_mem = '2.5GB'` and finished in 1.5 seconds – a 20x speedup. Other queries continued to use the default 32MB, preserving memory for concurrency.

Across 20 analytical queries, the system reduced total elapsed time by 78% while increasing peak memory usage by only 12% (because only the heavy queries received large grants). The DBA team stopped getting paged about slow sorts.

Bar chart comparing query execution time before (30s) and after (1.5s) AI‑driven per‑query work_mem optimisation

Implementing AI‑Driven Per‑Query Work Memory

The ebook Database Management Using AI provides a complete, production‑ready implementation. The blueprint includes:

Telemetry collector: Periodically extracts query stats from `pg_stat_statements` and execution plans from `EXPLAIN (FORMAT JSON)` for the top 100 queries by total time.
Training pipeline: Uses historical data to train an XGBoost or LightGBM quantile regression model. Stores the model as a pickle file.
Prediction proxy: A Python asyncio proxy that implements the PostgreSQL wire protocol (using `asyncpg` or `pg_proxy`). It caches the model in memory and applies predictions.
Safety guards: Configurable global max (`work_mem_max`) and min (`work_mem_min`) to prevent runaway memory usage or starvation.
Monitoring dashboard: Grafana panels showing memory grant distribution, spill elimination rate, and model accuracy over time.

For organisations not ready to deploy a full proxy, the system can run in “recommendation mode” – logging suggested `work_mem` values for DBAs to review and apply manually (using `ALTER USER` or per‑query hints).

🎛️ Stop guessing memory – let AI allocate it per query.
Get “Database Management Using AI” on Amazon → Get on Google Play →

Advanced Techniques: Plan‑Hint Integration and Concurrency Awareness

Beyond simple memory prediction, AI can also consider current system load. If the server already has high memory utilisation, the prediction model can be scaled down (e.g., multiply by a factor <1) to avoid swapping. This requires a central memory broker that tracks per‑node memory usage and coordinates grants – similar to SQL Server’s resource governor, but adaptive.

For PostgreSQL, the AI can also embed memory grants as `pg_hint_plan` comments, e.g., `/*+ Set(work_mem '1GB') */`, allowing fine‑tuning without changing `SET` commands. The ebook provides code to automatically inject these hints based on the ML model.

Observability and Trust

To trust AI‑driven memory allocation, you need visibility. The ebook includes Prometheus metrics exporters that track:

Number of queries that received a custom `work_mem` vs. default.
Distribution of recommended memory grants (histogram).
Spill rate before and after AI deployment.
Model prediction error (difference between recommended and actually needed memory).
Peak memory usage per server.

A Grafana dashboard provides real‑time feedback, and if the model’s accuracy falls below a threshold, an alert can trigger a retraining job.

Common Pitfalls and How to Avoid Them

Over‑allocation on highly concurrent servers: The model may recommend a large grant that, if multiplied by many concurrent queries, exceeds total RAM. Solution: Implement a central memory broker that cap grants based on free memory, or use a pessimistic multiplier for high‑concurrency times.
Cold start: No historical data → no model. Solution: Use a heuristic baseline (e.g., `work_mem = min(estimated_sort_size * 2, max_limit)`) for the first week while collecting training data.
Plan estimate errors: The planner’s row estimates can be off by orders of magnitude, leading to incorrect memory predictions. Solution: Use the 95th percentile quantile regression (robust to outliers) and periodically refresh statistics.
Security: Granting large memory to a user query could exhaust resources. Solution: Enforce per‑user or per‑database memory limits using `ALTER USER SET work_mem` as a ceiling.

A Purushotham Reddy Latest2all blog

Translate

Friday, 15 May 2026