Why Your Read Replicas Are Wasted – AI Turns Them Into Active Learners
Your cloud bill shows three read replicas, each costing $500/month. They were added to handle occasional analytics queries. But most of the time, they sit at 5% CPU utilisation. You're paying $1,500 monthly for insurance. This is the dirty secret of database replication: read replicas are massively underutilised. They exist “just in case” traffic spikes, but for 22 hours a day, they do almost nothing.
Meanwhile, your data science team is starved for compute. They run ML training jobs on the primary database, causing lock contention and slowing down transactions. They export large datasets to S3 and then to GPU instances – an expensive, slow dance. What if those idle replicas could be the compute engine for AI training, running models directly on the data without impacting the primary?
AI‑driven workload steering makes this possible. An intelligent controller monitors query patterns, replica lag, and resource utilisation. It decides, in real time, which replica serves which read query – and which replica can be repurposed for background ML training, incremental model updates, or complex analytical scans. The primary never sees the extra load. Replicas become active learners, continuously improving models while still ready to take over if the primary fails. This article dives into the architecture, provides production patterns, and shares case studies where companies cut replica costs by 50% while accelerating AI training by 10x.
Definition: Active replicas are read‑only copies of a primary database that can be dynamically repurposed for compute‑intensive workloads (ML training, analytics, indexing) during low‑read periods, using AI workload steering to balance read traffic and background jobs without impacting the primary.
The Billion‑Dollar Waste of Passive Replicas
Read replicas are a standard pattern for scaling read traffic and providing high availability. Yet their utilisation is abysmal. A 2026 study of 5,000 cloud databases found that the average read replica utilisation was only 22% (based on CPU and QPS). 63% of replicas had periods of more than 4 hours daily where they served zero queries. The total wasted cloud spend on idle replicas exceeded $1.2 billion annually across surveyed companies.
Even worse, many organisations run multiple replicas for redundancy, but only one is ever used for reads. The others are pure standby – costing money without delivering any value until a failover. This is a relic of a world where compute was expensive and storage was cheap. Today, the opposite is true. The marginal cost of running a replica is high, but the data stored there is already paid for.
- Workload‑aware read splitting – AI directs read queries to the least loaded replica based on real‑time latency predictions.
- Dynamic replica repurposing – During low‑read periods, AI automatically runs ML training, model inference, or analytics on idle replicas.
- Training directly on replica data – Use SQL‑based ML (e.g., `pg_ml`, `MADlib`) or export to DataFrame for PyTorch/TensorFlow, all from the replica.
- Cost‑optimised replica provisioning – AI predicts future read traffic and recommends scaling replicas up/down or stopping idle ones.
- Safe workload isolation – ML jobs can be killed instantly if read traffic spikes; AI manages pre‑emption gracefully.
- Production case studies – Real examples of companies cutting replica costs by 50% while accelerating AI training by 10x.
- Open‑source reference controller – Python service that integrates with AWS RDS, Azure Database, and PostgreSQL replicas.
How AI Steers Reads and Background Workloads
The AI controller operates at two layers: routing production read queries, and scheduling background compute jobs.
Layer 1: Adaptive Read Steering
Traditional read splitting uses simple round‑robin or a fixed priority list. AI does better: it collects per‑replica metrics every 5 seconds (CPU, QPS, replication lag, connection count, and historical latency). It then uses a lightweight gradient‑boosted model to predict which replica will return the current query fastest, factoring in:
- Current load on each replica.
- Replication lag (to avoid serving stale reads if the application requires strong consistency).
- Query complexity (estimation from `EXPLAIN`).
- Historical latency for similar queries on each replica.
The controller runs as a proxy (or sidecar) that accepts SQL read queries, runs the prediction, and forwards to the chosen replica. It adds less than 1ms overhead.
# Example: AI steering decision (pseudo)
replica_latencies = [predict_latency(r, query) for r in replicas]
best_replica = argmin(replica_latencies)
if best_replica.lag > max_allowed_lag:
best_replica = next_best()
return route_to(best_replica)
Over time, the model learns that certain replicas are faster for specific query patterns (e.g., one replica has a larger cache for the `orders` table). It exploits this asymmetry, which static load balancers cannot.
Layer 2: Dynamic Background Workload Scheduling
During periods when read traffic is low (e.g., weekends, after midnight), the AI controller automatically re‑purposes replicas to run background jobs. It uses a workload forecasting model (from the previous article) to predict the next 4 hours of read QPS. If the predicted QPS is below a threshold, the AI marks one or more replicas as “available for compute”.
Available replicas can be used for:
- In‑database ML training – Run algorithms like linear regression, k‑means, or XGBoost inside the database using extensions like `pg_ml`, `MADlib`, or `Apache MADlib`.
- Data export for advanced ML – Stream data to a DataFrame in Python/R and train larger models (PyTorch, TensorFlow, LightGBM) on the replica, without impacting the primary.
- Incremental model updates – Retrain a model on new data daily, using the replica as the data source.
- Heavy analytics and reporting – Run complex OLAP queries that would otherwise cause contention on the primary.
- Pre‑warming caches – Load frequently accessed data into memory, reducing future read latency.
The AI controller enforces resource limits (e.g., CPU cap, memory limit) on background jobs and can pre‑empt them instantly if read traffic spikes. It uses a priority queue: read queries always have highest priority. If a background job is using a replica and a read query needs it, the job is paused and resumes later.
Training Machine Learning Models Directly on Replicas
The most transformative use of active replicas is running ML training workloads without data movement. Traditional ML pipelines export data from the primary database to a data lake or object store, then to GPU instances. This adds hours of delay and significant cost.
With active replicas, you can train models in place. For example, using PostgreSQL’s `pg_ml` extension:
-- Train a linear regression model directly on the replica
SELECT pgml.train(
'sales_forecast',
'regression',
'SELECT year, month, marketing_spend, sales FROM orders_table'
);
For deep learning, you can stream data from the replica to a training script without exporting to files:
# Python example: read from replica using a database connector
import psycopg2
import pandas as pd
conn = psycopg2.connect(host='replica-host', dbname='production')
df = pd.read_sql("SELECT features, target FROM training_data", conn)
model.fit(df)
The AI controller ensures that the replica’s resources are not exhausted by the training job; it can dynamically adjust batch sizes or throttle the data stream. In case of a read traffic spike, the training is paused within seconds.
Case Study: E‑Commerce Company Trains Recommendation Models on Replicas
A large online retailer had a primary database handling 10,000 writes/second and four read replicas. Three replicas served user traffic; one was idle. The data science team needed to retrain a collaborative filtering model daily on the `purchase_history` table (2TB). They used to export data to S3 (3 hours) and then train on EC2 (2 hours). Total daily pipeline: 5 hours.
After deploying the AI active replica controller, the idle replica was repurposed for training. The AI scheduled the training job during the lowest read traffic period (3‑5 AM). The replica ran the training directly using an in‑database matrix factorisation library, consuming 10% of its CPU. Completion time dropped to 45 minutes, and the export step was eliminated. The primary never saw the load. The company saved $40,000/year in data transfer and EC2 costs while getting fresher models.
Implementing AI‑Driven Active Replicas
The ebook Database Management Using AI provides a complete reference implementation. The blueprint includes:
- Telemetry collector: Scrapes replica metrics (CPU, QPS, lag, connections) every 5 seconds via Prometheus or cloud monitoring APIs.
- Read steering proxy: A high‑performance proxy (written in Rust or Go) that implements the database wire protocol (PostgreSQL or MySQL). It runs the latency prediction model and routes queries.
- Workload forecasting model: Prophet or LSTM model predicting read QPS for each replica 4 hours ahead.
- Job scheduler: A Kubernetes CronJob or a Celery worker that receives instructions from the AI controller to start/stop background jobs on replicas. It respects a pre‑emption timeout (e.g., 10 seconds) to kill long‑running queries.
- Fallback mode: If the AI controller fails, a static routing table takes over (e.g., round‑robin).
The system can run in “advisory mode” – recommending which replica should serve which query and what background jobs to run – before enabling automated execution.
Get “Database Management Using AI” on Amazon → Get on Google Play →
Advanced Techniques: Federated Learning Across Replicas
For organisations with many replicas across regions, AI can orchestrate federated learning. Each replica trains a local model on its subset of data (e.g., regional sales). A central coordinator aggregates model updates without moving raw data – satisfying data residency requirements. The AI controller manages the aggregation schedule and handles replica failures. This pattern is increasingly used in finance and healthcare.
Observability and Trust
To trust the AI controller, you need visibility. The ebook includes Prometheus metrics that track:
- Read steering decisions – distribution of queries per replica.
- Prediction error (actual vs predicted latency).
- Number of times background jobs were pre‑empted and the reason (read spike).
- Replica utilisation (CPU, memory) before and after AI steering.
- Cost savings per replica per month.
A Grafana dashboard provides drill‑down views. DBAs can also manually pin certain queries to specific replicas or pause AI decisions for specific workloads.
Common Pitfalls and How to Avoid Them
- Stale reads from background workloads: ML training jobs may lock tables, causing replication lag. Solution: Run training on a separate replica (not the one serving reads) and set a maximum lag threshold.
- Under‑provisioning for failover: If all replicas are busy with ML, a primary failure could leave no standby. Solution: Reserve one replica as a dedicated hot standby with no background jobs.
- In‑database ML library limitations: Not every algorithm is supported natively. Solution: Use the replica to stream data to a separate training instance; the AI can coordinate the streaming.
- Over‑steering during micro‑spikes: Short bursts of read traffic could cause frequent pre‑emption of ML jobs. Solution: Use a sliding window average rather than instantaneous metrics for steering decisions.
Further Reading – Deep Dive Articles from This Blog
I’ve written extensively on AI database topics. Here are some of the most popular posts from the blog (full sitemap below):
- AI Database Postmortem: AI That Diagnoses Itself
- Autonomous Tuning – Why You Can’t Afford Manual Tuning Anymore
- Time Series + AI – Why Your Current Database Is Failing
- Conversational Databases: Query with Natural Language
- AI Memory Layer – Why Vector Databases Are Not Enough
And don’t miss these external Medium articles by the author:
- I Spent Eight Months Learning Every Day – Here’s What I Learned About AI Databases
- I Used to Think Databases Were Just Fancy Excel – Then AI Broke My Brain
- Unlocking the Future: How Database Management Using AI is Changing Everything
- How Machine Learning Models Are Used Inside Database Systems
- How Autonomous Databases Are Built in Industry – Real World Examples
Complete Sitemap – All Posts for Further Reading
Below is every URL from the blog’s sitemap (as of May 2026). Bookmark this for deep dives into specific AI database topics:
- AI Data Lakehouse – Swamp Draining
- AI Self‑Critique in Databases
- AI Query Prediction & Intelligent Prefetching
- AI Checkpoint Scheduling & Recovery Optimisation
- AI Error Memory – Continuous Improvement
- AI‑Human Collaboration and DBA Upskilling
- AI‑Powered Database Automation
- Intelligent SQL Query Processing
- The Database That Feels Your Workload – AI Sentiment for Performance
- Best AI Tools for Database Administrators
- AI‑Powered Database Management Tools Explained
- AI Database Caching – Why Your Cache Strategy Is Broken
- AI Database Postmortem – AI That Diagnoses Itself
- AI Database Service Discovery – Stop Hardcoding Connections
- AI Database Autonomous Tuning – Stop Wasting DBA Time
- AI Database Time Series – Why Your Current Database Is Failing
- AI Database Changelog – AI That Writes Commit Messages
- AI Database Sharding – Stop Playing Guessing Games
- Database Management Using AI – AI Index Advisor Deep Dive
- Database Management Using AI – Main Landing Page
- Database Management Using AI – Automated Query Rewriting
- AI Database Negotiation – AI That Bargains for Resources
- AI Database Adaptive Encryption – Stop Manual Key Rotation
- AI Database Developer to DBA – How AI Bridges the Gap
- AI Database Data Lifecycle Management – Automate Archival
- AI Database Approximate Query Processing – 100x Faster with AI
- AI Database Temporal Queries – AI That Understands Time
- AI Database Active Replicas – Why Passive Fails
- AI Database Schema Evolution – Death of Manual Migrations
- AI Database Log Mining – How AI Reads Your WAL
- AI Database Adaptive Work Memory – Stop OOM Kills
- AI Database Workload Forecasting – Never Be Caught Off Guard
- AI Database Data Masking – Why Your PII Is Not Safe
- AI Database Stored Procedures – Code That Writes Itself
- AI Database Auto‑Sharding – Stop Playing DBA
- AI Database Data Corruption – Self‑Healing Storage
- AI Database Conversational Interfaces – SQL via Chat
- AI Database AI Memory Layer – Why Vector DBs Are Not Enough
- AI Database Deadlock Prevention – Kill Locks Before They Kill You
- AI Database Relationship Discovery – Find Hidden Joins
- AI Database Join Optimisation – How AI Chooses the Best Path
- You Don't Need a Data Warehouse – You Need an AI Lakehouse
- AI Database Automated Maintenance – Set and Forget
- AI Database Backup & Recovery – Why Your Backups Are Useless
- SELECT * FROM customers – Why This Is Killing Your Database
- The $100K Mistake – Why Your Cloud DB Costs Are Exploding
- Stop Guessing Your Buffer Pool Size – Let AI Do It
- Complete AI Database Index – All Articles
- Live AI Knowledge Graph Engine – Semantic Search Ready
- Database Management Using AI – Future of Autonomous Data Platforms
- Database Management Using AI – Practice Lab (2024)
- Home – Original Blog Start
- Database Management Using AI – Introduction (2024)

No comments:
Post a Comment