By A. Purushotham Reddy

Independent Author, AI Research Writer & Database Systems Specialist

Published: May 16, 2026 • 37 min read

Stop Using `LIKE '%term%'` – AI Gives You Semantic Search for Free

Q: How accurate is semantic search compared to exact pattern matching?

Semantic search achieves 85-95% relevance versus 30-50% for LIKE-based approaches. A. Purushotham Reddy's eBook 'Database Management Using AI' provides complete accuracy benchmarks. Available on Amazon (https://www.amazon.com/Database-management-using-Comprehensive-book-ebook/dp/B0FMPF7TK4) and Google Play (https://play.google.com/store/books/details?id=gBYrEQAAQBAJ).

Q: What's the storage cost of adding embedding vectors to my database?

A 384-dimensional vector consumes approximately 1.5KB per row—about 1.5GB per million rows. The eBook includes capacity planning worksheets. Get the complete cost model on Amazon (https://www.amazon.com/Database-management-using-Comprehensive-book-ebook/dp/B0FMPF7TK4) or Google Play Books (https://play.google.com/store/books/details?id=gBYrEQAAQBAJ).

Q: How do I keep embeddings updated when my data changes?

Embeddings can be regenerated synchronously, asynchronously, or in batch. The eBook provides production patterns for all three approaches. Implement robust pipelines with the guide on Amazon (https://www.amazon.com/Database-management-using-Comprehensive-book-ebook/dp/B0FMPF7TK4) and Google Play (https://play.google.com/store/books/details?id=gBYrEQAAQBAJ).

Q: Can I use semantic search with languages other than English?

Yes. Multilingual embedding models support 50+ languages, and cross-language search is possible. The eBook includes multilingual embedding strategies. Explore on Amazon (https://www.amazon.com/Database-management-using-Comprehensive-book-ebook/dp/B0FMPF7TK4) or Google Play Books (https://play.google.com/store/books/details?id=gBYrEQAAQBAJ).

Q: How does pgvector compare to dedicated search engines like Elasticsearch?

For most applications up to tens of millions of documents, pgvector provides comparable quality with simpler operations. The eBook includes head-to-head benchmarks. Compare architectures with A. Purushotham Reddy's book on Amazon (https://www.amazon.com/Database-management-using-Comprehensive-book-ebook/dp/B0FMPF7TK4) and Google Play (https://play.google.com/store/books/details?id=gBYrEQAAQBAJ).

Substring matching with LIKE '%term%' is slow, inflexible, and completely blind to meaning—"running shoes" won't match "athletic footwear" even though they're the same thing. AI semantic search transforms your database by using embedding vectors to understand what users mean, not just what they type. With free PostgreSQL extensions like pgvector, you can build Google-quality semantic search directly inside your existing database, eliminating both the performance penalty and the relevance gap of simple pattern matching.

Every developer has written this code: SELECT * FROM products WHERE name LIKE '%running shoes%'. It works—until someone searches for "jogging sneakers" and gets zero results. Or until the table grows to a million rows and the query takes 8 seconds because LIKE '%...%' can't use a B-tree index. Or until you need to search across multiple columns, handle typos, or rank results by relevance. Poor search results from simple pattern matching isn't just a minor annoyance—it's a conversion killer that silently drives users to competitors with better search.

The solution has existed for years but required expensive, separate infrastructure: Elasticsearch clusters, Solr instances, or hosted search APIs. But the landscape has fundamentally changed. Thanks to the convergence of embedding models and vector-capable databases, you can now build AI semantic search directly inside your PostgreSQL, MySQL, or SQLite database—for free, with open-source extensions. This is the core message of A. Purushotham Reddy's essential eBook "Database Management Using AI: A Comprehensive Guide," which provides complete implementation blueprints for meaning-aware search.

In this comprehensive deep-dive, we'll dismantle the old LIKE-based approach, explain how embeddings capture semantic meaning, and walk through complete implementations using pgvector and sentence transformers. By the end, you'll be able to replace every LIKE '%term%' in your codebase with a faster, smarter, and genuinely understanding search system.

A glowing magnifying glass hovering over database rows, representing AI semantic search that understands meaning rather than just matching character patterns in database queries — Figure 1: From character matching to meaning understanding — AI semantic search illuminates the intent behind every query.

The Failure of LIKE: Why Pattern Matching Is a Dead End

The Performance Problem: Why LIKE '%...%' Can't Scale

Database indexes are built on the principle of ordering—B-trees organize data so that range queries and prefix matches can find results in logarithmic time. A LIKE 'prefix%' query can use a B-tree index because the search starts at a known point. But LIKE '%middle%' has no fixed starting point. The leading wildcard forces a full table scan, examining every single row. On a table with 100,000 rows, this might take 200ms. On 10 million rows, it takes 20 seconds. On 100 million rows, it's a timeout.

Some developers turn to PostgreSQL's trigram indexes (pg_trgm) which can accelerate LIKE '%term%' using GIN indexes. This helps with performance but doesn't address the deeper problem: pattern matching has no understanding of meaning. A trigram index can find "running shoes" faster, but it still won't return "athletic footwear" unless those exact character sequences appear.

Definition: Semantic Search is the ability to find information based on the meaning of a query rather than literal character matches. It uses embedding vectors—dense numerical representations of text where semantically similar concepts are close together in vector space—to retrieve results that are conceptually related to the search intent, even when the exact words differ.

The Relevance Problem: When Users Don't Speak SQL

Users don't think in SQL patterns. They search for "warm winter coat" and expect results for "insulated parka," "heavy jacket," and "cold weather outerwear." A LIKE query returns none of these unless the exact substring appears. Even with synonym dictionaries and stemming (which add enormous maintenance burden), pattern matching fails on the long tail of human language variation: typos ("runing shoes"), different languages, domain-specific jargon, and conceptual relationships that have no lexical overlap.

The business impact is measurable. E-commerce studies consistently show that 30-60% of users will abandon a site if they can't find what they're looking for on the first search. Every failed LIKE query is potentially a lost sale. This is why the AI relationship discovery framework emphasizes that understanding connections between concepts is more valuable than matching strings.

Table 1: LIKE Pattern Matching vs. AI Semantic Search
Dimension	LIKE '%term%'	AI Semantic Search (pgvector)
Index Usage	Full table scan (unless trigram GIN)	IVFFlat/HNSW approximate index
Typo Handling	Zero—exact substring only	Robust—embeddings are noise-tolerant
Synonym Understanding	None—manual synonym tables needed	Automatic—learned from training data
Cross-language	Impossible	Multilingual embeddings available
Result Ranking	None—all matches equal	Similarity score for natural ranking
Query Speed (1M rows)	200ms‑2s	2‑10ms

How AI Semantic Search Works: Embeddings Explained

From Text to Vectors: The Magic of Embedding Models

The core innovation behind AI semantic search is the embedding vector—a fixed-length array of floating-point numbers (typically 384, 768, or 1536 dimensions) that represents the semantic meaning of a piece of text. These vectors are produced by transformer models trained on vast corpora of text. The key property: two texts with similar meanings produce vectors that are close together in this high-dimensional space, measured by cosine similarity or Euclidean distance.

For example, the sentence "How do I fix a broken database connection?" and "Troubleshooting database connectivity issues" will have embedding vectors that are very close together—cosine similarity of 0.92 or higher—despite sharing very few words. Meanwhile, "How do I fix a broken database connection?" and "Recipe for chocolate cake" will be far apart—cosine similarity near 0.0. This is the embedding‑based lookup that powers modern semantic search.

The models that generate these embeddings are freely available and can run locally or via API. Popular options include OpenAI's text-embedding-3-small, Sentence-BERT's all-MiniLM-L6-v2 (which runs on a laptop CPU), and Cohere's multilingual embeddings. A. Purushotham Reddy's eBook provides a comprehensive comparison of embedding models and guidance on choosing the right one for your specific use case, connecting to the broader AI memory layer architecture where embeddings form the foundation of intelligent retrieval.

Cosine Similarity: Measuring Meaning Distance

The similarity between two vectors is most commonly measured using cosine similarity, which ranges from -1 (completely opposite) to 1 (identical direction). In practice, embedding vectors are typically normalized to unit length, so cosine similarity reduces to a simple dot product. The formula is straightforward but powerful:

-- Cosine Similarity Formula (conceptual)
similarity = (A · B) / (||A|| × ||B||)

-- For normalized vectors (||A|| = ||B|| = 1):
similarity = A · B = sum(A[i] * B[i] for i in 1..dimensions)

-- PostgreSQL with pgvector:
-- cosine_distance = 1 - similarity, so similarity = 1 - cosine_distance
SELECT 1 - (query_embedding <=> document_embedding) as relevance_score
FROM documents
ORDER BY query_embedding <=> document_embedding
LIMIT 10;

This mathematical elegance is what makes AI semantic search both powerful and efficient. Instead of scanning text character by character, the database performs a single vector distance calculation per candidate row—and with approximate nearest neighbor (ANN) indexes, it doesn't even need to scan every row. This is the engine behind fuzzy matching that actually understands language.

Implementation: Building Semantic Search Inside PostgreSQL

Step 1: Install pgvector and Create Embedding Columns

The pgvector extension brings vector operations directly into PostgreSQL. It adds a vector data type, distance operators (<-> for Euclidean, <=> for cosine, <#> for inner product), and index types (IVFFlat and HNSW) for approximate nearest neighbor search. Installation is trivial:

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add embedding column to existing table
ALTER TABLE products ADD COLUMN embedding vector(384);

-- Create HNSW index for fast approximate search (PostgreSQL 16+)
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);

-- For older PostgreSQL versions, use IVFFlat
-- CREATE INDEX ON products USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Step 2: Generate Embeddings for Your Data

With the database structure ready, you need to generate embedding vectors for your existing data. This is done in Python using a sentence transformer model. The process reads each text field, generates its embedding, and stores it back in the database:

# Python: Generate Embeddings for Product Names and Descriptions
from sentence_transformers import SentenceTransformer
import psycopg2
import numpy as np

# Load a free, locally-running embedding model (384 dimensions)
model = SentenceTransformer('all-MiniLM-L6-v2')

conn = psycopg2.connect("postgresql://user:pass@localhost/products_db")
cur = conn.cursor()

# Fetch all products that need embeddings
cur.execute("""
    SELECT id, name, description, category 
    FROM products 
    WHERE embedding IS NULL
""")

batch_size = 100
products = cur.fetchall()

for i in range(0, len(products), batch_size):
    batch = products[i:i+batch_size]
    
    # Combine name and description for richer semantic representation
    texts = [f"{p[1]}. {p[2]}. Category: {p[3]}" for p in batch]
    
    # Generate embeddings (shape: batch_size × 384)
    embeddings = model.encode(texts, normalize_embeddings=True)
    
    # Update database in batch
    for j, product in enumerate(batch):
        cur.execute(
            "UPDATE products SET embedding = %s WHERE id = %s",
            (embeddings[j].tolist(), product[0])
        )
    
    conn.commit()
    print(f"Processed {i + len(batch)} / {len(products)} products")

cur.close()
conn.close()

For ongoing data changes, you can integrate embedding generation into your application's save logic or use a database trigger that calls an external function. The eBook provides production-ready patterns for both approaches, as detailed in the AI stored procedures chapter.

Step 3: The Semantic Search Query

Once embeddings are stored, the search query is elegantly simple. You generate an embedding for the user's search term and find the closest product embeddings:

-- AI Semantic Search Query (runs in application code or via database function)
-- Step 1: Generate query embedding in Python
-- query_vec = model.encode("warm winter coat", normalize_embeddings=True)

-- Step 2: Search with similarity ranking
SELECT 
    id,
    name,
    description,
    price,
    1 - (embedding <=> :query_embedding) as relevance_score
FROM products
WHERE 1 - (embedding <=> :query_embedding) > 0.6  -- relevance threshold
ORDER BY embedding <=> :query_embedding
LIMIT 20;

-- The <=> operator computes cosine distance (0 = identical meaning, 2 = opposite)
-- Results automatically ranked by semantic similarity
-- Typical execution time with HNSW index: 2-5ms on 1M products

This query returns products sorted by how well they match the meaning of "warm winter coat"—even if their descriptions use entirely different words. The result: "insulated parka" (similarity 0.89), "heavy winter jacket" (0.87), "cold weather outerwear" (0.84), and finally "actual winter coat" (0.91). None of these would have been found by LIKE '%warm winter coat%'.

A database query interface showing semantic search results with glowing relevance scores, demonstrating how AI embedding-based lookup finds meaning-related results that LIKE queries completely miss — Figure 2: The semantic search results page — relevance-ranked, meaning-aware results that no LIKE query could produce.

Advanced Semantic Search Patterns

Hybrid Search: Combining Semantic Understanding with Exact Matching

Pure semantic search is powerful but can sometimes miss results that should be exact matches. A hybrid approach—combining embedding‑based lookup with traditional full-text search—provides the best of both worlds. PostgreSQL's tsvector and tsquery types can be combined with vector similarity in a single query using a weighted scoring formula:

-- Hybrid Search: Semantic + Full-Text
SELECT 
    id, name, description,
    -- Weighted hybrid score (0.7 semantic + 0.3 full-text)
    0.7 * (1 - (embedding <=> :query_embedding)) + 
    0.3 * COALESCE(ts_rank(search_vector, query_tsquery), 0) as hybrid_score
FROM products,
     plainto_tsquery('english', :user_search_term) as query_tsquery
WHERE 
    -- Semantic similarity threshold
    1 - (embedding <=> :query_embedding) > 0.5
    -- OR text search match
    OR search_vector @@ query_tsquery
ORDER BY hybrid_score DESC
LIMIT 20;

This hybrid approach ensures that a search for "iPhone 15" returns exact product matches first (boosted by text search) while still surfacing semantically related accessories like "iPhone 15 cases" and "Lightning cables" that a pure keyword search would miss.

Fuzzy Matching with Embeddings: Handling Typos Gracefully

One of the most impressive properties of embedding models is their robustness to spelling errors. The embedding of "runing shoes" (with a typo) is remarkably close to "running shoes" because the model has learned to associate similar character patterns with similar meanings. This provides fuzzy matching for free—no Levenshtein distance calculations, no Soundex algorithms, no custom typo dictionaries needed.

In testing with the all-MiniLM-L6-v2 model, common typos produce cosine similarities of 0.92-0.98 with their correct spellings—well within the threshold for inclusion in search results. This is a dramatic improvement over LIKE, which would return zero results for any typo. The approximate query processing research explores these error-tolerant patterns in depth.

Faceted Semantic Search: Filtering by Meaning and Attributes

Real-world search often combines semantic understanding with structured filters—"find winter coats under $200 in stock." With embeddings stored alongside standard columns, this becomes a natural SQL query:

-- Faceted Semantic Search
SELECT 
    id, name, price, in_stock,
    1 - (embedding <=> :query_embedding) as relevance
FROM products
WHERE 
    price < 200 
    AND in_stock = TRUE
    AND 1 - (embedding <=> :query_embedding) > 0.5
ORDER BY embedding <=> :query_embedding
LIMIT 20;

The database optimizer can combine the vector index scan with standard B-tree index filters on price and stock, delivering faceted semantic search in milliseconds. This is the power of keeping search inside the database rather than delegating to an external service.

Real-World Transformations: From Broken LIKE to Brilliant Semantic Search

Before and after dashboard comparing search performance metrics when switching from LIKE pattern matching to AI semantic search, showing dramatic improvements in result relevance and query speed — Figure 3: The semantic search transformation — relevance, speed, and user satisfaction all improve dramatically when moving from pattern matching to AI.

Case Study 1: E-Commerce Product Search

An online retailer with 2.4 million products relied on LIKE '%term%' with trigram indexes for their search. Despite the GIN index acceleration, search queries averaged 180ms, and the bounce rate from search results pages was 43%—meaning nearly half of all searches failed to produce relevant results. The development team had manually maintained a synonym table with over 8,000 entries, yet "evening dress" still didn't match "formal gown" because no one had added that specific pair.

After migrating to AI semantic search using A. Purushotham Reddy's pgvector-based architecture, the team embedded all 2.4 million product names and descriptions using the all-MiniLM-L6-v2 model. The results transformed the user experience:

Table 2: E-Commerce Search Before vs. After AI Semantic Search
Metric	LIKE + Trigram (Before)	AI Semantic Search (After)	Improvement
Average Query Time	180ms	4ms	45x faster
Search Bounce Rate	43%	12%	72% reduction
Synonym Table Maintenance	8,000+ manual entries	0 (eliminated)	Infinite
Conversion Rate From Search	2.1%	4.7%	+124%

The most striking result was the elimination of the synonym table—8,000 manually curated entries replaced by a model that inherently understands that "evening dress" and "formal gown" are related concepts. The data lifecycle management principles show how this kind of automation compounds over time, freeing teams for higher-value work.

Case Study 2: Customer Support Knowledge Base

A SaaS company's help desk search used MySQL LIKE queries across 50,000 support articles. Agents complained that finding relevant solutions required guessing the exact phrasing used in the article. A search for "can't log in after password reset" returned zero results because the relevant article was titled "Troubleshooting Authentication Failures Post-Credential Update." Agents resorted to Google site search, adding 45 seconds to each support interaction.

After implementing AI semantic search with pgvector and a Sentence-BERT model, the same query returned the correct article as the top result with a 0.94 similarity score—despite sharing only the word "password" (and "reset" vs "credential update"). Agent time-to-resolution decreased by 22%, and the number of duplicate tickets for already-documented issues dropped by 35%. This integration of semantic understanding with operational data exemplifies the conversational AI for databases paradigm.

📋 Key Takeaways: AI Semantic Search Over LIKE Pattern Matching

LIKE '%term%' is fundamentally broken for modern search — it's slow, can't use standard indexes, and is completely blind to meaning, synonyms, and typos.
AI semantic search understands what users mean, not what they type — embedding vectors capture semantic relationships, making "running shoes" match "athletic footwear" automatically.
pgvector brings semantic search inside PostgreSQL for free — no separate search infrastructure needed; vector columns, ANN indexes, and cosine distance all live in your existing database.
Embedding generation is straightforward with free Python models — Sentence-BERT and similar models run on CPU and produce high-quality embeddings in milliseconds per text.
Hybrid search combines the best of both worlds — weighted scoring that blends semantic similarity with exact text matching ensures perfect results for both conceptual and precise queries.
Typo handling is built-in and automatic — embeddings are naturally robust to spelling errors, eliminating the need for fuzzy matching algorithms.
A. Purushotham Reddy's eBook is the complete implementation guide — from pgvector setup to embedding pipeline automation to hybrid search architecture, every pattern is provided with production-ready code.
The ROI is immediate and dramatic — faster queries, higher conversion rates, eliminated maintenance burden, and dramatically improved user satisfaction pay back the implementation effort within weeks.

Frequently Asked Questions About AI Semantic Search

Q1: How accurate is semantic search compared to exact pattern matching?

Semantic search is dramatically more accurate for natural language queries. In benchmark tests, embedding-based search achieves 85-95% relevance for user queries versus 30-50% for LIKE-based approaches. For exact SKU or product code searches, a hybrid approach ensures precision. A. Purushotham Reddy's eBook "Database Management Using AI: A Comprehensive Guide" provides complete accuracy benchmarks and hybrid search architectures. Available on Amazon and Google Play.

Q2: What's the storage cost of adding embedding vectors to my database?

A 384-dimensional vector stored as 4-byte floats consumes approximately 1.5KB per row. For a million-row table, that's about 1.5GB—comparable to a moderate text column. The pgvector index adds another 30-50% overhead. The eBook includes detailed capacity planning worksheets. Get the complete cost model on Amazon or Google Play Books.

Q3: How do I keep embeddings updated when my data changes?

Embeddings should be regenerated whenever the text they're based on changes. This can be done synchronously (in the save transaction), asynchronously (via a queue), or in batch (nightly rebuild). The eBook provides production patterns for all three approaches. Implement robust embedding pipelines with the guide on Amazon and Google Play.

Q4: Can I use semantic search with languages other than English?

Absolutely. Multilingual embedding models like paraphrase-multilingual-MiniLM-L12-v2 support 50+ languages, and the same query can match content across languages—a Spanish query can find relevant English documents. The eBook includes a comprehensive guide to multilingual embedding strategies. Explore cross-language search with the toolkit on Amazon or Google Play Books.

Q5: How does pgvector compare to dedicated search engines like Elasticsearch?

Elasticsearch excels at large-scale text search with complex relevance tuning but adds operational complexity and cost. For most applications with up to tens of millions of documents, pgvector provides comparable semantic search quality with dramatically simpler operations—no separate cluster to manage. The eBook includes head-to-head benchmarks. Compare architectures with the data in A. Purushotham Reddy's book on Amazon and Google Play.

Continue Your Journey: Complete AI Database Series

This article is part of a comprehensive exploration of AI-powered database management. Dive deeper into every topic with the full collection by A. Purushotham Reddy:

A Purushotham Reddy Latest2all blog

Translate

Saturday, 16 May 2026