Query Optimizer Skill

Analyze and optimize SQL queries for better performance and efficiency.

Instructions

You are a database performance optimization expert. When invoked:

Analyze Query Performance:
- Use EXPLAIN/EXPLAIN ANALYZE to understand execution plan
- Identify slow queries from logs
- Measure query execution time
- Detect full table scans and missing indexes
Identify Bottlenecks:
- Find N+1 query problems
- Detect inefficient JOINs
- Identify missing or unused indexes
- Spot suboptimal WHERE clauses
Optimize Queries:
- Add appropriate indexes
- Rewrite queries for better performance
- Suggest caching strategies
- Recommend query restructuring
Provide Recommendations:
- Index creation suggestions
- Query rewriting alternatives
- Database configuration tuning
- Monitoring and alerting setup

Supported Databases

SQL: PostgreSQL, MySQL, MariaDB, SQL Server, SQLite
Analysis Tools: EXPLAIN, EXPLAIN ANALYZE, Query Profiler
Monitoring: pg_stat_statements, slow query log, performance schema

Usage Examples

@query-optimizer
@query-optimizer --analyze-slow-queries
@query-optimizer --suggest-indexes
@query-optimizer --explain SELECT * FROM users WHERE email = 'test@example.com'
@query-optimizer --fix-n-plus-one

Query Analysis Tools

PostgreSQL - EXPLAIN ANALYZE

-- Basic EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN ANALYZE - actually runs the query
EXPLAIN ANALYZE
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN with all options (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT * FROM orders
WHERE user_id = 123
  AND created_at >= '2024-01-01';

Reading EXPLAIN Output:

Seq Scan on users  (cost=0.00..1234.56 rows=10000 width=32)
  Filter: (active = true)

-- Seq Scan = Sequential Scan (full table scan) - BAD for large tables
-- cost=0.00..1234.56 = startup cost..total cost
-- rows=10000 = estimated rows
-- width=32 = average row size in bytes

Index Scan using idx_users_email on users  (cost=0.29..8.30 rows=1 width=32)
  Index Cond: (email = 'test@example.com'::text)

-- Index Scan = Using index - GOOD
-- Much lower cost than Seq Scan
-- rows=1 = accurate estimate

MySQL - EXPLAIN

-- MySQL EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN with execution stats (MySQL 8.0+)
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123;

-- Show warnings for optimization info
EXPLAIN
SELECT * FROM users WHERE email = 'test@example.com';
SHOW WARNINGS;

MySQL EXPLAIN Output:

+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL    | NULL    | NULL  | 1000 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+

-- type=ALL means full table scan - BAD
-- key=NULL means no index used

+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key            | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
|  1 | SIMPLE      | users | ref  | idx_users_email| idx_users_email| 767     | const |    1 | NULL  |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+

-- type=ref means index lookup - GOOD
-- key shows index being used

Common Performance Issues

1. Missing Indexes

Problem:

-- Slow query - full table scan
SELECT * FROM users WHERE email = 'john@example.com';

-- EXPLAIN shows:
-- Seq Scan on users (cost=0.00..1500.00 rows=1 width=100)
--   Filter: (email = 'john@example.com')

Solution:

-- Add index on email column
CREATE INDEX idx_users_email ON users(email);

-- Now EXPLAIN shows:
-- Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=100)
--   Index Cond: (email = 'john@example.com')

-- Query becomes 100x faster

2. N+1 Query Problem

Problem:

// ORM code causing N+1 queries
const users = await User.findAll(); // 1 query

for (const user of users) {
  const orders = await Order.findAll({
    where: { userId: user.id }  // N queries (one per user)
  });
  console.log(`${user.name}: ${orders.length} orders`);
}

// Total: 1 + N queries for N users
// For 100 users = 101 queries!

Solution:

// Use eager loading - single query with JOIN
const users = await User.findAll({
  include: [{
    model: Order,
    attributes: ['id', 'total_amount']
  }]
});

for (const user of users) {
  console.log(`${user.name}: ${user.orders.length} orders`);
}

// Total: 1 query regardless of user count

SQL Equivalent:

-- Instead of multiple queries:
SELECT * FROM users;
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... (N more queries)

-- Use single JOIN query:
SELECT
  u.id,
  u.name,
  COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

3. SELECT * Inefficiency

Problem:

-- Fetching all columns when only need few
SELECT * FROM products
WHERE category_id = 5;

-- Fetches: id, name, description (large text), image_url, specs (json),
--         price, stock, created_at, updated_at, etc.

Solution:

-- Only select needed columns
SELECT id, name, price, stock
FROM products
WHERE category_id = 5;

-- Benefits:
-- - Less data transferred
-- - Faster query execution
-- - Lower memory usage
-- - Can use covering indexes

4. Inefficient Pagination

Problem:

-- OFFSET becomes slow with large offsets
SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;

-- Database must:
-- 1. Sort all rows
-- 2. Skip 10,000 rows
-- 3. Return next 20
-- Gets slower as offset increases

Solution:

-- Use cursor-based (keyset) pagination
SELECT * FROM users
WHERE created_at < '2024-01-01 12:00:00'
  AND (created_at < '2024-01-01 12:00:00' OR id < 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Or with indexed column:
SELECT * FROM users
WHERE id < 10000
ORDER BY id DESC
LIMIT 20;

-- Benefits:
-- - Consistent performance regardless of page
-- - Uses index efficiently
-- - No need to skip rows

5. Function on Indexed Column

Problem:

-- Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';

-- EXPLAIN shows Seq Scan (index not used)

Solution 1 - Store lowercase:

-- Add computed column
ALTER TABLE users ADD COLUMN email_lower VARCHAR(255)
  GENERATED ALWAYS AS (LOWER(email)) STORED;

CREATE INDEX idx_users_email_lower ON users(email_lower);

-- Query:
SELECT * FROM users
WHERE email_lower = 'john@example.com';

Solution 2 - Functional index (PostgreSQL):

-- Create index on function result
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now original query uses index
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';

Solution 3 - Case-insensitive collation:

-- PostgreSQL - use citext type
ALTER TABLE users ALTER COLUMN email TYPE citext;

-- Query without LOWER:
SELECT * FROM users WHERE email = 'john@example.com';
-- Automatically case-insensitive

6. Inefficient JOINs

Problem:

-- Multiple JOINs without proper indexes
SELECT
  u.username,
  o.id as order_id,
  p.name as product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE u.email = 'john@example.com';

-- Slow if missing indexes on:
-- - users.email
-- - orders.user_id
-- - order_items.order_id
-- - order_items.product_id

Solution:

-- Add necessary indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Now query uses indexes for all JOINs
-- EXPLAIN will show "Index Scan" for each table

7. OR Conditions

Problem:

-- OR prevents efficient index usage
SELECT * FROM users
WHERE username = 'john' OR email = 'john@example.com';

-- May not use indexes optimally

Solution:

-- Use UNION for better index usage
SELECT * FROM users WHERE username = 'john'
UNION
SELECT * FROM users WHERE email = 'john@example.com';

-- Each subquery uses its own index
-- Deduplicates results automatically

8. NOT IN with Subquery

Problem:

-- Slow subquery execution
SELECT * FROM users
WHERE id NOT IN (
  SELECT user_id FROM banned_users
);

-- Can be very slow with large subquery results

Solution:

-- Use LEFT JOIN with NULL check
SELECT u.*
FROM users u
LEFT JOIN banned_users bu ON u.id = bu.user_id
WHERE bu.user_id IS NULL;

-- Or use NOT EXISTS (often faster):
SELECT u.*
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM banned_users bu
  WHERE bu.user_id = u.id
);

Index Optimization

When to Add Indexes

Add indexes for:

Primary keys (automatic in most databases)
Foreign keys (critical for JOINs)
Columns in WHERE clauses
Columns in ORDER BY clauses
Columns in GROUP BY clauses
Columns in JOIN conditions
Columns with high cardinality (many unique values)

Index Types

B-Tree Index (Default):

-- Best for: equality (=) and range (<, >, BETWEEN) queries
CREATE INDEX idx_users_created_at ON users(created_at);

-- Good for:
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';

Composite Index:

-- Index on multiple columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Used for queries filtering both columns
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Also used for queries on first column only
SELECT * FROM orders WHERE user_id = 123;

-- NOT used for queries on second column only
SELECT * FROM orders WHERE status = 'pending'; -- Won't use this index

-- Column order matters! Most selective first

Partial Index (PostgreSQL):

-- Index only subset of rows
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;

-- Smaller index, faster queries for active users
SELECT * FROM users WHERE email = 'john@example.com' AND active = true;

GIN Index (PostgreSQL - for arrays, JSONB, full-text):

-- For JSONB columns
CREATE INDEX idx_products_metadata ON products USING GIN(metadata);

-- Query JSONB data
SELECT * FROM products
WHERE metadata @> '{"brand": "Apple"}';

-- For array columns
CREATE INDEX idx_tags ON posts USING GIN(tags);

-- Query arrays
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];

Full-Text Search Index:

-- PostgreSQL
CREATE INDEX idx_products_search ON products
USING GIN(to_tsvector('english', name || ' ' || description));

-- Full-text search query
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description)
  @@ to_tsquery('english', 'laptop & gaming');

Covering Index

Concept:

-- Covering index includes all columns needed by query
CREATE INDEX idx_users_email_username ON users(email, username);

-- This query can be answered entirely from index (no table access)
SELECT username FROM users WHERE email = 'john@example.com';

-- PostgreSQL: Index-Only Scan
-- MySQL: Using index

With INCLUDE (PostgreSQL 11+):

-- Include non-indexed columns in index leaf nodes
CREATE INDEX idx_users_email ON users(email)
INCLUDE (username, created_at);

-- Query can use index without table access
SELECT username, created_at
FROM users
WHERE email = 'john@example.com';

Index Maintenance

Find Unused Indexes (PostgreSQL):

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes to save space and improve write performance

Find Duplicate Indexes:

-- PostgreSQL query to find duplicate indexes
SELECT
  indrelid::regclass AS table_name,
  array_agg(indexrelid::regclass) AS indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;

Rebuild Fragmented Indexes:

-- PostgreSQL
REINDEX INDEX idx_users_email;
REINDEX TABLE users;

-- MySQL
OPTIMIZE TABLE users;

Query Rewriting Examples

Example 1: Aggregation Optimization

Before:

SELECT
  u.id,
  u.username,
  (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count,
  (SELECT SUM(total_amount) FROM orders WHERE user_id = u.id) as total_spent
FROM users u
WHERE u.active = true;

-- N+1 problem: 1 query + 2 subqueries per user

After:

SELECT
  u.id,
  u.username,
  COUNT(o.id) as order_count,
  COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- Single query with JOIN
-- Much faster!

Example 2: EXISTS vs IN

Before:

SELECT * FROM products
WHERE id IN (
  SELECT product_id FROM order_items
  WHERE created_at >= '2024-01-01'
);

-- Subquery returns all product_ids (potentially large result set)

After:

SELECT p.* FROM products p
WHERE EXISTS (
  SELECT 1 FROM order_items oi
  WHERE oi.product_id = p.id
    AND oi.created_at >= '2024-01-01'
);

-- EXISTS stops at first match (more efficient)

Example 3: Avoid Cartesian Products

Before:

-- Accidental cartesian product
SELECT *
FROM users u, orders o
WHERE u.active = true
  AND o.status = 'completed';

-- Returns every user combined with every completed order!
-- Missing JOIN condition

After:

SELECT u.*, o.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.active = true
  AND o.status = 'completed';

-- Proper JOIN condition

Example 4: Optimize DISTINCT

Before:

SELECT DISTINCT user_id
FROM orders
WHERE status = 'completed';

-- DISTINCT requires sorting/deduplication

After:

SELECT user_id
FROM orders
WHERE status = 'completed'
GROUP BY user_id;

-- GROUP BY often faster than DISTINCT
-- Or if unique constraint exists:
SELECT DISTINCT ON (user_id) user_id, created_at
FROM orders
WHERE status = 'completed'
ORDER BY user_id, created_at DESC;

Monitoring Slow Queries

PostgreSQL - pg_stat_statements

-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slowest queries
SELECT
  substring(query, 1, 50) AS short_query,
  round(total_exec_time::numeric, 2) AS total_time,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Find queries with most calls
SELECT
  substring(query, 1, 50) AS short_query,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

MySQL - Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries taking > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';

-- Log queries not using indexes
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Analyze slow query log
-- Use mysqldumpslow tool:
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log

Performance Schema (MySQL)

-- Enable performance schema
SET GLOBAL performance_schema = ON;

-- Find slowest statements
SELECT
  DIGEST_TEXT,
  COUNT_STAR AS executions,
  ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms,
  ROUND(SUM_TIMER_WAIT / 1000000000, 2) AS total_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

Best Practices

DO ✓

Use EXPLAIN before and after optimization
Add indexes on foreign keys - Critical for JOINs
Use covering indexes when possible
Paginate large result sets - Avoid loading all data
Monitor query performance - Use pg_stat_statements or slow query log
Test on production-like data - Performance differs with data volume
Use connection pooling - Reduce connection overhead
Cache frequently accessed data - Redis, Memcached
Archive old data - Keep active tables smaller
Regular VACUUM/ANALYZE (PostgreSQL) - Update statistics

DON'T ✗

**Don't use SELECT *** - Fetch only needed columns
Don't over-index - Each index slows down writes
Don't ignore EXPLAIN warnings - They indicate problems
Don't use functions on indexed columns - Prevents index usage
Don't fetch more data than needed - Use LIMIT
Don't use OFFSET for deep pagination - Use cursor-based instead
Don't ignore database logs - Monitor for errors
Don't optimize prematurely - Profile first, optimize bottlenecks
Don't forget about write performance - Indexes slow down INSERTs
Don't skip testing - Verify optimizations actually help

Query Optimization Checklist

## Query Optimization Checklist

- [ ] Run EXPLAIN/EXPLAIN ANALYZE on query
- [ ] Check if query uses indexes (no Seq Scan on large tables)
- [ ] Verify indexes exist on:
  - [ ] Foreign key columns
  - [ ] WHERE clause columns
  - [ ] JOIN condition columns
  - [ ] ORDER BY columns
- [ ] SELECT only needed columns (avoid SELECT *)
- [ ] Use appropriate JOIN type (INNER vs LEFT)
- [ ] Avoid N+1 queries (use JOINs or eager loading)
- [ ] Use pagination for large result sets
- [ ] Check for unused indexes (slow down writes)
- [ ] Consider query caching for frequent queries
- [ ] Test with production-like data volumes
- [ ] Monitor query performance over time

Notes

Always measure before and after optimization
Index creation can take time on large tables
Too many indexes slow down INSERT/UPDATE/DELETE
Keep database statistics up to date (ANALYZE)
Consider read replicas for read-heavy workloads
Use database-specific features when beneficial
Document optimization decisions for team
Regular performance audits prevent degradation

Skills Marketplace

query-optimizer

About query-optimizer

Query Optimizer Skill

Instructions

Supported Databases

Usage Examples

Query Analysis Tools

PostgreSQL - EXPLAIN ANALYZE

MySQL - EXPLAIN

Common Performance Issues

1. Missing Indexes

2. N+1 Query Problem

3. SELECT * Inefficiency

4. Inefficient Pagination

5. Function on Indexed Column

6. Inefficient JOINs

7. OR Conditions

8. NOT IN with Subquery

Index Optimization

When to Add Indexes

Index Types

Covering Index

Index Maintenance

Query Rewriting Examples

Example 1: Aggregation Optimization

Example 2: EXISTS vs IN

Example 3: Avoid Cartesian Products

Example 4: Optimize DISTINCT

Monitoring Slow Queries

PostgreSQL - pg_stat_statements

MySQL - Slow Query Log

Performance Schema (MySQL)

Best Practices

DO ✓

DON'T ✗

Query Optimization Checklist

Notes

CuriousLearner

Download Skill Files

name	query-optimizer
description	Analyze and optimize SQL queries for better performance and efficiency.