dirnbauer

postgres-best-practices

3
0
# Install this skill:
npx skills add dirnbauer/webconsulting-skills --skill "postgres-best-practices"

Install specific skill from multi-skill repository

# Description

Postgres performance optimization and best practices from Supabase. Query performance, connection pooling, Row Level Security, schema design, and diagnostics.

# SKILL.md


name: postgres-best-practices
description: Postgres performance optimization and best practices from Supabase. Query performance, connection pooling, Row Level Security, schema design, and diagnostics.
version: 1.0.0
triggers:
- postgres
- postgresql
- sql
- database
- query
- index
- rls
- row-level-security
- supabase
- connection-pooling


Postgres Best Practices

Source: This skill is based on Supabase's Postgres Best Practices
for AI agents. See the announcement blog post.

Comprehensive Postgres performance optimization guide. Contains rules across 8 categories,
prioritized by impact from critical (query performance, connection management) to incremental
(advanced features).

When to Apply

Reference these guidelines when:
- Writing SQL queries or designing schemas
- Implementing indexes or query optimization
- Reviewing database performance issues
- Configuring connection pooling or scaling
- Optimizing for Postgres-specific features
- Working with Row-Level Security (RLS)

Rule Categories by Priority

Priority Category Impact
1 Query Performance CRITICAL
2 Connection Management CRITICAL
3 Security & RLS CRITICAL
4 Schema Design HIGH
5 Concurrency & Locking MEDIUM-HIGH
6 Data Access Patterns MEDIUM
7 Monitoring & Diagnostics LOW-MEDIUM
8 Advanced Features LOW

1. Query Performance (CRITICAL)

Slow queries, missing indexes, inefficient query plans. The most common source of Postgres performance issues.

1.1 Add Indexes on WHERE and JOIN Columns

Impact: CRITICAL (100-1000x faster queries on large tables)

Queries filtering or joining on unindexed columns cause full table scans.

Incorrect (sequential scan on large table):

-- No index on customer_id causes full table scan
select * from orders where customer_id = 123;

-- EXPLAIN shows: Seq Scan on orders (cost=0.00..25000.00 rows=100 width=85)

Correct (index scan):

-- Create index on frequently filtered column
create index orders_customer_id_idx on orders (customer_id);

select * from orders where customer_id = 123;

-- EXPLAIN shows: Index Scan using orders_customer_id_idx (cost=0.42..8.44 rows=100 width=85)

For JOIN columns, always index the foreign key side:

create index orders_customer_id_idx on orders (customer_id);

select c.name, o.total
from customers c
join orders o on o.customer_id = c.id;

1.2 Choose the Right Index Type

Impact: HIGH (10-100x improvement with correct index type)

Index Type Use Case Operators
B-tree (default) Equality, range queries =, <, >, BETWEEN, IN, IS NULL
GIN Arrays, JSONB, full-text @>, ?, ?&, ?\|, @@
BRIN Large time-series tables Range queries (10-100x smaller)
Hash Equality-only = (slightly faster than B-tree)

Incorrect (B-tree for JSONB containment):

-- B-tree cannot optimize containment operators
create index products_attrs_idx on products (attributes);
select * from products where attributes @> '{"color": "red"}';
-- Full table scan - B-tree doesn't support @> operator

Correct (GIN for JSONB):

-- GIN supports @>, ?, ?&, ?| operators
create index products_attrs_idx on products using gin (attributes);
select * from products where attributes @> '{"color": "red"}';

1.3 Create Composite Indexes for Multi-Column Queries

Impact: HIGH (5-10x faster multi-column queries)

Column order matters - place equality columns first, range columns last:

-- Good: status (=) before created_at (>)
create index orders_status_created_idx on orders (status, created_at);

-- Works for: WHERE status = 'pending'
-- Works for: WHERE status = 'pending' AND created_at > '2024-01-01'
-- Does NOT work for: WHERE created_at > '2024-01-01' (leftmost prefix rule)

1.4 Use Covering Indexes to Avoid Table Lookups

Impact: MEDIUM-HIGH (2-5x faster queries by eliminating heap fetches)

-- Include non-searchable columns in the index
create index users_email_idx on users (email) include (name, created_at);

-- All columns served from index, no table access needed
select email, name, created_at from users where email = '[email protected]';

1.5 Use Partial Indexes for Filtered Queries

Impact: HIGH (5-20x smaller indexes, faster writes and queries)

-- Index only includes active users
create index users_active_email_idx on users (email)
where deleted_at is null;

-- Only pending orders (status rarely changes once completed)
create index orders_pending_idx on orders (created_at)
where status = 'pending';

2. Connection Management (CRITICAL)

2.1 Use Connection Pooling

Impact: CRITICAL (Handle 10-100x more concurrent users)

Postgres connections are expensive (1-3MB RAM each). Use a pooler like PgBouncer.

-- Without pooling: 500 concurrent users = 500 connections = crashed database
-- With pooling: 500 concurrent users share 10 actual connections

Pool modes:
- Transaction mode: Connection returned after each transaction (best for most apps)
- Session mode: Connection held for entire session (needed for prepared statements, temp tables)

2.2 Configure Connection Limits

Impact: CRITICAL (Prevent database crashes and memory exhaustion)

-- Formula: max_connections = (RAM in MB / 5MB per connection) - reserved
-- For 4GB RAM: (4096 / 5) - 10 = ~800 theoretical max
-- Practically, 100-200 is better for query performance

alter system set max_connections = 100;

-- work_mem * max_connections should not exceed 25% of RAM
alter system set work_mem = '8MB';  -- 8MB * 100 = 800MB max

2.3 Configure Idle Connection Timeouts

-- Terminate connections idle in transaction after 30 seconds
alter system set idle_in_transaction_session_timeout = '30s';

-- Terminate completely idle connections after 10 minutes
alter system set idle_session_timeout = '10min';

select pg_reload_conf();

3. Security & RLS (CRITICAL)

3.1 Enable Row Level Security for Multi-Tenant Data

Impact: CRITICAL (Database-enforced tenant isolation, prevent data leaks)

Incorrect (application-level filtering only):

-- Relying only on application to filter
select * from orders where user_id = $current_user_id;

-- Bug or bypass means all data is exposed!
select * from orders;  -- Returns ALL orders

Correct (database-enforced RLS):

-- Enable RLS on the table
alter table orders enable row level security;

-- Create policy for users to see only their orders
create policy orders_user_policy on orders
  for all
  using (user_id = current_setting('app.current_user_id')::bigint);

-- Force RLS even for table owners
alter table orders force row level security;

-- Set user context and query
set app.current_user_id = '123';
select * from orders;  -- Only returns orders for user 123

Supabase pattern with auth.uid():

create policy orders_user_policy on orders
  for all
  to authenticated
  using (user_id = auth.uid());

3.2 Optimize RLS Policies for Performance

Impact: HIGH (5-10x faster RLS queries with proper patterns)

Incorrect (function called for every row):

create policy orders_policy on orders
  using (auth.uid() = user_id);  -- auth.uid() called per row!

Correct (wrap functions in SELECT for caching):

create policy orders_policy on orders
  using ((select auth.uid()) = user_id);  -- Called once, cached

-- 100x+ faster on large tables

Always add indexes on columns used in RLS policies:

create index orders_user_id_idx on orders (user_id);

3.3 Apply Principle of Least Privilege

-- Create role with no default privileges
create role app_readonly nologin;

-- Grant only SELECT on specific tables
grant usage on schema public to app_readonly;
grant select on public.products, public.categories to app_readonly;

-- Create role for writes with limited scope
create role app_writer nologin;
grant select, insert, update on public.orders to app_writer;
grant usage on sequence orders_id_seq to app_writer;
-- No DELETE permission

4. Schema Design (HIGH)

4.1 Choose Appropriate Data Types

Use Case Correct Type Avoid
IDs bigint int (overflows at 2.1B)
Strings text varchar(n) unless constraint needed
Timestamps timestamptz timestamp (no timezone)
Money numeric(10,2) float (precision issues)
Boolean boolean varchar(5)
create table users (
  id bigint generated always as identity primary key,
  email text,
  created_at timestamptz default now(),
  is_active boolean default true,
  price numeric(10,2)
);

4.2 Index Foreign Key Columns

Impact: HIGH (10-100x faster JOINs and CASCADE operations)

Postgres does NOT automatically index foreign key columns!

create table orders (
  id bigint generated always as identity primary key,
  customer_id bigint references customers(id) on delete cascade,
  total numeric(10,2)
);

-- Always index the FK column
create index orders_customer_id_idx on orders (customer_id);

Find missing FK indexes:

select
  conrelid::regclass as table_name,
  a.attname as fk_column
from pg_constraint c
join pg_attribute a on a.attrelid = c.conrelid and a.attnum = any(c.conkey)
where c.contype = 'f'
  and not exists (
    select 1 from pg_index i
    where i.indrelid = c.conrelid and a.attnum = any(i.indkey)
  );

4.3 Partition Large Tables

Impact: MEDIUM-HIGH (5-20x faster queries and maintenance on large tables)

When to partition: Tables > 100M rows, time-series data, need to efficiently drop old data.

create table events (
  id bigint generated always as identity,
  created_at timestamptz not null,
  data jsonb
) partition by range (created_at);

-- Create partitions for each month
create table events_2024_01 partition of events
  for values from ('2024-01-01') to ('2024-02-01');

-- Drop old data instantly
drop table events_2023_01;  -- Instant vs DELETE taking hours

4.4 Select Optimal Primary Key Strategy

-- Use IDENTITY for sequential IDs (SQL-standard, best for most cases)
create table users (
  id bigint generated always as identity primary key
);

-- For distributed systems needing UUIDs, use UUIDv7 (time-ordered)
-- Requires pg_uuidv7 extension
create table orders (
  id uuid default uuid_generate_v7() primary key
);

Guidelines:
- Single database: bigint identity (sequential, 8 bytes)
- Distributed/exposed IDs: UUIDv7 (time-ordered, no fragmentation)
- Avoid random UUIDs (v4) as primary keys on large tables (causes index fragmentation)

4.5 Use Lowercase Identifiers

-- Correct: unquoted lowercase identifiers are portable
create table users (
  user_id bigint primary key,
  first_name text,
  last_name text
);

-- Works without quotes, recognized by all tools
select first_name from users where user_id = 1;

5. Concurrency & Locking (MEDIUM-HIGH)

5.1 Keep Transactions Short

Impact: MEDIUM-HIGH (3-5x throughput improvement, fewer deadlocks)

Incorrect:

begin;
select * from orders where id = 1 for update;  -- Lock acquired
-- Application makes HTTP call to payment API (2-5 seconds)
-- Other queries on this row are blocked!
update orders set status = 'paid' where id = 1;
commit;

Correct:

-- Validate and call APIs outside transaction
-- Only hold lock for the actual update
begin;
update orders
set status = 'paid', payment_id = $1
where id = $2 and status = 'pending'
returning *;
commit;  -- Lock held for milliseconds

5.2 Prevent Deadlocks with Consistent Lock Ordering

-- Explicitly acquire locks in ID order before updating
begin;
select * from accounts where id in (1, 2) order by id for update;

-- Now perform updates in any order
update accounts set balance = balance - 100 where id = 1;
update accounts set balance = balance + 100 where id = 2;
commit;

5.3 Use SKIP LOCKED for Queue Processing

Impact: MEDIUM-HIGH (10x throughput for worker queues)

-- Each worker skips locked rows and gets the next available
begin;
select * from jobs
where status = 'pending'
order by created_at
limit 1
for update skip locked;

update jobs set status = 'processing' where id = $1;
commit;

Complete queue pattern (atomic claim-and-update):

update jobs
set status = 'processing', worker_id = $1, started_at = now()
where id = (
  select id from jobs
  where status = 'pending'
  order by created_at
  limit 1
  for update skip locked
)
returning *;

6. Data Access Patterns (MEDIUM)

6.1 Batch INSERT Statements

Impact: MEDIUM (10-50x faster bulk inserts)

-- Multiple rows in single statement
insert into events (user_id, action) values
  (1, 'click'),
  (1, 'view'),
  (2, 'click');

-- For large imports, use COPY
copy events (user_id, action, created_at)
from '/path/to/data.csv'
with (format csv, header true);

6.2 Eliminate N+1 Queries

Impact: MEDIUM-HIGH (10-100x fewer database round trips)

Incorrect:

-- 101 round trips
select id from users where active = true;  -- Returns 100 IDs
select * from orders where user_id = 1;
select * from orders where user_id = 2;
-- ... 98 more queries

Correct:

-- 1 round trip
select * from orders where user_id = any(array[1, 2, 3, ...]);

-- Or use JOIN
select u.id, u.name, o.*
from users u
left join orders o on o.user_id = u.id
where u.active = true;

6.3 Use Cursor-Based Pagination

Impact: MEDIUM-HIGH (Consistent O(1) performance regardless of page depth)

Incorrect (OFFSET):

-- Page 10000: scans 200,000 rows!
select * from products order by id limit 20 offset 199980;

Correct (cursor/keyset):

-- Page 10000: same speed as page 1
select * from products where id > 199980 order by id limit 20;

For multi-column sorting:

select * from products
where (created_at, id) > ('2024-01-15 10:00:00', 12345)
order by created_at, id
limit 20;

6.4 Use UPSERT for Insert-or-Update

-- Atomic operation, eliminates race conditions
insert into settings (user_id, key, value)
values (123, 'theme', 'dark')
on conflict (user_id, key)
do update set value = excluded.value, updated_at = now()
returning *;

7. Monitoring & Diagnostics (LOW-MEDIUM)

7.1 Enable pg_stat_statements

create extension if not exists pg_stat_statements;

-- Find slowest queries by total time
select
  calls,
  round(total_exec_time::numeric, 2) as total_time_ms,
  round(mean_exec_time::numeric, 2) as mean_time_ms,
  query
from pg_stat_statements
order by total_exec_time desc
limit 10;

7.2 Use EXPLAIN ANALYZE

explain (analyze, buffers, format text)
select * from orders where customer_id = 123 and status = 'pending';

-- Key things to look for:
-- Seq Scan on large tables = missing index
-- Rows Removed by Filter = poor selectivity or missing index
-- Buffers: read >> hit = data not cached
-- Nested Loop with high loops = consider different join strategy

7.3 Maintain Table Statistics

-- Manually analyze after large data changes
analyze orders;

-- Autovacuum tuning for busy tables
alter table orders set (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

8. Advanced Features (LOW)

8.1 Index JSONB Columns

-- GIN for containment queries
create index products_attrs_gin on products using gin (attributes);

-- Expression index for specific key lookups
create index products_brand_idx on products ((attributes->>'brand'));

Impact: MEDIUM (100x faster than LIKE, with ranking support)

-- Add tsvector column and index
alter table articles add column search_vector tsvector
  generated always as (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(content,''))) stored;

create index articles_search_idx on articles using gin (search_vector);

-- Fast full-text search with ranking
select *, ts_rank(search_vector, query) as rank
from articles, to_tsquery('english', 'postgresql') query
where search_vector @@ query
order by rank desc;

Quick Reference: Common Issues

Symptom Likely Cause Solution
Slow queries on large tables Missing index Add B-tree index on WHERE/JOIN columns
JSONB queries slow Wrong index type Use GIN index
Connection errors under load No pooling Use PgBouncer in transaction mode
Memory exhaustion Too many connections Reduce max_connections, add pooler
Multi-tenant data leak No RLS Enable RLS with proper policies
RLS queries slow Function per-row Wrap in (select ...) for caching
Slow pagination on deep pages Using OFFSET Use cursor-based pagination
INSERT bottleneck Single-row inserts Batch INSERTs or use COPY
N+1 queries Loop in application Use ANY() or JOIN
Deadlocks Inconsistent lock order Lock in consistent order (by ID)

Credits & Attribution

This skill is adapted from the excellent Postgres best practices developed by
Supabase.

Original repository: https://github.com/supabase/agent-skills

Copyright (c) Supabase - Methodology and best practices
Adapted by webconsulting.at for this skill collection

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.