BEST PRACTICES

How to Seed PostgreSQL Databases in 2025: Complete Guide

From manual SQL scripts to automated tools—learn the best practices for seeding PostgreSQL databases with foreign keys, constraints, and realistic data.

January 2, 2025 15 min read Database Development

Why Database Seeding Matters

Every PostgreSQL application needs test data. Whether you're:

  • Developing locally and need realistic data
  • Running integration tests in CI/CD
  • Setting up staging environments
  • Demoing your application to stakeholders

You need a reliable way to populate your database with valid, constraint-safe data.

This guide covers 5 approaches to database seeding, from manual SQL scripts to fully automated tools, with real code examples and performance comparisons.

Table of Contents

  1. 1. Manual SQL Scripts (The Old Way)
  2. 2. ORM Seeders (Prisma, TypeORM, Sequelize)
  3. 3. Faker.js + Custom Scripts
  4. 4. Database Cloning Tools
  5. 5. Automated Synthetic Data Generators
  6. 6. Best Practices & Performance Tips
  7. 7. Choosing the Right Approach

Method 1: Manual SQL Scripts

The traditional approach: write SQL INSERT statements by hand.

Example: seed.sql

-- seed.sql
BEGIN;

-- Insert users first (no dependencies)
INSERT INTO users (id, name, email, created_at) VALUES
  (1, 'Alice Johnson', 'alice@example.com', NOW()),
  (2, 'Bob Smith', 'bob@example.com', NOW()),
  (3, 'Carol White', 'carol@example.com', NOW());

-- Insert orders (depends on users)
INSERT INTO orders (id, user_id, total, status, created_at) VALUES
  (1, 1, 99.99, 'completed', NOW()),
  (2, 1, 149.50, 'pending', NOW()),
  (3, 2, 75.00, 'shipped', NOW());

COMMIT;

Pros

  • ✓ Simple and transparent
  • ✓ Version controlled
  • ✓ Fast execution
  • ✓ No dependencies

Cons

  • ✗ Not scalable (hard to maintain 1000+ rows)
  • ✗ Data is static (same every time)
  • ✗ Manual FK tracking required
  • ✗ Breaks when schema changes
  • ✗ No realistic data (all hardcoded)

When to Use

Manual SQL scripts work well for small, static datasets (< 100 rows) that rarely change. Perfect for initial schema setup or minimal test fixtures.

Method 2: ORM Seeders (Prisma, TypeORM, Sequelize)

Most ORMs provide seeding functionality. Here's how they compare:

Prisma Example

// prisma/seed.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();

async function main() {
  // Create users
  const alice = await prisma.user.create({
    data: {
      name: 'Alice Johnson',
      email: 'alice@example.com',
      orders: {
        create: [
          { total: 99.99, status: 'completed' },
          { total: 149.50, status: 'pending' }
        ]
      }
    }
  });

  const bob = await prisma.user.create({
    data: {
      name: 'Bob Smith',
      email: 'bob@example.com',
      orders: {
        create: [
          { total: 75.00, status: 'shipped' }
        ]
      }
    }
  });

  console.log({ alice, bob });
}

main()
  .catch((e) => console.error(e))
  .finally(async () => await prisma.$disconnect());

TypeORM Example

// src/database/seeds/user.seed.ts
import { Factory, Seeder } from 'typeorm-seeding';
import { User } from '../entities/User';
import { Order } from '../entities/Order';

export default class CreateUsers implements Seeder {
  public async run(factory: Factory): Promise {
    await factory(User)()
      .map(async (user) => {
        user.orders = await factory(Order)().createMany(3);
        return user;
      })
      .createMany(10);
  }
}

Pros

  • ✓ Type-safe (TypeScript)
  • ✓ Handles FK relationships automatically
  • ✓ Integrates with your ORM
  • ✓ Can use factories for randomization

Cons

  • ✗ Slow for large datasets (N+1 queries)
  • ✗ Still requires manual data definition
  • ✗ Limited to your ORM's capabilities
  • ✗ Doesn't work without an ORM

When to Use

ORM seeders are great if you're already using Prisma/TypeORM and need type-safe seeding for small to medium datasets (< 10,000 rows).

Method 3: Faker.js + Custom Scripts

Use Faker.js to generate realistic fake data, but you handle the database logic.

Example

// seed.js
const { faker } = require('@faker-js/faker');
const { Pool } = require('pg');

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function seed() {
  // Insert 1000 users
  const userIds = [];
  for (let i = 0; i < 1000; i++) {
    const result = await pool.query(
      'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id',
      [faker.person.fullName(), faker.internet.email()]
    );
    userIds.push(result.rows[0].id);
  }

  // Insert 3000 orders (3 per user on average)
  for (const userId of userIds) {
    const orderCount = faker.number.int({ min: 1, max: 5 });
    for (let i = 0; i < orderCount; i++) {
      await pool.query(
        'INSERT INTO orders (user_id, total, status) VALUES ($1, $2, $3)',
        [
          userId,
          faker.number.float({ min: 10, max: 1000, precision: 0.01 }),
          faker.helpers.arrayElement(['pending', 'shipped', 'completed'])
        ]
      );
    }
  }

  console.log('Seeded 1000 users and ~3000 orders');
}

seed().catch(console.error);

Pros

  • ✓ Realistic fake data
  • ✓ Flexible and customizable
  • ✓ Works with any database
  • ✓ Large ecosystem of generators

Cons

  • ✗ Manual FK tracking (userIds array)
  • ✗ Slow (one INSERT per row)
  • ✗ No automatic constraint handling
  • ✗ Breaks on circular dependencies
  • ✗ 100+ lines of boilerplate for complex schemas

When to Use

Faker.js is excellent for simple schemas (< 5 tables) or when you need very specific fake data formats. Not recommended for complex relational databases.

Read more: Faker.js vs Aphelion: When to Use Each

Method 4: Database Cloning Tools

Clone your production database structure and generate synthetic data automatically.

Example: Using Aphelion

# Clone production schema, generate test data
aphelion clone postgresql://localhost/production \
  test_db --rows 10000 --seed 42

# Output:
# 🔍 Introspecting schema...
#    ✓ Found 23 tables
#    ✓ Detected 47 foreign keys
#    ✓ Resolved 3 circular dependencies
#
# 📊 Generating data...
#    ✓ users (10,000 rows)
#    ✓ orders (34,567 rows)
#    ✓ products (5,000 rows)
#    ... (20 more tables)
#
# ✅ Generated 156,789 rows in 52 seconds
#    All constraints satisfied. Zero errors.

Pros

  • ✓ Fully automated (no manual scripting)
  • ✓ Handles all constraints automatically
  • ✓ Scales to millions of rows
  • ✓ Deterministic (same seed = same data)
  • ✓ Fast (bulk inserts)
  • ✓ Works with complex schemas

Cons

  • ✗ Less control over specific data values
  • ✗ Requires external tool
  • ✗ May need customization for domain-specific data

When to Use

Database cloning tools are ideal for complex schemas (10+ tables), CI/CD pipelines, and when you need production-like data volumes without manual scripting.

Method 5: Automated Synthetic Data Generators

Enterprise-grade tools that generate statistically similar data to production.

Options

  • Aphelion - PostgreSQL-native, $49/year, constraint-safe
  • Tonic.ai - Multi-database, $20k+/year, ML-based
  • MOSTLY AI - Enterprise focus, $3k+/year
  • Gretel.ai - Cloud-based, API-first

Comparison

Tool Price Best For
Aphelion $0-$49/year PostgreSQL specialists, startups
Tonic.ai $20k+/year Multi-database enterprises
MOSTLY AI $3k+/year ML-based similarity

Best Practices for PostgreSQL Seeding

1. Use Transactions

Always wrap your seed scripts in transactions to ensure atomicity:

BEGIN;
-- Your INSERT statements
COMMIT;
-- Or ROLLBACK on error

2. Disable Triggers Temporarily

For faster seeding, disable triggers during bulk inserts:

ALTER TABLE users DISABLE TRIGGER ALL;
-- Insert data
ALTER TABLE users ENABLE TRIGGER ALL;

3. Use COPY for Bulk Inserts

COPY is 10-100x faster than INSERT for large datasets:

COPY users (name, email, created_at) FROM STDIN WITH CSV;
Alice Johnson,alice@example.com,2025-01-01
Bob Smith,bob@example.com,2025-01-01
\.

4. Set Explicit IDs for Reproducibility

Use deterministic IDs for consistent test data:

INSERT INTO users (id, name, email) VALUES
  (1, 'Alice', 'alice@example.com'),
  (2, 'Bob', 'bob@example.com');

-- Reset sequence
SELECT setval('users_id_seq', (SELECT MAX(id) FROM users));

5. Validate Constraints After Seeding

Verify all constraints are satisfied:

-- Check for FK violations
SELECT * FROM orders WHERE user_id NOT IN (SELECT id FROM users);

-- Check for NULL violations
SELECT * FROM users WHERE email IS NULL;

-- Check for unique violations
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;

Performance Tips

Benchmark: 100,000 Rows

Method Time Notes
Individual INSERTs ~10 minutes Slow, not recommended
Batch INSERTs (1000/batch) ~2 minutes Better, but still slow
COPY command ~15 seconds Fast, recommended
Aphelion (automated) ~20 seconds Fast + automatic FKs

Optimization Checklist

  • ✓ Use COPY instead of INSERT for bulk data
  • ✓ Disable indexes during seeding, rebuild after
  • ✓ Disable triggers temporarily
  • ✓ Use UNLOGGED tables for temporary data
  • ✓ Increase maintenance_work_mem for faster index creation
  • ✓ Use parallel workers for large datasets

Choosing the Right Approach

Decision Matrix

  • Small, static data (< 100 rows)?
    → Use Manual SQL scripts
  • Using an ORM, small dataset (< 10k rows)?
    → Use ORM seeders
  • Simple schema, need realistic data?
    → Use Faker.js + custom scripts
  • Complex schema, many FKs, large dataset?
    → Use Aphelion or database cloning tools
  • Enterprise, multi-database, huge budget?
    → Use Tonic.ai or MOSTLY AI

Conclusion

Database seeding has evolved significantly. In 2025, you have options ranging from manual SQL scripts to fully automated synthetic data generators.

Our recommendation:

  • For simple projects: Manual SQL or ORM seeders
  • For complex PostgreSQL schemas: Aphelion (automated, $49/year)
  • For multi-database enterprises: Tonic.ai ($20k+/year)

The key is choosing a method that scales with your schema complexity and doesn't require constant maintenance.

Try Automated PostgreSQL Seeding

Generate constraint-safe test data in seconds. No manual scripting required.

Free forever for local development • 1,000 rows per table

Tags: #PostgreSQL #DatabaseSeeding #TestData #BestPractices #DevOps

Related: Faker.js vs AphelionAphelion vs Tonic.ai