Aphelion is a high-performance synthetic data generator built in Rust. It creates realistic, constraint-safe test data for PostgreSQL and MySQL databases, handling complex foreign keys and exotic types like PostGIS and JSONB without manual configuration.

How does Aphelion handle foreign keys?

Aphelion uses a topological sort algorithm to resolve table dependencies automatically. It guarantees 100% foreign key integrity by generating data in the correct order, ensuring that referenced records exist before dependent ones are created.

Aphelion offers a free version for developers that generates up to 1,000 rows per table. The Pro version, available for $49/year, offers unlimited row generation and advanced features.

Now Powered by Rust (v1.7.8)

by Algomimic

Enterprise Synthetic Data
at 1/400th the Cost

$49/year vs. $20,000+ enterprise platforms

52 PostgreSQL types (30 exotic: ltree, PostGIS, ranges, hstore) + 36+ MySQL types (JSON, spatial, ENUM/SET, partitioning)

80 total exotic types. 84 production-ready tables across 6 industries. Zero foreign key violations.
✅ Production Validated: Magento 2 (401 tables, 191K rows, 0.7% circular deps)

Built for healthcare, fintech, and SaaS teams who refuse to overpay.

Save $19,951/year vs. Tonic.ai

Download for Linux

No Credit Card • 1,000 Rows Per Table

Rust native binary (x64)

Linux x64 supported • macOS & Windows via Docker

curl -L https://algomimic.com/api/download/free -o aphelion && chmod +x aphelion

aphelion-cli — bash

Works perfectly with your modern stack

Native CLI integration for Docker, CI/CD, and Seed Scripts

Docker

Node.js

PostgreSQL

MySQL

React

GitHub Actions

What is Aphelion?

Aphelion is a high-performance, Rust-native synthetic data generator designed specifically for PostgreSQL and MySQL databases. Unlike other tools that require manual configuration, Aphelion automatically introspects your database schema, resolves complex foreign key dependencies (including circular references), and generates realistic, constraint-safe test data in seconds.

Why use Aphelion?

Speed: Written in Rust, it generates 10,000+ rows/second.
Accuracy: Guarantees 100% foreign key integrity.
Coverage: Supports 80+ exotic types (PostGIS, ltree, JSON).
Compliance: Generates HIPAA and GDPR-safe synthetic data locally.

How it works

Introspect: Connects to your DB to learn the schema.
Plan: Builds a dependency graph to strictly order inserts.
Generate: Creates and inserts data in parallel.

Comprehensive Database Coverage

100% exotic type support for both PostgreSQL and MySQL/MariaDB

PostgreSQL

52 Types

Exotic Types Supported:

▸ PostGIS: geometry, geography
▸ Hierarchical: ltree paths
▸ Key-Value: hstore
▸ Network: inet, cidr, macaddr, macaddr8
▸ Ranges: int4range, tsrange, daterange (6 types)
▸ Geometric: point, line, polygon, circle (7 types)
▸ Full-Text: tsvector, tsquery
▸ Advanced: arrays, JSONB, UUID, XML, money

Perfect For:

✓ Complex hierarchical data
✓ Geospatial applications
✓ Full-text search
✓ Advanced analytics

MySQL/MariaDB

28 Types

Exotic Types Supported:

▸ JSON: Native + MariaDB (via json_valid)
▸ UUIDs: Auto-detect CHAR(36) & BINARY(16)
▸ Network: IP/MAC heuristics (polyfill)
▸ Spatial: POINT, LINESTRING, POLYGON (8 types)
▸ Coded Values: ENUM, SET
▸ Binary: BIT, BINARY, VARBINARY, BLOB
▸ Scale: PARTITION BY RANGE, GENERATED columns

Perfect For:

✓ E-commerce (Magento, WooCommerce)
✓ WordPress/Drupal schemas
✓ Legacy apps with UUIDs in CHAR(36)
✓ MariaDB with JSON constraints

SQL Server

Coming Soon

Oracle

Coming Soon

84 Production-Ready Tables Across 6 Industries

Available for both PostgreSQL and MySQL/MariaDB

🏥 Healthcare (15 tables) 💰 Finance (16 tables) 🛒 E-commerce (17 tables) 🏢 Insurance (11 tables) 📱 Telecom (13 tables) ⚖️ Legal (12 tables)

Trusted by developers at

2.5M+

Rows Generated

500+

Developers

99.9%

Constraint Success

Industry Schemas

"Finally, realistic healthcare data without HIPAA violations. The ICD-10 and LOINC generators are perfect."

Sarah J.

Senior Engineer, HealthTech

"Saved us 40+ hours per sprint. CI/CD auto-approve mode is a game-changer for our testing pipeline."

Mike K.

DevOps Lead, FinTech

"The constraint-safe generation is incredible. No more foreign key violations. Just works."

Alex L.

CTO, E-commerce Startup

Compliance-Ready Data Generation

HIPAA-Ready

Synthetic PHI Generation

PCI-DSS Safe

Fake Payment Data

GDPR-Ready

No Real PII

Privacy-First

Runs Locally

Aphelion generates 100% synthetic data to help you maintain compliance. No real patient data, financial records, or personal information is used or required. Learn more →

Why Developers, Startups & Enterprises Love Us

Built by engineers tired of SQL seeds. Perfect for MVP velocity and Enterprise scale.

Deploy with Confidence

Never break staging again. Our topological dependency graph ensures 100% referential integrity for every insert.

Scale Before You fail

Simulate massive datasets on your laptop. Test partitioning strategies and query performance against production-scale volume.

Pass Audits Instantly

Compliance comes standard. HIPAA-ready patient records and PCI-safe financial transactions generated without real PII.

Privacy by Default

All data is generated locally in your infrastructure. We never see your schema, we never see your data. Zero external API calls.

Catch Edge Cases

Deterministic seeding guarantees reproducible bugs. Need a user with exactly 3 failed payments? Script it once, reuse forever.

Zero Tech Debt

Stop maintaining fragile SQL scripts. Aphelion auto-introspects schema changes, so your seed data never rots.

View Deep Technical Capabilities

▸ Generated Columns: Auto-computed values based on expressions.
▸ Partitioning: Intelligent data distribution for ranged partitions.
▸ Ltree / HierarchyID: Recursive tree generation (5-11 levels deep).
▸ Spatial Types: PostGIS geometry/geography, MySQL spatial.
▸ Weighted Distributions: Realistic demographic & geographic spread.

▸ Composite Keys: Correctly handles multi-column uniqueness.
▸ Domains & Enums: Respects custom types and constraints.
▸ Circular Dependencies: Automatically resolves FK cycles.
▸ Array Types: Generates realistic array distributions.

Built for Your Entire Team

Different goals, one source of truth.

For QA Teams

Spin up realistic test envs in under 10 minutes
Eliminate PII from test data while preserving edge cases
Reproducible bugs with deterministic seeds

For Compliance

HIPAA & PCI-DSS safe by design (no real data used)
Generate audit-ready datasets for penetration testing
Zero risk of data leaks in staging/dev

For Data Science

Version entire synthetic corpora with code
Benchmark drift detection against stable baselines

Industry-Specific Solutions

Pre-built generators for healthcare, finance, e-commerce, and more.

🏥

Healthcare

HIPAA-compliant, HL7 FHIR, clinical terminologies

Learn more →

🛒

E-commerce

Rich content, code snippets, reputation systems

Learn more →

💳

Financial Services

PCI-DSS, SOX compliance, fraud detection

Learn more →

🚀

SaaS & Startups

CI/CD integration, rapid iteration, zero config

Learn more →

📡

Telecommunications

IMSI, IMEI, CDRs, billing, network topology

Learn more →

🛡️

Insurance

P&C policies, claims, actuarial data

Learn more →

⚖️

Legal Tech

Case management, contracts, compliance audits

Learn more →

Works With Your Existing Schema

No configuration needed to start. We introspect your database, detect types, and map them to realistic Faker generators automatically.

Smart Type Detection Maps `user_email` to `internet.email` automatically
Zero Config Start Just point it at your DB URL and go
JSON Export Export layout to JSON for fine-tuning

bash — 80x24

➜ ~ aphelion introspect postgres://localhost/myapp

> Connected to database 'myapp'
> Found 14 tables
> Detected 3 circular dependencies
> Generating schema map... Done

➜ ~ aphelion generate --rows 1000 --seed 42

> Generating data plan
> Phase 1: Base tables (users, products)...
> Phase 2: Dependent tables (orders, items)...
> Phase 3: Resolving circular refs...
> Successfully generated 14,000 rows in 1.2s

See Aphelion in Action

Real workflows for real teams.

Workflow: Seed Healthcare Sandbox

# 1. Initialize with Healthcare template (HIPAA ready)

$ aphelion init --template healthcare-fhir

# 2. Point to your sandbox DB

$ export DB_URL="postgres://admin@localhost:5432/sandbox"

# 3. Generate 50k patients with history

$ aphelion generate --rows 50000 --seed 2024

> Generating patients (fhir_patients)... Done
> Generating encounters (linked to patients)... Done
> Generating observations (LOINC codes)... Done
> 0 FK Violations. 0 PII Leaks.

Workflow: Inject Fraud Patterns

# 1. Introspect existing transaction schema

$ aphelion introspect postgres://prod-replica/payments

# 2. Generate data with specific fraud signals

$ aphelion generate --scenario "velocity_attack" --rate 0.02

> Modeling normal transactions (98%)...
> Injecting velocity attacks (2%)...
> Ensuring standard deviation matches prod...
> Dataset ready for model training.

Perfect For

Database Seeding & Cloning You need to fill a complex Postgres schema with 10M+ rows that respect FKs and constraints.
Integration Testing You need deterministic data for CI/CD pipelines. Seed 42 always produces the exact same users.
Regulated Industries We specialize in Healthcare (HIPAA), Finance (PCI), and Telecom schemas.

Not Designed For

ML Model Training We generate structured relational data, not statistical duplicates of production data distributions for ML.
Unstructured Media We don't generate synthetic images, video, or long-form generated text/audio.
SaaS Hosting Aphelion is a CLI tool that runs in your infrastructure. We don't host your data.

Why Aphelion is Different

We fill the gap between hacking together scripts and expensive enterprise platforms.

Vs. Scripts & Libraries

Feature	Aphelion	Faker.js / Seeds	Custom SQL Scripts
Relational Integrity (FKs)	Automated	Manual ID tracking	Complex CTEs needed
Circular Dependencies	Handled	Impossible	⚠️ Very hard to write
Maintenance	Zero Auto-introspects schema	High Break on schema change	High Rewrite query on change

Vs. Enterprise Platforms

Feature	Aphelion	Enterprise AI Platforms (Gretel, MOSTLY AI, Tonic)
Primary Focus	Relational Structure Perfect DB seeding & Foreign Keys	Statistical Similarity ML Model Training & Privacy
Developer Experience	CLI Native Runs locally, works in CI	Web UI / SaaS Upload data to cloud
Postgres Depth	Native Support ltree, hierarchyid, jsonb, ranges	Generic SQL Often treats everything as tables
Price	Free / $49 mo	$20k+ / year

Start Building Free →

Simple, Transparent Pricing

Start free on your local machine. Scale when your team grows.

Developer (CLI)

$0/forever

Run locally on your laptop. No credit card required.

Unlimited tables & databases
1,000 Rows per table limit
Constraint-safe generation
All industry templates included

Download for Linux (x64)

macOS & Windows coming soon

Team (CI/CD)

$49/year

That's ~0.2% the cost of enterprise tools ($20k+)

For teams automating testing in CI pipelines.

1.5 Million Rows (per seed)
Auto-Approve CI Mode
Priority Email Support

Secure payment via Stripe

🔒 You get realistic data without inheriting production risk.

We never copy, store, hash, or transform real data — we observe structure and generate new data from scratch. All PII is automatically detected and replaced with safe synthetic values.

Scale Transparency: Tested and proven with up to 1.5M rows (100K patients in healthcare demos). Production-ready for datasets up to 250K patients (~3.75M rows) with current configuration. For larger datasets, we offer streaming implementation and direct database loading options. View technical details.

Frequently Asked Questions

Everything you need to know about Aphelion

How is Aphelion different from Faker.js?

Faker.js generates random data but doesn't understand database constraints. Aphelion introspects your schema to ensure zero foreign key violations, handles circular dependencies, and generates realistic healthcare/finance codes (ICD-10, LOINC, etc.) that Faker.js doesn't support.

Is the generated data truly realistic?

Yes! Aphelion uses weighted distributions and industry-specific generators. Healthcare schemas support ICD-10 codes, LOINC lab tests, and MRN formats. Financial schemas include realistic transaction patterns and account hierarchies. It's designed to mirror production data without the compliance risk. (Advanced generators coming to Rust version)

Can I use this in production?

No. Aphelion is for testing, development, and staging environments only. The data is synthetic and realistic, but not suitable for production use. It's designed to replace production data in non-production environments to maintain HIPAA/PCI-DSS compliance.

What databases are supported?

Currently, Aphelion supports PostgreSQL (including complex features like ltree, JSONB, arrays, and enums), MySQL, MariaDB, and SQLite. Support for SQL Server and Oracle is on the roadmap.

How does deterministic generation work?

Use the --seed flag to generate identical data every time. Same seed = same data. Perfect for reproducible testing, CI/CD pipelines, and debugging. Different team members can generate the exact same dataset.

Which features are in the Rust vs. Node.js version?

The current Rust binary (Linux x64) includes:

✅ All 52 PostgreSQL & 36+ MySQL types
✅ Constraint-safe generation (FK, unique, check, PK)
✅ Topology sorting for correct insertion order
✅ License validation (Free/Pro/Enterprise)
✅ Industry-specific generators (ICD-10, LOINC, financial)
✅ Weighted distributions
✅ Temporal constraint patterns

Do I need to write code or configuration files?

No coding required! Aphelion introspects your database schema automatically. Just point it at your database, and it generates a JSON configuration with smart defaults. You can customize if needed, but it works out of the box.

What's included in the Team tier?

Team ($49/year) includes: unlimited rows (tested up to 1.5M), CI/CD auto-approve mode (no manual confirmations), priority email support, and advanced custom generators. Perfect for teams with automated testing pipelines.

How fast is data generation?

Aphelion generates ~10,000 rows/second on modern hardware. A 100K row dataset typically takes 10-15 seconds. The constraint-safe algorithm adds minimal overhead while ensuring perfect referential integrity.

Still have questions?

Contact Sales & Support

Latest from the Blog

Updates, tutorials, and announcements.

View all posts →

New Release v1.6.0

Aphelion v1.6.0: Smart Partitions & Generated Columns

Solving the hardest PostgreSQL data generation challenges: intelligent partition support and automatic generated column handling. Zero config required.

Jan 4, 2026 • 5 min read

HEALTHCARE

Enterprise Synthetic Data at 1/400th the Cost

What is Aphelion?

Why use Aphelion?

How it works

Comprehensive Database Coverage

PostgreSQL

Exotic Types Supported:

Perfect For:

MySQL/MariaDB

Exotic Types Supported:

Perfect For:

84 Production-Ready Tables Across 6 Industries

Why Developers, Startups & Enterprises Love Us

Deploy with Confidence

Scale Before You fail

Pass Audits Instantly

Privacy by Default

Catch Edge Cases

Zero Tech Debt

View Deep Technical Capabilities

Built for Your Entire Team

For QA Teams

For Compliance

For Data Science

Industry-Specific Solutions

Healthcare

E-commerce

Financial Services

SaaS & Startups

Telecommunications

Insurance

Legal Tech

Works With Your Existing Schema

See Aphelion in Action

Perfect For

Not Designed For

Why Aphelion is Different

Vs. Scripts & Libraries

Vs. Enterprise Platforms

Simple, Transparent Pricing

Developer (CLI)

Team (CI/CD)

Frequently Asked Questions

How is Aphelion different from Faker.js?

Is the generated data truly realistic?

Can I use this in production?

What databases are supported?

How does deterministic generation work?

Which features are in the Rust vs. Node.js version?

Do I need to write code or configuration files?

What's included in the Team tier?

How fast is data generation?

Contact Sales & Support

Latest from the Blog

Aphelion v1.6.0: Smart Partitions & Generated Columns

Healthcare Data: OMOP & OpenMRS

P&C Insurance Data Model

E-commerce: Stores & Shoppers

Enterprise Synthetic Data
at 1/400th the Cost