init
This commit is contained in:
87
.opencode/skills/databases/SKILL.md
Normal file
87
.opencode/skills/databases/SKILL.md
Normal file
@@ -0,0 +1,87 @@
|
||||
---
|
||||
name: ck:databases
|
||||
description: Design schemas, write queries for MongoDB and PostgreSQL. Use for database design, SQL/NoSQL queries, aggregation pipelines, indexes, migrations, replication, performance optimization, psql CLI.
|
||||
license: MIT
|
||||
argument-hint: "[query or schema task]"
|
||||
metadata:
|
||||
author: claudekit
|
||||
version: "1.0.0"
|
||||
---
|
||||
|
||||
# Databases Skill
|
||||
|
||||
Unified guide for working with MongoDB (document-oriented) and PostgreSQL (relational) databases. Choose the right database for your use case and master both systems.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use when:
|
||||
- Designing database schemas and data models
|
||||
- Writing queries (SQL or MongoDB query language)
|
||||
- Building aggregation pipelines or complex joins
|
||||
- Optimizing indexes and query performance
|
||||
- Implementing database migrations
|
||||
- Setting up replication, sharding, or clustering
|
||||
- Configuring backups and disaster recovery
|
||||
- Managing database users and permissions
|
||||
- Analyzing slow queries and performance issues
|
||||
- Administering production database deployments
|
||||
|
||||
## Reference Navigation
|
||||
|
||||
### Database Design
|
||||
- **[db-design.md](references/db-design.md)** - Activate when user requests: Database/table design for transactional (OLTP), analytics (OLAP), create or extend schema, design fact/dimension tables, analyze/review CSV/JSON/SQL files to create tables, or need advice on data storage structure.
|
||||
|
||||
### MongoDB References
|
||||
- **[mongodb-crud.md](references/mongodb-crud.md)** - CRUD operations, query operators, atomic updates
|
||||
- **[mongodb-aggregation.md](references/mongodb-aggregation.md)** - Aggregation pipeline, stages, operators, patterns
|
||||
- **[mongodb-indexing.md](references/mongodb-indexing.md)** - Index types, compound indexes, performance optimization
|
||||
- **[mongodb-atlas.md](references/mongodb-atlas.md)** - Atlas cloud setup, clusters, monitoring, search
|
||||
|
||||
### PostgreSQL References
|
||||
- **[postgresql-queries.md](references/postgresql-queries.md)** - SELECT, JOINs, subqueries, CTEs, window functions
|
||||
- **[postgresql-psql-cli.md](references/postgresql-psql-cli.md)** - psql commands, meta-commands, scripting
|
||||
- **[postgresql-performance.md](references/postgresql-performance.md)** - EXPLAIN, query optimization, vacuum, indexes
|
||||
- **[postgresql-administration.md](references/postgresql-administration.md)** - User management, backups, replication, maintenance
|
||||
|
||||
## Python Utilities
|
||||
|
||||
Database utility scripts in `scripts/`:
|
||||
- **db_migrate.py** - Generate and apply migrations for both databases (MongoDB and PostgreSQL)
|
||||
- **db_backup.py** - Backup and restore MongoDB and PostgreSQL
|
||||
- **db_performance_check.py** - Analyze slow queries and recommend indexes
|
||||
|
||||
```bash
|
||||
# Generate migration
|
||||
python scripts/db_migrate.py --db mongodb --generate "add_user_index"
|
||||
|
||||
# Run backup
|
||||
python scripts/db_backup.py --db postgres --output /backups/
|
||||
|
||||
# Check performance
|
||||
python scripts/db_performance_check.py --db mongodb --threshold 100ms
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
**MongoDB:**
|
||||
- Use embedded documents for 1-to-few relationships
|
||||
- Reference documents for 1-to-many or many-to-many
|
||||
- Index frequently queried fields
|
||||
- Use aggregation pipeline for complex transformations
|
||||
- Enable authentication and TLS in production
|
||||
- Use Atlas for managed hosting
|
||||
|
||||
**PostgreSQL:**
|
||||
- Normalize schema to 3NF, denormalize for performance
|
||||
- Use foreign keys for referential integrity
|
||||
- Index foreign keys and frequently filtered columns
|
||||
- Use EXPLAIN ANALYZE to optimize queries
|
||||
- Regular VACUUM and ANALYZE maintenance
|
||||
- Connection pooling (pgBouncer) for web apps
|
||||
|
||||
## Resources
|
||||
|
||||
- MongoDB: https://www.mongodb.com/docs/
|
||||
- PostgreSQL: https://www.postgresql.org/docs/
|
||||
- MongoDB University: https://learn.mongodb.com/
|
||||
- PostgreSQL Tutorial: https://www.postgresqltutorial.com/
|
||||
198
.opencode/skills/databases/analytics.md
Normal file
198
.opencode/skills/databases/analytics.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# Analytics (OLAP) Rules
|
||||
|
||||
> **Note:** Core naming conventions, workflow, and checklist are in `SKILL.md` or `db-design.md` (always loaded).
|
||||
|
||||
Guidelines for designing schemas for statistics and reporting tables.
|
||||
|
||||
---
|
||||
|
||||
## General Principles
|
||||
|
||||
- **Separate** from transactional tables - don't mix analytics logic into business tables
|
||||
- When heavy analytics queries/aggregations repeat → create separate tables
|
||||
- Use **Star Schema**: Fact tables at center, Dimension tables around
|
||||
|
||||
---
|
||||
|
||||
## Design Process
|
||||
|
||||
### 1. Analyze Statistics Requirements
|
||||
|
||||
Ask user to clarify:
|
||||
- **Analysis dimensions**: by date, by customer, by product, by channel, by region?
|
||||
- **Granularity**: per order, per item, per day, per month?
|
||||
- **Metrics**: order_count, revenue, margin, conversion_rate, avg_order_value?
|
||||
|
||||
### 2. Define Fact Granularity
|
||||
|
||||
**Important**: What does 1 row in fact table represent?
|
||||
|
||||
| Fact Table | Granularity | Use case |
|
||||
|------------|-------------|----------|
|
||||
| `fact_orders` | 1 row = 1 order | Statistics by order |
|
||||
| `fact_order_items` | 1 row = 1 order item | Statistics by product |
|
||||
| `fact_daily_sales` | 1 row = 1 day + store | Daily summary |
|
||||
|
||||
### 3. Identify Required Dimensions
|
||||
|
||||
Create separate dim table when:
|
||||
- Reused in multiple places
|
||||
- Has many descriptive attributes
|
||||
- Subject to slow changes (Slowly Changing Dimension)
|
||||
|
||||
---
|
||||
|
||||
## Fact Tables
|
||||
|
||||
### Fact table structure
|
||||
|
||||
```sql
|
||||
CREATE TABLE fact_orders (
|
||||
fact_id BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
-- Dimension keys
|
||||
date_key INT NOT NULL, -- FK to dim_date
|
||||
customer_key BIGINT NOT NULL, -- FK to dim_customer
|
||||
store_key INT,
|
||||
channel_key INT,
|
||||
-- Degenerate dimensions (no separate dim needed)
|
||||
order_id BIGINT NOT NULL,
|
||||
order_number VARCHAR(50),
|
||||
-- Measures
|
||||
item_count INT NOT NULL,
|
||||
gross_amount DECIMAL(18,2) NOT NULL,
|
||||
discount_amount DECIMAL(18,2) DEFAULT 0,
|
||||
net_amount DECIMAL(18,2) NOT NULL,
|
||||
|
||||
INDEX idx_fact_orders_date (date_key),
|
||||
INDEX idx_fact_orders_customer (customer_key),
|
||||
INDEX idx_fact_orders_date_store (date_key, store_key)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dimension Tables
|
||||
|
||||
### dim_date (required for every analytics schema)
|
||||
|
||||
```sql
|
||||
CREATE TABLE dim_date (
|
||||
date_key INT PRIMARY KEY, -- Format: YYYYMMDD (20241215)
|
||||
full_date DATE NOT NULL,
|
||||
year INT NOT NULL,
|
||||
quarter INT NOT NULL, -- 1-4
|
||||
month INT NOT NULL, -- 1-12
|
||||
month_name VARCHAR(20), -- 'January', 'February'
|
||||
week_of_year INT NOT NULL,
|
||||
day_of_month INT NOT NULL,
|
||||
day_of_week INT NOT NULL, -- 1=Monday, 7=Sunday
|
||||
day_name VARCHAR(20),
|
||||
is_weekend BOOLEAN NOT NULL,
|
||||
is_holiday BOOLEAN DEFAULT FALSE,
|
||||
|
||||
UNIQUE (full_date)
|
||||
);
|
||||
-- Pre-populate for multiple years (2020-2030)
|
||||
```
|
||||
|
||||
### dim_customer
|
||||
|
||||
```sql
|
||||
CREATE TABLE dim_customer (
|
||||
customer_key BIGINT PRIMARY KEY AUTO_INCREMENT, -- Surrogate key
|
||||
customer_id BIGINT NOT NULL, -- Natural key from users
|
||||
customer_name VARCHAR(255),
|
||||
email VARCHAR(255),
|
||||
segment VARCHAR(50), -- 'VIP', 'Regular', 'New'
|
||||
city VARCHAR(100),
|
||||
region VARCHAR(100),
|
||||
first_order_date DATE,
|
||||
-- SCD Type 2 columns (if history needed)
|
||||
effective_from DATE NOT NULL,
|
||||
effective_to DATE,
|
||||
is_current BOOLEAN DEFAULT TRUE,
|
||||
|
||||
INDEX idx_dim_customer_id (customer_id),
|
||||
INDEX idx_dim_customer_current (is_current, customer_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Tables (Pre-aggregated)
|
||||
|
||||
When pre-aggregation needed for dashboard performance:
|
||||
|
||||
```sql
|
||||
CREATE TABLE summary_daily_sales (
|
||||
id BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
date_key INT NOT NULL,
|
||||
store_key INT,
|
||||
channel_key INT,
|
||||
-- Pre-aggregated measures
|
||||
order_count INT NOT NULL,
|
||||
item_count INT NOT NULL,
|
||||
gross_revenue DECIMAL(18,2) NOT NULL,
|
||||
net_revenue DECIMAL(18,2) NOT NULL,
|
||||
unique_customers INT NOT NULL,
|
||||
avg_order_value DECIMAL(18,2),
|
||||
|
||||
UNIQUE (date_key, store_key, channel_key),
|
||||
INDEX idx_summary_date (date_key)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Slowly Changing Dimensions (SCD)
|
||||
|
||||
### Type 1 - Overwrite
|
||||
Overwrite old value, no history kept:
|
||||
```sql
|
||||
UPDATE dim_customer SET segment = 'VIP' WHERE customer_id = 123;
|
||||
```
|
||||
|
||||
### Type 2 - Add new row (Recommended when history needed)
|
||||
```sql
|
||||
-- 1. Close old row
|
||||
UPDATE dim_customer
|
||||
SET effective_to = CURRENT_DATE - 1, is_current = FALSE
|
||||
WHERE customer_id = 123 AND is_current = TRUE;
|
||||
|
||||
-- 2. Add new row
|
||||
INSERT INTO dim_customer (customer_id, segment, effective_from, is_current)
|
||||
VALUES (123, 'VIP', CURRENT_DATE, TRUE);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Indexing for Analytics
|
||||
|
||||
### Fact tables
|
||||
- Index FKs to dimensions: `date_key`, `customer_key`, `product_key`
|
||||
- Composite index based on query patterns: `INDEX (date_key, store_key)`
|
||||
|
||||
### Dimension tables
|
||||
- PK: surrogate key
|
||||
- Index natural key: `customer_id`, `product_id`
|
||||
- Index for SCD: `(is_current, customer_id)`
|
||||
|
||||
---
|
||||
|
||||
## Naming Convention
|
||||
|
||||
- Fact tables: `fact_*` or `fct_*`
|
||||
- Dimension tables: `dim_*`
|
||||
- Summary tables: `summary_*` or `agg_*`
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Granularity defined for each fact table
|
||||
- [ ] dim_date exists or created (pre-populate multiple years)
|
||||
- [ ] Surrogate keys for dimensions
|
||||
- [ ] Index FKs in fact tables
|
||||
- [ ] SCD strategy for changing dimensions (Type 1 or Type 2)
|
||||
- [ ] Naming: `fact_*`, `dim_*`, `summary_*`
|
||||
- [ ] Refresh strategy: see [incremental-etl.md](incremental-etl.md)
|
||||
188
.opencode/skills/databases/db-design.md
Normal file
188
.opencode/skills/databases/db-design.md
Normal file
@@ -0,0 +1,188 @@
|
||||
---
|
||||
name: db-design
|
||||
description: "Activate when user requests: Database/table design for transactional (OLTP), analytics (OLAP), create or extend schema, design fact/dimension tables, analyze/review CSV/JSON/SQL files to create tables, or need advice on data storage structure."
|
||||
---
|
||||
|
||||
# SKILL: Database Schema Design
|
||||
|
||||
## Execution Process
|
||||
|
||||
### Step 1: Analyze Context (MANDATORY)
|
||||
|
||||
Before ANY design work, AI MUST check:
|
||||
|
||||
- [ ] Database stack identified? (MySQL / PostgreSQL / BigQuery / SQLite / D1)
|
||||
- [ ] Existing schema explored via tools/MCP?
|
||||
- [ ] Use case clear? (transactional vs analytics vs ETL)
|
||||
|
||||
**If ANY checkbox is unclear → STOP and ask user before proceeding.**
|
||||
|
||||
### Step 2: Gather Requirements
|
||||
|
||||
**MUST ask if missing:**
|
||||
|
||||
| Missing Info | Required Question |
|
||||
|--------------|-------------------|
|
||||
| Entities unclear | "What are the main entities involved in this feature?" |
|
||||
| Query patterns unknown | "What queries/reports will run most often?" |
|
||||
| Granularity unclear | "What level of detail needed? (per order, per item, per day)" |
|
||||
| Constraints unknown | "Any uniqueness, retention, or multi-tenant requirements?" |
|
||||
| Relationships unclear | "How do these entities relate to each other?" |
|
||||
|
||||
**DO NOT assume or proceed without answers. DO NOT design schema without understanding the business context.**
|
||||
|
||||
### Step 3: Load Documentation by Intent
|
||||
|
||||
| Intent | Keywords | Load file |
|
||||
|--------|----------|-----------|
|
||||
| **OLTP** | "business tables", "transactional", "CRUD", "orders", "users" | [transactional.md](transactional.md) |
|
||||
| **OLAP** | "analytics", "statistics", "reports", "fact", "dimension" | [analytics.md](analytics.md) |
|
||||
| **ETL** | "incremental", "load", "sync", "watermark", "etl" | [incremental-etl.md](incremental-etl.md) |
|
||||
|
||||
**Database stacks:** [MySQL](stacks/mysql.md) · [PostgreSQL](stacks/postgres.md) · [BigQuery](stacks/bigquery.md) · [SQLite](stacks/sqlite.md) · [D1 Cloudflare](stacks/d1_cloudflare.md)
|
||||
|
||||
### Step 4: Design & Propose
|
||||
|
||||
Only after Steps 1-3 complete:
|
||||
- Propose DDL with reasoning
|
||||
- **Wait for user approval before execution**
|
||||
- Never auto-execute DDL
|
||||
|
||||
### Step 5: Verify Checklist
|
||||
|
||||
Before delivering final DDL, verify the **Final Checklist** at the end of this file.
|
||||
|
||||
---
|
||||
|
||||
## Scope & Objectives
|
||||
|
||||
Design or extend database schemas ensuring:
|
||||
|
||||
- Consistent, understandable, maintainable data structures
|
||||
- Good performance with appropriate indexes
|
||||
- Design that fits existing tables (avoid duplication, avoid breaking logic)
|
||||
|
||||
---
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
### Table naming
|
||||
|
||||
- Use **snake_case**, English, plural for main entities:
|
||||
- `users`, `orders`, `order_items`, `product_variants`
|
||||
- Junction/mapping tables:
|
||||
- `role_permissions`, `user_roles`, `product_categories`
|
||||
- Fact & dimension tables (analytics):
|
||||
- Fact: `fact_orders`, `fact_order_items`, `fact_pageviews`
|
||||
- Dimension: `dim_date`, `dim_customer`, `dim_product`, `dim_store`
|
||||
- Avoid unclear abbreviations:
|
||||
- ❌ `usr`, `ord`, `inv_tx`
|
||||
- ✅ `users`, `orders`, `inventory_transactions`
|
||||
|
||||
### Column naming
|
||||
|
||||
- Use **snake_case**, clear meaning, consistent across tables
|
||||
- **Primary key**: `id` (main PK of table)
|
||||
- **Foreign key**: `xxx_id` pointing to table `xxx`:
|
||||
- `user_id` → `users.id`
|
||||
- `order_id` → `orders.id`
|
||||
- **Audit & lifecycle**:
|
||||
- `created_at`, `updated_at`, `deleted_at` (soft delete)
|
||||
- **Boolean flags**:
|
||||
- `is_active`, `is_deleted`, `is_primary`, `is_verified`
|
||||
- **Quantities & prices**:
|
||||
- `quantity`, `unit_price`, `total_price`, `discount_amount`, `tax_amount`
|
||||
- **Time ranges**:
|
||||
- `start_at`, `end_at` or `valid_from`, `valid_to` (pick one convention and use consistently)
|
||||
- **Enum/state**:
|
||||
- `status` (combine with ENUM / CHECK constraint)
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Understand business requirements
|
||||
|
||||
- Summarize requirements in your own words
|
||||
- If missing context, ask:
|
||||
- What entities are involved?
|
||||
- What is the main business flow?
|
||||
- What queries/reports are commonly used?
|
||||
|
||||
### 2. Explore existing schema (MANDATORY before proposing)
|
||||
|
||||
- Use tools/MCP to list tables and describe structure
|
||||
- For DBs with comments (MySQL/PostgreSQL/BigQuery): read comments directly
|
||||
- For SQLite/D1: query `metadata_tables` and `metadata_columns` tables
|
||||
|
||||
### 3. Analyze and propose draft
|
||||
|
||||
- Which tables already partially cover requirements?
|
||||
- Which tables need additional columns/indexes?
|
||||
- Which tables are completely missing (need to create)?
|
||||
|
||||
### 4. Confirm with user before finalizing design
|
||||
|
||||
- Required granularity
|
||||
- Metrics to calculate
|
||||
- Important business constraints (uniqueness, retention, privacy)
|
||||
|
||||
### 5. Design detailed schema
|
||||
|
||||
- Provide proposed DDL - **do NOT auto-execute**
|
||||
- Wait for user approval before execution
|
||||
|
||||
---
|
||||
|
||||
## Processing Data Files (CSV/JSON/SQL)
|
||||
|
||||
When user provides files:
|
||||
|
||||
1. **Read and identify structure**: headers, data types, relationships
|
||||
2. **Large files**: write Python script to parse and analyze
|
||||
3. **SQL files**: review naming, indexes, constraints, propose improvements
|
||||
4. **Output**: Proposed DDL with reasoning
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
When delivering design results:
|
||||
|
||||
1. **Summary**: list of new tables / tables to modify
|
||||
2. **Complete DDL SQL**: columns, types, PK, FK, indexes, comments
|
||||
3. **Reasoning**: explain why this structure/index was chosen, tied to actual queries
|
||||
|
||||
---
|
||||
|
||||
## Final Checklist
|
||||
|
||||
Before delivering final DDL:
|
||||
|
||||
- [ ] Explored existing schema via tools/MCP
|
||||
- [ ] No duplicate tables with existing schema
|
||||
- [ ] All FKs have indexes
|
||||
- [ ] Composite indexes match main query patterns
|
||||
- [ ] Naming convention consistent with existing schema
|
||||
- [ ] Added comments/descriptions for tables and columns
|
||||
- [ ] For SQLite/D1: prepared INSERTs for metadata tables
|
||||
- [ ] Explained reasoning for important design decisions
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
.opencode/skills/databases/
|
||||
├── SKILL.md # Skill description (entry + execution process + checklist)
|
||||
├── db-design.md # This file
|
||||
├── transactional.md # OLTP design rules
|
||||
├── analytics.md # OLAP / Star Schema rules
|
||||
├── incremental-etl.md # ETL & watermark patterns
|
||||
└── stacks/
|
||||
├── mysql.md # MySQL-specific rules
|
||||
├── postgres.md # PostgreSQL-specific rules
|
||||
├── bigquery.md # BigQuery-specific rules
|
||||
├── sqlite.md # SQLite-specific rules
|
||||
└── d1_cloudflare.md # D1 Cloudflare-specific rules
|
||||
```
|
||||
213
.opencode/skills/databases/incremental-etl.md
Normal file
213
.opencode/skills/databases/incremental-etl.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Incremental Loading & ETL Logs
|
||||
|
||||
> **Note:** Core naming conventions, workflow, and checklist are in `SKILL.md` or `db-design.md` (always loaded).
|
||||
|
||||
Guidelines for designing schemas to support periodic data loading with tracking and control.
|
||||
|
||||
---
|
||||
|
||||
## 1. Source Tables Requirements
|
||||
|
||||
All source tables that need to support incremental loading must have:
|
||||
|
||||
```sql
|
||||
-- Required audit columns
|
||||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||||
|
||||
-- Index for incremental query
|
||||
CREATE INDEX idx_tablename_updated_at ON tablename(updated_at);
|
||||
```
|
||||
|
||||
### Soft delete support
|
||||
|
||||
If source has soft delete, need to track deleted records too:
|
||||
|
||||
```sql
|
||||
deleted_at TIMESTAMP NULL
|
||||
|
||||
-- Index to get all modified records (update or delete)
|
||||
CREATE INDEX idx_tablename_modified ON tablename(updated_at);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. ETL Control Tables
|
||||
|
||||
### etl_jobs - Job definitions
|
||||
|
||||
```sql
|
||||
CREATE TABLE etl_jobs (
|
||||
job_id VARCHAR(100) PRIMARY KEY,
|
||||
job_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
source_table VARCHAR(255),
|
||||
target_table VARCHAR(255),
|
||||
load_type VARCHAR(50) NOT NULL, -- 'full', 'incremental', 'cdc'
|
||||
schedule VARCHAR(100), -- Cron: '0 */1 * * *'
|
||||
is_active BOOLEAN DEFAULT TRUE,
|
||||
timeout_seconds INT DEFAULT 3600,
|
||||
retry_count INT DEFAULT 3,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
|
||||
|
||||
INDEX idx_etl_jobs_active (is_active)
|
||||
);
|
||||
```
|
||||
|
||||
### etl_job_runs - Execution tracking
|
||||
|
||||
```sql
|
||||
CREATE TABLE etl_job_runs (
|
||||
run_id BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
job_id VARCHAR(100) NOT NULL,
|
||||
run_status VARCHAR(50) NOT NULL, -- 'running', 'success', 'failed', 'cancelled'
|
||||
started_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
finished_at TIMESTAMP NULL,
|
||||
duration_seconds INT,
|
||||
-- Metrics
|
||||
rows_read BIGINT DEFAULT 0,
|
||||
rows_written BIGINT DEFAULT 0,
|
||||
rows_updated BIGINT DEFAULT 0,
|
||||
rows_deleted BIGINT DEFAULT 0,
|
||||
-- Watermarks
|
||||
watermark_start TIMESTAMP NULL,
|
||||
watermark_end TIMESTAMP NULL,
|
||||
-- Error info
|
||||
error_message TEXT,
|
||||
error_details JSON,
|
||||
-- Metadata
|
||||
triggered_by VARCHAR(100), -- 'scheduler', 'manual', 'api'
|
||||
|
||||
INDEX idx_etl_runs_job (job_id, started_at DESC),
|
||||
INDEX idx_etl_runs_status (run_status, started_at),
|
||||
FOREIGN KEY (job_id) REFERENCES etl_jobs(job_id)
|
||||
);
|
||||
```
|
||||
|
||||
### etl_watermarks - Checkpoint tracking
|
||||
|
||||
```sql
|
||||
CREATE TABLE etl_watermarks (
|
||||
job_id VARCHAR(100) NOT NULL,
|
||||
source_table VARCHAR(255) NOT NULL,
|
||||
watermark_column VARCHAR(100) NOT NULL, -- 'updated_at', 'id'
|
||||
watermark_type VARCHAR(50) NOT NULL, -- 'timestamp', 'integer'
|
||||
last_value_timestamp TIMESTAMP NULL,
|
||||
last_value_integer BIGINT NULL,
|
||||
last_successful_run_id BIGINT,
|
||||
last_updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
|
||||
|
||||
PRIMARY KEY (job_id, source_table),
|
||||
FOREIGN KEY (job_id) REFERENCES etl_jobs(job_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Incremental Load Patterns
|
||||
|
||||
### Timestamp-based (most common)
|
||||
|
||||
```sql
|
||||
-- Query incremental data from source
|
||||
SELECT * FROM orders
|
||||
WHERE updated_at > :last_watermark
|
||||
AND updated_at <= :current_watermark
|
||||
ORDER BY updated_at;
|
||||
|
||||
-- Update watermark AFTER successful load
|
||||
UPDATE etl_watermarks
|
||||
SET last_value_timestamp = :current_watermark,
|
||||
last_successful_run_id = :run_id,
|
||||
last_updated_at = CURRENT_TIMESTAMP
|
||||
WHERE job_id = :job_id AND source_table = 'orders';
|
||||
```
|
||||
|
||||
### ID-based (for append-only tables)
|
||||
|
||||
```sql
|
||||
-- Query incremental data
|
||||
SELECT * FROM events
|
||||
WHERE id > :last_id
|
||||
ORDER BY id
|
||||
LIMIT 10000;
|
||||
|
||||
-- Update watermark
|
||||
UPDATE etl_watermarks
|
||||
SET last_value_integer = :max_id_loaded,
|
||||
last_successful_run_id = :run_id
|
||||
WHERE job_id = :job_id;
|
||||
```
|
||||
|
||||
### Combined (timestamp + ID)
|
||||
|
||||
```sql
|
||||
SELECT * FROM orders
|
||||
WHERE (updated_at > :last_timestamp)
|
||||
OR (updated_at = :last_timestamp AND id > :last_id)
|
||||
ORDER BY updated_at, id
|
||||
LIMIT 10000;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Example Queries
|
||||
|
||||
### Get latest job status
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
j.job_id, j.job_name, j.load_type,
|
||||
r.run_status, r.started_at, r.finished_at,
|
||||
r.rows_written, r.error_message
|
||||
FROM etl_jobs j
|
||||
LEFT JOIN (
|
||||
SELECT * FROM etl_job_runs r1
|
||||
WHERE started_at = (
|
||||
SELECT MAX(started_at) FROM etl_job_runs r2
|
||||
WHERE r2.job_id = r1.job_id
|
||||
)
|
||||
) r ON j.job_id = r.job_id
|
||||
WHERE j.is_active = TRUE;
|
||||
```
|
||||
|
||||
### Jobs failed in last 24 hours
|
||||
|
||||
```sql
|
||||
SELECT j.job_name, r.run_id, r.started_at, r.error_message
|
||||
FROM etl_job_runs r
|
||||
JOIN etl_jobs j ON r.job_id = j.job_id
|
||||
WHERE r.run_status = 'failed'
|
||||
AND r.started_at >= NOW() - INTERVAL 24 HOUR;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Best Practices
|
||||
|
||||
### Watermark management
|
||||
- Commit watermark **AFTER** data has been successfully loaded
|
||||
- Use transactions to ensure consistency
|
||||
- Store both watermark_start and watermark_end for each run
|
||||
|
||||
### Error handling
|
||||
- Log enough context for debugging: source_record_id, source_data, error_stack
|
||||
- Classify errors: retryable vs non-retryable
|
||||
- Set up alerts for critical failures
|
||||
|
||||
### Retention policy
|
||||
- Keep detailed logs: 30-90 days
|
||||
- Keep summary: 1-2 years
|
||||
- Keep watermarks: indefinitely
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Source tables have `updated_at` with index
|
||||
- [ ] ETL control tables created
|
||||
- [ ] Watermark committed AFTER successful load
|
||||
- [ ] Sufficient logging for debugging: rows, timestamps, errors
|
||||
- [ ] Retention policy for ETL logs (30-90 days)
|
||||
- [ ] Alerts for failed jobs
|
||||
447
.opencode/skills/databases/references/mongodb-aggregation.md
Normal file
447
.opencode/skills/databases/references/mongodb-aggregation.md
Normal file
@@ -0,0 +1,447 @@
|
||||
# MongoDB Aggregation Pipeline
|
||||
|
||||
Aggregation pipeline for complex data transformations, analytics, and multi-stage processing.
|
||||
|
||||
## Pipeline Concept
|
||||
|
||||
Aggregation processes documents through multiple stages. Each stage transforms documents and passes results to next stage.
|
||||
|
||||
```javascript
|
||||
db.collection.aggregate([
|
||||
{ /* Stage 1 */ },
|
||||
{ /* Stage 2 */ },
|
||||
{ /* Stage 3 */ }
|
||||
])
|
||||
```
|
||||
|
||||
## Core Pipeline Stages
|
||||
|
||||
### $match (Filter Documents)
|
||||
```javascript
|
||||
// Filter early in pipeline for efficiency
|
||||
db.orders.aggregate([
|
||||
{ $match: { status: "completed", total: { $gte: 100 } } },
|
||||
// Subsequent stages process only matched documents
|
||||
])
|
||||
|
||||
// Multiple conditions
|
||||
db.orders.aggregate([
|
||||
{ $match: {
|
||||
$and: [
|
||||
{ orderDate: { $gte: startDate } },
|
||||
{ status: { $in: ["completed", "shipped"] } }
|
||||
]
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### $project (Reshape Documents)
|
||||
```javascript
|
||||
// Select and reshape fields
|
||||
db.orders.aggregate([
|
||||
{ $project: {
|
||||
orderNumber: 1,
|
||||
total: 1,
|
||||
customerName: "$customer.name",
|
||||
year: { $year: "$orderDate" },
|
||||
_id: 0 // Exclude _id
|
||||
}}
|
||||
])
|
||||
|
||||
// Computed fields
|
||||
db.orders.aggregate([
|
||||
{ $project: {
|
||||
total: 1,
|
||||
tax: { $multiply: ["$total", 0.1] },
|
||||
grandTotal: { $add: ["$total", { $multiply: ["$total", 0.1] }] }
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### $group (Aggregate Data)
|
||||
```javascript
|
||||
// Group and count
|
||||
db.orders.aggregate([
|
||||
{ $group: {
|
||||
_id: "$status",
|
||||
count: { $sum: 1 }
|
||||
}}
|
||||
])
|
||||
|
||||
// Multiple aggregations
|
||||
db.orders.aggregate([
|
||||
{ $group: {
|
||||
_id: "$customerId",
|
||||
totalSpent: { $sum: "$total" },
|
||||
orderCount: { $sum: 1 },
|
||||
avgOrderValue: { $avg: "$total" },
|
||||
maxOrder: { $max: "$total" },
|
||||
minOrder: { $min: "$total" }
|
||||
}}
|
||||
])
|
||||
|
||||
// Group by multiple fields
|
||||
db.sales.aggregate([
|
||||
{ $group: {
|
||||
_id: {
|
||||
year: { $year: "$date" },
|
||||
month: { $month: "$date" },
|
||||
product: "$productId"
|
||||
},
|
||||
revenue: { $sum: "$amount" }
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### $sort (Order Results)
|
||||
```javascript
|
||||
// Sort by field
|
||||
db.orders.aggregate([
|
||||
{ $sort: { total: -1 } } // -1: descending, 1: ascending
|
||||
])
|
||||
|
||||
// Sort by multiple fields
|
||||
db.orders.aggregate([
|
||||
{ $sort: { status: 1, orderDate: -1 } }
|
||||
])
|
||||
```
|
||||
|
||||
### $limit / $skip (Pagination)
|
||||
```javascript
|
||||
// Limit results
|
||||
db.orders.aggregate([
|
||||
{ $sort: { orderDate: -1 } },
|
||||
{ $limit: 10 }
|
||||
])
|
||||
|
||||
// Pagination
|
||||
const page = 2;
|
||||
const pageSize = 20;
|
||||
db.orders.aggregate([
|
||||
{ $sort: { orderDate: -1 } },
|
||||
{ $skip: (page - 1) * pageSize },
|
||||
{ $limit: pageSize }
|
||||
])
|
||||
```
|
||||
|
||||
### $lookup (Join Collections)
|
||||
```javascript
|
||||
// Simple join
|
||||
db.orders.aggregate([
|
||||
{ $lookup: {
|
||||
from: "customers",
|
||||
localField: "customerId",
|
||||
foreignField: "_id",
|
||||
as: "customer"
|
||||
}},
|
||||
{ $unwind: "$customer" } // Convert array to object
|
||||
])
|
||||
|
||||
// Pipeline join (more powerful)
|
||||
db.orders.aggregate([
|
||||
{ $lookup: {
|
||||
from: "products",
|
||||
let: { items: "$items" },
|
||||
pipeline: [
|
||||
{ $match: { $expr: { $in: ["$_id", "$$items.productId"] } } },
|
||||
{ $project: { name: 1, price: 1 } }
|
||||
],
|
||||
as: "productDetails"
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### $unwind (Deconstruct Arrays)
|
||||
```javascript
|
||||
// Unwind array field
|
||||
db.orders.aggregate([
|
||||
{ $unwind: "$items" }
|
||||
])
|
||||
|
||||
// Preserve null/empty arrays
|
||||
db.orders.aggregate([
|
||||
{ $unwind: {
|
||||
path: "$items",
|
||||
preserveNullAndEmptyArrays: true
|
||||
}}
|
||||
])
|
||||
|
||||
// Include array index
|
||||
db.orders.aggregate([
|
||||
{ $unwind: {
|
||||
path: "$items",
|
||||
includeArrayIndex: "itemIndex"
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### $addFields (Add New Fields)
|
||||
```javascript
|
||||
// Add computed fields
|
||||
db.orders.aggregate([
|
||||
{ $addFields: {
|
||||
totalWithTax: { $multiply: ["$total", 1.1] },
|
||||
year: { $year: "$orderDate" }
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### $replaceRoot (Replace Document Root)
|
||||
```javascript
|
||||
// Promote subdocument to root
|
||||
db.orders.aggregate([
|
||||
{ $replaceRoot: { newRoot: "$customer" } }
|
||||
])
|
||||
|
||||
// Merge fields
|
||||
db.orders.aggregate([
|
||||
{ $replaceRoot: {
|
||||
newRoot: { $mergeObjects: ["$customer", { orderId: "$_id" }] }
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
## Aggregation Operators
|
||||
|
||||
### Arithmetic Operators
|
||||
```javascript
|
||||
// Basic math
|
||||
db.products.aggregate([
|
||||
{ $project: {
|
||||
name: 1,
|
||||
profit: { $subtract: ["$price", "$cost"] },
|
||||
margin: { $multiply: [
|
||||
{ $divide: [
|
||||
{ $subtract: ["$price", "$cost"] },
|
||||
"$price"
|
||||
]},
|
||||
100
|
||||
]}
|
||||
}}
|
||||
])
|
||||
|
||||
// Other operators: $add, $multiply, $divide, $mod, $abs, $ceil, $floor, $round
|
||||
```
|
||||
|
||||
### String Operators
|
||||
```javascript
|
||||
// String manipulation
|
||||
db.users.aggregate([
|
||||
{ $project: {
|
||||
fullName: { $concat: ["$firstName", " ", "$lastName"] },
|
||||
email: { $toLower: "$email" },
|
||||
initials: { $concat: [
|
||||
{ $substr: ["$firstName", 0, 1] },
|
||||
{ $substr: ["$lastName", 0, 1] }
|
||||
]}
|
||||
}}
|
||||
])
|
||||
|
||||
// Other: $toUpper, $trim, $split, $substr, $regexMatch
|
||||
```
|
||||
|
||||
### Date Operators
|
||||
```javascript
|
||||
// Date extraction
|
||||
db.events.aggregate([
|
||||
{ $project: {
|
||||
event: 1,
|
||||
year: { $year: "$timestamp" },
|
||||
month: { $month: "$timestamp" },
|
||||
day: { $dayOfMonth: "$timestamp" },
|
||||
hour: { $hour: "$timestamp" },
|
||||
dayOfWeek: { $dayOfWeek: "$timestamp" }
|
||||
}}
|
||||
])
|
||||
|
||||
// Date math
|
||||
db.events.aggregate([
|
||||
{ $project: {
|
||||
event: 1,
|
||||
expiresAt: { $add: ["$createdAt", 1000 * 60 * 60 * 24 * 30] }, // +30 days
|
||||
ageInDays: { $divide: [
|
||||
{ $subtract: [new Date(), "$createdAt"] },
|
||||
1000 * 60 * 60 * 24
|
||||
]}
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### Array Operators
|
||||
```javascript
|
||||
// Array operations
|
||||
db.posts.aggregate([
|
||||
{ $project: {
|
||||
title: 1,
|
||||
tagCount: { $size: "$tags" },
|
||||
firstTag: { $arrayElemAt: ["$tags", 0] },
|
||||
lastTag: { $arrayElemAt: ["$tags", -1] },
|
||||
hasMongoDBTag: { $in: ["mongodb", "$tags"] }
|
||||
}}
|
||||
])
|
||||
|
||||
// Array filtering
|
||||
db.posts.aggregate([
|
||||
{ $project: {
|
||||
title: 1,
|
||||
activeTags: {
|
||||
$filter: {
|
||||
input: "$tags",
|
||||
as: "tag",
|
||||
cond: { $ne: ["$$tag.status", "deprecated"] }
|
||||
}
|
||||
}
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### Conditional Operators
|
||||
```javascript
|
||||
// $cond (ternary)
|
||||
db.products.aggregate([
|
||||
{ $project: {
|
||||
name: 1,
|
||||
status: {
|
||||
$cond: {
|
||||
if: { $gte: ["$stock", 10] },
|
||||
then: "In Stock",
|
||||
else: "Low Stock"
|
||||
}
|
||||
}
|
||||
}}
|
||||
])
|
||||
|
||||
// $switch (multiple conditions)
|
||||
db.orders.aggregate([
|
||||
{ $project: {
|
||||
status: 1,
|
||||
priority: {
|
||||
$switch: {
|
||||
branches: [
|
||||
{ case: { $gte: ["$total", 1000] }, then: "High" },
|
||||
{ case: { $gte: ["$total", 100] }, then: "Medium" }
|
||||
],
|
||||
default: "Low"
|
||||
}
|
||||
}
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Time-Based Aggregation
|
||||
```javascript
|
||||
// Daily sales
|
||||
db.orders.aggregate([
|
||||
{ $match: { orderDate: { $gte: startDate } } },
|
||||
{ $group: {
|
||||
_id: {
|
||||
year: { $year: "$orderDate" },
|
||||
month: { $month: "$orderDate" },
|
||||
day: { $dayOfMonth: "$orderDate" }
|
||||
},
|
||||
revenue: { $sum: "$total" },
|
||||
orderCount: { $sum: 1 }
|
||||
}},
|
||||
{ $sort: { "_id.year": 1, "_id.month": 1, "_id.day": 1 } }
|
||||
])
|
||||
```
|
||||
|
||||
### Faceted Search
|
||||
```javascript
|
||||
// Multiple aggregations in one query
|
||||
db.products.aggregate([
|
||||
{ $match: { category: "electronics" } },
|
||||
{ $facet: {
|
||||
priceRanges: [
|
||||
{ $bucket: {
|
||||
groupBy: "$price",
|
||||
boundaries: [0, 100, 500, 1000, 5000],
|
||||
default: "5000+",
|
||||
output: { count: { $sum: 1 } }
|
||||
}}
|
||||
],
|
||||
topBrands: [
|
||||
{ $group: { _id: "$brand", count: { $sum: 1 } } },
|
||||
{ $sort: { count: -1 } },
|
||||
{ $limit: 5 }
|
||||
],
|
||||
avgPrice: [
|
||||
{ $group: { _id: null, avg: { $avg: "$price" } } }
|
||||
]
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### Window Functions
|
||||
```javascript
|
||||
// Running totals and moving averages
|
||||
db.sales.aggregate([
|
||||
{ $setWindowFields: {
|
||||
partitionBy: "$region",
|
||||
sortBy: { date: 1 },
|
||||
output: {
|
||||
runningTotal: {
|
||||
$sum: "$amount",
|
||||
window: { documents: ["unbounded", "current"] }
|
||||
},
|
||||
movingAvg: {
|
||||
$avg: "$amount",
|
||||
window: { documents: [-7, 0] } // Last 7 days
|
||||
}
|
||||
}
|
||||
}}
|
||||
])
|
||||
```
|
||||
|
||||
### Text Search with Aggregation
|
||||
```javascript
|
||||
// Full-text search (requires text index)
|
||||
db.articles.aggregate([
|
||||
{ $match: { $text: { $search: "mongodb database" } } },
|
||||
{ $addFields: { score: { $meta: "textScore" } } },
|
||||
{ $sort: { score: -1 } },
|
||||
{ $limit: 10 }
|
||||
])
|
||||
```
|
||||
|
||||
### Geospatial Aggregation
|
||||
```javascript
|
||||
// Find nearby locations
|
||||
db.places.aggregate([
|
||||
{ $geoNear: {
|
||||
near: { type: "Point", coordinates: [lon, lat] },
|
||||
distanceField: "distance",
|
||||
maxDistance: 5000,
|
||||
spherical: true
|
||||
}},
|
||||
{ $limit: 10 }
|
||||
])
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **$match early** - Filter documents before other stages
|
||||
2. **$project early** - Reduce document size
|
||||
3. **Index usage** - $match and $sort can use indexes (only at start)
|
||||
4. **$limit after $sort** - Reduce memory usage
|
||||
5. **Avoid $lookup** - Prefer embedded documents when possible
|
||||
6. **Use $facet sparingly** - Can be memory intensive
|
||||
7. **allowDiskUse** - Enable for large datasets
|
||||
```javascript
|
||||
db.collection.aggregate(pipeline, { allowDiskUse: true })
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Order stages efficiently** - $match → $project → $group → $sort → $limit
|
||||
2. **Use $expr carefully** - Can prevent index usage
|
||||
3. **Monitor memory** - Default limit: 100MB per stage
|
||||
4. **Test with explain** - Analyze pipeline performance
|
||||
```javascript
|
||||
db.collection.explain("executionStats").aggregate(pipeline)
|
||||
```
|
||||
5. **Break complex pipelines** - Use $out/$merge for intermediate results
|
||||
6. **Use $sample** - For random document selection
|
||||
7. **Leverage $addFields** - Cleaner than $project for adding fields
|
||||
465
.opencode/skills/databases/references/mongodb-atlas.md
Normal file
465
.opencode/skills/databases/references/mongodb-atlas.md
Normal file
@@ -0,0 +1,465 @@
|
||||
# MongoDB Atlas Cloud Platform
|
||||
|
||||
MongoDB Atlas is fully-managed cloud database service with automated backups, monitoring, and scaling.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Create Free Cluster
|
||||
1. Sign up at mongodb.com/atlas
|
||||
2. Create organization and project
|
||||
3. Build cluster (M0 Free Tier)
|
||||
- Cloud provider: AWS/GCP/Azure
|
||||
- Region: closest to users
|
||||
- Cluster name
|
||||
4. Create database user (username/password)
|
||||
5. Whitelist IP address (or 0.0.0.0/0 for development)
|
||||
6. Get connection string
|
||||
|
||||
### Connection String Format
|
||||
```
|
||||
mongodb+srv://username:password@cluster.mongodb.net/database?retryWrites=true&w=majority
|
||||
```
|
||||
|
||||
### Connect
|
||||
```javascript
|
||||
// Node.js
|
||||
const { MongoClient } = require("mongodb");
|
||||
const uri = "mongodb+srv://...";
|
||||
const client = new MongoClient(uri);
|
||||
|
||||
await client.connect();
|
||||
const db = client.db("myDatabase");
|
||||
```
|
||||
|
||||
```python
|
||||
# Python
|
||||
from pymongo import MongoClient
|
||||
uri = "mongodb+srv://..."
|
||||
client = MongoClient(uri)
|
||||
db = client.myDatabase
|
||||
```
|
||||
|
||||
## Cluster Tiers
|
||||
|
||||
### M0 (Free Tier)
|
||||
- 512 MB storage
|
||||
- Shared CPU/RAM
|
||||
- Perfect for development/learning
|
||||
- Limited to 100 connections
|
||||
- No backups
|
||||
|
||||
### M10+ (Dedicated Clusters)
|
||||
- Dedicated resources
|
||||
- 2GB - 4TB+ storage
|
||||
- Automated backups
|
||||
- Advanced monitoring
|
||||
- Performance Advisor
|
||||
- Multi-region support
|
||||
- VPC peering
|
||||
|
||||
### Serverless
|
||||
- Pay per operation
|
||||
- Auto-scales to zero
|
||||
- Good for sporadic workloads
|
||||
- 1GB+ storage
|
||||
- Limited features (no full-text search)
|
||||
|
||||
## Database Configuration
|
||||
|
||||
### Create Database
|
||||
```javascript
|
||||
// Via Atlas UI: Database → Add Database
|
||||
// Via shell
|
||||
use myNewDatabase
|
||||
db.createCollection("myCollection")
|
||||
|
||||
// Via driver
|
||||
const db = client.db("myNewDatabase");
|
||||
await db.createCollection("myCollection");
|
||||
```
|
||||
|
||||
### Schema Validation
|
||||
```javascript
|
||||
// Set validation rules in Atlas UI or via shell
|
||||
db.createCollection("users", {
|
||||
validator: {
|
||||
$jsonSchema: {
|
||||
bsonType: "object",
|
||||
required: ["email", "name"],
|
||||
properties: {
|
||||
email: { bsonType: "string", pattern: "^.+@.+$" },
|
||||
age: { bsonType: "int", minimum: 0 }
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
### Network Access
|
||||
```javascript
|
||||
// IP Whitelist (Atlas UI → Network Access)
|
||||
// - Add IP Address: specific IPs
|
||||
// - 0.0.0.0/0: allow from anywhere (dev only)
|
||||
// - VPC Peering: private connection
|
||||
|
||||
// Connection string includes options
|
||||
mongodb+srv://cluster.mongodb.net/?retryWrites=true&w=majority&ssl=true
|
||||
```
|
||||
|
||||
### Database Users
|
||||
```javascript
|
||||
// Create via Atlas UI → Database Access
|
||||
// - Username/password authentication
|
||||
// - AWS IAM authentication
|
||||
// - X.509 certificates
|
||||
|
||||
// Roles:
|
||||
// - atlasAdmin: full access
|
||||
// - readWriteAnyDatabase: read/write all databases
|
||||
// - readAnyDatabase: read-only all databases
|
||||
// - read/readWrite: database-specific
|
||||
```
|
||||
|
||||
### Encryption
|
||||
```javascript
|
||||
// Encryption at rest (automatic on M10+)
|
||||
// Encryption in transit (TLS/SSL, always enabled)
|
||||
|
||||
// Client-Side Field Level Encryption (CSFLE)
|
||||
const autoEncryptionOpts = {
|
||||
keyVaultNamespace: "encryption.__keyVault",
|
||||
kmsProviders: {
|
||||
aws: {
|
||||
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
|
||||
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
const client = new MongoClient(uri, { autoEncryption: autoEncryptionOpts });
|
||||
```
|
||||
|
||||
## Backups and Snapshots
|
||||
|
||||
### Cloud Backups (M10+)
|
||||
```javascript
|
||||
// Automatic continuous backups
|
||||
// - Snapshots every 6-24 hours
|
||||
// - Oplog for point-in-time recovery
|
||||
// - Retention: 2+ days configurable
|
||||
|
||||
// Restore via Atlas UI:
|
||||
// 1. Clusters → cluster name → Backup tab
|
||||
// 2. Select snapshot or point in time
|
||||
// 3. Download or restore to cluster
|
||||
```
|
||||
|
||||
### Manual Backups
|
||||
```bash
|
||||
# Export using mongodump
|
||||
mongodump --uri="mongodb+srv://user:pass@cluster.mongodb.net/mydb" --out=/backup
|
||||
|
||||
# Restore using mongorestore
|
||||
mongorestore --uri="mongodb+srv://..." /backup/mydb
|
||||
```
|
||||
|
||||
## Monitoring and Alerts
|
||||
|
||||
### Metrics Dashboard
|
||||
```javascript
|
||||
// Atlas UI → Metrics
|
||||
// Key metrics:
|
||||
// - Operations per second
|
||||
// - Query execution times
|
||||
// - Connections
|
||||
// - Network I/O
|
||||
// - Disk usage
|
||||
// - CPU utilization
|
||||
|
||||
// Real-time Performance panel
|
||||
// - Current operations
|
||||
// - Slow queries
|
||||
// - Index suggestions
|
||||
```
|
||||
|
||||
### Alerts
|
||||
```javascript
|
||||
// Configure via Atlas UI → Alerts
|
||||
// Alert types:
|
||||
// - High connections (> threshold)
|
||||
// - High CPU usage (> 80%)
|
||||
// - Disk usage (> 90%)
|
||||
// - Replication lag
|
||||
// - Backup failures
|
||||
|
||||
// Notification channels:
|
||||
// - Email
|
||||
// - SMS
|
||||
// - Slack
|
||||
// - PagerDuty
|
||||
// - Webhook
|
||||
```
|
||||
|
||||
### Performance Advisor
|
||||
```javascript
|
||||
// Automatic index recommendations
|
||||
// Atlas UI → Performance Advisor
|
||||
|
||||
// Analyzes:
|
||||
// - Slow queries
|
||||
// - Missing indexes
|
||||
// - Redundant indexes
|
||||
// - Index usage statistics
|
||||
|
||||
// Provides:
|
||||
// - Index creation commands
|
||||
// - Expected performance improvement
|
||||
// - Schema design suggestions
|
||||
```
|
||||
|
||||
## Atlas Search (Full-Text Search)
|
||||
|
||||
### Create Search Index
|
||||
```javascript
|
||||
// Atlas UI → Search → Create Index
|
||||
|
||||
// JSON definition
|
||||
{
|
||||
"mappings": {
|
||||
"dynamic": false,
|
||||
"fields": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"analyzer": "lucene.standard"
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"analyzer": "lucene.english"
|
||||
},
|
||||
"tags": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Search Queries
|
||||
```javascript
|
||||
// Aggregation pipeline with $search
|
||||
db.articles.aggregate([
|
||||
{
|
||||
$search: {
|
||||
text: {
|
||||
query: "mongodb database tutorial",
|
||||
path: ["title", "description"],
|
||||
fuzzy: { maxEdits: 1 }
|
||||
}
|
||||
}
|
||||
},
|
||||
{ $limit: 10 },
|
||||
{
|
||||
$project: {
|
||||
title: 1,
|
||||
description: 1,
|
||||
score: { $meta: "searchScore" }
|
||||
}
|
||||
}
|
||||
])
|
||||
|
||||
// Autocomplete
|
||||
db.articles.aggregate([
|
||||
{
|
||||
$search: {
|
||||
autocomplete: {
|
||||
query: "mong",
|
||||
path: "title",
|
||||
tokenOrder: "sequential"
|
||||
}
|
||||
}
|
||||
}
|
||||
])
|
||||
```
|
||||
|
||||
## Atlas Vector Search (AI/ML)
|
||||
|
||||
### Create Vector Search Index
|
||||
```javascript
|
||||
// For AI similarity search (embeddings)
|
||||
{
|
||||
"fields": [
|
||||
{
|
||||
"type": "vector",
|
||||
"path": "embedding",
|
||||
"numDimensions": 1536, // OpenAI embeddings
|
||||
"similarity": "cosine"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Vector Search Query
|
||||
```javascript
|
||||
// Search by similarity
|
||||
db.products.aggregate([
|
||||
{
|
||||
$vectorSearch: {
|
||||
index: "vector_index",
|
||||
path: "embedding",
|
||||
queryVector: [0.123, 0.456, ...], // 1536 dimensions
|
||||
numCandidates: 100,
|
||||
limit: 10
|
||||
}
|
||||
},
|
||||
{
|
||||
$project: {
|
||||
name: 1,
|
||||
description: 1,
|
||||
score: { $meta: "vectorSearchScore" }
|
||||
}
|
||||
}
|
||||
])
|
||||
```
|
||||
|
||||
## Data Federation
|
||||
|
||||
### Query Across Sources
|
||||
```javascript
|
||||
// Federated database instance
|
||||
// Query data from:
|
||||
// - Atlas clusters
|
||||
// - AWS S3
|
||||
// - HTTP endpoints
|
||||
|
||||
// Create virtual collection
|
||||
{
|
||||
"databases": [{
|
||||
"name": "federated",
|
||||
"collections": [{
|
||||
"name": "sales",
|
||||
"dataSources": [{
|
||||
"storeName": "s3Store",
|
||||
"path": "/sales/*.json"
|
||||
}]
|
||||
}]
|
||||
}]
|
||||
}
|
||||
|
||||
// Query like normal collection
|
||||
use federated
|
||||
db.sales.find({ region: "US" })
|
||||
```
|
||||
|
||||
## Atlas Charts (Embedded Analytics)
|
||||
|
||||
### Create Dashboard
|
||||
```javascript
|
||||
// Atlas UI → Charts → New Dashboard
|
||||
// Data source: Atlas cluster
|
||||
// Chart types: bar, line, pie, scatter, etc.
|
||||
|
||||
// Embed in application
|
||||
<iframe
|
||||
src="https://charts.mongodb.com/charts-project/embed/charts?id=..."
|
||||
width="800"
|
||||
height="600"
|
||||
/>
|
||||
```
|
||||
|
||||
## Atlas CLI
|
||||
|
||||
```bash
|
||||
# Install
|
||||
npm install -g mongodb-atlas-cli
|
||||
|
||||
# Login
|
||||
atlas auth login
|
||||
|
||||
# List clusters
|
||||
atlas clusters list
|
||||
|
||||
# Create cluster
|
||||
atlas clusters create myCluster --provider AWS --region US_EAST_1 --tier M10
|
||||
|
||||
# Manage users
|
||||
atlas dbusers create --username myuser --password mypass
|
||||
|
||||
# Backups
|
||||
atlas backups snapshots list --clusterName myCluster
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use connection pooling** - Reuse connections
|
||||
```javascript
|
||||
const client = new MongoClient(uri, {
|
||||
maxPoolSize: 50,
|
||||
minPoolSize: 10
|
||||
});
|
||||
```
|
||||
|
||||
2. **Enable authentication** - Always use database users, not Atlas users
|
||||
|
||||
3. **Restrict network access** - IP whitelist or VPC peering
|
||||
|
||||
4. **Monitor regularly** - Set up alerts for key metrics
|
||||
|
||||
5. **Index optimization** - Use Performance Advisor recommendations
|
||||
|
||||
6. **Backup verification** - Regularly test restores
|
||||
|
||||
7. **Right-size clusters** - Start small, scale as needed
|
||||
|
||||
8. **Multi-region** - For global applications (M10+)
|
||||
|
||||
9. **Read preferences** - Use secondaries for read-heavy workloads
|
||||
```javascript
|
||||
const client = new MongoClient(uri, {
|
||||
readPreference: "secondaryPreferred"
|
||||
});
|
||||
```
|
||||
|
||||
10. **Connection string security** - Use environment variables
|
||||
```javascript
|
||||
const uri = process.env.MONGODB_URI;
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Connection Issues
|
||||
```javascript
|
||||
// Check IP whitelist
|
||||
// Verify credentials
|
||||
// Test connection string
|
||||
|
||||
// Verbose logging
|
||||
const client = new MongoClient(uri, {
|
||||
serverSelectionTimeoutMS: 5000,
|
||||
loggerLevel: "debug"
|
||||
});
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
```javascript
|
||||
// Check Performance Advisor
|
||||
// Review slow query logs
|
||||
// Analyze index usage
|
||||
db.collection.aggregate([{ $indexStats: {} }])
|
||||
|
||||
// Check connection count
|
||||
db.serverStatus().connections
|
||||
```
|
||||
|
||||
### Common Errors
|
||||
```javascript
|
||||
// MongoNetworkError: IP not whitelisted
|
||||
// → Add IP to Network Access
|
||||
|
||||
// Authentication failed: wrong credentials
|
||||
// → Verify username/password in Database Access
|
||||
|
||||
// Timeout: connection string or network issue
|
||||
// → Check connection string format, DNS resolution
|
||||
```
|
||||
408
.opencode/skills/databases/references/mongodb-crud.md
Normal file
408
.opencode/skills/databases/references/mongodb-crud.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# MongoDB CRUD Operations
|
||||
|
||||
CRUD operations (Create, Read, Update, Delete) in MongoDB with query operators and atomic updates.
|
||||
|
||||
## Create Operations
|
||||
|
||||
### insertOne
|
||||
```javascript
|
||||
// Insert single document
|
||||
db.users.insertOne({
|
||||
name: "Alice",
|
||||
email: "alice@example.com",
|
||||
age: 30,
|
||||
createdAt: new Date()
|
||||
})
|
||||
|
||||
// Returns: { acknowledged: true, insertedId: ObjectId("...") }
|
||||
```
|
||||
|
||||
### insertMany
|
||||
```javascript
|
||||
// Insert multiple documents
|
||||
db.users.insertMany([
|
||||
{ name: "Bob", age: 25 },
|
||||
{ name: "Charlie", age: 35 },
|
||||
{ name: "Diana", age: 28 }
|
||||
])
|
||||
|
||||
// With ordered: false (continue on error)
|
||||
db.users.insertMany(docs, { ordered: false })
|
||||
```
|
||||
|
||||
## Read Operations
|
||||
|
||||
### find
|
||||
```javascript
|
||||
// Find all documents
|
||||
db.users.find()
|
||||
|
||||
// Find with filter
|
||||
db.users.find({ age: { $gte: 18 } })
|
||||
|
||||
// Projection (select fields)
|
||||
db.users.find({ status: "active" }, { name: 1, email: 1, _id: 0 })
|
||||
|
||||
// Cursor operations
|
||||
db.users.find()
|
||||
.sort({ createdAt: -1 })
|
||||
.limit(10)
|
||||
.skip(20)
|
||||
```
|
||||
|
||||
### findOne
|
||||
```javascript
|
||||
// Get single document
|
||||
db.users.findOne({ email: "alice@example.com" })
|
||||
|
||||
// With projection
|
||||
db.users.findOne({ _id: ObjectId("...") }, { name: 1, email: 1 })
|
||||
```
|
||||
|
||||
### count/estimatedDocumentCount
|
||||
```javascript
|
||||
// Count matching documents
|
||||
db.users.countDocuments({ status: "active" })
|
||||
|
||||
// Fast estimate (uses metadata)
|
||||
db.users.estimatedDocumentCount()
|
||||
```
|
||||
|
||||
### distinct
|
||||
```javascript
|
||||
// Get unique values
|
||||
db.users.distinct("status")
|
||||
db.users.distinct("city", { country: "USA" })
|
||||
```
|
||||
|
||||
## Update Operations
|
||||
|
||||
### updateOne
|
||||
```javascript
|
||||
// Update first matching document
|
||||
db.users.updateOne(
|
||||
{ email: "alice@example.com" },
|
||||
{ $set: { status: "verified" } }
|
||||
)
|
||||
|
||||
// Upsert (insert if not exists)
|
||||
db.users.updateOne(
|
||||
{ email: "new@example.com" },
|
||||
{ $set: { name: "New User" } },
|
||||
{ upsert: true }
|
||||
)
|
||||
```
|
||||
|
||||
### updateMany
|
||||
```javascript
|
||||
// Update all matching documents
|
||||
db.users.updateMany(
|
||||
{ lastLogin: { $lt: cutoffDate } },
|
||||
{ $set: { status: "inactive" } }
|
||||
)
|
||||
|
||||
// Multiple updates
|
||||
db.users.updateMany(
|
||||
{ status: "pending" },
|
||||
{
|
||||
$set: { status: "active" },
|
||||
$currentDate: { updatedAt: true }
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### replaceOne
|
||||
```javascript
|
||||
// Replace entire document (except _id)
|
||||
db.users.replaceOne(
|
||||
{ _id: ObjectId("...") },
|
||||
{ name: "Alice", email: "alice@example.com", age: 31 }
|
||||
)
|
||||
```
|
||||
|
||||
## Delete Operations
|
||||
|
||||
### deleteOne
|
||||
```javascript
|
||||
// Delete first matching document
|
||||
db.users.deleteOne({ email: "alice@example.com" })
|
||||
```
|
||||
|
||||
### deleteMany
|
||||
```javascript
|
||||
// Delete all matching documents
|
||||
db.users.deleteMany({ status: "deleted" })
|
||||
|
||||
// Delete all documents in collection
|
||||
db.users.deleteMany({})
|
||||
```
|
||||
|
||||
## Query Operators
|
||||
|
||||
### Comparison Operators
|
||||
```javascript
|
||||
// $eq (equals)
|
||||
db.users.find({ age: { $eq: 30 } })
|
||||
db.users.find({ age: 30 }) // Implicit $eq
|
||||
|
||||
// $ne (not equals)
|
||||
db.users.find({ status: { $ne: "deleted" } })
|
||||
|
||||
// $gt, $gte, $lt, $lte
|
||||
db.users.find({ age: { $gt: 18, $lte: 65 } })
|
||||
|
||||
// $in (in array)
|
||||
db.users.find({ status: { $in: ["active", "pending"] } })
|
||||
|
||||
// $nin (not in array)
|
||||
db.users.find({ status: { $nin: ["deleted", "banned"] } })
|
||||
```
|
||||
|
||||
### Logical Operators
|
||||
```javascript
|
||||
// $and (implicit for multiple conditions)
|
||||
db.users.find({ age: { $gte: 18 }, status: "active" })
|
||||
|
||||
// $and (explicit)
|
||||
db.users.find({
|
||||
$and: [
|
||||
{ age: { $gte: 18 } },
|
||||
{ status: "active" }
|
||||
]
|
||||
})
|
||||
|
||||
// $or
|
||||
db.users.find({
|
||||
$or: [
|
||||
{ status: "active" },
|
||||
{ verified: true }
|
||||
]
|
||||
})
|
||||
|
||||
// $not
|
||||
db.users.find({ age: { $not: { $lt: 18 } } })
|
||||
|
||||
// $nor (not any condition)
|
||||
db.users.find({
|
||||
$nor: [
|
||||
{ status: "deleted" },
|
||||
{ status: "banned" }
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
### Element Operators
|
||||
```javascript
|
||||
// $exists
|
||||
db.users.find({ phoneNumber: { $exists: true } })
|
||||
db.users.find({ deletedAt: { $exists: false } })
|
||||
|
||||
// $type
|
||||
db.users.find({ age: { $type: "int" } })
|
||||
db.users.find({ age: { $type: ["int", "double"] } })
|
||||
```
|
||||
|
||||
### Array Operators
|
||||
```javascript
|
||||
// $all (contains all elements)
|
||||
db.posts.find({ tags: { $all: ["mongodb", "database"] } })
|
||||
|
||||
// $elemMatch (array element matches all conditions)
|
||||
db.products.find({
|
||||
reviews: {
|
||||
$elemMatch: { rating: { $gte: 4 }, verified: true }
|
||||
}
|
||||
})
|
||||
|
||||
// $size (array length)
|
||||
db.posts.find({ tags: { $size: 3 } })
|
||||
```
|
||||
|
||||
### String Operators
|
||||
```javascript
|
||||
// $regex (regular expression)
|
||||
db.users.find({ name: { $regex: /^A/i } })
|
||||
db.users.find({ email: { $regex: "@example\\.com$" } })
|
||||
|
||||
// Text search (requires text index)
|
||||
db.articles.find({ $text: { $search: "mongodb database" } })
|
||||
```
|
||||
|
||||
## Update Operators
|
||||
|
||||
### Field Update Operators
|
||||
```javascript
|
||||
// $set (set field value)
|
||||
db.users.updateOne(
|
||||
{ _id: userId },
|
||||
{ $set: { status: "active", updatedAt: new Date() } }
|
||||
)
|
||||
|
||||
// $unset (remove field)
|
||||
db.users.updateOne(
|
||||
{ _id: userId },
|
||||
{ $unset: { tempField: "" } }
|
||||
)
|
||||
|
||||
// $rename (rename field)
|
||||
db.users.updateMany(
|
||||
{},
|
||||
{ $rename: { "oldName": "newName" } }
|
||||
)
|
||||
|
||||
// $currentDate (set to current date)
|
||||
db.users.updateOne(
|
||||
{ _id: userId },
|
||||
{ $currentDate: { lastModified: true } }
|
||||
)
|
||||
```
|
||||
|
||||
### Numeric Update Operators
|
||||
```javascript
|
||||
// $inc (increment)
|
||||
db.posts.updateOne(
|
||||
{ _id: postId },
|
||||
{ $inc: { views: 1, likes: 5 } }
|
||||
)
|
||||
|
||||
// $mul (multiply)
|
||||
db.products.updateOne(
|
||||
{ _id: productId },
|
||||
{ $mul: { price: 1.1 } } // 10% increase
|
||||
)
|
||||
|
||||
// $min (update if new value is less)
|
||||
db.scores.updateOne(
|
||||
{ _id: scoreId },
|
||||
{ $min: { lowestScore: 50 } }
|
||||
)
|
||||
|
||||
// $max (update if new value is greater)
|
||||
db.scores.updateOne(
|
||||
{ _id: scoreId },
|
||||
{ $max: { highestScore: 100 } }
|
||||
)
|
||||
```
|
||||
|
||||
### Array Update Operators
|
||||
```javascript
|
||||
// $push (add to array)
|
||||
db.posts.updateOne(
|
||||
{ _id: postId },
|
||||
{ $push: { comments: { author: "Alice", text: "Great!" } } }
|
||||
)
|
||||
|
||||
// $push with $each (multiple elements)
|
||||
db.posts.updateOne(
|
||||
{ _id: postId },
|
||||
{ $push: { tags: { $each: ["mongodb", "database"] } } }
|
||||
)
|
||||
|
||||
// $addToSet (add if not exists)
|
||||
db.users.updateOne(
|
||||
{ _id: userId },
|
||||
{ $addToSet: { interests: "coding" } }
|
||||
)
|
||||
|
||||
// $pull (remove matching elements)
|
||||
db.users.updateOne(
|
||||
{ _id: userId },
|
||||
{ $pull: { tags: "deprecated" } }
|
||||
)
|
||||
|
||||
// $pop (remove first/last element)
|
||||
db.users.updateOne(
|
||||
{ _id: userId },
|
||||
{ $pop: { notifications: -1 } } // -1: first, 1: last
|
||||
)
|
||||
|
||||
// $ (update first matching array element)
|
||||
db.posts.updateOne(
|
||||
{ _id: postId, "comments.author": "Alice" },
|
||||
{ $set: { "comments.$.text": "Updated comment" } }
|
||||
)
|
||||
|
||||
// $[] (update all array elements)
|
||||
db.posts.updateOne(
|
||||
{ _id: postId },
|
||||
{ $set: { "comments.$[].verified": true } }
|
||||
)
|
||||
|
||||
// $[<identifier>] (filtered positional)
|
||||
db.posts.updateOne(
|
||||
{ _id: postId },
|
||||
{ $set: { "comments.$[elem].flagged": true } },
|
||||
{ arrayFilters: [{ "elem.rating": { $lt: 2 } }] }
|
||||
)
|
||||
```
|
||||
|
||||
## Atomic Operations
|
||||
|
||||
### findAndModify / findOneAndUpdate
|
||||
```javascript
|
||||
// Find and update (returns old document by default)
|
||||
db.users.findOneAndUpdate(
|
||||
{ email: "alice@example.com" },
|
||||
{ $set: { status: "active" } }
|
||||
)
|
||||
|
||||
// Return new document
|
||||
db.users.findOneAndUpdate(
|
||||
{ email: "alice@example.com" },
|
||||
{ $set: { status: "active" } },
|
||||
{ returnNewDocument: true }
|
||||
)
|
||||
|
||||
// Upsert and return new
|
||||
db.counters.findOneAndUpdate(
|
||||
{ _id: "sequence" },
|
||||
{ $inc: { value: 1 } },
|
||||
{ upsert: true, returnNewDocument: true }
|
||||
)
|
||||
```
|
||||
|
||||
### findOneAndReplace
|
||||
```javascript
|
||||
// Find and replace entire document
|
||||
db.users.findOneAndReplace(
|
||||
{ _id: ObjectId("...") },
|
||||
{ name: "Alice", email: "alice@example.com" },
|
||||
{ returnNewDocument: true }
|
||||
)
|
||||
```
|
||||
|
||||
### findOneAndDelete
|
||||
```javascript
|
||||
// Find and delete (returns deleted document)
|
||||
const deletedUser = db.users.findOneAndDelete(
|
||||
{ email: "alice@example.com" }
|
||||
)
|
||||
```
|
||||
|
||||
## Bulk Operations
|
||||
|
||||
```javascript
|
||||
// Ordered bulk write (stops on first error)
|
||||
db.users.bulkWrite([
|
||||
{ insertOne: { document: { name: "Alice" } } },
|
||||
{ updateOne: {
|
||||
filter: { name: "Bob" },
|
||||
update: { $set: { age: 25 } }
|
||||
}},
|
||||
{ deleteOne: { filter: { name: "Charlie" } } }
|
||||
])
|
||||
|
||||
// Unordered (continues on errors)
|
||||
db.users.bulkWrite(operations, { ordered: false })
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use projection** to return only needed fields
|
||||
2. **Create indexes** on frequently queried fields
|
||||
3. **Use updateMany** carefully (can affect many documents)
|
||||
4. **Use upsert** for "create or update" patterns
|
||||
5. **Use atomic operators** ($inc, $push) for concurrent updates
|
||||
6. **Avoid large arrays** in documents (embed vs reference)
|
||||
7. **Use findAndModify** for atomic read-modify-write
|
||||
8. **Batch operations** with insertMany/bulkWrite for efficiency
|
||||
442
.opencode/skills/databases/references/mongodb-indexing.md
Normal file
442
.opencode/skills/databases/references/mongodb-indexing.md
Normal file
@@ -0,0 +1,442 @@
|
||||
# MongoDB Indexing and Performance
|
||||
|
||||
Index types, strategies, and performance optimization techniques for MongoDB.
|
||||
|
||||
## Index Fundamentals
|
||||
|
||||
Indexes improve query performance by allowing MongoDB to scan fewer documents. Without indexes, MongoDB performs collection scans (reads every document).
|
||||
|
||||
```javascript
|
||||
// Check if query uses index
|
||||
db.users.find({ email: "user@example.com" }).explain("executionStats")
|
||||
|
||||
// Key metrics:
|
||||
// - executionTimeMillis: query duration
|
||||
// - totalDocsExamined: documents scanned
|
||||
// - nReturned: documents returned
|
||||
// - stage: IXSCAN (index) vs COLLSCAN (full scan)
|
||||
```
|
||||
|
||||
## Index Types
|
||||
|
||||
### Single Field Index
|
||||
```javascript
|
||||
// Create index on single field
|
||||
db.users.createIndex({ email: 1 }) // 1: ascending, -1: descending
|
||||
|
||||
// Use case: queries filtering by email
|
||||
db.users.find({ email: "user@example.com" })
|
||||
|
||||
// Drop index
|
||||
db.users.dropIndex({ email: 1 })
|
||||
db.users.dropIndex("email_1") // By name
|
||||
```
|
||||
|
||||
### Compound Index
|
||||
```javascript
|
||||
// Index on multiple fields (order matters!)
|
||||
db.orders.createIndex({ status: 1, createdAt: -1 })
|
||||
|
||||
// Supports queries on:
|
||||
// 1. { status: "..." }
|
||||
// 2. { status: "...", createdAt: ... }
|
||||
// Does NOT efficiently support: { createdAt: ... } alone
|
||||
|
||||
// Left-to-right prefix rule
|
||||
db.orders.createIndex({ a: 1, b: 1, c: 1 })
|
||||
// Supports: {a}, {a,b}, {a,b,c}
|
||||
// Not: {b}, {c}, {b,c}
|
||||
```
|
||||
|
||||
### Text Index (Full-Text Search)
|
||||
```javascript
|
||||
// Create text index
|
||||
db.articles.createIndex({ title: "text", body: "text" })
|
||||
|
||||
// Only one text index per collection
|
||||
db.articles.createIndex({
|
||||
title: "text",
|
||||
body: "text",
|
||||
tags: "text"
|
||||
}, {
|
||||
weights: {
|
||||
title: 10, // Title matches weighted higher
|
||||
body: 5,
|
||||
tags: 3
|
||||
}
|
||||
})
|
||||
|
||||
// Search
|
||||
db.articles.find({ $text: { $search: "mongodb database" } })
|
||||
|
||||
// Search with score
|
||||
db.articles.find(
|
||||
{ $text: { $search: "mongodb" } },
|
||||
{ score: { $meta: "textScore" } }
|
||||
).sort({ score: { $meta: "textScore" } })
|
||||
```
|
||||
|
||||
### Geospatial Indexes
|
||||
```javascript
|
||||
// 2dsphere index (spherical geometry)
|
||||
db.places.createIndex({ location: "2dsphere" })
|
||||
|
||||
// Document format
|
||||
db.places.insertOne({
|
||||
name: "Coffee Shop",
|
||||
location: {
|
||||
type: "Point",
|
||||
coordinates: [-73.97, 40.77] // [longitude, latitude]
|
||||
}
|
||||
})
|
||||
|
||||
// Find nearby
|
||||
db.places.find({
|
||||
location: {
|
||||
$near: {
|
||||
$geometry: { type: "Point", coordinates: [-73.97, 40.77] },
|
||||
$maxDistance: 5000 // meters
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
// Within polygon
|
||||
db.places.find({
|
||||
location: {
|
||||
$geoWithin: {
|
||||
$geometry: {
|
||||
type: "Polygon",
|
||||
coordinates: [[
|
||||
[lon1, lat1], [lon2, lat2], [lon3, lat3], [lon1, lat1]
|
||||
]]
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Wildcard Index
|
||||
```javascript
|
||||
// Index all fields in subdocuments
|
||||
db.products.createIndex({ "attributes.$**": 1 })
|
||||
|
||||
// Supports queries on any nested field
|
||||
db.products.find({ "attributes.color": "red" })
|
||||
db.products.find({ "attributes.size": "large" })
|
||||
|
||||
// Specific paths only
|
||||
db.products.createIndex(
|
||||
{ "$**": 1 },
|
||||
{ wildcardProjection: { "attributes.color": 1, "attributes.size": 1 } }
|
||||
)
|
||||
```
|
||||
|
||||
### Hashed Index
|
||||
```javascript
|
||||
// Hashed index (for even distribution in sharding)
|
||||
db.users.createIndex({ userId: "hashed" })
|
||||
|
||||
// Use case: shard key
|
||||
sh.shardCollection("mydb.users", { userId: "hashed" })
|
||||
```
|
||||
|
||||
### TTL Index (Auto-Expiration)
|
||||
```javascript
|
||||
// Delete documents after specified time
|
||||
db.sessions.createIndex(
|
||||
{ createdAt: 1 },
|
||||
{ expireAfterSeconds: 3600 } // 1 hour
|
||||
)
|
||||
|
||||
// Documents automatically deleted after createdAt + 3600 seconds
|
||||
// Background task runs every 60 seconds
|
||||
```
|
||||
|
||||
### Partial Index
|
||||
```javascript
|
||||
// Index only documents matching filter
|
||||
db.orders.createIndex(
|
||||
{ customerId: 1 },
|
||||
{ partialFilterExpression: { status: "active" } }
|
||||
)
|
||||
|
||||
// Index only used when query includes filter
|
||||
db.orders.find({ customerId: "123", status: "active" }) // Uses index
|
||||
db.orders.find({ customerId: "123" }) // Does not use index
|
||||
```
|
||||
|
||||
### Unique Index
|
||||
```javascript
|
||||
// Enforce uniqueness
|
||||
db.users.createIndex({ email: 1 }, { unique: true })
|
||||
|
||||
// Compound unique index
|
||||
db.users.createIndex({ firstName: 1, lastName: 1 }, { unique: true })
|
||||
|
||||
// Sparse unique index (null values not indexed)
|
||||
db.users.createIndex({ email: 1 }, { unique: true, sparse: true })
|
||||
```
|
||||
|
||||
### Sparse Index
|
||||
```javascript
|
||||
// Index only documents with field present
|
||||
db.users.createIndex({ phoneNumber: 1 }, { sparse: true })
|
||||
|
||||
// Useful for optional fields
|
||||
// Documents without phoneNumber not in index
|
||||
```
|
||||
|
||||
## Index Management
|
||||
|
||||
### List Indexes
|
||||
```javascript
|
||||
// Show all indexes
|
||||
db.collection.getIndexes()
|
||||
|
||||
// Index statistics
|
||||
db.collection.aggregate([{ $indexStats: {} }])
|
||||
```
|
||||
|
||||
### Create Index Options
|
||||
```javascript
|
||||
// Background index (doesn't block operations)
|
||||
db.collection.createIndex({ field: 1 }, { background: true })
|
||||
|
||||
// Index name
|
||||
db.collection.createIndex({ field: 1 }, { name: "custom_index_name" })
|
||||
|
||||
// Case-insensitive index (collation)
|
||||
db.collection.createIndex(
|
||||
{ name: 1 },
|
||||
{ collation: { locale: "en", strength: 2 } }
|
||||
)
|
||||
```
|
||||
|
||||
### Hide/Unhide Index
|
||||
```javascript
|
||||
// Hide index (test before dropping)
|
||||
db.collection.hideIndex("index_name")
|
||||
|
||||
// Check performance without index
|
||||
// ...
|
||||
|
||||
// Unhide or drop
|
||||
db.collection.unhideIndex("index_name")
|
||||
db.collection.dropIndex("index_name")
|
||||
```
|
||||
|
||||
### Rebuild Indexes
|
||||
```javascript
|
||||
// Rebuild all indexes (after data changes)
|
||||
db.collection.reIndex()
|
||||
|
||||
// Useful after bulk deletions to reclaim space
|
||||
```
|
||||
|
||||
## Query Optimization
|
||||
|
||||
### Covered Queries
|
||||
```javascript
|
||||
// Query covered by index (no document fetch)
|
||||
db.users.createIndex({ email: 1, name: 1 })
|
||||
|
||||
// Covered query (all fields in index)
|
||||
db.users.find(
|
||||
{ email: "user@example.com" },
|
||||
{ email: 1, name: 1, _id: 0 } // Must exclude _id
|
||||
)
|
||||
|
||||
// Check with explain: stage should be "IXSCAN" with no "FETCH"
|
||||
```
|
||||
|
||||
### Index Intersection
|
||||
```javascript
|
||||
// MongoDB can use multiple indexes
|
||||
db.collection.createIndex({ a: 1 })
|
||||
db.collection.createIndex({ b: 1 })
|
||||
|
||||
// Query may use both indexes
|
||||
db.collection.find({ a: 1, b: 1 })
|
||||
|
||||
// Usually compound index is better
|
||||
db.collection.createIndex({ a: 1, b: 1 })
|
||||
```
|
||||
|
||||
### Index Hints
|
||||
```javascript
|
||||
// Force specific index
|
||||
db.orders.find({ status: "active", city: "NYC" })
|
||||
.hint({ status: 1, createdAt: -1 })
|
||||
|
||||
// Force no index (for testing)
|
||||
db.orders.find({ status: "active" }).hint({ $natural: 1 })
|
||||
```
|
||||
|
||||
### ESR Rule (Equality, Sort, Range)
|
||||
```javascript
|
||||
// Optimal compound index order: Equality → Sort → Range
|
||||
|
||||
// Query
|
||||
db.orders.find({
|
||||
status: "completed", // Equality
|
||||
category: "electronics" // Equality
|
||||
}).sort({
|
||||
orderDate: -1 // Sort
|
||||
}).limit(10)
|
||||
|
||||
// Optimal index
|
||||
db.orders.createIndex({
|
||||
status: 1, // Equality first
|
||||
category: 1, // Equality
|
||||
orderDate: -1 // Sort last
|
||||
})
|
||||
|
||||
// With range
|
||||
db.orders.find({
|
||||
status: "completed", // Equality
|
||||
total: { $gte: 100 } // Range
|
||||
}).sort({
|
||||
orderDate: -1 // Sort
|
||||
})
|
||||
|
||||
// Optimal index
|
||||
db.orders.createIndex({
|
||||
status: 1, // Equality
|
||||
orderDate: -1, // Sort
|
||||
total: 1 // Range last
|
||||
})
|
||||
```
|
||||
|
||||
## Performance Analysis
|
||||
|
||||
### explain() Modes
|
||||
```javascript
|
||||
// Query planner (default)
|
||||
db.collection.find({ field: value }).explain()
|
||||
|
||||
// Execution stats
|
||||
db.collection.find({ field: value }).explain("executionStats")
|
||||
|
||||
// All execution stats
|
||||
db.collection.find({ field: value }).explain("allPlansExecution")
|
||||
```
|
||||
|
||||
### Key Metrics
|
||||
```javascript
|
||||
// Good performance indicators:
|
||||
// - executionTimeMillis < 100ms
|
||||
// - totalDocsExamined ≈ nReturned (examine only what's needed)
|
||||
// - stage: "IXSCAN" (using index)
|
||||
// - totalKeysExamined ≈ nReturned (index selectivity)
|
||||
|
||||
// Bad indicators:
|
||||
// - stage: "COLLSCAN" (full collection scan)
|
||||
// - totalDocsExamined >> nReturned (scanning too many docs)
|
||||
// - executionTimeMillis > 1000ms
|
||||
```
|
||||
|
||||
### Index Selectivity
|
||||
```javascript
|
||||
// High selectivity = good (returns few documents)
|
||||
// Low selectivity = bad (returns many documents)
|
||||
|
||||
// Check selectivity
|
||||
db.collection.aggregate([
|
||||
{ $group: { _id: "$status", count: { $sum: 1 } } }
|
||||
])
|
||||
|
||||
// Good for indexing: email, userId, orderId
|
||||
// Bad for indexing: gender, status (few unique values)
|
||||
```
|
||||
|
||||
## Index Strategies
|
||||
|
||||
### Multi-Tenant Applications
|
||||
```javascript
|
||||
// Always filter by tenant first
|
||||
db.data.createIndex({ tenantId: 1, createdAt: -1 })
|
||||
|
||||
// All queries include tenantId
|
||||
db.data.find({ tenantId: "tenant1", createdAt: { $gte: date } })
|
||||
```
|
||||
|
||||
### Time-Series Data
|
||||
```javascript
|
||||
// Index on timestamp descending (recent data accessed more)
|
||||
db.events.createIndex({ timestamp: -1 })
|
||||
|
||||
// Compound with filter fields
|
||||
db.events.createIndex({ userId: 1, timestamp: -1 })
|
||||
```
|
||||
|
||||
### Lookup Optimization
|
||||
```javascript
|
||||
// Index foreign key fields
|
||||
db.orders.createIndex({ customerId: 1 })
|
||||
db.customers.createIndex({ _id: 1 }) // Default _id index
|
||||
|
||||
// Aggregation $lookup uses these indexes
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Create indexes for frequent queries** - Analyze slow query logs
|
||||
2. **Limit number of indexes** - Each index adds write overhead
|
||||
3. **Use compound indexes** - More efficient than multiple single indexes
|
||||
4. **Follow ESR rule** - Equality, Sort, Range order
|
||||
5. **Use covered queries** - When possible, avoid document fetches
|
||||
6. **Monitor index usage** - Drop unused indexes
|
||||
```javascript
|
||||
db.collection.aggregate([{ $indexStats: {} }])
|
||||
```
|
||||
7. **Partial indexes for filtered queries** - Reduce index size
|
||||
8. **Consider index size** - Should fit in RAM
|
||||
```javascript
|
||||
db.collection.stats().indexSizes
|
||||
```
|
||||
9. **Background index creation** - Don't block operations (deprecated in 4.2+)
|
||||
10. **Test with explain** - Verify query plan before production
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Over-indexing** - Too many indexes slow writes
|
||||
2. **Unused indexes** - Waste space and write performance
|
||||
3. **Regex without prefix** - `/pattern/` can't use index, `/^pattern/` can
|
||||
4. **$ne, $nin queries** - Often scan entire collection
|
||||
5. **$or with multiple branches** - May not use indexes efficiently
|
||||
6. **Sort without index** - In-memory sort limited to 32MB
|
||||
7. **Compound index order** - Wrong order makes index useless
|
||||
8. **Case-sensitive queries** - Use collation for case-insensitive
|
||||
|
||||
## Monitoring
|
||||
|
||||
```javascript
|
||||
// Current operations
|
||||
db.currentOp()
|
||||
|
||||
// Slow queries (enable profiling)
|
||||
db.setProfilingLevel(1, { slowms: 100 })
|
||||
db.system.profile.find().sort({ ts: -1 }).limit(10)
|
||||
|
||||
// Index statistics
|
||||
db.collection.aggregate([
|
||||
{ $indexStats: {} },
|
||||
{ $sort: { "accesses.ops": -1 } }
|
||||
])
|
||||
|
||||
// Collection statistics
|
||||
db.collection.stats()
|
||||
```
|
||||
|
||||
## Index Size Calculation
|
||||
|
||||
```javascript
|
||||
// Check index sizes
|
||||
db.collection.stats().indexSizes
|
||||
|
||||
// Total index size
|
||||
db.collection.totalIndexSize()
|
||||
|
||||
// Recommend: indexes fit in RAM
|
||||
// Monitor: db.serverStatus().mem
|
||||
```
|
||||
@@ -0,0 +1,594 @@
|
||||
# PostgreSQL Administration
|
||||
|
||||
User management, backups, replication, maintenance, and production database administration.
|
||||
|
||||
## User and Role Management
|
||||
|
||||
### Create Users
|
||||
```sql
|
||||
-- Create user with password
|
||||
CREATE USER appuser WITH PASSWORD 'secure_password';
|
||||
|
||||
-- Create superuser
|
||||
CREATE USER admin WITH SUPERUSER PASSWORD 'admin_password';
|
||||
|
||||
-- Create role without login
|
||||
CREATE ROLE readonly;
|
||||
|
||||
-- Create user with attributes
|
||||
CREATE USER developer WITH
|
||||
PASSWORD 'dev_pass'
|
||||
CREATEDB
|
||||
VALID UNTIL '2025-12-31';
|
||||
```
|
||||
|
||||
### Alter Users
|
||||
```sql
|
||||
-- Change password
|
||||
ALTER USER appuser WITH PASSWORD 'new_password';
|
||||
|
||||
-- Add attributes
|
||||
ALTER USER appuser WITH CREATEDB CREATEROLE;
|
||||
|
||||
-- Remove attributes
|
||||
ALTER USER appuser WITH NOSUPERUSER;
|
||||
|
||||
-- Rename user
|
||||
ALTER USER oldname RENAME TO newname;
|
||||
|
||||
-- Set connection limit
|
||||
ALTER USER appuser CONNECTION LIMIT 10;
|
||||
```
|
||||
|
||||
### Roles and Inheritance
|
||||
```sql
|
||||
-- Create role hierarchy
|
||||
CREATE ROLE readonly;
|
||||
CREATE ROLE readwrite;
|
||||
|
||||
-- Grant role to user
|
||||
GRANT readonly TO appuser;
|
||||
GRANT readwrite TO developer;
|
||||
|
||||
-- Revoke role
|
||||
REVOKE readonly FROM appuser;
|
||||
|
||||
-- Role membership
|
||||
\du
|
||||
```
|
||||
|
||||
### Permissions
|
||||
|
||||
#### Database Level
|
||||
```sql
|
||||
-- Grant database access
|
||||
GRANT CONNECT ON DATABASE mydb TO appuser;
|
||||
|
||||
-- Grant schema usage
|
||||
GRANT USAGE ON SCHEMA public TO appuser;
|
||||
|
||||
-- Revoke access
|
||||
REVOKE CONNECT ON DATABASE mydb FROM appuser;
|
||||
```
|
||||
|
||||
#### Table Level
|
||||
```sql
|
||||
-- Grant table permissions
|
||||
GRANT SELECT ON users TO appuser;
|
||||
GRANT SELECT, INSERT, UPDATE ON orders TO appuser;
|
||||
GRANT ALL PRIVILEGES ON products TO appuser;
|
||||
|
||||
-- Grant on all tables
|
||||
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
|
||||
|
||||
-- Revoke permissions
|
||||
REVOKE INSERT ON users FROM appuser;
|
||||
```
|
||||
|
||||
#### Column Level
|
||||
```sql
|
||||
-- Grant specific columns
|
||||
GRANT SELECT (id, name, email) ON users TO appuser;
|
||||
GRANT UPDATE (status) ON orders TO appuser;
|
||||
```
|
||||
|
||||
#### Sequence Permissions
|
||||
```sql
|
||||
-- Grant sequence usage (for SERIAL/auto-increment)
|
||||
GRANT USAGE, SELECT ON SEQUENCE users_id_seq TO appuser;
|
||||
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO appuser;
|
||||
```
|
||||
|
||||
#### Function Permissions
|
||||
```sql
|
||||
-- Grant execute on function
|
||||
GRANT EXECUTE ON FUNCTION get_user(integer) TO appuser;
|
||||
```
|
||||
|
||||
### Default Privileges
|
||||
```sql
|
||||
-- Set default privileges for future objects
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public
|
||||
GRANT SELECT ON TABLES TO readonly;
|
||||
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public
|
||||
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite;
|
||||
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public
|
||||
GRANT USAGE ON SEQUENCES TO readwrite;
|
||||
```
|
||||
|
||||
### View Permissions
|
||||
```sql
|
||||
-- Show table permissions
|
||||
\dp users
|
||||
|
||||
-- Show role memberships
|
||||
\du
|
||||
|
||||
-- Query permissions
|
||||
SELECT grantee, privilege_type
|
||||
FROM information_schema.role_table_grants
|
||||
WHERE table_name = 'users';
|
||||
```
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
### pg_dump (Logical Backup)
|
||||
```bash
|
||||
# Dump database to SQL file
|
||||
pg_dump mydb > mydb.sql
|
||||
|
||||
# Custom format (compressed, allows selective restore)
|
||||
pg_dump -Fc mydb > mydb.dump
|
||||
|
||||
# Directory format (parallel dump)
|
||||
pg_dump -Fd mydb -j 4 -f mydb_dir
|
||||
|
||||
# Specific table
|
||||
pg_dump -t users mydb > users.sql
|
||||
|
||||
# Multiple tables
|
||||
pg_dump -t users -t orders mydb > tables.sql
|
||||
|
||||
# Schema only
|
||||
pg_dump -s mydb > schema.sql
|
||||
|
||||
# Data only
|
||||
pg_dump -a mydb > data.sql
|
||||
|
||||
# Exclude table
|
||||
pg_dump --exclude-table=logs mydb > mydb.sql
|
||||
|
||||
# With compression
|
||||
pg_dump -Fc -Z 9 mydb > mydb.dump
|
||||
```
|
||||
|
||||
### pg_dumpall (All Databases)
|
||||
```bash
|
||||
# Dump all databases
|
||||
pg_dumpall > all_databases.sql
|
||||
|
||||
# Only globals (roles, tablespaces)
|
||||
pg_dumpall --globals-only > globals.sql
|
||||
```
|
||||
|
||||
### pg_restore
|
||||
```bash
|
||||
# Restore from custom format
|
||||
pg_restore -d mydb mydb.dump
|
||||
|
||||
# Restore specific table
|
||||
pg_restore -d mydb -t users mydb.dump
|
||||
|
||||
# List contents
|
||||
pg_restore -l mydb.dump
|
||||
|
||||
# Parallel restore
|
||||
pg_restore -d mydb -j 4 mydb.dump
|
||||
|
||||
# Clean database first
|
||||
pg_restore -d mydb --clean mydb.dump
|
||||
|
||||
# Create database if not exists
|
||||
pg_restore -C -d postgres mydb.dump
|
||||
```
|
||||
|
||||
### Restore from SQL
|
||||
```bash
|
||||
# Restore SQL dump
|
||||
psql mydb < mydb.sql
|
||||
|
||||
# Create database and restore
|
||||
createdb mydb
|
||||
psql mydb < mydb.sql
|
||||
|
||||
# Single transaction
|
||||
psql -1 mydb < mydb.sql
|
||||
|
||||
# Stop on error
|
||||
psql --set ON_ERROR_STOP=on mydb < mydb.sql
|
||||
```
|
||||
|
||||
### Automated Backup Script
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# backup.sh
|
||||
|
||||
# Configuration
|
||||
DB_NAME="mydb"
|
||||
BACKUP_DIR="/backups"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
RETENTION_DAYS=7
|
||||
|
||||
# Create backup
|
||||
pg_dump -Fc "$DB_NAME" > "$BACKUP_DIR/${DB_NAME}_${DATE}.dump"
|
||||
|
||||
# Remove old backups
|
||||
find "$BACKUP_DIR" -name "${DB_NAME}_*.dump" -mtime +$RETENTION_DAYS -delete
|
||||
|
||||
# Log
|
||||
echo "Backup completed: ${DB_NAME}_${DATE}.dump"
|
||||
```
|
||||
|
||||
### Point-in-Time Recovery (PITR)
|
||||
```bash
|
||||
# Enable WAL archiving (postgresql.conf)
|
||||
wal_level = replica
|
||||
archive_mode = on
|
||||
archive_command = 'cp %p /archive/%f'
|
||||
max_wal_senders = 3
|
||||
|
||||
# Base backup
|
||||
pg_basebackup -D /backup/base -Ft -z -P
|
||||
|
||||
# Restore to point in time
|
||||
# 1. Stop PostgreSQL
|
||||
# 2. Restore base backup
|
||||
# 3. Create recovery.conf with recovery_target_time
|
||||
# 4. Start PostgreSQL
|
||||
```
|
||||
|
||||
## Replication
|
||||
|
||||
### Streaming Replication (Primary-Replica)
|
||||
|
||||
#### Primary Setup
|
||||
```sql
|
||||
-- Create replication user
|
||||
CREATE USER replicator WITH REPLICATION PASSWORD 'replica_pass';
|
||||
|
||||
-- Configure postgresql.conf
|
||||
wal_level = replica
|
||||
max_wal_senders = 3
|
||||
wal_keep_size = 64MB
|
||||
|
||||
-- Configure pg_hba.conf
|
||||
host replication replicator replica_ip/32 md5
|
||||
```
|
||||
|
||||
#### Replica Setup
|
||||
```bash
|
||||
# Stop replica PostgreSQL
|
||||
systemctl stop postgresql
|
||||
|
||||
# Remove data directory
|
||||
rm -rf /var/lib/postgresql/data/*
|
||||
|
||||
# Clone from primary
|
||||
pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -R
|
||||
|
||||
# Start replica
|
||||
systemctl start postgresql
|
||||
|
||||
# Check replication status
|
||||
SELECT * FROM pg_stat_replication; -- On primary
|
||||
```
|
||||
|
||||
### Logical Replication
|
||||
|
||||
#### Publisher (Primary)
|
||||
```sql
|
||||
-- Create publication
|
||||
CREATE PUBLICATION my_publication FOR ALL TABLES;
|
||||
|
||||
-- Or specific tables
|
||||
CREATE PUBLICATION my_publication FOR TABLE users, orders;
|
||||
|
||||
-- Check publications
|
||||
\dRp
|
||||
SELECT * FROM pg_publication;
|
||||
```
|
||||
|
||||
#### Subscriber (Replica)
|
||||
```sql
|
||||
-- Create subscription
|
||||
CREATE SUBSCRIPTION my_subscription
|
||||
CONNECTION 'host=primary_host dbname=mydb user=replicator password=replica_pass'
|
||||
PUBLICATION my_publication;
|
||||
|
||||
-- Check subscriptions
|
||||
\dRs
|
||||
SELECT * FROM pg_subscription;
|
||||
|
||||
-- Monitor replication
|
||||
SELECT * FROM pg_stat_subscription;
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Database Size
|
||||
```sql
|
||||
-- Database size
|
||||
SELECT pg_size_pretty(pg_database_size('mydb'));
|
||||
|
||||
-- Table sizes
|
||||
SELECT schemaname, tablename,
|
||||
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
|
||||
FROM pg_tables
|
||||
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
|
||||
|
||||
-- Index sizes
|
||||
SELECT schemaname, tablename, indexname,
|
||||
pg_size_pretty(pg_relation_size(indexrelid)) AS size
|
||||
FROM pg_stat_user_indexes
|
||||
ORDER BY pg_relation_size(indexrelid) DESC;
|
||||
```
|
||||
|
||||
### Connections
|
||||
```sql
|
||||
-- Current connections
|
||||
SELECT count(*) FROM pg_stat_activity;
|
||||
|
||||
-- Connections by database
|
||||
SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;
|
||||
|
||||
-- Connection limit
|
||||
SHOW max_connections;
|
||||
|
||||
-- Kill connection
|
||||
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid = 12345;
|
||||
```
|
||||
|
||||
### Activity
|
||||
```sql
|
||||
-- Active queries
|
||||
SELECT pid, usename, state, query, query_start
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle';
|
||||
|
||||
-- Long-running queries
|
||||
SELECT pid, now() - query_start AS duration, query
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
ORDER BY duration DESC;
|
||||
|
||||
-- Blocking queries
|
||||
SELECT blocked.pid AS blocked_pid,
|
||||
blocked.query AS blocked_query,
|
||||
blocking.pid AS blocking_pid,
|
||||
blocking.query AS blocking_query
|
||||
FROM pg_stat_activity blocked
|
||||
JOIN pg_stat_activity blocking
|
||||
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
|
||||
```
|
||||
|
||||
### Cache Hit Ratio
|
||||
```sql
|
||||
-- Should be > 0.99 for good performance
|
||||
SELECT
|
||||
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_ratio
|
||||
FROM pg_statio_user_tables;
|
||||
```
|
||||
|
||||
### Table Bloat
|
||||
```sql
|
||||
-- Check for table bloat (requires pgstattuple extension)
|
||||
CREATE EXTENSION pgstattuple;
|
||||
|
||||
SELECT schemaname, tablename,
|
||||
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
|
||||
pgstattuple(schemaname||'.'||tablename) AS stats
|
||||
FROM pg_tables
|
||||
WHERE schemaname = 'public';
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### VACUUM
|
||||
```sql
|
||||
-- Reclaim storage
|
||||
VACUUM users;
|
||||
|
||||
-- Verbose
|
||||
VACUUM VERBOSE users;
|
||||
|
||||
-- Full (locks table, rewrites)
|
||||
VACUUM FULL users;
|
||||
|
||||
-- With analyze
|
||||
VACUUM ANALYZE users;
|
||||
|
||||
-- All tables
|
||||
VACUUM;
|
||||
```
|
||||
|
||||
### Auto-Vacuum
|
||||
```sql
|
||||
-- Check last vacuum
|
||||
SELECT schemaname, tablename, last_vacuum, last_autovacuum
|
||||
FROM pg_stat_user_tables;
|
||||
|
||||
-- Configure postgresql.conf
|
||||
autovacuum = on
|
||||
autovacuum_vacuum_threshold = 50
|
||||
autovacuum_vacuum_scale_factor = 0.2
|
||||
autovacuum_analyze_threshold = 50
|
||||
autovacuum_analyze_scale_factor = 0.1
|
||||
```
|
||||
|
||||
### REINDEX
|
||||
```sql
|
||||
-- Rebuild index
|
||||
REINDEX INDEX idx_users_email;
|
||||
|
||||
-- Rebuild all indexes on table
|
||||
REINDEX TABLE users;
|
||||
|
||||
-- Rebuild database indexes
|
||||
REINDEX DATABASE mydb;
|
||||
|
||||
-- Concurrently (doesn't lock)
|
||||
REINDEX INDEX CONCURRENTLY idx_users_email;
|
||||
```
|
||||
|
||||
### ANALYZE
|
||||
```sql
|
||||
-- Update statistics
|
||||
ANALYZE users;
|
||||
|
||||
-- Specific columns
|
||||
ANALYZE users(email, status);
|
||||
|
||||
-- All tables
|
||||
ANALYZE;
|
||||
|
||||
-- Verbose
|
||||
ANALYZE VERBOSE users;
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### postgresql.conf Location
|
||||
```sql
|
||||
SHOW config_file;
|
||||
```
|
||||
|
||||
### Key Settings
|
||||
```conf
|
||||
# Memory
|
||||
shared_buffers = 4GB # 25% of RAM
|
||||
work_mem = 64MB # Per operation
|
||||
maintenance_work_mem = 512MB # VACUUM, CREATE INDEX
|
||||
effective_cache_size = 12GB # OS cache estimate
|
||||
|
||||
# Query Planner
|
||||
random_page_cost = 1.1 # Lower for SSD
|
||||
effective_io_concurrency = 200 # Concurrent disk ops
|
||||
|
||||
# Connections
|
||||
max_connections = 100
|
||||
superuser_reserved_connections = 3
|
||||
|
||||
# Logging
|
||||
log_destination = 'stderr'
|
||||
logging_collector = on
|
||||
log_directory = 'log'
|
||||
log_filename = 'postgresql-%Y-%m-%d.log'
|
||||
log_rotation_age = 1d
|
||||
log_min_duration_statement = 100 # Log slow queries
|
||||
|
||||
# Replication
|
||||
wal_level = replica
|
||||
max_wal_senders = 3
|
||||
wal_keep_size = 64MB
|
||||
|
||||
# Autovacuum
|
||||
autovacuum = on
|
||||
autovacuum_vacuum_scale_factor = 0.2
|
||||
autovacuum_analyze_scale_factor = 0.1
|
||||
```
|
||||
|
||||
### Reload Configuration
|
||||
```sql
|
||||
-- Reload config without restart
|
||||
SELECT pg_reload_conf();
|
||||
|
||||
-- Or from shell
|
||||
pg_ctl reload
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
### SSL/TLS
|
||||
```conf
|
||||
# postgresql.conf
|
||||
ssl = on
|
||||
ssl_cert_file = '/path/to/server.crt'
|
||||
ssl_key_file = '/path/to/server.key'
|
||||
ssl_ca_file = '/path/to/ca.crt'
|
||||
```
|
||||
|
||||
### pg_hba.conf (Host-Based Authentication)
|
||||
```conf
|
||||
# TYPE DATABASE USER ADDRESS METHOD
|
||||
|
||||
# Local connections
|
||||
local all postgres peer
|
||||
local all all md5
|
||||
|
||||
# Remote connections
|
||||
host all all 0.0.0.0/0 md5
|
||||
host all all ::0/0 md5
|
||||
|
||||
# Replication
|
||||
host replication replicator replica_ip/32 md5
|
||||
|
||||
# SSL required
|
||||
hostssl all all 0.0.0.0/0 md5
|
||||
```
|
||||
|
||||
### Row Level Security
|
||||
```sql
|
||||
-- Enable RLS
|
||||
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
|
||||
|
||||
-- Create policy
|
||||
CREATE POLICY user_policy ON users
|
||||
USING (user_id = current_user_id());
|
||||
|
||||
-- Drop policy
|
||||
DROP POLICY user_policy ON users;
|
||||
|
||||
-- View policies
|
||||
\d+ users
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Backups**
|
||||
- Daily automated backups
|
||||
- Test restores regularly
|
||||
- Store backups off-site
|
||||
- Use pg_dump custom format for flexibility
|
||||
|
||||
2. **Monitoring**
|
||||
- Monitor connections, queries, cache hit ratio
|
||||
- Set up alerts for critical metrics
|
||||
- Log slow queries
|
||||
- Use pg_stat_statements
|
||||
|
||||
3. **Security**
|
||||
- Use strong passwords
|
||||
- Restrict network access (pg_hba.conf)
|
||||
- Enable SSL/TLS
|
||||
- Regular security updates
|
||||
- Principle of least privilege
|
||||
|
||||
4. **Maintenance**
|
||||
- Regular VACUUM and ANALYZE
|
||||
- Monitor autovacuum
|
||||
- REINDEX periodically
|
||||
- Check for table bloat
|
||||
|
||||
5. **Configuration**
|
||||
- Tune for workload
|
||||
- Use connection pooling (pgBouncer)
|
||||
- Monitor and adjust memory settings
|
||||
- Keep PostgreSQL updated
|
||||
|
||||
6. **Replication**
|
||||
- At least one replica for HA
|
||||
- Monitor replication lag
|
||||
- Test failover procedures
|
||||
- Use logical replication for selective replication
|
||||
527
.opencode/skills/databases/references/postgresql-performance.md
Normal file
527
.opencode/skills/databases/references/postgresql-performance.md
Normal file
@@ -0,0 +1,527 @@
|
||||
# PostgreSQL Performance Optimization
|
||||
|
||||
Query optimization, indexing strategies, EXPLAIN analysis, and performance tuning for PostgreSQL.
|
||||
|
||||
## EXPLAIN Command
|
||||
|
||||
### Basic EXPLAIN
|
||||
```sql
|
||||
-- Show query plan
|
||||
EXPLAIN SELECT * FROM users WHERE id = 1;
|
||||
|
||||
-- Output shows:
|
||||
-- - Execution plan nodes
|
||||
-- - Estimated costs
|
||||
-- - Estimated rows
|
||||
```
|
||||
|
||||
### EXPLAIN ANALYZE
|
||||
```sql
|
||||
-- Execute query and show actual performance
|
||||
EXPLAIN ANALYZE SELECT * FROM users WHERE age > 18;
|
||||
|
||||
-- Shows:
|
||||
-- - Actual execution time
|
||||
-- - Actual rows returned
|
||||
-- - Planning time
|
||||
-- - Execution time
|
||||
```
|
||||
|
||||
### EXPLAIN Options
|
||||
```sql
|
||||
-- Verbose output
|
||||
EXPLAIN (VERBOSE) SELECT * FROM users;
|
||||
|
||||
-- Show buffer usage
|
||||
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE active = true;
|
||||
|
||||
-- JSON format
|
||||
EXPLAIN (FORMAT JSON, ANALYZE) SELECT * FROM users;
|
||||
|
||||
-- All options
|
||||
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, COSTS)
|
||||
SELECT * FROM users WHERE id = 1;
|
||||
```
|
||||
|
||||
## Understanding Query Plans
|
||||
|
||||
### Scan Methods
|
||||
|
||||
#### Sequential Scan
|
||||
```sql
|
||||
-- Full table scan (reads all rows)
|
||||
EXPLAIN SELECT * FROM users WHERE name = 'Alice';
|
||||
|
||||
-- Output: Seq Scan on users
|
||||
-- Indicates: no suitable index or small table
|
||||
```
|
||||
|
||||
#### Index Scan
|
||||
```sql
|
||||
-- Uses index to find rows
|
||||
EXPLAIN SELECT * FROM users WHERE id = 1;
|
||||
|
||||
-- Output: Index Scan using users_pkey on users
|
||||
-- Best for: selective queries, small result sets
|
||||
```
|
||||
|
||||
#### Index Only Scan
|
||||
```sql
|
||||
-- Query covered by index (no table access)
|
||||
CREATE INDEX idx_users_email_name ON users(email, name);
|
||||
EXPLAIN SELECT email, name FROM users WHERE email = 'alice@example.com';
|
||||
|
||||
-- Output: Index Only Scan using idx_users_email_name
|
||||
-- Best performance: no heap fetch needed
|
||||
```
|
||||
|
||||
#### Bitmap Scan
|
||||
```sql
|
||||
-- Combines multiple indexes or handles large result sets
|
||||
EXPLAIN SELECT * FROM users WHERE age > 18 AND status = 'active';
|
||||
|
||||
-- Output:
|
||||
-- Bitmap Heap Scan on users
|
||||
-- Recheck Cond: ...
|
||||
-- -> Bitmap Index Scan on idx_age
|
||||
|
||||
-- Good for: moderate selectivity
|
||||
```
|
||||
|
||||
### Join Methods
|
||||
|
||||
#### Nested Loop
|
||||
```sql
|
||||
-- For each row in outer table, scan inner table
|
||||
EXPLAIN SELECT * FROM orders o
|
||||
JOIN customers c ON o.customer_id = c.id
|
||||
WHERE c.id = 1;
|
||||
|
||||
-- Output: Nested Loop
|
||||
-- Best for: small outer table, indexed inner table
|
||||
```
|
||||
|
||||
#### Hash Join
|
||||
```sql
|
||||
-- Build hash table from smaller table
|
||||
EXPLAIN SELECT * FROM orders o
|
||||
JOIN customers c ON o.customer_id = c.id;
|
||||
|
||||
-- Output: Hash Join
|
||||
-- Best for: large tables, equality conditions
|
||||
```
|
||||
|
||||
#### Merge Join
|
||||
```sql
|
||||
-- Both inputs sorted on join key
|
||||
EXPLAIN SELECT * FROM orders o
|
||||
JOIN customers c ON o.customer_id = c.id
|
||||
ORDER BY o.customer_id;
|
||||
|
||||
-- Output: Merge Join
|
||||
-- Best for: pre-sorted data, large sorted inputs
|
||||
```
|
||||
|
||||
## Indexing Strategies
|
||||
|
||||
### B-tree Index (Default)
|
||||
```sql
|
||||
-- General purpose index
|
||||
CREATE INDEX idx_users_email ON users(email);
|
||||
CREATE INDEX idx_orders_date ON orders(order_date);
|
||||
|
||||
-- Supports: =, <, <=, >, >=, BETWEEN, IN, IS NULL
|
||||
-- Supports: ORDER BY, MIN/MAX
|
||||
```
|
||||
|
||||
### Composite Index
|
||||
```sql
|
||||
-- Multiple columns (order matters!)
|
||||
CREATE INDEX idx_users_status_created ON users(status, created_at);
|
||||
|
||||
-- Supports queries on:
|
||||
-- - status
|
||||
-- - status, created_at
|
||||
-- Does NOT support: created_at alone
|
||||
|
||||
-- Column order: most selective first
|
||||
-- Exception: match query WHERE/ORDER BY patterns
|
||||
```
|
||||
|
||||
### Partial Index
|
||||
```sql
|
||||
-- Index subset of rows
|
||||
CREATE INDEX idx_active_users ON users(email)
|
||||
WHERE status = 'active';
|
||||
|
||||
-- Smaller index, faster queries with matching WHERE clause
|
||||
-- Query must include WHERE status = 'active' to use index
|
||||
```
|
||||
|
||||
### Expression Index
|
||||
```sql
|
||||
-- Index on computed value
|
||||
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
|
||||
|
||||
-- Query must use same expression
|
||||
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
|
||||
```
|
||||
|
||||
### GIN Index (Generalized Inverted Index)
|
||||
```sql
|
||||
-- For array, JSONB, full-text search
|
||||
CREATE INDEX idx_products_tags ON products USING GIN(tags);
|
||||
CREATE INDEX idx_documents_data ON documents USING GIN(data);
|
||||
|
||||
-- Array queries
|
||||
SELECT * FROM products WHERE tags @> ARRAY['featured'];
|
||||
|
||||
-- JSONB queries
|
||||
SELECT * FROM documents WHERE data @> '{"status": "active"}';
|
||||
```
|
||||
|
||||
### GiST Index (Generalized Search Tree)
|
||||
```sql
|
||||
-- For geometric data, range types, full-text
|
||||
CREATE INDEX idx_locations_geom ON locations USING GiST(geom);
|
||||
|
||||
-- Geometric queries
|
||||
SELECT * FROM locations WHERE geom && ST_MakeEnvelope(...);
|
||||
```
|
||||
|
||||
### Hash Index
|
||||
```sql
|
||||
-- Equality comparisons only
|
||||
CREATE INDEX idx_users_hash_email ON users USING HASH(email);
|
||||
|
||||
-- Only supports: =
|
||||
-- Rarely used (B-tree usually better)
|
||||
```
|
||||
|
||||
### BRIN Index (Block Range Index)
|
||||
```sql
|
||||
-- For very large tables with natural clustering
|
||||
CREATE INDEX idx_logs_brin_created ON logs USING BRIN(created_at);
|
||||
|
||||
-- Tiny index size, good for append-only data
|
||||
-- Best for: time-series, logging, large tables
|
||||
```
|
||||
|
||||
## Query Optimization Techniques
|
||||
|
||||
### Avoid SELECT *
|
||||
```sql
|
||||
-- Bad
|
||||
SELECT * FROM users WHERE id = 1;
|
||||
|
||||
-- Good (only needed columns)
|
||||
SELECT id, name, email FROM users WHERE id = 1;
|
||||
```
|
||||
|
||||
### Use LIMIT
|
||||
```sql
|
||||
-- Limit result set
|
||||
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
|
||||
|
||||
-- PostgreSQL can stop early with LIMIT
|
||||
```
|
||||
|
||||
### Index for ORDER BY
|
||||
```sql
|
||||
-- Create index matching sort order
|
||||
CREATE INDEX idx_users_created_desc ON users(created_at DESC);
|
||||
|
||||
-- Query uses index for sorting
|
||||
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
|
||||
```
|
||||
|
||||
### Covering Index
|
||||
```sql
|
||||
-- Include all queried columns in index
|
||||
CREATE INDEX idx_users_email_name_status ON users(email, name, status);
|
||||
|
||||
-- Query covered by index (no table access)
|
||||
SELECT name, status FROM users WHERE email = 'alice@example.com';
|
||||
```
|
||||
|
||||
### EXISTS vs IN
|
||||
```sql
|
||||
-- Prefer EXISTS for large subqueries
|
||||
-- Bad
|
||||
SELECT * FROM customers
|
||||
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
|
||||
|
||||
-- Good
|
||||
SELECT * FROM customers c
|
||||
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000);
|
||||
```
|
||||
|
||||
### JOIN Order
|
||||
```sql
|
||||
-- Filter before joining
|
||||
-- Bad
|
||||
SELECT * FROM orders o
|
||||
JOIN customers c ON o.customer_id = c.id
|
||||
WHERE o.status = 'completed' AND c.country = 'USA';
|
||||
|
||||
-- Good (filter in subquery)
|
||||
SELECT * FROM (
|
||||
SELECT * FROM orders WHERE status = 'completed'
|
||||
) o
|
||||
JOIN (
|
||||
SELECT * FROM customers WHERE country = 'USA'
|
||||
) c ON o.customer_id = c.id;
|
||||
|
||||
-- Or use CTE
|
||||
WITH filtered_orders AS (
|
||||
SELECT * FROM orders WHERE status = 'completed'
|
||||
),
|
||||
filtered_customers AS (
|
||||
SELECT * FROM customers WHERE country = 'USA'
|
||||
)
|
||||
SELECT * FROM filtered_orders o
|
||||
JOIN filtered_customers c ON o.customer_id = c.id;
|
||||
```
|
||||
|
||||
### Avoid Functions in WHERE
|
||||
```sql
|
||||
-- Bad (index not used)
|
||||
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
|
||||
|
||||
-- Good (create expression index)
|
||||
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
|
||||
-- Then query uses index
|
||||
|
||||
-- Or store lowercase separately
|
||||
ALTER TABLE users ADD COLUMN email_lower TEXT;
|
||||
UPDATE users SET email_lower = LOWER(email);
|
||||
CREATE INDEX idx_users_email_lower ON users(email_lower);
|
||||
```
|
||||
|
||||
## Statistics and ANALYZE
|
||||
|
||||
### Update Statistics
|
||||
```sql
|
||||
-- Analyze table (update statistics)
|
||||
ANALYZE users;
|
||||
|
||||
-- Analyze specific columns
|
||||
ANALYZE users(email, status);
|
||||
|
||||
-- Analyze all tables
|
||||
ANALYZE;
|
||||
|
||||
-- Auto-analyze (configured in postgresql.conf)
|
||||
autovacuum_analyze_threshold = 50
|
||||
autovacuum_analyze_scale_factor = 0.1
|
||||
```
|
||||
|
||||
### Check Statistics
|
||||
```sql
|
||||
-- Last analyze time
|
||||
SELECT schemaname, tablename, last_analyze, last_autoanalyze
|
||||
FROM pg_stat_user_tables;
|
||||
|
||||
-- Statistics targets (adjust for important columns)
|
||||
ALTER TABLE users ALTER COLUMN email SET STATISTICS 1000;
|
||||
```
|
||||
|
||||
## VACUUM and Maintenance
|
||||
|
||||
### VACUUM
|
||||
```sql
|
||||
-- Reclaim storage, update statistics
|
||||
VACUUM users;
|
||||
|
||||
-- Verbose output
|
||||
VACUUM VERBOSE users;
|
||||
|
||||
-- Full vacuum (rewrites table, locks table)
|
||||
VACUUM FULL users;
|
||||
|
||||
-- Analyze after vacuum
|
||||
VACUUM ANALYZE users;
|
||||
```
|
||||
|
||||
### Auto-Vacuum
|
||||
```sql
|
||||
-- Check autovacuum status
|
||||
SELECT schemaname, tablename, last_vacuum, last_autovacuum
|
||||
FROM pg_stat_user_tables;
|
||||
|
||||
-- Configure in postgresql.conf
|
||||
autovacuum = on
|
||||
autovacuum_vacuum_threshold = 50
|
||||
autovacuum_vacuum_scale_factor = 0.2
|
||||
```
|
||||
|
||||
### REINDEX
|
||||
```sql
|
||||
-- Rebuild index
|
||||
REINDEX INDEX idx_users_email;
|
||||
|
||||
-- Rebuild all indexes on table
|
||||
REINDEX TABLE users;
|
||||
|
||||
-- Rebuild all indexes in schema
|
||||
REINDEX SCHEMA public;
|
||||
```
|
||||
|
||||
## Monitoring Queries
|
||||
|
||||
### Active Queries
|
||||
```sql
|
||||
-- Current queries
|
||||
SELECT pid, usename, state, query, query_start
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle';
|
||||
|
||||
-- Long-running queries
|
||||
SELECT pid, now() - query_start AS duration, query
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle' AND now() - query_start > interval '5 minutes'
|
||||
ORDER BY duration DESC;
|
||||
```
|
||||
|
||||
### Slow Query Log
|
||||
```sql
|
||||
-- Enable slow query logging (postgresql.conf)
|
||||
log_min_duration_statement = 100 -- milliseconds
|
||||
|
||||
-- Or per session
|
||||
SET log_min_duration_statement = 100;
|
||||
|
||||
-- Logs appear in PostgreSQL log files
|
||||
```
|
||||
|
||||
### pg_stat_statements Extension
|
||||
```sql
|
||||
-- Enable extension
|
||||
CREATE EXTENSION pg_stat_statements;
|
||||
|
||||
-- View query statistics
|
||||
SELECT query, calls, total_exec_time, mean_exec_time, rows
|
||||
FROM pg_stat_statements
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 10;
|
||||
|
||||
-- Reset statistics
|
||||
SELECT pg_stat_statements_reset();
|
||||
```
|
||||
|
||||
## Index Usage Analysis
|
||||
|
||||
### Check Index Usage
|
||||
```sql
|
||||
-- Index usage statistics
|
||||
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
|
||||
FROM pg_stat_user_indexes
|
||||
ORDER BY idx_scan;
|
||||
|
||||
-- Unused indexes (idx_scan = 0)
|
||||
SELECT schemaname, tablename, indexname
|
||||
FROM pg_stat_user_indexes
|
||||
WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey';
|
||||
```
|
||||
|
||||
### Index Size
|
||||
```sql
|
||||
-- Index sizes
|
||||
SELECT schemaname, tablename, indexname,
|
||||
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
|
||||
FROM pg_stat_user_indexes
|
||||
ORDER BY pg_relation_size(indexrelid) DESC;
|
||||
```
|
||||
|
||||
### Missing Indexes
|
||||
```sql
|
||||
-- Tables with sequential scans
|
||||
SELECT schemaname, tablename, seq_scan, seq_tup_read
|
||||
FROM pg_stat_user_tables
|
||||
WHERE seq_scan > 0
|
||||
ORDER BY seq_tup_read DESC;
|
||||
|
||||
-- Consider adding indexes to high seq_scan tables
|
||||
```
|
||||
|
||||
## Configuration Tuning
|
||||
|
||||
### Memory Settings (postgresql.conf)
|
||||
```conf
|
||||
# Shared buffers (25% of RAM)
|
||||
shared_buffers = 4GB
|
||||
|
||||
# Work memory (per operation)
|
||||
work_mem = 64MB
|
||||
|
||||
# Maintenance work memory (VACUUM, CREATE INDEX)
|
||||
maintenance_work_mem = 512MB
|
||||
|
||||
# Effective cache size (estimate of OS cache)
|
||||
effective_cache_size = 12GB
|
||||
```
|
||||
|
||||
### Query Planner Settings
|
||||
```conf
|
||||
# Random page cost (lower for SSD)
|
||||
random_page_cost = 1.1
|
||||
|
||||
# Effective IO concurrency (number of concurrent disk operations)
|
||||
effective_io_concurrency = 200
|
||||
|
||||
# Cost of parallel query startup
|
||||
parallel_setup_cost = 1000
|
||||
parallel_tuple_cost = 0.1
|
||||
```
|
||||
|
||||
### Connection Settings
|
||||
```conf
|
||||
# Max connections
|
||||
max_connections = 100
|
||||
|
||||
# Connection pooling recommended (pgBouncer)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Index strategy**
|
||||
- Index foreign keys
|
||||
- Index WHERE clause columns
|
||||
- Index ORDER BY columns
|
||||
- Use composite indexes for multi-column queries
|
||||
- Keep index count reasonable (5-10 per table)
|
||||
|
||||
2. **Query optimization**
|
||||
- Use EXPLAIN ANALYZE
|
||||
- Avoid SELECT *
|
||||
- Use LIMIT when possible
|
||||
- Filter before joining
|
||||
- Use appropriate join type
|
||||
|
||||
3. **Statistics**
|
||||
- Regular ANALYZE
|
||||
- Increase statistics target for skewed distributions
|
||||
- Monitor autovacuum
|
||||
|
||||
4. **Monitoring**
|
||||
- Enable pg_stat_statements
|
||||
- Log slow queries
|
||||
- Monitor index usage
|
||||
- Check table bloat
|
||||
|
||||
5. **Maintenance**
|
||||
- Regular VACUUM
|
||||
- REINDEX periodically
|
||||
- Update PostgreSQL version
|
||||
- Monitor disk space
|
||||
|
||||
6. **Configuration**
|
||||
- Tune memory settings
|
||||
- Adjust for workload (OLTP vs OLAP)
|
||||
- Use connection pooling
|
||||
- Enable query logging
|
||||
|
||||
7. **Testing**
|
||||
- Test queries with production-like data volume
|
||||
- Benchmark before/after changes
|
||||
- Monitor production metrics
|
||||
467
.opencode/skills/databases/references/postgresql-psql-cli.md
Normal file
467
.opencode/skills/databases/references/postgresql-psql-cli.md
Normal file
@@ -0,0 +1,467 @@
|
||||
# PostgreSQL psql CLI
|
||||
|
||||
Command-line interface for PostgreSQL: connection, meta-commands, scripting, and interactive usage.
|
||||
|
||||
## Connection
|
||||
|
||||
### Basic Connection
|
||||
```bash
|
||||
# Connect to database
|
||||
psql -U username -d database -h hostname -p 5432
|
||||
|
||||
# Connect using URI
|
||||
psql postgresql://username:password@hostname:5432/database
|
||||
|
||||
# Environment variables
|
||||
export PGUSER=postgres
|
||||
export PGPASSWORD=mypassword
|
||||
export PGHOST=localhost
|
||||
export PGPORT=5432
|
||||
export PGDATABASE=mydb
|
||||
psql
|
||||
```
|
||||
|
||||
### Password File (~/.pgpass)
|
||||
```bash
|
||||
# Format: hostname:port:database:username:password
|
||||
# chmod 600 ~/.pgpass
|
||||
localhost:5432:mydb:postgres:mypassword
|
||||
*.example.com:5432:*:appuser:apppass
|
||||
```
|
||||
|
||||
### SSL Connection
|
||||
```bash
|
||||
# Require SSL
|
||||
psql "host=hostname sslmode=require user=username dbname=database"
|
||||
|
||||
# Verify certificate
|
||||
psql "host=hostname sslmode=verify-full \
|
||||
sslcert=/path/to/client.crt \
|
||||
sslkey=/path/to/client.key \
|
||||
sslrootcert=/path/to/ca.crt"
|
||||
```
|
||||
|
||||
## Essential Meta-Commands
|
||||
|
||||
### Database Navigation
|
||||
```bash
|
||||
\l or \list # List databases
|
||||
\l+ # List with sizes
|
||||
\c database # Connect to database
|
||||
\c database username # Connect as user
|
||||
\conninfo # Connection info
|
||||
```
|
||||
|
||||
### Schema Inspection
|
||||
```bash
|
||||
\dn # List schemas
|
||||
\dt # List tables
|
||||
\dt+ # Tables with sizes
|
||||
\dt *.* # All tables, all schemas
|
||||
\di # List indexes
|
||||
\dv # List views
|
||||
\dm # List materialized views
|
||||
\ds # List sequences
|
||||
\df # List functions
|
||||
```
|
||||
|
||||
### Object Description
|
||||
```bash
|
||||
\d tablename # Describe table
|
||||
\d+ tablename # Detailed description
|
||||
\d indexname # Describe index
|
||||
\df functionname # Describe function
|
||||
\du # List users/roles
|
||||
\dp tablename # Show permissions
|
||||
```
|
||||
|
||||
### Output Formatting
|
||||
```bash
|
||||
\x # Toggle expanded output
|
||||
\x on # Enable expanded
|
||||
\x off # Disable expanded
|
||||
\a # Toggle aligned output
|
||||
\t # Toggle tuples only
|
||||
\H # HTML output
|
||||
\pset format csv # CSV format
|
||||
\pset null '[NULL]' # Show NULL values
|
||||
```
|
||||
|
||||
### Execution Commands
|
||||
```bash
|
||||
\i filename.sql # Execute SQL file
|
||||
\o output.txt # Redirect output to file
|
||||
\o # Stop redirecting
|
||||
\! command # Execute shell command
|
||||
\timing # Toggle timing
|
||||
\q # Quit
|
||||
```
|
||||
|
||||
## psql Command-Line Options
|
||||
|
||||
```bash
|
||||
# Connection
|
||||
-h hostname # Host
|
||||
-p port # Port (default 5432)
|
||||
-U username # Username
|
||||
-d database # Database
|
||||
-W # Prompt for password
|
||||
|
||||
# Execution
|
||||
-c "SQL" # Execute command and exit
|
||||
-f file.sql # Execute file
|
||||
--command="SQL" # Execute command
|
||||
|
||||
# Output
|
||||
-t # Tuples only (no headers)
|
||||
-A # Unaligned output
|
||||
-F "," # Field separator
|
||||
-o output.txt # Output to file
|
||||
-q # Quiet mode
|
||||
-x # Expanded output
|
||||
|
||||
# Script options
|
||||
-1 # Execute as transaction
|
||||
--on-error-stop # Stop on error
|
||||
-v variable=value # Set variable
|
||||
-L logfile.log # Log session
|
||||
```
|
||||
|
||||
## Running SQL
|
||||
|
||||
### Interactive Queries
|
||||
```sql
|
||||
-- Simple query
|
||||
SELECT * FROM users;
|
||||
|
||||
-- Multi-line (ends with semicolon)
|
||||
SELECT id, name, email
|
||||
FROM users
|
||||
WHERE active = true;
|
||||
|
||||
-- Edit in editor
|
||||
\e
|
||||
|
||||
-- Repeat last query
|
||||
\g
|
||||
|
||||
-- Send to file
|
||||
\g output.txt
|
||||
```
|
||||
|
||||
### Variables
|
||||
```bash
|
||||
# Set variable
|
||||
\set myvar 'value'
|
||||
\set limit 10
|
||||
|
||||
# Use variable
|
||||
SELECT * FROM users LIMIT :limit;
|
||||
|
||||
# String variable (quoted)
|
||||
\set username 'alice'
|
||||
SELECT * FROM users WHERE name = :'username';
|
||||
|
||||
# Show all variables
|
||||
\set
|
||||
|
||||
# Unset variable
|
||||
\unset myvar
|
||||
```
|
||||
|
||||
### Scripts
|
||||
```sql
|
||||
-- script.sql
|
||||
\set ON_ERROR_STOP on
|
||||
|
||||
BEGIN;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS users (
|
||||
id SERIAL PRIMARY KEY,
|
||||
name TEXT NOT NULL,
|
||||
email TEXT UNIQUE
|
||||
);
|
||||
|
||||
INSERT INTO users (name, email) VALUES
|
||||
('Alice', 'alice@example.com'),
|
||||
('Bob', 'bob@example.com');
|
||||
|
||||
COMMIT;
|
||||
|
||||
\echo 'Script completed!'
|
||||
```
|
||||
|
||||
```bash
|
||||
# Execute script
|
||||
psql -d mydb -f script.sql
|
||||
|
||||
# With error stopping
|
||||
psql -d mydb -f script.sql --on-error-stop
|
||||
|
||||
# In single transaction
|
||||
psql -d mydb -1 -f script.sql
|
||||
```
|
||||
|
||||
## Data Import/Export
|
||||
|
||||
### COPY (Server-side)
|
||||
```sql
|
||||
-- Export to CSV
|
||||
COPY users TO '/tmp/users.csv' WITH (FORMAT CSV, HEADER);
|
||||
|
||||
-- Import from CSV
|
||||
COPY users FROM '/tmp/users.csv' WITH (FORMAT CSV, HEADER);
|
||||
|
||||
-- Query to file
|
||||
COPY (SELECT * FROM users WHERE active = true)
|
||||
TO '/tmp/active_users.csv' WITH (FORMAT CSV, HEADER);
|
||||
```
|
||||
|
||||
### \copy (Client-side)
|
||||
```bash
|
||||
# Export (from psql)
|
||||
\copy users TO 'users.csv' WITH (FORMAT CSV, HEADER)
|
||||
|
||||
# Export query results
|
||||
\copy (SELECT * FROM users WHERE active = true) TO 'active.csv' CSV HEADER
|
||||
|
||||
# Import
|
||||
\copy users FROM 'users.csv' WITH (FORMAT CSV, HEADER)
|
||||
|
||||
# To stdout
|
||||
\copy users TO STDOUT CSV HEADER > users.csv
|
||||
```
|
||||
|
||||
### pg_dump / pg_restore
|
||||
```bash
|
||||
# Dump database
|
||||
pg_dump mydb > mydb.sql
|
||||
pg_dump -d mydb -Fc > mydb.dump # Custom format
|
||||
|
||||
# Dump specific table
|
||||
pg_dump -t users mydb > users.sql
|
||||
|
||||
# Schema only
|
||||
pg_dump -s mydb > schema.sql
|
||||
|
||||
# Data only
|
||||
pg_dump -a mydb > data.sql
|
||||
|
||||
# Restore
|
||||
psql mydb < mydb.sql
|
||||
pg_restore -d mydb mydb.dump
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### ~/.psqlrc
|
||||
```bash
|
||||
# Auto-loaded on psql startup
|
||||
\set QUIET ON
|
||||
|
||||
-- Prompt customization
|
||||
\set PROMPT1 '%n@%m:%>/%/%R%# '
|
||||
|
||||
-- Output settings
|
||||
\pset null '[NULL]'
|
||||
\pset border 2
|
||||
\pset linestyle unicode
|
||||
\pset expanded auto
|
||||
|
||||
-- Timing
|
||||
\timing ON
|
||||
|
||||
-- Pager
|
||||
\pset pager always
|
||||
|
||||
-- History
|
||||
\set HISTSIZE 10000
|
||||
|
||||
-- Custom shortcuts
|
||||
\set active_users 'SELECT * FROM users WHERE status = ''active'';'
|
||||
\set dbsize 'SELECT pg_size_pretty(pg_database_size(current_database()));'
|
||||
|
||||
\set QUIET OFF
|
||||
```
|
||||
|
||||
### Useful Aliases
|
||||
```bash
|
||||
# Add to ~/.psqlrc
|
||||
\set locks 'SELECT pid, usename, pg_blocking_pids(pid) as blocked_by, query FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;'
|
||||
|
||||
\set activity 'SELECT pid, usename, state, query FROM pg_stat_activity WHERE state != ''idle'';'
|
||||
|
||||
\set table_sizes 'SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||''.''||tablename)) FROM pg_tables ORDER BY pg_total_relation_size(schemaname||''.''||tablename) DESC;'
|
||||
|
||||
\set index_usage 'SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes ORDER BY idx_scan;'
|
||||
|
||||
# Usage: :locks, :activity, :table_sizes
|
||||
```
|
||||
|
||||
## Transactions
|
||||
|
||||
```sql
|
||||
-- Begin transaction
|
||||
BEGIN;
|
||||
|
||||
-- Or
|
||||
START TRANSACTION;
|
||||
|
||||
-- Savepoint
|
||||
SAVEPOINT sp1;
|
||||
|
||||
-- Rollback to savepoint
|
||||
ROLLBACK TO sp1;
|
||||
|
||||
-- Commit
|
||||
COMMIT;
|
||||
|
||||
-- Rollback
|
||||
ROLLBACK;
|
||||
```
|
||||
|
||||
## Performance Analysis
|
||||
|
||||
### EXPLAIN
|
||||
```sql
|
||||
-- Show query plan
|
||||
EXPLAIN SELECT * FROM users WHERE id = 1;
|
||||
|
||||
-- With execution
|
||||
EXPLAIN ANALYZE SELECT * FROM users WHERE age > 18;
|
||||
|
||||
-- Verbose
|
||||
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
|
||||
SELECT * FROM users WHERE active = true;
|
||||
```
|
||||
|
||||
### Current Activity
|
||||
```sql
|
||||
-- Active queries
|
||||
SELECT pid, usename, state, query
|
||||
FROM pg_stat_activity;
|
||||
|
||||
-- Long-running queries
|
||||
SELECT pid, now() - query_start AS duration, query
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
ORDER BY duration DESC;
|
||||
|
||||
-- Blocking queries
|
||||
SELECT blocked.pid, blocking.pid AS blocking_pid,
|
||||
blocked.query AS blocked_query,
|
||||
blocking.query AS blocking_query
|
||||
FROM pg_stat_activity blocked
|
||||
JOIN pg_stat_activity blocking
|
||||
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
|
||||
```
|
||||
|
||||
### Statistics
|
||||
```sql
|
||||
-- Database size
|
||||
SELECT pg_size_pretty(pg_database_size(current_database()));
|
||||
|
||||
-- Table sizes
|
||||
SELECT schemaname, tablename,
|
||||
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
|
||||
FROM pg_tables
|
||||
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
|
||||
|
||||
-- Index usage
|
||||
SELECT schemaname, tablename, indexname, idx_scan
|
||||
FROM pg_stat_user_indexes
|
||||
ORDER BY idx_scan;
|
||||
```
|
||||
|
||||
## User Management
|
||||
|
||||
```sql
|
||||
-- Create user
|
||||
CREATE USER appuser WITH PASSWORD 'secure_password';
|
||||
|
||||
-- Create superuser
|
||||
CREATE USER admin WITH PASSWORD 'password' SUPERUSER;
|
||||
|
||||
-- Alter user
|
||||
ALTER USER appuser WITH PASSWORD 'new_password';
|
||||
|
||||
-- Grant permissions
|
||||
GRANT CONNECT ON DATABASE mydb TO appuser;
|
||||
GRANT USAGE ON SCHEMA public TO appuser;
|
||||
GRANT SELECT, INSERT, UPDATE, DELETE ON users TO appuser;
|
||||
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO appuser;
|
||||
|
||||
-- Default privileges
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public
|
||||
GRANT SELECT ON TABLES TO appuser;
|
||||
|
||||
-- View permissions
|
||||
\dp users
|
||||
|
||||
-- Drop user
|
||||
DROP USER appuser;
|
||||
```
|
||||
|
||||
## Backup Patterns
|
||||
|
||||
```bash
|
||||
# Daily backup script
|
||||
#!/bin/bash
|
||||
DATE=$(date +%Y%m%d)
|
||||
pg_dump -Fc mydb > /backups/mydb_$DATE.dump
|
||||
|
||||
# Restore latest
|
||||
pg_restore -d mydb /backups/mydb_latest.dump
|
||||
|
||||
# Backup all databases
|
||||
pg_dumpall > /backups/all_databases.sql
|
||||
|
||||
# Backup specific schema
|
||||
pg_dump -n public mydb > public_schema.sql
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Connection Issues
|
||||
```bash
|
||||
# Test connection
|
||||
psql -h hostname -U username -d postgres -c "SELECT 1;"
|
||||
|
||||
# Check pg_hba.conf
|
||||
# /var/lib/postgresql/data/pg_hba.conf
|
||||
|
||||
# Verbose connection
|
||||
psql -h hostname -d mydb --echo-all
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
```sql
|
||||
-- Enable slow query logging
|
||||
ALTER DATABASE mydb SET log_min_duration_statement = 100;
|
||||
|
||||
-- Check cache hit ratio
|
||||
SELECT
|
||||
sum(heap_blks_read) as heap_read,
|
||||
sum(heap_blks_hit) as heap_hit,
|
||||
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
|
||||
FROM pg_statio_user_tables;
|
||||
|
||||
-- Find slow queries
|
||||
SELECT query, mean_exec_time, calls
|
||||
FROM pg_stat_statements
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use .pgpass** for credential management
|
||||
2. **Set ON_ERROR_STOP** in scripts
|
||||
3. **Use transactions** for multi-statement changes
|
||||
4. **Test with EXPLAIN** before running expensive queries
|
||||
5. **Use \timing** to measure query performance
|
||||
6. **Configure ~/.psqlrc** for productivity
|
||||
7. **Use variables** for dynamic queries
|
||||
8. **Log sessions** with -L for auditing
|
||||
9. **Use \copy** instead of COPY for client operations
|
||||
10. **Regular backups** with pg_dump
|
||||
475
.opencode/skills/databases/references/postgresql-queries.md
Normal file
475
.opencode/skills/databases/references/postgresql-queries.md
Normal file
@@ -0,0 +1,475 @@
|
||||
# PostgreSQL SQL Queries
|
||||
|
||||
SQL queries in PostgreSQL: SELECT, JOINs, subqueries, CTEs, window functions, and advanced patterns.
|
||||
|
||||
## Basic SELECT
|
||||
|
||||
### Simple Queries
|
||||
```sql
|
||||
-- Select all columns
|
||||
SELECT * FROM users;
|
||||
|
||||
-- Select specific columns
|
||||
SELECT id, name, email FROM users;
|
||||
|
||||
-- With alias
|
||||
SELECT name AS full_name, email AS contact_email FROM users;
|
||||
|
||||
-- Distinct values
|
||||
SELECT DISTINCT status FROM orders;
|
||||
|
||||
-- Count rows
|
||||
SELECT COUNT(*) FROM users;
|
||||
SELECT COUNT(DISTINCT status) FROM orders;
|
||||
```
|
||||
|
||||
### WHERE Clause
|
||||
```sql
|
||||
-- Equality
|
||||
SELECT * FROM users WHERE status = 'active';
|
||||
|
||||
-- Comparison
|
||||
SELECT * FROM products WHERE price > 100;
|
||||
SELECT * FROM orders WHERE total BETWEEN 100 AND 500;
|
||||
|
||||
-- Pattern matching
|
||||
SELECT * FROM users WHERE email LIKE '%@example.com';
|
||||
SELECT * FROM users WHERE name ILIKE 'john%'; -- case-insensitive
|
||||
|
||||
-- IN operator
|
||||
SELECT * FROM orders WHERE status IN ('pending', 'processing');
|
||||
|
||||
-- NULL checks
|
||||
SELECT * FROM users WHERE deleted_at IS NULL;
|
||||
SELECT * FROM users WHERE phone_number IS NOT NULL;
|
||||
|
||||
-- Logical operators
|
||||
SELECT * FROM products WHERE price > 100 AND stock > 0;
|
||||
SELECT * FROM users WHERE status = 'active' OR verified = true;
|
||||
SELECT * FROM products WHERE NOT (price > 1000);
|
||||
```
|
||||
|
||||
### ORDER BY
|
||||
```sql
|
||||
-- Ascending (default)
|
||||
SELECT * FROM users ORDER BY created_at;
|
||||
|
||||
-- Descending
|
||||
SELECT * FROM users ORDER BY created_at DESC;
|
||||
|
||||
-- Multiple columns
|
||||
SELECT * FROM orders ORDER BY status ASC, created_at DESC;
|
||||
|
||||
-- NULL handling
|
||||
SELECT * FROM users ORDER BY last_login NULLS FIRST;
|
||||
SELECT * FROM users ORDER BY last_login NULLS LAST;
|
||||
```
|
||||
|
||||
### LIMIT and OFFSET
|
||||
```sql
|
||||
-- Limit results
|
||||
SELECT * FROM users LIMIT 10;
|
||||
|
||||
-- Pagination
|
||||
SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20;
|
||||
|
||||
-- Alternative: FETCH
|
||||
SELECT * FROM users OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
|
||||
```
|
||||
|
||||
## JOINs
|
||||
|
||||
### INNER JOIN
|
||||
```sql
|
||||
-- Match rows from both tables
|
||||
SELECT orders.id, orders.total, customers.name
|
||||
FROM orders
|
||||
INNER JOIN customers ON orders.customer_id = customers.id;
|
||||
|
||||
-- Short syntax
|
||||
SELECT o.id, o.total, c.name
|
||||
FROM orders o
|
||||
JOIN customers c ON o.customer_id = c.id;
|
||||
|
||||
-- Multiple joins
|
||||
SELECT o.id, c.name, p.name AS product
|
||||
FROM orders o
|
||||
JOIN customers c ON o.customer_id = c.id
|
||||
JOIN order_items oi ON oi.order_id = o.id
|
||||
JOIN products p ON oi.product_id = p.id;
|
||||
```
|
||||
|
||||
### LEFT JOIN (LEFT OUTER JOIN)
|
||||
```sql
|
||||
-- All rows from left table, matching rows from right
|
||||
SELECT c.name, o.id AS order_id
|
||||
FROM customers c
|
||||
LEFT JOIN orders o ON c.id = o.customer_id;
|
||||
|
||||
-- Find customers without orders
|
||||
SELECT c.name
|
||||
FROM customers c
|
||||
LEFT JOIN orders o ON c.id = o.customer_id
|
||||
WHERE o.id IS NULL;
|
||||
```
|
||||
|
||||
### RIGHT JOIN (RIGHT OUTER JOIN)
|
||||
```sql
|
||||
-- All rows from right table, matching rows from left
|
||||
SELECT c.name, o.id AS order_id
|
||||
FROM orders o
|
||||
RIGHT JOIN customers c ON o.customer_id = c.id;
|
||||
```
|
||||
|
||||
### FULL OUTER JOIN
|
||||
```sql
|
||||
-- All rows from both tables
|
||||
SELECT c.name, o.id AS order_id
|
||||
FROM customers c
|
||||
FULL OUTER JOIN orders o ON c.id = o.customer_id;
|
||||
```
|
||||
|
||||
### CROSS JOIN
|
||||
```sql
|
||||
-- Cartesian product (all combinations)
|
||||
SELECT c.name, p.name
|
||||
FROM colors c
|
||||
CROSS JOIN products p;
|
||||
```
|
||||
|
||||
### Self Join
|
||||
```sql
|
||||
-- Join table to itself
|
||||
SELECT e1.name AS employee, e2.name AS manager
|
||||
FROM employees e1
|
||||
LEFT JOIN employees e2 ON e1.manager_id = e2.id;
|
||||
```
|
||||
|
||||
## Subqueries
|
||||
|
||||
### Scalar Subquery
|
||||
```sql
|
||||
-- Return single value
|
||||
SELECT name, salary,
|
||||
(SELECT AVG(salary) FROM employees) AS avg_salary
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
### IN Subquery
|
||||
```sql
|
||||
-- Match against set of values
|
||||
SELECT name FROM customers
|
||||
WHERE id IN (
|
||||
SELECT customer_id FROM orders WHERE total > 1000
|
||||
);
|
||||
```
|
||||
|
||||
### EXISTS Subquery
|
||||
```sql
|
||||
-- Check if subquery returns any rows
|
||||
SELECT name FROM customers c
|
||||
WHERE EXISTS (
|
||||
SELECT 1 FROM orders o WHERE o.customer_id = c.id
|
||||
);
|
||||
|
||||
-- NOT EXISTS
|
||||
SELECT name FROM customers c
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM orders o WHERE o.customer_id = c.id
|
||||
);
|
||||
```
|
||||
|
||||
### Correlated Subquery
|
||||
```sql
|
||||
-- Subquery references outer query
|
||||
SELECT name, salary FROM employees e1
|
||||
WHERE salary > (
|
||||
SELECT AVG(salary) FROM employees e2
|
||||
WHERE e2.department_id = e1.department_id
|
||||
);
|
||||
```
|
||||
|
||||
## Common Table Expressions (CTEs)
|
||||
|
||||
### Simple CTE
|
||||
```sql
|
||||
-- Named temporary result set
|
||||
WITH active_users AS (
|
||||
SELECT id, name, email FROM users WHERE status = 'active'
|
||||
)
|
||||
SELECT * FROM active_users WHERE created_at > '2024-01-01';
|
||||
```
|
||||
|
||||
### Multiple CTEs
|
||||
```sql
|
||||
WITH
|
||||
active_customers AS (
|
||||
SELECT id, name FROM customers WHERE active = true
|
||||
),
|
||||
recent_orders AS (
|
||||
SELECT customer_id, SUM(total) AS total_spent
|
||||
FROM orders
|
||||
WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
|
||||
GROUP BY customer_id
|
||||
)
|
||||
SELECT c.name, COALESCE(o.total_spent, 0) AS spent
|
||||
FROM active_customers c
|
||||
LEFT JOIN recent_orders o ON c.id = o.customer_id;
|
||||
```
|
||||
|
||||
### Recursive CTE
|
||||
```sql
|
||||
-- Tree traversal, hierarchical data
|
||||
WITH RECURSIVE category_tree AS (
|
||||
-- Base case: root categories
|
||||
SELECT id, name, parent_id, 0 AS level
|
||||
FROM categories
|
||||
WHERE parent_id IS NULL
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Recursive case: child categories
|
||||
SELECT c.id, c.name, c.parent_id, ct.level + 1
|
||||
FROM categories c
|
||||
JOIN category_tree ct ON c.parent_id = ct.id
|
||||
)
|
||||
SELECT * FROM category_tree ORDER BY level, name;
|
||||
|
||||
-- Employee hierarchy
|
||||
WITH RECURSIVE org_chart AS (
|
||||
SELECT id, name, manager_id, 1 AS level
|
||||
FROM employees
|
||||
WHERE manager_id IS NULL
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT e.id, e.name, e.manager_id, oc.level + 1
|
||||
FROM employees e
|
||||
JOIN org_chart oc ON e.manager_id = oc.id
|
||||
)
|
||||
SELECT * FROM org_chart;
|
||||
```
|
||||
|
||||
## Aggregate Functions
|
||||
|
||||
### Basic Aggregates
|
||||
```sql
|
||||
-- COUNT, SUM, AVG, MIN, MAX
|
||||
SELECT
|
||||
COUNT(*) AS total_orders,
|
||||
SUM(total) AS total_revenue,
|
||||
AVG(total) AS avg_order_value,
|
||||
MIN(total) AS min_order,
|
||||
MAX(total) AS max_order
|
||||
FROM orders;
|
||||
|
||||
-- COUNT variations
|
||||
SELECT COUNT(*) FROM users; -- All rows
|
||||
SELECT COUNT(phone_number) FROM users; -- Non-NULL values
|
||||
SELECT COUNT(DISTINCT status) FROM orders; -- Unique values
|
||||
```
|
||||
|
||||
### GROUP BY
|
||||
```sql
|
||||
-- Aggregate by groups
|
||||
SELECT status, COUNT(*) AS count
|
||||
FROM orders
|
||||
GROUP BY status;
|
||||
|
||||
-- Multiple grouping columns
|
||||
SELECT customer_id, status, COUNT(*) AS count
|
||||
FROM orders
|
||||
GROUP BY customer_id, status;
|
||||
|
||||
-- With aggregate functions
|
||||
SELECT customer_id,
|
||||
COUNT(*) AS order_count,
|
||||
SUM(total) AS total_spent,
|
||||
AVG(total) AS avg_order
|
||||
FROM orders
|
||||
GROUP BY customer_id;
|
||||
```
|
||||
|
||||
### HAVING
|
||||
```sql
|
||||
-- Filter after aggregation
|
||||
SELECT customer_id, SUM(total) AS total_spent
|
||||
FROM orders
|
||||
GROUP BY customer_id
|
||||
HAVING SUM(total) > 1000;
|
||||
|
||||
-- Multiple conditions
|
||||
SELECT status, COUNT(*) AS count
|
||||
FROM orders
|
||||
GROUP BY status
|
||||
HAVING COUNT(*) > 10;
|
||||
```
|
||||
|
||||
## Window Functions
|
||||
|
||||
### ROW_NUMBER
|
||||
```sql
|
||||
-- Assign unique number to each row
|
||||
SELECT id, name, salary,
|
||||
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
|
||||
FROM employees;
|
||||
|
||||
-- Partition by group
|
||||
SELECT id, department, salary,
|
||||
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
### RANK / DENSE_RANK
|
||||
```sql
|
||||
-- RANK: gaps in ranking for ties
|
||||
-- DENSE_RANK: no gaps
|
||||
SELECT id, name, salary,
|
||||
RANK() OVER (ORDER BY salary DESC) AS rank,
|
||||
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
### LAG / LEAD
|
||||
```sql
|
||||
-- Access previous/next row
|
||||
SELECT date, revenue,
|
||||
LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
|
||||
LEAD(revenue) OVER (ORDER BY date) AS next_revenue,
|
||||
revenue - LAG(revenue) OVER (ORDER BY date) AS change
|
||||
FROM daily_sales;
|
||||
```
|
||||
|
||||
### Running Totals
|
||||
```sql
|
||||
-- Cumulative sum
|
||||
SELECT date, amount,
|
||||
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
|
||||
FROM transactions;
|
||||
|
||||
-- Simpler syntax
|
||||
SELECT date, amount,
|
||||
SUM(amount) OVER (ORDER BY date) AS running_total
|
||||
FROM transactions;
|
||||
```
|
||||
|
||||
### Moving Averages
|
||||
```sql
|
||||
-- 7-day moving average
|
||||
SELECT date, value,
|
||||
AVG(value) OVER (
|
||||
ORDER BY date
|
||||
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
|
||||
) AS moving_avg_7d
|
||||
FROM metrics;
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### CASE Expressions
|
||||
```sql
|
||||
-- Simple CASE
|
||||
SELECT name,
|
||||
CASE status
|
||||
WHEN 'active' THEN 'Active User'
|
||||
WHEN 'pending' THEN 'Pending Verification'
|
||||
ELSE 'Inactive'
|
||||
END AS status_label
|
||||
FROM users;
|
||||
|
||||
-- Searched CASE
|
||||
SELECT name, age,
|
||||
CASE
|
||||
WHEN age < 18 THEN 'Minor'
|
||||
WHEN age BETWEEN 18 AND 65 THEN 'Adult'
|
||||
ELSE 'Senior'
|
||||
END AS age_group
|
||||
FROM users;
|
||||
```
|
||||
|
||||
### COALESCE
|
||||
```sql
|
||||
-- Return first non-NULL value
|
||||
SELECT name, COALESCE(phone_number, email, 'No contact') AS contact
|
||||
FROM users;
|
||||
```
|
||||
|
||||
### NULLIF
|
||||
```sql
|
||||
-- Return NULL if values equal
|
||||
SELECT name, NULLIF(status, 'deleted') AS active_status
|
||||
FROM users;
|
||||
```
|
||||
|
||||
### Array Operations
|
||||
```sql
|
||||
-- Array aggregate
|
||||
SELECT customer_id, ARRAY_AGG(product_id) AS products
|
||||
FROM order_items
|
||||
GROUP BY customer_id;
|
||||
|
||||
-- Unnest array
|
||||
SELECT unnest(ARRAY[1, 2, 3, 4, 5]);
|
||||
|
||||
-- Array contains
|
||||
SELECT * FROM products WHERE tags @> ARRAY['featured'];
|
||||
```
|
||||
|
||||
### JSON Operations
|
||||
```sql
|
||||
-- Query JSON/JSONB
|
||||
SELECT data->>'name' AS name FROM documents;
|
||||
SELECT data->'address'->>'city' AS city FROM documents;
|
||||
|
||||
-- Check key exists
|
||||
SELECT * FROM documents WHERE data ? 'email';
|
||||
|
||||
-- JSONB operators
|
||||
SELECT * FROM documents WHERE data @> '{"status": "active"}';
|
||||
|
||||
-- JSON aggregation
|
||||
SELECT json_agg(name) FROM users;
|
||||
SELECT json_object_agg(id, name) FROM users;
|
||||
```
|
||||
|
||||
## Set Operations
|
||||
|
||||
### UNION
|
||||
```sql
|
||||
-- Combine results (removes duplicates)
|
||||
SELECT name FROM customers
|
||||
UNION
|
||||
SELECT name FROM suppliers;
|
||||
|
||||
-- Keep duplicates
|
||||
SELECT name FROM customers
|
||||
UNION ALL
|
||||
SELECT name FROM suppliers;
|
||||
```
|
||||
|
||||
### INTERSECT
|
||||
```sql
|
||||
-- Common rows
|
||||
SELECT email FROM users
|
||||
INTERSECT
|
||||
SELECT email FROM subscribers;
|
||||
```
|
||||
|
||||
### EXCEPT
|
||||
```sql
|
||||
-- Rows in first query but not second
|
||||
SELECT email FROM users
|
||||
EXCEPT
|
||||
SELECT email FROM unsubscribed;
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use indexes** on WHERE, JOIN, ORDER BY columns
|
||||
2. **Avoid SELECT *** - specify needed columns
|
||||
3. **Use EXISTS** instead of IN for large subqueries
|
||||
4. **Filter early** - WHERE before JOIN when possible
|
||||
5. **Use CTEs** for readability over nested subqueries
|
||||
6. **Parameterize queries** - prevent SQL injection
|
||||
7. **Use window functions** instead of self-joins
|
||||
8. **Test with EXPLAIN** - analyze query plans
|
||||
BIN
.opencode/skills/databases/scripts/.coverage
Normal file
BIN
.opencode/skills/databases/scripts/.coverage
Normal file
Binary file not shown.
502
.opencode/skills/databases/scripts/db_backup.py
Executable file
502
.opencode/skills/databases/scripts/db_backup.py
Executable file
@@ -0,0 +1,502 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Database backup and restore tool for MongoDB and PostgreSQL.
|
||||
Supports compression, scheduling, and verification.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import gzip
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class BackupInfo:
|
||||
"""Backup metadata."""
|
||||
|
||||
filename: str
|
||||
database_type: str
|
||||
database_name: str
|
||||
timestamp: datetime
|
||||
size_bytes: int
|
||||
compressed: bool
|
||||
verified: bool = False
|
||||
|
||||
|
||||
class BackupManager:
|
||||
"""Manages database backups for MongoDB and PostgreSQL."""
|
||||
|
||||
def __init__(self, db_type: str, backup_dir: str = "./backups"):
|
||||
"""
|
||||
Initialize backup manager.
|
||||
|
||||
Args:
|
||||
db_type: Database type ('mongodb' or 'postgres')
|
||||
backup_dir: Directory to store backups
|
||||
"""
|
||||
self.db_type = db_type.lower()
|
||||
self.backup_dir = Path(backup_dir)
|
||||
self.backup_dir.mkdir(exist_ok=True)
|
||||
|
||||
def create_backup(
|
||||
self,
|
||||
uri: str,
|
||||
database: Optional[str] = None,
|
||||
compress: bool = True,
|
||||
verify: bool = True
|
||||
) -> Optional[BackupInfo]:
|
||||
"""
|
||||
Create database backup.
|
||||
|
||||
Args:
|
||||
uri: Database connection string
|
||||
database: Database name (optional for MongoDB)
|
||||
compress: Compress backup file
|
||||
verify: Verify backup after creation
|
||||
|
||||
Returns:
|
||||
BackupInfo if successful, None otherwise
|
||||
"""
|
||||
timestamp = datetime.now()
|
||||
date_str = timestamp.strftime("%Y%m%d_%H%M%S")
|
||||
|
||||
if self.db_type == "mongodb":
|
||||
return self._backup_mongodb(uri, database, date_str, compress, verify)
|
||||
elif self.db_type == "postgres":
|
||||
return self._backup_postgres(uri, database, date_str, compress, verify)
|
||||
else:
|
||||
print(f"Error: Unsupported database type: {self.db_type}")
|
||||
return None
|
||||
|
||||
def _backup_mongodb(
|
||||
self,
|
||||
uri: str,
|
||||
database: Optional[str],
|
||||
date_str: str,
|
||||
compress: bool,
|
||||
verify: bool
|
||||
) -> Optional[BackupInfo]:
|
||||
"""Create MongoDB backup using mongodump."""
|
||||
db_name = database or "all"
|
||||
filename = f"mongodb_{db_name}_{date_str}"
|
||||
backup_path = self.backup_dir / filename
|
||||
|
||||
try:
|
||||
cmd = ["mongodump", "--uri", uri, "--out", str(backup_path)]
|
||||
|
||||
if database:
|
||||
cmd.extend(["--db", database])
|
||||
|
||||
print(f"Creating MongoDB backup: {filename}")
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: {result.stderr}")
|
||||
return None
|
||||
|
||||
# Compress if requested
|
||||
if compress:
|
||||
archive_path = backup_path.with_suffix(".tar.gz")
|
||||
print(f"Compressing backup...")
|
||||
shutil.make_archive(str(backup_path), "gztar", backup_path)
|
||||
shutil.rmtree(backup_path)
|
||||
backup_path = archive_path
|
||||
filename = archive_path.name
|
||||
|
||||
size_bytes = self._get_size(backup_path)
|
||||
|
||||
backup_info = BackupInfo(
|
||||
filename=filename,
|
||||
database_type="mongodb",
|
||||
database_name=db_name,
|
||||
timestamp=datetime.now(),
|
||||
size_bytes=size_bytes,
|
||||
compressed=compress
|
||||
)
|
||||
|
||||
if verify:
|
||||
backup_info.verified = self._verify_backup(backup_info)
|
||||
|
||||
self._save_metadata(backup_info)
|
||||
print(f"✓ Backup created: {filename} ({self._format_size(size_bytes)})")
|
||||
|
||||
return backup_info
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error creating MongoDB backup: {e}")
|
||||
return None
|
||||
|
||||
def _backup_postgres(
|
||||
self,
|
||||
uri: str,
|
||||
database: str,
|
||||
date_str: str,
|
||||
compress: bool,
|
||||
verify: bool
|
||||
) -> Optional[BackupInfo]:
|
||||
"""Create PostgreSQL backup using pg_dump."""
|
||||
if not database:
|
||||
print("Error: Database name required for PostgreSQL backup")
|
||||
return None
|
||||
|
||||
ext = ".sql.gz" if compress else ".sql"
|
||||
filename = f"postgres_{database}_{date_str}{ext}"
|
||||
backup_path = self.backup_dir / filename
|
||||
|
||||
try:
|
||||
cmd = ["pg_dump", uri]
|
||||
|
||||
if compress:
|
||||
# Use pg_dump with gzip
|
||||
with open(backup_path, "wb") as f:
|
||||
dump_proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
|
||||
gzip_proc = subprocess.Popen(
|
||||
["gzip"],
|
||||
stdin=dump_proc.stdout,
|
||||
stdout=f
|
||||
)
|
||||
dump_proc.stdout.close()
|
||||
gzip_proc.communicate()
|
||||
|
||||
if dump_proc.returncode != 0:
|
||||
print("Error: pg_dump failed")
|
||||
return None
|
||||
else:
|
||||
with open(backup_path, "w") as f:
|
||||
result = subprocess.run(cmd, stdout=f, stderr=subprocess.PIPE, text=True)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: {result.stderr}")
|
||||
return None
|
||||
|
||||
size_bytes = backup_path.stat().st_size
|
||||
|
||||
backup_info = BackupInfo(
|
||||
filename=filename,
|
||||
database_type="postgres",
|
||||
database_name=database,
|
||||
timestamp=datetime.now(),
|
||||
size_bytes=size_bytes,
|
||||
compressed=compress
|
||||
)
|
||||
|
||||
if verify:
|
||||
backup_info.verified = self._verify_backup(backup_info)
|
||||
|
||||
self._save_metadata(backup_info)
|
||||
print(f"✓ Backup created: {filename} ({self._format_size(size_bytes)})")
|
||||
|
||||
return backup_info
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error creating PostgreSQL backup: {e}")
|
||||
return None
|
||||
|
||||
def restore_backup(self, filename: str, uri: str, dry_run: bool = False) -> bool:
|
||||
"""
|
||||
Restore database from backup.
|
||||
|
||||
Args:
|
||||
filename: Backup filename
|
||||
uri: Database connection string
|
||||
dry_run: If True, only show what would be done
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
backup_path = self.backup_dir / filename
|
||||
|
||||
if not backup_path.exists():
|
||||
print(f"Error: Backup not found: {filename}")
|
||||
return False
|
||||
|
||||
# Load metadata
|
||||
metadata_path = backup_path.with_suffix(".json")
|
||||
if metadata_path.exists():
|
||||
with open(metadata_path) as f:
|
||||
metadata = json.load(f)
|
||||
print(f"Restoring backup from {metadata['timestamp']}")
|
||||
print(f"Database: {metadata['database_name']}")
|
||||
|
||||
if dry_run:
|
||||
print(f"Would restore from: {backup_path}")
|
||||
return True
|
||||
|
||||
print(f"Restoring backup: {filename}")
|
||||
|
||||
try:
|
||||
if self.db_type == "mongodb":
|
||||
return self._restore_mongodb(backup_path, uri)
|
||||
elif self.db_type == "postgres":
|
||||
return self._restore_postgres(backup_path, uri)
|
||||
else:
|
||||
print(f"Error: Unsupported database type: {self.db_type}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error restoring backup: {e}")
|
||||
return False
|
||||
|
||||
def _restore_mongodb(self, backup_path: Path, uri: str) -> bool:
|
||||
"""Restore MongoDB backup using mongorestore."""
|
||||
try:
|
||||
# Extract if compressed
|
||||
restore_path = backup_path
|
||||
if backup_path.suffix == ".gz":
|
||||
print("Extracting backup...")
|
||||
extract_path = backup_path.with_suffix("")
|
||||
shutil.unpack_archive(backup_path, extract_path)
|
||||
restore_path = extract_path
|
||||
|
||||
cmd = ["mongorestore", "--uri", uri, str(restore_path)]
|
||||
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
|
||||
# Cleanup extracted files
|
||||
if restore_path != backup_path and restore_path.is_dir():
|
||||
shutil.rmtree(restore_path)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: {result.stderr}")
|
||||
return False
|
||||
|
||||
print("✓ Restore completed")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error restoring MongoDB: {e}")
|
||||
return False
|
||||
|
||||
def _restore_postgres(self, backup_path: Path, uri: str) -> bool:
|
||||
"""Restore PostgreSQL backup using psql."""
|
||||
try:
|
||||
if backup_path.suffix == ".gz":
|
||||
# Decompress and restore
|
||||
with gzip.open(backup_path, "rb") as f:
|
||||
cmd = ["psql", uri]
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
stdin=f,
|
||||
capture_output=True,
|
||||
text=False
|
||||
)
|
||||
else:
|
||||
with open(backup_path) as f:
|
||||
cmd = ["psql", uri]
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
stdin=f,
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: {result.stderr}")
|
||||
return False
|
||||
|
||||
print("✓ Restore completed")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error restoring PostgreSQL: {e}")
|
||||
return False
|
||||
|
||||
def list_backups(self) -> List[BackupInfo]:
|
||||
"""
|
||||
List all backups.
|
||||
|
||||
Returns:
|
||||
List of BackupInfo objects
|
||||
"""
|
||||
backups = []
|
||||
|
||||
for metadata_file in sorted(self.backup_dir.glob("*.json")):
|
||||
try:
|
||||
with open(metadata_file) as f:
|
||||
data = json.load(f)
|
||||
|
||||
backup_info = BackupInfo(
|
||||
filename=data["filename"],
|
||||
database_type=data["database_type"],
|
||||
database_name=data["database_name"],
|
||||
timestamp=datetime.fromisoformat(data["timestamp"]),
|
||||
size_bytes=data["size_bytes"],
|
||||
compressed=data["compressed"],
|
||||
verified=data.get("verified", False)
|
||||
)
|
||||
backups.append(backup_info)
|
||||
except Exception as e:
|
||||
print(f"Error reading metadata {metadata_file}: {e}")
|
||||
|
||||
return backups
|
||||
|
||||
def cleanup_old_backups(self, retention_days: int, dry_run: bool = False) -> int:
|
||||
"""
|
||||
Remove backups older than retention period.
|
||||
|
||||
Args:
|
||||
retention_days: Number of days to retain backups
|
||||
dry_run: If True, only show what would be deleted
|
||||
|
||||
Returns:
|
||||
Number of backups removed
|
||||
"""
|
||||
cutoff = datetime.now().timestamp() - (retention_days * 24 * 3600)
|
||||
removed = 0
|
||||
|
||||
for backup_file in self.backup_dir.glob("*"):
|
||||
if backup_file.suffix == ".json":
|
||||
continue
|
||||
|
||||
if backup_file.stat().st_mtime < cutoff:
|
||||
if dry_run:
|
||||
print(f"Would remove: {backup_file.name}")
|
||||
else:
|
||||
print(f"Removing: {backup_file.name}")
|
||||
backup_file.unlink()
|
||||
# Remove metadata
|
||||
metadata_file = backup_file.with_suffix(".json")
|
||||
if metadata_file.exists():
|
||||
metadata_file.unlink()
|
||||
removed += 1
|
||||
|
||||
return removed
|
||||
|
||||
def _verify_backup(self, backup_info: BackupInfo) -> bool:
|
||||
"""
|
||||
Verify backup integrity.
|
||||
|
||||
Args:
|
||||
backup_info: Backup information
|
||||
|
||||
Returns:
|
||||
True if backup is valid, False otherwise
|
||||
"""
|
||||
backup_path = self.backup_dir / backup_info.filename
|
||||
|
||||
if not backup_path.exists():
|
||||
return False
|
||||
|
||||
# Basic verification: file exists and has size > 0
|
||||
if backup_path.stat().st_size == 0:
|
||||
return False
|
||||
|
||||
# Could add more verification here (checksums, test restore, etc.)
|
||||
return True
|
||||
|
||||
def _get_size(self, path: Path) -> int:
|
||||
"""Get total size of file or directory."""
|
||||
if path.is_file():
|
||||
return path.stat().st_size
|
||||
elif path.is_dir():
|
||||
total = 0
|
||||
for item in path.rglob("*"):
|
||||
if item.is_file():
|
||||
total += item.stat().st_size
|
||||
return total
|
||||
return 0
|
||||
|
||||
def _format_size(self, size_bytes: int) -> str:
|
||||
"""Format size in human-readable format."""
|
||||
for unit in ["B", "KB", "MB", "GB", "TB"]:
|
||||
if size_bytes < 1024:
|
||||
return f"{size_bytes:.2f} {unit}"
|
||||
size_bytes /= 1024
|
||||
return f"{size_bytes:.2f} PB"
|
||||
|
||||
def _save_metadata(self, backup_info: BackupInfo):
|
||||
"""Save backup metadata to JSON file."""
|
||||
metadata_path = self.backup_dir / f"{backup_info.filename}.json"
|
||||
|
||||
metadata = {
|
||||
"filename": backup_info.filename,
|
||||
"database_type": backup_info.database_type,
|
||||
"database_name": backup_info.database_name,
|
||||
"timestamp": backup_info.timestamp.isoformat(),
|
||||
"size_bytes": backup_info.size_bytes,
|
||||
"compressed": backup_info.compressed,
|
||||
"verified": backup_info.verified
|
||||
}
|
||||
|
||||
with open(metadata_path, "w") as f:
|
||||
json.dump(metadata, f, indent=2)
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(description="Database backup tool")
|
||||
parser.add_argument("--db", required=True, choices=["mongodb", "postgres"],
|
||||
help="Database type")
|
||||
parser.add_argument("--backup-dir", default="./backups",
|
||||
help="Backup directory")
|
||||
|
||||
subparsers = parser.add_subparsers(dest="command", required=True)
|
||||
|
||||
# Backup command
|
||||
backup_parser = subparsers.add_parser("backup", help="Create backup")
|
||||
backup_parser.add_argument("--uri", required=True, help="Database connection string")
|
||||
backup_parser.add_argument("--database", help="Database name")
|
||||
backup_parser.add_argument("--no-compress", action="store_true",
|
||||
help="Disable compression")
|
||||
backup_parser.add_argument("--no-verify", action="store_true",
|
||||
help="Skip verification")
|
||||
|
||||
# Restore command
|
||||
restore_parser = subparsers.add_parser("restore", help="Restore backup")
|
||||
restore_parser.add_argument("filename", help="Backup filename")
|
||||
restore_parser.add_argument("--uri", required=True, help="Database connection string")
|
||||
restore_parser.add_argument("--dry-run", action="store_true",
|
||||
help="Show what would be done")
|
||||
|
||||
# List command
|
||||
subparsers.add_parser("list", help="List backups")
|
||||
|
||||
# Cleanup command
|
||||
cleanup_parser = subparsers.add_parser("cleanup", help="Remove old backups")
|
||||
cleanup_parser.add_argument("--retention-days", type=int, default=7,
|
||||
help="Days to retain backups (default: 7)")
|
||||
cleanup_parser.add_argument("--dry-run", action="store_true",
|
||||
help="Show what would be removed")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
manager = BackupManager(args.db, args.backup_dir)
|
||||
|
||||
if args.command == "backup":
|
||||
backup_info = manager.create_backup(
|
||||
args.uri,
|
||||
args.database,
|
||||
compress=not args.no_compress,
|
||||
verify=not args.no_verify
|
||||
)
|
||||
sys.exit(0 if backup_info else 1)
|
||||
|
||||
elif args.command == "restore":
|
||||
success = manager.restore_backup(args.filename, args.uri, args.dry_run)
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
elif args.command == "list":
|
||||
backups = manager.list_backups()
|
||||
print(f"Total backups: {len(backups)}\n")
|
||||
for backup in backups:
|
||||
verified_str = "✓" if backup.verified else "?"
|
||||
print(f"[{verified_str}] {backup.filename}")
|
||||
print(f" Database: {backup.database_name}")
|
||||
print(f" Created: {backup.timestamp}")
|
||||
print(f" Size: {manager._format_size(backup.size_bytes)}")
|
||||
print()
|
||||
|
||||
elif args.command == "cleanup":
|
||||
removed = manager.cleanup_old_backups(args.retention_days, args.dry_run)
|
||||
print(f"Removed {removed} backup(s)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
426
.opencode/skills/databases/scripts/db_migrate.py
Executable file
426
.opencode/skills/databases/scripts/db_migrate.py
Executable file
@@ -0,0 +1,426 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Database migration tool for MongoDB and PostgreSQL.
|
||||
Generates and applies schema migrations with rollback support.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
# Windows UTF-8 compatibility (works for both local and global installs)
|
||||
CLAUDE_ROOT = Path(__file__).parent.parent.parent.parent
|
||||
sys.path.insert(0, str(CLAUDE_ROOT / 'scripts'))
|
||||
try:
|
||||
from win_compat import ensure_utf8_stdout
|
||||
ensure_utf8_stdout()
|
||||
except ImportError:
|
||||
if sys.platform == 'win32':
|
||||
import io
|
||||
if hasattr(sys.stdout, 'buffer'):
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
|
||||
|
||||
try:
|
||||
from pymongo import MongoClient
|
||||
MONGO_AVAILABLE = True
|
||||
except ImportError:
|
||||
MONGO_AVAILABLE = False
|
||||
|
||||
try:
|
||||
import psycopg2
|
||||
from psycopg2 import sql
|
||||
POSTGRES_AVAILABLE = True
|
||||
except ImportError:
|
||||
POSTGRES_AVAILABLE = False
|
||||
|
||||
|
||||
@dataclass
|
||||
class Migration:
|
||||
"""Represents a database migration."""
|
||||
|
||||
id: str
|
||||
name: str
|
||||
timestamp: datetime
|
||||
database_type: str
|
||||
up_sql: Optional[str] = None
|
||||
down_sql: Optional[str] = None
|
||||
mongodb_operations: Optional[List[Dict[str, Any]]] = None
|
||||
applied: bool = False
|
||||
|
||||
|
||||
class MigrationManager:
|
||||
"""Manages database migrations for MongoDB and PostgreSQL."""
|
||||
|
||||
def __init__(self, db_type: str, connection_string: str, migrations_dir: str = "./migrations"):
|
||||
"""
|
||||
Initialize migration manager.
|
||||
|
||||
Args:
|
||||
db_type: Database type ('mongodb' or 'postgres')
|
||||
connection_string: Database connection string
|
||||
migrations_dir: Directory to store migration files
|
||||
"""
|
||||
self.db_type = db_type.lower()
|
||||
self.connection_string = connection_string
|
||||
self.migrations_dir = Path(migrations_dir)
|
||||
self.migrations_dir.mkdir(exist_ok=True)
|
||||
|
||||
self.client = None
|
||||
self.db = None
|
||||
self.conn = None
|
||||
|
||||
def connect(self) -> bool:
|
||||
"""
|
||||
Connect to database.
|
||||
|
||||
Returns:
|
||||
True if connection successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
if self.db_type == "mongodb":
|
||||
if not MONGO_AVAILABLE:
|
||||
print("Error: pymongo not installed")
|
||||
return False
|
||||
self.client = MongoClient(self.connection_string)
|
||||
self.db = self.client.get_default_database()
|
||||
# Test connection
|
||||
self.client.server_info()
|
||||
return True
|
||||
|
||||
elif self.db_type == "postgres":
|
||||
if not POSTGRES_AVAILABLE:
|
||||
print("Error: psycopg2 not installed")
|
||||
return False
|
||||
self.conn = psycopg2.connect(self.connection_string)
|
||||
return True
|
||||
|
||||
else:
|
||||
print(f"Error: Unsupported database type: {self.db_type}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"Connection error: {e}")
|
||||
return False
|
||||
|
||||
def disconnect(self):
|
||||
"""Disconnect from database."""
|
||||
try:
|
||||
if self.client:
|
||||
self.client.close()
|
||||
if self.conn:
|
||||
self.conn.close()
|
||||
except Exception as e:
|
||||
print(f"Disconnect error: {e}")
|
||||
|
||||
def _ensure_migrations_table(self):
|
||||
"""Create migrations tracking table/collection if not exists."""
|
||||
if self.db_type == "mongodb":
|
||||
# MongoDB creates collection automatically
|
||||
pass
|
||||
elif self.db_type == "postgres":
|
||||
with self.conn.cursor() as cur:
|
||||
cur.execute("""
|
||||
CREATE TABLE IF NOT EXISTS migrations (
|
||||
id VARCHAR(255) PRIMARY KEY,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
""")
|
||||
self.conn.commit()
|
||||
|
||||
def generate_migration(self, name: str, dry_run: bool = False) -> Optional[Migration]:
|
||||
"""
|
||||
Generate new migration file.
|
||||
|
||||
Args:
|
||||
name: Migration name
|
||||
dry_run: If True, only show what would be generated
|
||||
|
||||
Returns:
|
||||
Migration object if successful, None otherwise
|
||||
"""
|
||||
timestamp = datetime.now()
|
||||
migration_id = timestamp.strftime("%Y%m%d%H%M%S")
|
||||
filename = f"{migration_id}_{name}.json"
|
||||
filepath = self.migrations_dir / filename
|
||||
|
||||
migration = Migration(
|
||||
id=migration_id,
|
||||
name=name,
|
||||
timestamp=timestamp,
|
||||
database_type=self.db_type
|
||||
)
|
||||
|
||||
if self.db_type == "mongodb":
|
||||
migration.mongodb_operations = [
|
||||
{
|
||||
"operation": "createIndex",
|
||||
"collection": "example_collection",
|
||||
"index": {"field": 1},
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
elif self.db_type == "postgres":
|
||||
migration.up_sql = "-- Add your SQL here\n"
|
||||
migration.down_sql = "-- Add rollback SQL here\n"
|
||||
|
||||
migration_data = {
|
||||
"id": migration.id,
|
||||
"name": migration.name,
|
||||
"timestamp": migration.timestamp.isoformat(),
|
||||
"database_type": migration.database_type,
|
||||
"up_sql": migration.up_sql,
|
||||
"down_sql": migration.down_sql,
|
||||
"mongodb_operations": migration.mongodb_operations
|
||||
}
|
||||
|
||||
if dry_run:
|
||||
print(f"Would create: {filepath}")
|
||||
print(json.dumps(migration_data, indent=2))
|
||||
return migration
|
||||
|
||||
try:
|
||||
with open(filepath, "w") as f:
|
||||
json.dump(migration_data, f, indent=2)
|
||||
print(f"Created migration: {filepath}")
|
||||
return migration
|
||||
except Exception as e:
|
||||
print(f"Error creating migration: {e}")
|
||||
return None
|
||||
|
||||
def get_pending_migrations(self) -> List[Migration]:
|
||||
"""
|
||||
Get list of pending migrations.
|
||||
|
||||
Returns:
|
||||
List of pending Migration objects
|
||||
"""
|
||||
# Get applied migrations
|
||||
applied_ids = set()
|
||||
|
||||
try:
|
||||
if self.db_type == "mongodb":
|
||||
applied_ids = {
|
||||
doc["id"] for doc in self.db.migrations.find({}, {"id": 1})
|
||||
}
|
||||
elif self.db_type == "postgres":
|
||||
with self.conn.cursor() as cur:
|
||||
cur.execute("SELECT id FROM migrations")
|
||||
applied_ids = {row[0] for row in cur.fetchall()}
|
||||
except Exception as e:
|
||||
print(f"Error reading applied migrations: {e}")
|
||||
|
||||
# Get all migration files
|
||||
pending = []
|
||||
for filepath in sorted(self.migrations_dir.glob("*.json")):
|
||||
try:
|
||||
with open(filepath) as f:
|
||||
data = json.load(f)
|
||||
|
||||
if data["id"] not in applied_ids:
|
||||
migration = Migration(
|
||||
id=data["id"],
|
||||
name=data["name"],
|
||||
timestamp=datetime.fromisoformat(data["timestamp"]),
|
||||
database_type=data["database_type"],
|
||||
up_sql=data.get("up_sql"),
|
||||
down_sql=data.get("down_sql"),
|
||||
mongodb_operations=data.get("mongodb_operations")
|
||||
)
|
||||
pending.append(migration)
|
||||
except Exception as e:
|
||||
print(f"Error reading {filepath}: {e}")
|
||||
|
||||
return pending
|
||||
|
||||
def apply_migration(self, migration: Migration, dry_run: bool = False) -> bool:
|
||||
"""
|
||||
Apply migration.
|
||||
|
||||
Args:
|
||||
migration: Migration to apply
|
||||
dry_run: If True, only show what would be executed
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
print(f"Applying migration: {migration.id} - {migration.name}")
|
||||
|
||||
if dry_run:
|
||||
if self.db_type == "mongodb":
|
||||
print("MongoDB operations:")
|
||||
print(json.dumps(migration.mongodb_operations, indent=2))
|
||||
elif self.db_type == "postgres":
|
||||
print("SQL to execute:")
|
||||
print(migration.up_sql)
|
||||
return True
|
||||
|
||||
try:
|
||||
if self.db_type == "mongodb":
|
||||
for op in migration.mongodb_operations or []:
|
||||
if op["operation"] == "createIndex":
|
||||
self.db[op["collection"]].create_index(
|
||||
list(op["index"].items()),
|
||||
**op.get("options", {})
|
||||
)
|
||||
|
||||
# Record migration
|
||||
self.db.migrations.insert_one({
|
||||
"id": migration.id,
|
||||
"name": migration.name,
|
||||
"applied_at": datetime.now()
|
||||
})
|
||||
|
||||
elif self.db_type == "postgres":
|
||||
with self.conn.cursor() as cur:
|
||||
cur.execute(migration.up_sql)
|
||||
|
||||
# Record migration
|
||||
cur.execute(
|
||||
"INSERT INTO migrations (id, name) VALUES (%s, %s)",
|
||||
(migration.id, migration.name)
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
print(f"✓ Applied: {migration.id}")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"✗ Error applying migration: {e}")
|
||||
if self.conn:
|
||||
self.conn.rollback()
|
||||
return False
|
||||
|
||||
def rollback_migration(self, migration_id: str, dry_run: bool = False) -> bool:
|
||||
"""
|
||||
Rollback migration.
|
||||
|
||||
Args:
|
||||
migration_id: Migration ID to rollback
|
||||
dry_run: If True, only show what would be executed
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
# Find migration file
|
||||
migration_file = None
|
||||
for filepath in self.migrations_dir.glob(f"{migration_id}_*.json"):
|
||||
migration_file = filepath
|
||||
break
|
||||
|
||||
if not migration_file:
|
||||
print(f"Migration not found: {migration_id}")
|
||||
return False
|
||||
|
||||
try:
|
||||
with open(migration_file) as f:
|
||||
data = json.load(f)
|
||||
|
||||
print(f"Rolling back: {migration_id} - {data['name']}")
|
||||
|
||||
if dry_run:
|
||||
if self.db_type == "postgres":
|
||||
print("SQL to execute:")
|
||||
print(data.get("down_sql", "-- No rollback defined"))
|
||||
return True
|
||||
|
||||
if self.db_type == "postgres" and data.get("down_sql"):
|
||||
with self.conn.cursor() as cur:
|
||||
cur.execute(data["down_sql"])
|
||||
cur.execute("DELETE FROM migrations WHERE id = %s", (migration_id,))
|
||||
self.conn.commit()
|
||||
elif self.db_type == "mongodb":
|
||||
self.db.migrations.delete_one({"id": migration_id})
|
||||
|
||||
print(f"✓ Rolled back: {migration_id}")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"✗ Error rolling back: {e}")
|
||||
if self.conn:
|
||||
self.conn.rollback()
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(description="Database migration tool")
|
||||
parser.add_argument("--db", required=True, choices=["mongodb", "postgres"],
|
||||
help="Database type")
|
||||
parser.add_argument("--uri", help="Database connection string")
|
||||
parser.add_argument("--migrations-dir", default="./migrations",
|
||||
help="Migrations directory")
|
||||
|
||||
subparsers = parser.add_subparsers(dest="command", required=True)
|
||||
|
||||
# Generate command
|
||||
gen_parser = subparsers.add_parser("generate", help="Generate new migration")
|
||||
gen_parser.add_argument("name", help="Migration name")
|
||||
gen_parser.add_argument("--dry-run", action="store_true",
|
||||
help="Show what would be generated")
|
||||
|
||||
# Apply command
|
||||
apply_parser = subparsers.add_parser("apply", help="Apply pending migrations")
|
||||
apply_parser.add_argument("--dry-run", action="store_true",
|
||||
help="Show what would be executed")
|
||||
|
||||
# Rollback command
|
||||
rollback_parser = subparsers.add_parser("rollback", help="Rollback migration")
|
||||
rollback_parser.add_argument("id", help="Migration ID to rollback")
|
||||
rollback_parser.add_argument("--dry-run", action="store_true",
|
||||
help="Show what would be executed")
|
||||
|
||||
# Status command
|
||||
subparsers.add_parser("status", help="Show migration status")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# For generate, we don't need connection
|
||||
if args.command == "generate":
|
||||
manager = MigrationManager(args.db, "", args.migrations_dir)
|
||||
migration = manager.generate_migration(args.name, args.dry_run)
|
||||
sys.exit(0 if migration else 1)
|
||||
|
||||
# Other commands need connection
|
||||
if not args.uri:
|
||||
print("Error: --uri required for this command")
|
||||
sys.exit(1)
|
||||
|
||||
manager = MigrationManager(args.db, args.uri, args.migrations_dir)
|
||||
|
||||
if not manager.connect():
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
manager._ensure_migrations_table()
|
||||
|
||||
if args.command == "status":
|
||||
pending = manager.get_pending_migrations()
|
||||
print(f"Pending migrations: {len(pending)}")
|
||||
for migration in pending:
|
||||
print(f" {migration.id} - {migration.name}")
|
||||
|
||||
elif args.command == "apply":
|
||||
pending = manager.get_pending_migrations()
|
||||
if not pending:
|
||||
print("No pending migrations")
|
||||
else:
|
||||
for migration in pending:
|
||||
if not manager.apply_migration(migration, args.dry_run):
|
||||
sys.exit(1)
|
||||
|
||||
elif args.command == "rollback":
|
||||
if not manager.rollback_migration(args.id, args.dry_run):
|
||||
sys.exit(1)
|
||||
|
||||
finally:
|
||||
manager.disconnect()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
457
.opencode/skills/databases/scripts/db_performance_check.py
Executable file
457
.opencode/skills/databases/scripts/db_performance_check.py
Executable file
@@ -0,0 +1,457 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Database performance analysis tool for MongoDB and PostgreSQL.
|
||||
Analyzes slow queries, recommends indexes, and generates reports.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from dataclasses import dataclass, asdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
# Windows UTF-8 compatibility (works for both local and global installs)
|
||||
CLAUDE_ROOT = Path(__file__).parent.parent.parent.parent
|
||||
sys.path.insert(0, str(CLAUDE_ROOT / 'scripts'))
|
||||
try:
|
||||
from win_compat import ensure_utf8_stdout
|
||||
ensure_utf8_stdout()
|
||||
except ImportError:
|
||||
if sys.platform == 'win32':
|
||||
import io
|
||||
if hasattr(sys.stdout, 'buffer'):
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
|
||||
|
||||
try:
|
||||
from pymongo import MongoClient
|
||||
MONGO_AVAILABLE = True
|
||||
except ImportError:
|
||||
MONGO_AVAILABLE = False
|
||||
|
||||
try:
|
||||
import psycopg2
|
||||
from psycopg2.extras import RealDictCursor
|
||||
POSTGRES_AVAILABLE = True
|
||||
except ImportError:
|
||||
POSTGRES_AVAILABLE = False
|
||||
|
||||
|
||||
@dataclass
|
||||
class SlowQuery:
|
||||
"""Represents a slow query."""
|
||||
|
||||
query: str
|
||||
execution_time_ms: float
|
||||
count: int
|
||||
collection_or_table: Optional[str] = None
|
||||
index_used: Optional[str] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class IndexRecommendation:
|
||||
"""Index recommendation."""
|
||||
|
||||
collection_or_table: str
|
||||
fields: List[str]
|
||||
reason: str
|
||||
estimated_benefit: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class PerformanceReport:
|
||||
"""Performance analysis report."""
|
||||
|
||||
database_type: str
|
||||
database_name: str
|
||||
timestamp: datetime
|
||||
slow_queries: List[SlowQuery]
|
||||
index_recommendations: List[IndexRecommendation]
|
||||
database_metrics: Dict[str, any]
|
||||
|
||||
|
||||
class PerformanceAnalyzer:
|
||||
"""Analyzes database performance."""
|
||||
|
||||
def __init__(self, db_type: str, connection_string: str, threshold_ms: int = 100):
|
||||
"""
|
||||
Initialize performance analyzer.
|
||||
|
||||
Args:
|
||||
db_type: Database type ('mongodb' or 'postgres')
|
||||
connection_string: Database connection string
|
||||
threshold_ms: Slow query threshold in milliseconds
|
||||
"""
|
||||
self.db_type = db_type.lower()
|
||||
self.connection_string = connection_string
|
||||
self.threshold_ms = threshold_ms
|
||||
|
||||
self.client = None
|
||||
self.db = None
|
||||
self.conn = None
|
||||
|
||||
def connect(self) -> bool:
|
||||
"""Connect to database."""
|
||||
try:
|
||||
if self.db_type == "mongodb":
|
||||
if not MONGO_AVAILABLE:
|
||||
print("Error: pymongo not installed")
|
||||
return False
|
||||
self.client = MongoClient(self.connection_string)
|
||||
self.db = self.client.get_default_database()
|
||||
self.client.server_info()
|
||||
return True
|
||||
|
||||
elif self.db_type == "postgres":
|
||||
if not POSTGRES_AVAILABLE:
|
||||
print("Error: psycopg2 not installed")
|
||||
return False
|
||||
self.conn = psycopg2.connect(self.connection_string)
|
||||
return True
|
||||
|
||||
else:
|
||||
print(f"Error: Unsupported database type: {self.db_type}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f"Connection error: {e}")
|
||||
return False
|
||||
|
||||
def disconnect(self):
|
||||
"""Disconnect from database."""
|
||||
try:
|
||||
if self.client:
|
||||
self.client.close()
|
||||
if self.conn:
|
||||
self.conn.close()
|
||||
except Exception as e:
|
||||
print(f"Disconnect error: {e}")
|
||||
|
||||
def analyze(self) -> Optional[PerformanceReport]:
|
||||
"""
|
||||
Analyze database performance.
|
||||
|
||||
Returns:
|
||||
PerformanceReport if successful, None otherwise
|
||||
"""
|
||||
try:
|
||||
if self.db_type == "mongodb":
|
||||
return self._analyze_mongodb()
|
||||
elif self.db_type == "postgres":
|
||||
return self._analyze_postgres()
|
||||
else:
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"Analysis error: {e}")
|
||||
return None
|
||||
|
||||
def _analyze_mongodb(self) -> PerformanceReport:
|
||||
"""Analyze MongoDB performance."""
|
||||
slow_queries = []
|
||||
index_recommendations = []
|
||||
|
||||
# Enable profiling if not enabled
|
||||
profiling_level = self.db.command("profile", -1)
|
||||
if profiling_level.get("was", 0) == 0:
|
||||
self.db.command("profile", 1, slowms=self.threshold_ms)
|
||||
|
||||
# Get slow queries from system.profile
|
||||
for doc in self.db.system.profile.find(
|
||||
{"millis": {"$gte": self.threshold_ms}},
|
||||
limit=50
|
||||
).sort("millis", -1):
|
||||
|
||||
query_str = json.dumps(doc.get("command", {}), default=str)
|
||||
|
||||
slow_queries.append(SlowQuery(
|
||||
query=query_str,
|
||||
execution_time_ms=doc.get("millis", 0),
|
||||
count=1,
|
||||
collection_or_table=doc.get("ns", "").split(".")[-1] if "ns" in doc else None,
|
||||
index_used=doc.get("planSummary")
|
||||
))
|
||||
|
||||
# Analyze collections for index recommendations
|
||||
for coll_name in self.db.list_collection_names():
|
||||
if coll_name.startswith("system."):
|
||||
continue
|
||||
|
||||
coll = self.db[coll_name]
|
||||
|
||||
# Check for collections scans
|
||||
stats = coll.aggregate([
|
||||
{"$collStats": {"storageStats": {}}}
|
||||
]).next()
|
||||
|
||||
# Check if collection has indexes
|
||||
indexes = list(coll.list_indexes())
|
||||
|
||||
if len(indexes) <= 1: # Only _id index
|
||||
# Recommend indexes based on common patterns
|
||||
# Sample documents to find frequently queried fields
|
||||
sample = list(coll.find().limit(100))
|
||||
|
||||
if sample:
|
||||
# Find fields that appear in most documents
|
||||
field_freq = {}
|
||||
for doc in sample:
|
||||
for field in doc.keys():
|
||||
if field != "_id":
|
||||
field_freq[field] = field_freq.get(field, 0) + 1
|
||||
|
||||
# Recommend index on most common field
|
||||
if field_freq:
|
||||
top_field = max(field_freq.items(), key=lambda x: x[1])[0]
|
||||
index_recommendations.append(IndexRecommendation(
|
||||
collection_or_table=coll_name,
|
||||
fields=[top_field],
|
||||
reason="Frequently queried field without index",
|
||||
estimated_benefit="High"
|
||||
))
|
||||
|
||||
# Get database metrics
|
||||
server_status = self.client.admin.command("serverStatus")
|
||||
db_stats = self.db.command("dbStats")
|
||||
|
||||
metrics = {
|
||||
"connections": server_status.get("connections", {}).get("current", 0),
|
||||
"operations_per_sec": server_status.get("opcounters", {}).get("query", 0),
|
||||
"database_size_mb": db_stats.get("dataSize", 0) / (1024 * 1024),
|
||||
"index_size_mb": db_stats.get("indexSize", 0) / (1024 * 1024),
|
||||
"collections": db_stats.get("collections", 0)
|
||||
}
|
||||
|
||||
return PerformanceReport(
|
||||
database_type="mongodb",
|
||||
database_name=self.db.name,
|
||||
timestamp=datetime.now(),
|
||||
slow_queries=slow_queries[:10], # Top 10
|
||||
index_recommendations=index_recommendations,
|
||||
database_metrics=metrics
|
||||
)
|
||||
|
||||
def _analyze_postgres(self) -> PerformanceReport:
|
||||
"""Analyze PostgreSQL performance."""
|
||||
slow_queries = []
|
||||
index_recommendations = []
|
||||
|
||||
with self.conn.cursor(cursor_factory=RealDictCursor) as cur:
|
||||
# Check if pg_stat_statements extension is available
|
||||
cur.execute("""
|
||||
SELECT EXISTS (
|
||||
SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements'
|
||||
) AS has_extension
|
||||
""")
|
||||
has_pg_stat_statements = cur.fetchone()["has_extension"]
|
||||
|
||||
if has_pg_stat_statements:
|
||||
# Get slow queries from pg_stat_statements
|
||||
cur.execute("""
|
||||
SELECT
|
||||
query,
|
||||
mean_exec_time,
|
||||
calls,
|
||||
total_exec_time
|
||||
FROM pg_stat_statements
|
||||
WHERE mean_exec_time >= %s
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 10
|
||||
""", (self.threshold_ms,))
|
||||
|
||||
for row in cur.fetchall():
|
||||
slow_queries.append(SlowQuery(
|
||||
query=row["query"],
|
||||
execution_time_ms=row["mean_exec_time"],
|
||||
count=row["calls"]
|
||||
))
|
||||
|
||||
# Find tables with sequential scans (potential index candidates)
|
||||
cur.execute("""
|
||||
SELECT
|
||||
schemaname,
|
||||
tablename,
|
||||
seq_scan,
|
||||
seq_tup_read,
|
||||
idx_scan
|
||||
FROM pg_stat_user_tables
|
||||
WHERE seq_scan > 1000
|
||||
AND (idx_scan IS NULL OR seq_scan > idx_scan * 2)
|
||||
ORDER BY seq_tup_read DESC
|
||||
LIMIT 10
|
||||
""")
|
||||
|
||||
for row in cur.fetchall():
|
||||
index_recommendations.append(IndexRecommendation(
|
||||
collection_or_table=f"{row['schemaname']}.{row['tablename']}",
|
||||
fields=["<analyze query patterns>"],
|
||||
reason=f"High sequential scans ({row['seq_scan']}) vs index scans ({row['idx_scan'] or 0})",
|
||||
estimated_benefit="High" if row["seq_tup_read"] > 100000 else "Medium"
|
||||
))
|
||||
|
||||
# Find unused indexes
|
||||
cur.execute("""
|
||||
SELECT
|
||||
schemaname,
|
||||
tablename,
|
||||
indexname,
|
||||
idx_scan
|
||||
FROM pg_stat_user_indexes
|
||||
WHERE idx_scan = 0
|
||||
AND indexname NOT LIKE '%_pkey'
|
||||
ORDER BY pg_relation_size(indexrelid) DESC
|
||||
""")
|
||||
|
||||
unused_indexes = []
|
||||
for row in cur.fetchall():
|
||||
unused_indexes.append(
|
||||
f"{row['schemaname']}.{row['tablename']}.{row['indexname']}"
|
||||
)
|
||||
|
||||
# Database metrics
|
||||
cur.execute("""
|
||||
SELECT
|
||||
sum(numbackends) AS connections,
|
||||
sum(xact_commit) AS commits,
|
||||
sum(xact_rollback) AS rollbacks
|
||||
FROM pg_stat_database
|
||||
WHERE datname = current_database()
|
||||
""")
|
||||
stats = cur.fetchone()
|
||||
|
||||
cur.execute("""
|
||||
SELECT pg_database_size(current_database()) AS db_size
|
||||
""")
|
||||
db_size = cur.fetchone()["db_size"]
|
||||
|
||||
cur.execute("""
|
||||
SELECT
|
||||
sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio
|
||||
FROM pg_statio_user_tables
|
||||
""")
|
||||
cache_ratio = cur.fetchone()["cache_hit_ratio"] or 0
|
||||
|
||||
metrics = {
|
||||
"connections": stats["connections"],
|
||||
"commits": stats["commits"],
|
||||
"rollbacks": stats["rollbacks"],
|
||||
"database_size_mb": db_size / (1024 * 1024),
|
||||
"cache_hit_ratio": float(cache_ratio),
|
||||
"unused_indexes": unused_indexes
|
||||
}
|
||||
|
||||
return PerformanceReport(
|
||||
database_type="postgres",
|
||||
database_name=self.conn.info.dbname,
|
||||
timestamp=datetime.now(),
|
||||
slow_queries=slow_queries,
|
||||
index_recommendations=index_recommendations,
|
||||
database_metrics=metrics
|
||||
)
|
||||
|
||||
def print_report(self, report: PerformanceReport):
|
||||
"""Print performance report."""
|
||||
print("=" * 80)
|
||||
print(f"Database Performance Report - {report.database_type.upper()}")
|
||||
print(f"Database: {report.database_name}")
|
||||
print(f"Timestamp: {report.timestamp}")
|
||||
print("=" * 80)
|
||||
|
||||
print("\n## Database Metrics")
|
||||
print("-" * 80)
|
||||
for key, value in report.database_metrics.items():
|
||||
if isinstance(value, float):
|
||||
print(f"{key}: {value:.2f}")
|
||||
else:
|
||||
print(f"{key}: {value}")
|
||||
|
||||
print("\n## Slow Queries")
|
||||
print("-" * 80)
|
||||
if report.slow_queries:
|
||||
for i, query in enumerate(report.slow_queries, 1):
|
||||
print(f"\n{i}. Execution Time: {query.execution_time_ms:.2f}ms | Count: {query.count}")
|
||||
if query.collection_or_table:
|
||||
print(f" Collection/Table: {query.collection_or_table}")
|
||||
if query.index_used:
|
||||
print(f" Index Used: {query.index_used}")
|
||||
print(f" Query: {query.query[:200]}...")
|
||||
else:
|
||||
print("No slow queries found")
|
||||
|
||||
print("\n## Index Recommendations")
|
||||
print("-" * 80)
|
||||
if report.index_recommendations:
|
||||
for i, rec in enumerate(report.index_recommendations, 1):
|
||||
print(f"\n{i}. {rec.collection_or_table}")
|
||||
print(f" Fields: {', '.join(rec.fields)}")
|
||||
print(f" Reason: {rec.reason}")
|
||||
print(f" Estimated Benefit: {rec.estimated_benefit}")
|
||||
|
||||
if report.database_type == "mongodb":
|
||||
index_spec = {field: 1 for field in rec.fields}
|
||||
print(f" Command: db.{rec.collection_or_table}.createIndex({json.dumps(index_spec)})")
|
||||
elif report.database_type == "postgres":
|
||||
fields_str = ", ".join(rec.fields)
|
||||
print(f" Command: CREATE INDEX idx_{rec.collection_or_table.replace('.', '_')}_{rec.fields[0]} ON {rec.collection_or_table}({fields_str});")
|
||||
else:
|
||||
print("No index recommendations")
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
|
||||
def save_report(self, report: PerformanceReport, filename: str):
|
||||
"""Save report to JSON file."""
|
||||
# Convert dataclasses to dict
|
||||
report_dict = {
|
||||
"database_type": report.database_type,
|
||||
"database_name": report.database_name,
|
||||
"timestamp": report.timestamp.isoformat(),
|
||||
"slow_queries": [asdict(q) for q in report.slow_queries],
|
||||
"index_recommendations": [asdict(r) for r in report.index_recommendations],
|
||||
"database_metrics": report.database_metrics
|
||||
}
|
||||
|
||||
with open(filename, "w") as f:
|
||||
json.dump(report_dict, f, indent=2, default=str)
|
||||
|
||||
print(f"\nReport saved to: {filename}")
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(description="Database performance analysis tool")
|
||||
parser.add_argument("--db", required=True, choices=["mongodb", "postgres"],
|
||||
help="Database type")
|
||||
parser.add_argument("--uri", required=True, help="Database connection string")
|
||||
parser.add_argument("--threshold", type=int, default=100,
|
||||
help="Slow query threshold in milliseconds (default: 100)")
|
||||
parser.add_argument("--output", help="Save report to JSON file")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
analyzer = PerformanceAnalyzer(args.db, args.uri, args.threshold)
|
||||
|
||||
if not analyzer.connect():
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
print(f"Analyzing {args.db} performance (threshold: {args.threshold}ms)...")
|
||||
report = analyzer.analyze()
|
||||
|
||||
if report:
|
||||
analyzer.print_report(report)
|
||||
|
||||
if args.output:
|
||||
analyzer.save_report(report, args.output)
|
||||
|
||||
sys.exit(0)
|
||||
else:
|
||||
print("Analysis failed")
|
||||
sys.exit(1)
|
||||
|
||||
finally:
|
||||
analyzer.disconnect()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
20
.opencode/skills/databases/scripts/requirements.txt
Normal file
20
.opencode/skills/databases/scripts/requirements.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
# Databases Skill Dependencies
|
||||
# Python 3.10+ required
|
||||
|
||||
# No Python package dependencies - uses only standard library
|
||||
|
||||
# Testing dependencies (dev)
|
||||
pytest>=8.0.0
|
||||
pytest-cov>=4.1.0
|
||||
pytest-mock>=3.12.0
|
||||
|
||||
# Note: This skill requires database CLI tools:
|
||||
#
|
||||
# PostgreSQL:
|
||||
# - psql CLI (comes with PostgreSQL)
|
||||
# - Ubuntu/Debian: sudo apt-get install postgresql-client
|
||||
# - macOS: brew install postgresql
|
||||
#
|
||||
# MongoDB:
|
||||
# - mongosh CLI: https://www.mongodb.com/try/download/shell
|
||||
# - mongodump/mongorestore: https://www.mongodb.com/try/download/database-tools
|
||||
File diff suppressed because one or more lines are too long
@@ -0,0 +1,4 @@
|
||||
pytest>=7.0.0
|
||||
pytest-cov>=4.0.0
|
||||
pytest-mock>=3.10.0
|
||||
mongomock>=4.1.0
|
||||
340
.opencode/skills/databases/scripts/tests/test_db_backup.py
Normal file
340
.opencode/skills/databases/scripts/tests/test_db_backup.py
Normal file
@@ -0,0 +1,340 @@
|
||||
"""Tests for db_backup.py"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from unittest.mock import Mock, patch, MagicMock, call
|
||||
|
||||
import pytest
|
||||
|
||||
# Add parent directory to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||
|
||||
from db_backup import BackupInfo, BackupManager
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_backup_dir(tmp_path):
|
||||
"""Create temporary backup directory."""
|
||||
backup_dir = tmp_path / "backups"
|
||||
backup_dir.mkdir()
|
||||
return str(backup_dir)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_backup_info():
|
||||
"""Create sample backup info."""
|
||||
return BackupInfo(
|
||||
filename="test_backup_20250101_120000.dump",
|
||||
database_type="mongodb",
|
||||
database_name="testdb",
|
||||
timestamp=datetime.now(),
|
||||
size_bytes=1024000,
|
||||
compressed=True,
|
||||
verified=True
|
||||
)
|
||||
|
||||
|
||||
class TestBackupInfo:
|
||||
"""Test BackupInfo dataclass."""
|
||||
|
||||
def test_backup_info_creation(self):
|
||||
"""Test creating backup info object."""
|
||||
info = BackupInfo(
|
||||
filename="backup.dump",
|
||||
database_type="mongodb",
|
||||
database_name="mydb",
|
||||
timestamp=datetime.now(),
|
||||
size_bytes=1024,
|
||||
compressed=False
|
||||
)
|
||||
|
||||
assert info.filename == "backup.dump"
|
||||
assert info.database_type == "mongodb"
|
||||
assert info.database_name == "mydb"
|
||||
assert info.size_bytes == 1024
|
||||
assert not info.compressed
|
||||
assert not info.verified
|
||||
|
||||
|
||||
class TestBackupManager:
|
||||
"""Test BackupManager class."""
|
||||
|
||||
def test_init(self, temp_backup_dir):
|
||||
"""Test manager initialization."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
assert manager.db_type == "mongodb"
|
||||
assert Path(temp_backup_dir).exists()
|
||||
|
||||
@patch('subprocess.run')
|
||||
def test_backup_mongodb(self, mock_run, temp_backup_dir):
|
||||
"""Test MongoDB backup creation."""
|
||||
mock_run.return_value = Mock(returncode=0, stderr="")
|
||||
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
backup_info = manager.create_backup(
|
||||
"mongodb://localhost",
|
||||
"testdb",
|
||||
compress=False,
|
||||
verify=False
|
||||
)
|
||||
|
||||
assert backup_info is not None
|
||||
assert backup_info.database_type == "mongodb"
|
||||
assert backup_info.database_name == "testdb"
|
||||
mock_run.assert_called_once()
|
||||
|
||||
@patch('subprocess.run')
|
||||
def test_backup_postgres(self, mock_run, temp_backup_dir):
|
||||
"""Test PostgreSQL backup creation."""
|
||||
mock_run.return_value = Mock(returncode=0, stderr="")
|
||||
|
||||
manager = BackupManager("postgres", temp_backup_dir)
|
||||
|
||||
with patch('builtins.open', create=True) as mock_open:
|
||||
mock_open.return_value.__enter__.return_value = MagicMock()
|
||||
|
||||
backup_info = manager.create_backup(
|
||||
"postgresql://localhost/testdb",
|
||||
"testdb",
|
||||
compress=False,
|
||||
verify=False
|
||||
)
|
||||
|
||||
assert backup_info is not None
|
||||
assert backup_info.database_type == "postgres"
|
||||
assert backup_info.database_name == "testdb"
|
||||
|
||||
def test_backup_postgres_no_database(self, temp_backup_dir):
|
||||
"""Test PostgreSQL backup without database name."""
|
||||
manager = BackupManager("postgres", temp_backup_dir)
|
||||
backup_info = manager.create_backup(
|
||||
"postgresql://localhost",
|
||||
database=None,
|
||||
compress=False,
|
||||
verify=False
|
||||
)
|
||||
|
||||
assert backup_info is None
|
||||
|
||||
@patch('subprocess.run')
|
||||
def test_backup_with_compression(self, mock_run, temp_backup_dir):
|
||||
"""Test backup with compression."""
|
||||
mock_run.return_value = Mock(returncode=0, stderr="")
|
||||
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
with patch('shutil.make_archive') as mock_archive, \
|
||||
patch('shutil.rmtree') as mock_rmtree:
|
||||
|
||||
backup_info = manager.create_backup(
|
||||
"mongodb://localhost",
|
||||
"testdb",
|
||||
compress=True,
|
||||
verify=False
|
||||
)
|
||||
|
||||
assert backup_info is not None
|
||||
assert backup_info.compressed
|
||||
mock_archive.assert_called_once()
|
||||
|
||||
def test_save_and_load_metadata(self, temp_backup_dir, sample_backup_info):
|
||||
"""Test saving and loading backup metadata."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Save metadata
|
||||
manager._save_metadata(sample_backup_info)
|
||||
|
||||
# Check file was created
|
||||
metadata_file = Path(temp_backup_dir) / f"{sample_backup_info.filename}.json"
|
||||
assert metadata_file.exists()
|
||||
|
||||
# Load metadata
|
||||
with open(metadata_file) as f:
|
||||
data = json.load(f)
|
||||
assert data["filename"] == sample_backup_info.filename
|
||||
assert data["database_type"] == "mongodb"
|
||||
assert data["database_name"] == "testdb"
|
||||
|
||||
def test_list_backups(self, temp_backup_dir, sample_backup_info):
|
||||
"""Test listing backups."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create test backup metadata
|
||||
manager._save_metadata(sample_backup_info)
|
||||
|
||||
# List backups
|
||||
backups = manager.list_backups()
|
||||
|
||||
assert len(backups) == 1
|
||||
assert backups[0].filename == sample_backup_info.filename
|
||||
assert backups[0].database_name == "testdb"
|
||||
|
||||
@patch('subprocess.run')
|
||||
def test_restore_mongodb(self, mock_run, temp_backup_dir):
|
||||
"""Test MongoDB restore."""
|
||||
mock_run.return_value = Mock(returncode=0, stderr="")
|
||||
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create dummy backup file
|
||||
backup_file = Path(temp_backup_dir) / "test_backup.dump"
|
||||
backup_file.touch()
|
||||
|
||||
result = manager.restore_backup(
|
||||
"test_backup.dump",
|
||||
"mongodb://localhost"
|
||||
)
|
||||
|
||||
assert result is True
|
||||
mock_run.assert_called_once()
|
||||
|
||||
@patch('subprocess.run')
|
||||
def test_restore_postgres(self, mock_run, temp_backup_dir):
|
||||
"""Test PostgreSQL restore."""
|
||||
mock_run.return_value = Mock(returncode=0, stderr="")
|
||||
|
||||
manager = BackupManager("postgres", temp_backup_dir)
|
||||
|
||||
# Create dummy backup file
|
||||
backup_file = Path(temp_backup_dir) / "test_backup.sql"
|
||||
backup_file.write_text("SELECT 1;")
|
||||
|
||||
with patch('builtins.open', create=True) as mock_open:
|
||||
mock_open.return_value.__enter__.return_value = MagicMock()
|
||||
|
||||
result = manager.restore_backup(
|
||||
"test_backup.sql",
|
||||
"postgresql://localhost/testdb"
|
||||
)
|
||||
|
||||
assert result is True
|
||||
|
||||
def test_restore_nonexistent_backup(self, temp_backup_dir):
|
||||
"""Test restore with non-existent backup file."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
result = manager.restore_backup(
|
||||
"nonexistent.dump",
|
||||
"mongodb://localhost"
|
||||
)
|
||||
|
||||
assert result is False
|
||||
|
||||
def test_restore_dry_run(self, temp_backup_dir):
|
||||
"""Test restore in dry-run mode."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create dummy backup file
|
||||
backup_file = Path(temp_backup_dir) / "test_backup.dump"
|
||||
backup_file.touch()
|
||||
|
||||
result = manager.restore_backup(
|
||||
"test_backup.dump",
|
||||
"mongodb://localhost",
|
||||
dry_run=True
|
||||
)
|
||||
|
||||
assert result is True
|
||||
|
||||
def test_cleanup_old_backups(self, temp_backup_dir):
|
||||
"""Test cleaning up old backups."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create old backup file (simulate by setting mtime)
|
||||
old_backup = Path(temp_backup_dir) / "old_backup.dump"
|
||||
old_backup.touch()
|
||||
|
||||
# Set mtime to 10 days ago
|
||||
old_time = datetime.now().timestamp() - (10 * 24 * 3600)
|
||||
os.utime(old_backup, (old_time, old_time))
|
||||
|
||||
# Cleanup with 7-day retention
|
||||
removed = manager.cleanup_old_backups(retention_days=7)
|
||||
|
||||
assert removed == 1
|
||||
assert not old_backup.exists()
|
||||
|
||||
def test_cleanup_dry_run(self, temp_backup_dir):
|
||||
"""Test cleanup in dry-run mode."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create old backup file
|
||||
old_backup = Path(temp_backup_dir) / "old_backup.dump"
|
||||
old_backup.touch()
|
||||
|
||||
old_time = datetime.now().timestamp() - (10 * 24 * 3600)
|
||||
os.utime(old_backup, (old_time, old_time))
|
||||
|
||||
# Cleanup with dry-run
|
||||
removed = manager.cleanup_old_backups(retention_days=7, dry_run=True)
|
||||
|
||||
assert removed == 1
|
||||
assert old_backup.exists() # File should still exist
|
||||
|
||||
def test_verify_backup(self, temp_backup_dir, sample_backup_info):
|
||||
"""Test backup verification."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create dummy backup file
|
||||
backup_file = Path(temp_backup_dir) / sample_backup_info.filename
|
||||
backup_file.write_text("backup data")
|
||||
|
||||
result = manager._verify_backup(sample_backup_info)
|
||||
|
||||
assert result is True
|
||||
|
||||
def test_verify_empty_backup(self, temp_backup_dir, sample_backup_info):
|
||||
"""Test verification of empty backup file."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
# Create empty backup file
|
||||
backup_file = Path(temp_backup_dir) / sample_backup_info.filename
|
||||
backup_file.touch()
|
||||
|
||||
result = manager._verify_backup(sample_backup_info)
|
||||
|
||||
assert result is False
|
||||
|
||||
def test_format_size(self, temp_backup_dir):
|
||||
"""Test size formatting."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
assert manager._format_size(500) == "500.00 B"
|
||||
assert manager._format_size(1024) == "1.00 KB"
|
||||
assert manager._format_size(1024 * 1024) == "1.00 MB"
|
||||
assert manager._format_size(1024 * 1024 * 1024) == "1.00 GB"
|
||||
|
||||
def test_get_size_file(self, temp_backup_dir):
|
||||
"""Test getting size of file."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
test_file = Path(temp_backup_dir) / "test.txt"
|
||||
test_file.write_text("test data")
|
||||
|
||||
size = manager._get_size(test_file)
|
||||
|
||||
assert size > 0
|
||||
|
||||
def test_get_size_directory(self, temp_backup_dir):
|
||||
"""Test getting size of directory."""
|
||||
manager = BackupManager("mongodb", temp_backup_dir)
|
||||
|
||||
test_dir = Path(temp_backup_dir) / "test_dir"
|
||||
test_dir.mkdir()
|
||||
(test_dir / "file1.txt").write_text("data1")
|
||||
(test_dir / "file2.txt").write_text("data2")
|
||||
|
||||
size = manager._get_size(test_dir)
|
||||
|
||||
assert size > 0
|
||||
|
||||
|
||||
# Import os for cleanup test
|
||||
import os
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
277
.opencode/skills/databases/scripts/tests/test_db_migrate.py
Normal file
277
.opencode/skills/databases/scripts/tests/test_db_migrate.py
Normal file
@@ -0,0 +1,277 @@
|
||||
"""Tests for db_migrate.py"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from unittest.mock import Mock, patch, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
# Add parent directory to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||
|
||||
from db_migrate import Migration, MigrationManager
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_migrations_dir(tmp_path):
|
||||
"""Create temporary migrations directory."""
|
||||
migrations_dir = tmp_path / "migrations"
|
||||
migrations_dir.mkdir()
|
||||
return str(migrations_dir)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_mongo_client():
|
||||
"""Mock MongoDB client."""
|
||||
mock_client = MagicMock()
|
||||
mock_db = MagicMock()
|
||||
mock_client.get_default_database.return_value = mock_db
|
||||
mock_client.server_info.return_value = {}
|
||||
return mock_client, mock_db
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_postgres_conn():
|
||||
"""Mock PostgreSQL connection."""
|
||||
mock_conn = MagicMock()
|
||||
mock_cursor = MagicMock()
|
||||
mock_conn.cursor.return_value.__enter__.return_value = mock_cursor
|
||||
return mock_conn, mock_cursor
|
||||
|
||||
|
||||
class TestMigration:
|
||||
"""Test Migration dataclass."""
|
||||
|
||||
def test_migration_creation(self):
|
||||
"""Test creating migration object."""
|
||||
migration = Migration(
|
||||
id="20250101120000",
|
||||
name="test_migration",
|
||||
timestamp=datetime.now(),
|
||||
database_type="mongodb"
|
||||
)
|
||||
|
||||
assert migration.id == "20250101120000"
|
||||
assert migration.name == "test_migration"
|
||||
assert migration.database_type == "mongodb"
|
||||
assert not migration.applied
|
||||
|
||||
|
||||
class TestMigrationManager:
|
||||
"""Test MigrationManager class."""
|
||||
|
||||
def test_init(self, temp_migrations_dir):
|
||||
"""Test manager initialization."""
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
|
||||
assert manager.db_type == "mongodb"
|
||||
assert manager.connection_string == "mongodb://localhost"
|
||||
assert Path(temp_migrations_dir).exists()
|
||||
|
||||
@patch('db_migrate.MongoClient')
|
||||
def test_connect_mongodb(self, mock_client_class, temp_migrations_dir, mock_mongo_client):
|
||||
"""Test MongoDB connection."""
|
||||
mock_client, mock_db = mock_mongo_client
|
||||
mock_client_class.return_value = mock_client
|
||||
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
result = manager.connect()
|
||||
|
||||
assert result is True
|
||||
assert manager.client == mock_client
|
||||
assert manager.db == mock_db
|
||||
|
||||
@patch('db_migrate.psycopg2')
|
||||
def test_connect_postgres(self, mock_psycopg2, temp_migrations_dir, mock_postgres_conn):
|
||||
"""Test PostgreSQL connection."""
|
||||
mock_conn, mock_cursor = mock_postgres_conn
|
||||
mock_psycopg2.connect.return_value = mock_conn
|
||||
|
||||
manager = MigrationManager("postgres", "postgresql://localhost", temp_migrations_dir)
|
||||
result = manager.connect()
|
||||
|
||||
assert result is True
|
||||
assert manager.conn == mock_conn
|
||||
|
||||
def test_connect_unsupported_db(self, temp_migrations_dir):
|
||||
"""Test connection with unsupported database type."""
|
||||
manager = MigrationManager("unsupported", "connection_string", temp_migrations_dir)
|
||||
result = manager.connect()
|
||||
|
||||
assert result is False
|
||||
|
||||
def test_generate_migration(self, temp_migrations_dir):
|
||||
"""Test migration generation."""
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
migration = manager.generate_migration("test_migration")
|
||||
|
||||
assert migration is not None
|
||||
assert migration.name == "test_migration"
|
||||
|
||||
# Check file was created
|
||||
migration_files = list(Path(temp_migrations_dir).glob("*.json"))
|
||||
assert len(migration_files) == 1
|
||||
|
||||
# Check file content
|
||||
with open(migration_files[0]) as f:
|
||||
data = json.load(f)
|
||||
assert data["name"] == "test_migration"
|
||||
assert data["database_type"] == "mongodb"
|
||||
|
||||
def test_generate_migration_dry_run(self, temp_migrations_dir):
|
||||
"""Test migration generation in dry-run mode."""
|
||||
manager = MigrationManager("postgres", "postgresql://localhost", temp_migrations_dir)
|
||||
migration = manager.generate_migration("test_migration", dry_run=True)
|
||||
|
||||
assert migration is not None
|
||||
|
||||
# Check no file was created
|
||||
migration_files = list(Path(temp_migrations_dir).glob("*.json"))
|
||||
assert len(migration_files) == 0
|
||||
|
||||
def test_get_pending_migrations(self, temp_migrations_dir):
|
||||
"""Test getting pending migrations."""
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
|
||||
# Create test migration file
|
||||
migration_data = {
|
||||
"id": "20250101120000",
|
||||
"name": "test_migration",
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"database_type": "mongodb",
|
||||
"mongodb_operations": []
|
||||
}
|
||||
|
||||
migration_file = Path(temp_migrations_dir) / "20250101120000_test.json"
|
||||
with open(migration_file, "w") as f:
|
||||
json.dump(migration_data, f)
|
||||
|
||||
# Mock database connection
|
||||
with patch.object(manager, 'db', MagicMock()):
|
||||
manager.db.migrations.find.return_value = []
|
||||
|
||||
pending = manager.get_pending_migrations()
|
||||
|
||||
assert len(pending) == 1
|
||||
assert pending[0].id == "20250101120000"
|
||||
assert pending[0].name == "test_migration"
|
||||
|
||||
@patch('db_migrate.MongoClient')
|
||||
def test_apply_mongodb_migration(self, mock_client_class, temp_migrations_dir, mock_mongo_client):
|
||||
"""Test applying MongoDB migration."""
|
||||
mock_client, mock_db = mock_mongo_client
|
||||
mock_client_class.return_value = mock_client
|
||||
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
manager.connect()
|
||||
|
||||
migration = Migration(
|
||||
id="20250101120000",
|
||||
name="test_migration",
|
||||
timestamp=datetime.now(),
|
||||
database_type="mongodb",
|
||||
mongodb_operations=[
|
||||
{
|
||||
"operation": "createIndex",
|
||||
"collection": "users",
|
||||
"index": {"email": 1},
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
result = manager.apply_migration(migration)
|
||||
|
||||
assert result is True
|
||||
mock_db["users"].create_index.assert_called_once()
|
||||
mock_db.migrations.insert_one.assert_called_once()
|
||||
|
||||
def test_apply_migration_dry_run(self, temp_migrations_dir):
|
||||
"""Test applying migration in dry-run mode."""
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
|
||||
migration = Migration(
|
||||
id="20250101120000",
|
||||
name="test_migration",
|
||||
timestamp=datetime.now(),
|
||||
database_type="mongodb",
|
||||
mongodb_operations=[]
|
||||
)
|
||||
|
||||
result = manager.apply_migration(migration, dry_run=True)
|
||||
|
||||
assert result is True
|
||||
|
||||
@patch('db_migrate.psycopg2')
|
||||
def test_rollback_postgres_migration(self, mock_psycopg2, temp_migrations_dir, mock_postgres_conn):
|
||||
"""Test rolling back PostgreSQL migration."""
|
||||
mock_conn, mock_cursor = mock_postgres_conn
|
||||
mock_psycopg2.connect.return_value = mock_conn
|
||||
|
||||
manager = MigrationManager("postgres", "postgresql://localhost", temp_migrations_dir)
|
||||
manager.connect()
|
||||
|
||||
# Create migration file
|
||||
migration_data = {
|
||||
"id": "20250101120000",
|
||||
"name": "test_migration",
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"database_type": "postgres",
|
||||
"up_sql": "CREATE TABLE test (id INT);",
|
||||
"down_sql": "DROP TABLE test;"
|
||||
}
|
||||
|
||||
migration_file = Path(temp_migrations_dir) / "20250101120000_test.json"
|
||||
with open(migration_file, "w") as f:
|
||||
json.dump(migration_data, f)
|
||||
|
||||
result = manager.rollback_migration("20250101120000")
|
||||
|
||||
assert result is True
|
||||
# Verify SQL was executed
|
||||
assert mock_cursor.execute.call_count >= 1
|
||||
|
||||
def test_rollback_migration_not_found(self, temp_migrations_dir):
|
||||
"""Test rollback with non-existent migration."""
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
|
||||
result = manager.rollback_migration("99999999999999")
|
||||
|
||||
assert result is False
|
||||
|
||||
|
||||
def test_migration_sorting(temp_migrations_dir):
|
||||
"""Test that migrations are applied in correct order."""
|
||||
manager = MigrationManager("mongodb", "mongodb://localhost", temp_migrations_dir)
|
||||
|
||||
# Create multiple migration files
|
||||
for i in range(3):
|
||||
migration_data = {
|
||||
"id": f"2025010112000{i}",
|
||||
"name": f"migration_{i}",
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"database_type": "mongodb",
|
||||
"mongodb_operations": []
|
||||
}
|
||||
|
||||
migration_file = Path(temp_migrations_dir) / f"2025010112000{i}_test.json"
|
||||
with open(migration_file, "w") as f:
|
||||
json.dump(migration_data, f)
|
||||
|
||||
with patch.object(manager, 'db', MagicMock()):
|
||||
manager.db.migrations.find.return_value = []
|
||||
|
||||
pending = manager.get_pending_migrations()
|
||||
|
||||
# Check they're in order
|
||||
assert len(pending) == 3
|
||||
assert pending[0].id == "20250101120000"
|
||||
assert pending[1].id == "20250101120001"
|
||||
assert pending[2].id == "20250101120002"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
@@ -0,0 +1,370 @@
|
||||
"""Tests for db_performance_check.py"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from unittest.mock import Mock, patch, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
# Add parent directory to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||
|
||||
from db_performance_check import (
|
||||
SlowQuery, IndexRecommendation, PerformanceReport, PerformanceAnalyzer
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_mongo_client():
|
||||
"""Mock MongoDB client."""
|
||||
mock_client = MagicMock()
|
||||
mock_db = MagicMock()
|
||||
mock_client.get_default_database.return_value = mock_db
|
||||
mock_client.server_info.return_value = {}
|
||||
return mock_client, mock_db
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_postgres_conn():
|
||||
"""Mock PostgreSQL connection."""
|
||||
mock_conn = MagicMock()
|
||||
mock_cursor = MagicMock()
|
||||
mock_conn.cursor.return_value.__enter__.return_value = mock_cursor
|
||||
return mock_conn, mock_cursor
|
||||
|
||||
|
||||
class TestSlowQuery:
|
||||
"""Test SlowQuery dataclass."""
|
||||
|
||||
def test_slow_query_creation(self):
|
||||
"""Test creating slow query object."""
|
||||
query = SlowQuery(
|
||||
query="SELECT * FROM users",
|
||||
execution_time_ms=150.5,
|
||||
count=10
|
||||
)
|
||||
|
||||
assert query.query == "SELECT * FROM users"
|
||||
assert query.execution_time_ms == 150.5
|
||||
assert query.count == 10
|
||||
|
||||
|
||||
class TestIndexRecommendation:
|
||||
"""Test IndexRecommendation dataclass."""
|
||||
|
||||
def test_recommendation_creation(self):
|
||||
"""Test creating index recommendation."""
|
||||
rec = IndexRecommendation(
|
||||
collection_or_table="users",
|
||||
fields=["email"],
|
||||
reason="Frequently queried field",
|
||||
estimated_benefit="High"
|
||||
)
|
||||
|
||||
assert rec.collection_or_table == "users"
|
||||
assert rec.fields == ["email"]
|
||||
assert rec.reason == "Frequently queried field"
|
||||
assert rec.estimated_benefit == "High"
|
||||
|
||||
|
||||
class TestPerformanceReport:
|
||||
"""Test PerformanceReport dataclass."""
|
||||
|
||||
def test_report_creation(self):
|
||||
"""Test creating performance report."""
|
||||
report = PerformanceReport(
|
||||
database_type="mongodb",
|
||||
database_name="testdb",
|
||||
timestamp=datetime.now(),
|
||||
slow_queries=[],
|
||||
index_recommendations=[],
|
||||
database_metrics={}
|
||||
)
|
||||
|
||||
assert report.database_type == "mongodb"
|
||||
assert report.database_name == "testdb"
|
||||
assert isinstance(report.slow_queries, list)
|
||||
assert isinstance(report.index_recommendations, list)
|
||||
assert isinstance(report.database_metrics, dict)
|
||||
|
||||
|
||||
class TestPerformanceAnalyzer:
|
||||
"""Test PerformanceAnalyzer class."""
|
||||
|
||||
def test_init(self):
|
||||
"""Test analyzer initialization."""
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost", 100)
|
||||
|
||||
assert analyzer.db_type == "mongodb"
|
||||
assert analyzer.connection_string == "mongodb://localhost"
|
||||
assert analyzer.threshold_ms == 100
|
||||
|
||||
@patch('db_performance_check.MongoClient')
|
||||
def test_connect_mongodb(self, mock_client_class, mock_mongo_client):
|
||||
"""Test MongoDB connection."""
|
||||
mock_client, mock_db = mock_mongo_client
|
||||
mock_client_class.return_value = mock_client
|
||||
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost")
|
||||
result = analyzer.connect()
|
||||
|
||||
assert result is True
|
||||
assert analyzer.client == mock_client
|
||||
assert analyzer.db == mock_db
|
||||
|
||||
@patch('db_performance_check.psycopg2')
|
||||
def test_connect_postgres(self, mock_psycopg2, mock_postgres_conn):
|
||||
"""Test PostgreSQL connection."""
|
||||
mock_conn, mock_cursor = mock_postgres_conn
|
||||
mock_psycopg2.connect.return_value = mock_conn
|
||||
|
||||
analyzer = PerformanceAnalyzer("postgres", "postgresql://localhost")
|
||||
result = analyzer.connect()
|
||||
|
||||
assert result is True
|
||||
assert analyzer.conn == mock_conn
|
||||
|
||||
def test_connect_unsupported_db(self):
|
||||
"""Test connection with unsupported database type."""
|
||||
analyzer = PerformanceAnalyzer("unsupported", "connection_string")
|
||||
result = analyzer.connect()
|
||||
|
||||
assert result is False
|
||||
|
||||
@patch('db_performance_check.MongoClient')
|
||||
def test_analyze_mongodb(self, mock_client_class, mock_mongo_client):
|
||||
"""Test MongoDB performance analysis."""
|
||||
mock_client, mock_db = mock_mongo_client
|
||||
mock_client_class.return_value = mock_client
|
||||
|
||||
# Mock profiling
|
||||
mock_db.command.side_effect = [
|
||||
{"was": 0}, # profile -1 (get status)
|
||||
{}, # profile 1 (enable)
|
||||
]
|
||||
|
||||
# Mock slow queries
|
||||
mock_profile_cursor = MagicMock()
|
||||
mock_profile_cursor.sort.return_value = [
|
||||
{
|
||||
"command": {"find": "users"},
|
||||
"millis": 150,
|
||||
"ns": "testdb.users",
|
||||
"planSummary": "COLLSCAN"
|
||||
}
|
||||
]
|
||||
mock_db.system.profile.find.return_value = mock_profile_cursor
|
||||
|
||||
# Mock collections
|
||||
mock_db.list_collection_names.return_value = ["users", "orders"]
|
||||
|
||||
# Mock collection stats
|
||||
mock_coll = MagicMock()
|
||||
mock_coll.aggregate.return_value = [{"storageStats": {}}]
|
||||
mock_coll.list_indexes.return_value = [{"name": "_id_"}]
|
||||
mock_coll.find.return_value.limit.return_value = [
|
||||
{"_id": 1, "name": "Alice", "email": "alice@example.com"}
|
||||
]
|
||||
mock_db.__getitem__.return_value = mock_coll
|
||||
|
||||
# Mock server status and db stats
|
||||
mock_client.admin.command.return_value = {
|
||||
"connections": {"current": 10},
|
||||
"opcounters": {"query": 1000}
|
||||
}
|
||||
mock_db.command.return_value = {
|
||||
"dataSize": 1024 * 1024 * 100,
|
||||
"indexSize": 1024 * 1024 * 10,
|
||||
"collections": 5
|
||||
}
|
||||
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost")
|
||||
analyzer.connect()
|
||||
|
||||
report = analyzer.analyze()
|
||||
|
||||
assert report is not None
|
||||
assert report.database_type == "mongodb"
|
||||
assert isinstance(report.slow_queries, list)
|
||||
assert isinstance(report.index_recommendations, list)
|
||||
assert isinstance(report.database_metrics, dict)
|
||||
|
||||
@patch('db_performance_check.psycopg2')
|
||||
def test_analyze_postgres(self, mock_psycopg2, mock_postgres_conn):
|
||||
"""Test PostgreSQL performance analysis."""
|
||||
mock_conn, mock_cursor = mock_postgres_conn
|
||||
mock_psycopg2.connect.return_value = mock_conn
|
||||
|
||||
# Mock cursor results
|
||||
mock_cursor.fetchone.side_effect = [
|
||||
{"has_extension": True}, # pg_stat_statements check
|
||||
{"connections": 10, "commits": 1000, "rollbacks": 5}, # stats
|
||||
{"db_size": 1024 * 1024 * 500}, # database size
|
||||
{"cache_hit_ratio": 0.95} # cache hit ratio
|
||||
]
|
||||
|
||||
mock_cursor.fetchall.side_effect = [
|
||||
# Slow queries
|
||||
[
|
||||
{
|
||||
"query": "SELECT * FROM users",
|
||||
"mean_exec_time": 150.5,
|
||||
"calls": 100,
|
||||
"total_exec_time": 15050
|
||||
}
|
||||
],
|
||||
# Sequential scans
|
||||
[
|
||||
{
|
||||
"schemaname": "public",
|
||||
"tablename": "users",
|
||||
"seq_scan": 5000,
|
||||
"seq_tup_read": 500000,
|
||||
"idx_scan": 100
|
||||
}
|
||||
],
|
||||
# Unused indexes
|
||||
[]
|
||||
]
|
||||
|
||||
analyzer = PerformanceAnalyzer("postgres", "postgresql://localhost")
|
||||
analyzer.connect()
|
||||
|
||||
report = analyzer.analyze()
|
||||
|
||||
assert report is not None
|
||||
assert report.database_type == "postgres"
|
||||
assert len(report.slow_queries) > 0
|
||||
assert len(report.index_recommendations) > 0
|
||||
|
||||
def test_print_report(self, capsys):
|
||||
"""Test report printing."""
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost")
|
||||
|
||||
report = PerformanceReport(
|
||||
database_type="mongodb",
|
||||
database_name="testdb",
|
||||
timestamp=datetime.now(),
|
||||
slow_queries=[
|
||||
SlowQuery(
|
||||
query="db.users.find({age: {$gte: 18}})",
|
||||
execution_time_ms=150.5,
|
||||
count=10,
|
||||
collection_or_table="users"
|
||||
)
|
||||
],
|
||||
index_recommendations=[
|
||||
IndexRecommendation(
|
||||
collection_or_table="users",
|
||||
fields=["age"],
|
||||
reason="Frequently queried field",
|
||||
estimated_benefit="High"
|
||||
)
|
||||
],
|
||||
database_metrics={
|
||||
"connections": 10,
|
||||
"database_size_mb": 100.5
|
||||
}
|
||||
)
|
||||
|
||||
analyzer.print_report(report)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "Database Performance Report" in captured.out
|
||||
assert "testdb" in captured.out
|
||||
assert "150.5ms" in captured.out
|
||||
assert "users" in captured.out
|
||||
|
||||
def test_save_report(self, tmp_path):
|
||||
"""Test saving report to JSON."""
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost")
|
||||
|
||||
report = PerformanceReport(
|
||||
database_type="mongodb",
|
||||
database_name="testdb",
|
||||
timestamp=datetime.now(),
|
||||
slow_queries=[],
|
||||
index_recommendations=[],
|
||||
database_metrics={}
|
||||
)
|
||||
|
||||
output_file = tmp_path / "report.json"
|
||||
analyzer.save_report(report, str(output_file))
|
||||
|
||||
assert output_file.exists()
|
||||
|
||||
with open(output_file) as f:
|
||||
data = json.load(f)
|
||||
assert data["database_type"] == "mongodb"
|
||||
assert data["database_name"] == "testdb"
|
||||
|
||||
def test_disconnect(self):
|
||||
"""Test disconnection."""
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost")
|
||||
|
||||
# Mock client and connection
|
||||
analyzer.client = MagicMock()
|
||||
analyzer.conn = MagicMock()
|
||||
|
||||
analyzer.disconnect()
|
||||
|
||||
analyzer.client.close.assert_called_once()
|
||||
analyzer.conn.close.assert_called_once()
|
||||
|
||||
@patch('db_performance_check.MongoClient')
|
||||
def test_analyze_error_handling(self, mock_client_class, mock_mongo_client):
|
||||
"""Test error handling during analysis."""
|
||||
mock_client, mock_db = mock_mongo_client
|
||||
mock_client_class.return_value = mock_client
|
||||
|
||||
# Simulate error
|
||||
mock_db.command.side_effect = Exception("Database error")
|
||||
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost")
|
||||
analyzer.connect()
|
||||
|
||||
report = analyzer.analyze()
|
||||
|
||||
assert report is None
|
||||
|
||||
|
||||
class TestIntegration:
|
||||
"""Integration tests."""
|
||||
|
||||
@patch('db_performance_check.MongoClient')
|
||||
def test_full_mongodb_workflow(self, mock_client_class, mock_mongo_client, tmp_path):
|
||||
"""Test complete MongoDB analysis workflow."""
|
||||
mock_client, mock_db = mock_mongo_client
|
||||
mock_client_class.return_value = mock_client
|
||||
|
||||
# Setup mocks
|
||||
mock_db.command.return_value = {"was": 0}
|
||||
mock_db.system.profile.find.return_value.sort.return_value = []
|
||||
mock_db.list_collection_names.return_value = []
|
||||
mock_client.admin.command.return_value = {
|
||||
"connections": {"current": 10},
|
||||
"opcounters": {"query": 1000}
|
||||
}
|
||||
|
||||
analyzer = PerformanceAnalyzer("mongodb", "mongodb://localhost", 100)
|
||||
|
||||
# Connect
|
||||
assert analyzer.connect() is True
|
||||
|
||||
# Analyze
|
||||
report = analyzer.analyze()
|
||||
assert report is not None
|
||||
|
||||
# Save report
|
||||
output_file = tmp_path / "report.json"
|
||||
analyzer.save_report(report, str(output_file))
|
||||
assert output_file.exists()
|
||||
|
||||
# Disconnect
|
||||
analyzer.disconnect()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
231
.opencode/skills/databases/stacks/bigquery.md
Normal file
231
.opencode/skills/databases/stacks/bigquery.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# BigQuery Rules
|
||||
|
||||
Guidelines for designing schemas specific to Google BigQuery.
|
||||
|
||||
---
|
||||
|
||||
## Data Types
|
||||
|
||||
| Use case | Type | Notes |
|
||||
|----------|------|-------|
|
||||
| Integer | `INT64` | |
|
||||
| Decimal | `NUMERIC` or `BIGNUMERIC` | |
|
||||
| Float | `FLOAT64` | |
|
||||
| Boolean | `BOOL` | |
|
||||
| String | `STRING` | |
|
||||
| Timestamp | `TIMESTAMP` | |
|
||||
| Date | `DATE` | |
|
||||
| Time | `TIME` | |
|
||||
| DateTime | `DATETIME` | No timezone |
|
||||
| JSON | `JSON` | |
|
||||
| Bytes | `BYTES` | |
|
||||
| Nested | `STRUCT` | Record type |
|
||||
| Repeated | `ARRAY` | |
|
||||
|
||||
---
|
||||
|
||||
## Description (NO metadata tables needed)
|
||||
|
||||
BigQuery supports description via `OPTIONS`:
|
||||
|
||||
### Table description
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
...
|
||||
) OPTIONS(description='Table storing customer orders');
|
||||
```
|
||||
|
||||
### Column descriptions
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
id INT64 NOT NULL OPTIONS(description='Primary key'),
|
||||
order_number STRING NOT NULL OPTIONS(description='Unique order code'),
|
||||
status STRING NOT NULL OPTIONS(description='pending|confirmed|shipped|cancelled'),
|
||||
total_amount NUMERIC OPTIONS(description='Total value')
|
||||
);
|
||||
```
|
||||
|
||||
### Query descriptions
|
||||
```sql
|
||||
SELECT column_name, description
|
||||
FROM `project.dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`
|
||||
WHERE table_name = 'orders';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Partitioning & Clustering
|
||||
|
||||
BigQuery **does not have traditional indexes**. Instead, use partitioning and clustering:
|
||||
|
||||
### Partitioning (required for large tables)
|
||||
|
||||
```sql
|
||||
-- Partition by date column
|
||||
CREATE TABLE fact_orders (
|
||||
order_date DATE,
|
||||
customer_id INT64,
|
||||
order_id INT64,
|
||||
amount NUMERIC
|
||||
)
|
||||
PARTITION BY order_date;
|
||||
|
||||
-- Partition by timestamp (daily)
|
||||
CREATE TABLE events (
|
||||
event_time TIMESTAMP,
|
||||
...
|
||||
)
|
||||
PARTITION BY DATE(event_time);
|
||||
|
||||
-- Integer range partitioning
|
||||
CREATE TABLE logs (
|
||||
log_id INT64,
|
||||
...
|
||||
)
|
||||
PARTITION BY RANGE_BUCKET(log_id, GENERATE_ARRAY(0, 1000000, 10000));
|
||||
```
|
||||
|
||||
### Clustering (optimize filter/group)
|
||||
|
||||
```sql
|
||||
CREATE TABLE fact_orders (
|
||||
order_date DATE,
|
||||
customer_id INT64,
|
||||
product_id INT64,
|
||||
store_id INT64,
|
||||
amount NUMERIC
|
||||
)
|
||||
PARTITION BY order_date
|
||||
CLUSTER BY customer_id, product_id;
|
||||
-- Can cluster up to 4 columns
|
||||
```
|
||||
|
||||
### When to use what?
|
||||
- **Partition**: Filter by date range → reduces data scan
|
||||
- **Cluster**: Filter/Group by specific columns → data is organized closer together
|
||||
|
||||
---
|
||||
|
||||
## Nested & Repeated Fields (Denormalization)
|
||||
|
||||
BigQuery encourages denormalization with STRUCT and ARRAY:
|
||||
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
order_id INT64,
|
||||
order_date DATE,
|
||||
customer STRUCT<
|
||||
id INT64,
|
||||
name STRING,
|
||||
email STRING
|
||||
>,
|
||||
items ARRAY<STRUCT<
|
||||
product_id INT64,
|
||||
product_name STRING,
|
||||
quantity INT64,
|
||||
unit_price NUMERIC
|
||||
>>
|
||||
);
|
||||
|
||||
-- Query nested fields
|
||||
SELECT
|
||||
order_id,
|
||||
customer.name,
|
||||
item.product_name,
|
||||
item.quantity
|
||||
FROM orders, UNNEST(items) AS item;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Query optimization
|
||||
- **Avoid `SELECT *`** - only select needed columns
|
||||
- **Filter early** - use partition column in WHERE
|
||||
- **Avoid cross-join** with large tables
|
||||
|
||||
### Table design
|
||||
- Partition tables > 1GB
|
||||
- Cluster by frequently filtered/grouped columns
|
||||
- Denormalize with STRUCT/ARRAY when appropriate
|
||||
- Avoid too many small tables
|
||||
|
||||
### Cost control
|
||||
- Partition pruning: always filter by partition column
|
||||
- Use `--dry_run` to estimate cost
|
||||
- Set up query quotas
|
||||
|
||||
---
|
||||
|
||||
## Example DDL
|
||||
|
||||
```sql
|
||||
CREATE TABLE `project.dataset.fact_orders` (
|
||||
-- Keys
|
||||
order_id INT64 NOT NULL OPTIONS(description='PK from source'),
|
||||
order_date DATE NOT NULL OPTIONS(description='Order date'),
|
||||
|
||||
-- Dimensions (denormalized)
|
||||
customer_id INT64 NOT NULL,
|
||||
customer_name STRING,
|
||||
customer_segment STRING OPTIONS(description='VIP|Regular|New'),
|
||||
|
||||
-- Measures
|
||||
item_count INT64 NOT NULL,
|
||||
gross_amount NUMERIC NOT NULL,
|
||||
discount_amount NUMERIC DEFAULT 0,
|
||||
net_amount NUMERIC NOT NULL,
|
||||
|
||||
-- Nested items
|
||||
items ARRAY<STRUCT<
|
||||
product_id INT64,
|
||||
product_name STRING,
|
||||
quantity INT64,
|
||||
unit_price NUMERIC
|
||||
>> OPTIONS(description='Product details in order'),
|
||||
|
||||
-- Metadata
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
|
||||
)
|
||||
PARTITION BY order_date
|
||||
CLUSTER BY customer_id
|
||||
OPTIONS(
|
||||
description='Fact table for orders, partitioned by date, clustered by customer'
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scheduled Queries / Materialized Views
|
||||
|
||||
### Materialized View
|
||||
```sql
|
||||
CREATE MATERIALIZED VIEW `project.dataset.mv_daily_sales`
|
||||
PARTITION BY report_date
|
||||
CLUSTER BY store_id
|
||||
AS
|
||||
SELECT
|
||||
DATE(order_date) as report_date,
|
||||
store_id,
|
||||
COUNT(*) as order_count,
|
||||
SUM(net_amount) as revenue
|
||||
FROM `project.dataset.fact_orders`
|
||||
GROUP BY 1, 2;
|
||||
```
|
||||
|
||||
### Scheduled Query (ETL)
|
||||
Create via BigQuery Console or API with schedule expression.
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Partition large tables by date column
|
||||
- [ ] Cluster by frequently filtered/grouped columns (max 4)
|
||||
- [ ] `OPTIONS(description=...)` for tables and columns
|
||||
- [ ] Denormalize with STRUCT/ARRAY when appropriate
|
||||
- [ ] Avoid `SELECT *` and cross-join large tables
|
||||
- [ ] Filter by partition column in queries
|
||||
- [ ] Consider Materialized Views for frequently used aggregations
|
||||
137
.opencode/skills/databases/stacks/d1_cloudflare.md
Normal file
137
.opencode/skills/databases/stacks/d1_cloudflare.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# D1 Cloudflare Rules
|
||||
|
||||
D1 is a SQLite-based database running on Cloudflare edge. Inherits from [sqlite.md](sqlite.md) with its own specifics.
|
||||
|
||||
---
|
||||
|
||||
## Important Limitations
|
||||
|
||||
| Limit | Value |
|
||||
|-------|-------|
|
||||
| Max database size | **10 GB** |
|
||||
| Concurrent queries | Single-threaded (1 query/time) |
|
||||
| Batch operations | No more than 100k rows/query |
|
||||
| Max tables | No hard limit, but recommend < 100 |
|
||||
|
||||
**Throughput**: ~1000 queries/s if avg query = 1ms. Query 100ms = only 10 queries/s.
|
||||
|
||||
---
|
||||
|
||||
## Data Types
|
||||
|
||||
Same as SQLite:
|
||||
|
||||
| Storage Class | Use case |
|
||||
|---------------|----------|
|
||||
| `INTEGER` | PK, FK, counts, booleans |
|
||||
| `REAL` | Floats, decimals |
|
||||
| `TEXT` | Strings, dates, JSON |
|
||||
| `BLOB` | Binary |
|
||||
|
||||
**Date/Time**: Store as `TEXT` with ISO format: `datetime('now')`
|
||||
|
||||
---
|
||||
|
||||
## Metadata Tables (REQUIRED)
|
||||
|
||||
D1 has no native comments. Need to create metadata tables:
|
||||
|
||||
```sql
|
||||
CREATE TABLE metadata_tables (
|
||||
table_name TEXT PRIMARY KEY,
|
||||
description TEXT NOT NULL,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE TABLE metadata_columns (
|
||||
table_name TEXT NOT NULL,
|
||||
column_name TEXT NOT NULL,
|
||||
description TEXT NOT NULL,
|
||||
data_type TEXT,
|
||||
PRIMARY KEY (table_name, column_name),
|
||||
FOREIGN KEY (table_name) REFERENCES metadata_tables(table_name) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- After CREATE TABLE
|
||||
INSERT INTO metadata_tables VALUES ('orders', 'Orders table', datetime('now'));
|
||||
INSERT INTO metadata_columns VALUES
|
||||
('orders', 'id', 'PK', 'INTEGER'),
|
||||
('orders', 'status', 'pending|confirmed|shipped', 'TEXT');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Foreign Keys
|
||||
|
||||
```sql
|
||||
-- Enable FK (ON by default in D1, but should be explicit)
|
||||
PRAGMA foreign_keys = ON;
|
||||
|
||||
CREATE TABLE order_items (
|
||||
id INTEGER PRIMARY KEY,
|
||||
order_id INTEGER NOT NULL,
|
||||
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
|
||||
);
|
||||
CREATE INDEX idx_order_items_order ON order_items(order_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Performance
|
||||
- **Use indexes** - since single-threaded, queries must be fast
|
||||
- **Batch writes** to reduce latency
|
||||
- **Avoid large transactions** (>100k rows)
|
||||
- **Pagination** with LIMIT/OFFSET or cursor-based
|
||||
|
||||
### Monitoring
|
||||
- Track query duration
|
||||
- Monitor database size
|
||||
- Alert when approaching 10GB limit
|
||||
|
||||
---
|
||||
|
||||
## Example DDL
|
||||
|
||||
```sql
|
||||
-- Enable FK
|
||||
PRAGMA foreign_keys = ON;
|
||||
|
||||
CREATE TABLE orders (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
order_number TEXT NOT NULL UNIQUE,
|
||||
user_id INTEGER NOT NULL,
|
||||
status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'confirmed', 'shipped', 'cancelled')),
|
||||
total_amount REAL DEFAULT 0,
|
||||
created_at TEXT DEFAULT (datetime('now')),
|
||||
updated_at TEXT DEFAULT (datetime('now')),
|
||||
FOREIGN KEY (user_id) REFERENCES users(id)
|
||||
);
|
||||
|
||||
-- Indexes
|
||||
CREATE INDEX idx_orders_user ON orders(user_id);
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
CREATE INDEX idx_orders_updated ON orders(updated_at);
|
||||
|
||||
-- Metadata
|
||||
INSERT INTO metadata_tables (table_name, description)
|
||||
VALUES ('orders', 'Orders table');
|
||||
|
||||
INSERT INTO metadata_columns (table_name, column_name, description, data_type)
|
||||
VALUES
|
||||
('orders', 'id', 'PK', 'INTEGER'),
|
||||
('orders', 'order_number', 'Unique order code', 'TEXT'),
|
||||
('orders', 'status', 'pending|confirmed|shipped|cancelled', 'TEXT');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Database size < 10GB
|
||||
- [ ] Metadata tables for documentation
|
||||
- [ ] Index for every query (single-threaded!)
|
||||
- [ ] FK has index
|
||||
- [ ] Batch operations < 100k rows
|
||||
- [ ] Monitor size and query performance
|
||||
216
.opencode/skills/databases/stacks/mysql.md
Normal file
216
.opencode/skills/databases/stacks/mysql.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# MySQL Rules
|
||||
|
||||
Guidelines for designing schemas specific to MySQL / MariaDB.
|
||||
|
||||
---
|
||||
|
||||
## Data Types
|
||||
|
||||
| Use case | Type | Notes |
|
||||
|----------|------|-------|
|
||||
| Primary key | `BIGINT UNSIGNED AUTO_INCREMENT` | For large tables |
|
||||
| Primary key (small) | `INT UNSIGNED AUTO_INCREMENT` | Tables < 2 billion rows |
|
||||
| Foreign key | Match with PK type | |
|
||||
| Money/Price | `DECIMAL(18,2)` | **Do NOT use FLOAT/DOUBLE** |
|
||||
| Quantity | `INT` or `DECIMAL(18,4)` | |
|
||||
| Boolean | `TINYINT(1)` or `BOOLEAN` | |
|
||||
| Short text | `VARCHAR(n)` | n ≤ 255 for indexing |
|
||||
| Long text | `TEXT` | Cannot index directly |
|
||||
| JSON | `JSON` | MySQL 5.7+ |
|
||||
| Timestamp | `DATETIME` or `TIMESTAMP` | |
|
||||
| Date | `DATE` | |
|
||||
| UUID | `CHAR(36)` or `BINARY(16)` | |
|
||||
|
||||
### TIMESTAMP vs DATETIME
|
||||
- `TIMESTAMP`: Auto-converts timezone, range 1970-2038
|
||||
- `DATETIME`: No timezone conversion, range 1000-9999
|
||||
|
||||
---
|
||||
|
||||
## Comments (NO metadata tables needed)
|
||||
|
||||
MySQL supports comments directly in DDL:
|
||||
|
||||
### Table comment
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
...
|
||||
) COMMENT='Table storing customer orders';
|
||||
```
|
||||
|
||||
### Column comment
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
id BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT
|
||||
COMMENT 'Primary key, auto increment',
|
||||
order_number VARCHAR(50) NOT NULL
|
||||
COMMENT 'Unique order code, format: ORD-YYYYMMDD-XXXXX',
|
||||
status VARCHAR(32) NOT NULL DEFAULT 'pending'
|
||||
COMMENT 'Status: pending|confirmed|shipped|cancelled',
|
||||
total_amount DECIMAL(18,2) NOT NULL
|
||||
COMMENT 'Total value including tax'
|
||||
);
|
||||
```
|
||||
|
||||
### Query comments
|
||||
```sql
|
||||
-- Table comment
|
||||
SELECT TABLE_COMMENT FROM INFORMATION_SCHEMA.TABLES
|
||||
WHERE TABLE_SCHEMA = 'your_db' AND TABLE_NAME = 'orders';
|
||||
|
||||
-- Column comments
|
||||
SELECT COLUMN_NAME, COLUMN_COMMENT FROM INFORMATION_SCHEMA.COLUMNS
|
||||
WHERE TABLE_SCHEMA = 'your_db' AND TABLE_NAME = 'orders';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Index Types
|
||||
|
||||
### B-Tree (default)
|
||||
```sql
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
|
||||
CREATE UNIQUE INDEX idx_orders_number ON orders(order_number);
|
||||
```
|
||||
|
||||
### FULLTEXT (text search)
|
||||
```sql
|
||||
CREATE FULLTEXT INDEX idx_products_search ON products(name, description);
|
||||
|
||||
-- Query
|
||||
SELECT * FROM products
|
||||
WHERE MATCH(name, description) AGAINST('gaming laptop' IN NATURAL LANGUAGE MODE);
|
||||
```
|
||||
|
||||
### Notes
|
||||
- **Partial index**: Not natively supported in MySQL
|
||||
- Index prefix for TEXT: `CREATE INDEX idx ON table(col(255))`
|
||||
|
||||
---
|
||||
|
||||
## Auto Increment
|
||||
|
||||
```sql
|
||||
CREATE TABLE users (
|
||||
id BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT
|
||||
);
|
||||
|
||||
-- Set starting value
|
||||
ALTER TABLE users AUTO_INCREMENT = 1000;
|
||||
|
||||
-- Get last inserted ID
|
||||
SELECT LAST_INSERT_ID();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Auto update timestamp
|
||||
|
||||
```sql
|
||||
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ENUM vs VARCHAR
|
||||
|
||||
### Use ENUM when values are fixed and rarely change
|
||||
```sql
|
||||
status ENUM('pending', 'confirmed', 'shipped', 'cancelled') NOT NULL DEFAULT 'pending'
|
||||
```
|
||||
|
||||
### Use VARCHAR + CHECK when flexibility needed
|
||||
```sql
|
||||
status VARCHAR(32) NOT NULL DEFAULT 'pending',
|
||||
CONSTRAINT chk_orders_status CHECK (status IN ('pending', 'confirmed', 'shipped'))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Foreign Keys
|
||||
|
||||
```sql
|
||||
CREATE TABLE order_items (
|
||||
id BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
|
||||
order_id BIGINT UNSIGNED NOT NULL,
|
||||
product_id BIGINT UNSIGNED NOT NULL,
|
||||
|
||||
CONSTRAINT fk_orderitems_order
|
||||
FOREIGN KEY (order_id) REFERENCES orders(id)
|
||||
ON DELETE CASCADE ON UPDATE CASCADE,
|
||||
|
||||
CONSTRAINT fk_orderitems_product
|
||||
FOREIGN KEY (product_id) REFERENCES products(id)
|
||||
ON DELETE RESTRICT
|
||||
);
|
||||
|
||||
-- IMPORTANT: Index for FK
|
||||
CREATE INDEX idx_orderitems_order ON order_items(order_id);
|
||||
CREATE INDEX idx_orderitems_product ON order_items(product_id);
|
||||
```
|
||||
|
||||
**Note:** FK only works with **InnoDB** engine.
|
||||
|
||||
---
|
||||
|
||||
## Partitioning
|
||||
|
||||
### Range partitioning (by date)
|
||||
```sql
|
||||
CREATE TABLE fact_orders (
|
||||
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
|
||||
order_date DATE NOT NULL,
|
||||
...
|
||||
PRIMARY KEY (id, order_date)
|
||||
) PARTITION BY RANGE (YEAR(order_date)) (
|
||||
PARTITION p2023 VALUES LESS THAN (2024),
|
||||
PARTITION p2024 VALUES LESS THAN (2025),
|
||||
PARTITION pmax VALUES LESS THAN MAXVALUE
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example DDL
|
||||
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
id BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT
|
||||
COMMENT 'Primary key',
|
||||
order_number VARCHAR(50) NOT NULL
|
||||
COMMENT 'Unique order code',
|
||||
user_id BIGINT UNSIGNED NOT NULL
|
||||
COMMENT 'FK to users.id',
|
||||
status ENUM('pending', 'confirmed', 'shipped', 'cancelled', 'completed')
|
||||
NOT NULL DEFAULT 'pending'
|
||||
COMMENT 'Order status',
|
||||
subtotal DECIMAL(18,2) NOT NULL DEFAULT 0,
|
||||
discount_amount DECIMAL(18,2) NOT NULL DEFAULT 0,
|
||||
total_amount DECIMAL(18,2) NOT NULL DEFAULT 0,
|
||||
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
|
||||
|
||||
UNIQUE KEY uk_orders_number (order_number),
|
||||
KEY idx_orders_user (user_id),
|
||||
KEY idx_orders_status (status),
|
||||
KEY idx_orders_user_status_created (user_id, status, created_at DESC),
|
||||
|
||||
CONSTRAINT fk_orders_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
|
||||
) ENGINE=InnoDB
|
||||
DEFAULT CHARSET=utf8mb4
|
||||
COLLATE=utf8mb4_unicode_ci
|
||||
COMMENT='Table storing order information';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Engine: `InnoDB` (required for FK)
|
||||
- [ ] Charset: `utf8mb4` + `utf8mb4_unicode_ci`
|
||||
- [ ] Comments for tables and columns
|
||||
- [ ] All FKs have indexes
|
||||
- [ ] Use `DECIMAL` for money, not `FLOAT`
|
||||
- [ ] `ON UPDATE CURRENT_TIMESTAMP` for `updated_at`
|
||||
235
.opencode/skills/databases/stacks/postgres.md
Normal file
235
.opencode/skills/databases/stacks/postgres.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# PostgreSQL Rules
|
||||
|
||||
Guidelines for designing schemas specific to PostgreSQL.
|
||||
|
||||
---
|
||||
|
||||
## Data Types
|
||||
|
||||
| Use case | Type | Notes |
|
||||
|----------|------|-------|
|
||||
| Primary key | `BIGSERIAL` | Auto-increment BIGINT |
|
||||
| Primary key (small) | `SERIAL` | Auto-increment INT |
|
||||
| Foreign key | Match with PK type | |
|
||||
| Money/Price | `NUMERIC(18,2)` | Precise |
|
||||
| Boolean | `BOOLEAN` | TRUE/FALSE/NULL |
|
||||
| Short text | `VARCHAR(n)` or `TEXT` | TEXT has no limit |
|
||||
| JSON | `JSONB` | **Queryable**, prefer over JSON |
|
||||
| Timestamp | `TIMESTAMPTZ` | **Always use with timezone** |
|
||||
| Date | `DATE` | |
|
||||
| UUID | `UUID` | Native type |
|
||||
| Array | `TEXT[]`, `INT[]` | |
|
||||
| IP Address | `INET` | |
|
||||
|
||||
### TIMESTAMPTZ vs TIMESTAMP
|
||||
**IMPORTANT**: Always use `TIMESTAMPTZ` to avoid timezone issues.
|
||||
|
||||
---
|
||||
|
||||
## Comments (NO metadata tables needed)
|
||||
|
||||
PostgreSQL supports `COMMENT ON`:
|
||||
|
||||
### Table comment
|
||||
```sql
|
||||
COMMENT ON TABLE orders IS 'Table storing customer orders';
|
||||
```
|
||||
|
||||
### Column comments
|
||||
```sql
|
||||
COMMENT ON COLUMN orders.id IS 'Primary key, auto increment';
|
||||
COMMENT ON COLUMN orders.order_number IS 'Unique order code, format: ORD-YYYYMMDD-XXXXX';
|
||||
COMMENT ON COLUMN orders.status IS 'Status: pending|confirmed|shipped|cancelled';
|
||||
```
|
||||
|
||||
### Query comments
|
||||
```sql
|
||||
-- Table comment
|
||||
SELECT obj_description('orders'::regclass);
|
||||
|
||||
-- Column comments
|
||||
SELECT
|
||||
c.column_name,
|
||||
pgd.description
|
||||
FROM information_schema.columns c
|
||||
LEFT JOIN pg_catalog.pg_statio_all_tables st
|
||||
ON c.table_schema = st.schemaname AND c.table_name = st.relname
|
||||
LEFT JOIN pg_catalog.pg_description pgd
|
||||
ON pgd.objoid = st.relid AND pgd.objsubid = c.ordinal_position
|
||||
WHERE c.table_name = 'orders';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Index Types
|
||||
|
||||
### B-Tree (default)
|
||||
```sql
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
|
||||
```
|
||||
|
||||
### Partial Index (very useful)
|
||||
```sql
|
||||
-- Only index non-deleted records
|
||||
CREATE INDEX idx_orders_active ON orders(user_id, status)
|
||||
WHERE deleted_at IS NULL;
|
||||
|
||||
-- Only index pending orders
|
||||
CREATE INDEX idx_orders_pending ON orders(created_at)
|
||||
WHERE status = 'pending';
|
||||
```
|
||||
|
||||
### GIN (for JSONB, Arrays, Full-text)
|
||||
```sql
|
||||
CREATE INDEX idx_orders_metadata ON orders USING GIN(metadata);
|
||||
CREATE INDEX idx_products_tags ON products USING GIN(tags);
|
||||
|
||||
-- Full-text search
|
||||
CREATE INDEX idx_products_search ON products
|
||||
USING GIN(to_tsvector('english', name || ' ' || description));
|
||||
```
|
||||
|
||||
### BRIN (for large tables with ordered data)
|
||||
```sql
|
||||
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sequences (Auto Increment)
|
||||
|
||||
```sql
|
||||
-- SERIAL/BIGSERIAL auto-creates sequence
|
||||
CREATE TABLE users (id BIGSERIAL PRIMARY KEY);
|
||||
|
||||
-- Identity columns (PostgreSQL 10+)
|
||||
CREATE TABLE users (
|
||||
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY
|
||||
);
|
||||
|
||||
-- Custom sequence
|
||||
CREATE SEQUENCE order_number_seq START 1000;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Auto update timestamp (requires trigger)
|
||||
|
||||
PostgreSQL **does not have** `ON UPDATE CURRENT_TIMESTAMP`. Need to create trigger:
|
||||
|
||||
```sql
|
||||
-- Function
|
||||
CREATE OR REPLACE FUNCTION update_updated_at_column()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
NEW.updated_at = CURRENT_TIMESTAMP;
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
-- Apply to table
|
||||
CREATE TRIGGER trg_orders_updated_at
|
||||
BEFORE UPDATE ON orders
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION update_updated_at_column();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ENUM Type
|
||||
|
||||
```sql
|
||||
-- Create enum type
|
||||
CREATE TYPE order_status AS ENUM (
|
||||
'pending', 'confirmed', 'shipped', 'cancelled', 'completed'
|
||||
);
|
||||
|
||||
-- Use in table
|
||||
CREATE TABLE orders (
|
||||
status order_status NOT NULL DEFAULT 'pending'
|
||||
);
|
||||
|
||||
-- Add new value
|
||||
ALTER TYPE order_status ADD VALUE 'refunded' AFTER 'completed';
|
||||
```
|
||||
|
||||
**Note:** Cannot remove values from ENUM.
|
||||
|
||||
---
|
||||
|
||||
## Partitioning
|
||||
|
||||
### Range partitioning
|
||||
```sql
|
||||
CREATE TABLE fact_orders (
|
||||
id BIGSERIAL,
|
||||
order_date DATE NOT NULL,
|
||||
...
|
||||
) PARTITION BY RANGE (order_date);
|
||||
|
||||
CREATE TABLE fact_orders_2024 PARTITION OF fact_orders
|
||||
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
|
||||
|
||||
CREATE TABLE fact_orders_default PARTITION OF fact_orders DEFAULT;
|
||||
```
|
||||
|
||||
### Hash partitioning
|
||||
```sql
|
||||
CREATE TABLE logs PARTITION BY HASH (user_id);
|
||||
CREATE TABLE logs_0 PARTITION OF logs FOR VALUES WITH (MODULUS 4, REMAINDER 0);
|
||||
CREATE TABLE logs_1 PARTITION OF logs FOR VALUES WITH (MODULUS 4, REMAINDER 1);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example DDL
|
||||
|
||||
```sql
|
||||
-- Create enum
|
||||
CREATE TYPE order_status AS ENUM ('pending', 'confirmed', 'shipped', 'cancelled');
|
||||
|
||||
-- Create table
|
||||
CREATE TABLE orders (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
order_number VARCHAR(50) NOT NULL,
|
||||
user_id BIGINT NOT NULL,
|
||||
status order_status NOT NULL DEFAULT 'pending',
|
||||
subtotal NUMERIC(18,2) NOT NULL DEFAULT 0,
|
||||
discount_amount NUMERIC(18,2) NOT NULL DEFAULT 0,
|
||||
total_amount NUMERIC(18,2) NOT NULL DEFAULT 0,
|
||||
metadata JSONB DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
CONSTRAINT uk_orders_number UNIQUE (order_number),
|
||||
CONSTRAINT fk_orders_user FOREIGN KEY (user_id) REFERENCES users(id)
|
||||
);
|
||||
|
||||
-- Indexes
|
||||
CREATE INDEX idx_orders_user ON orders(user_id);
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
|
||||
CREATE INDEX idx_orders_metadata ON orders USING GIN(metadata);
|
||||
|
||||
-- Comments
|
||||
COMMENT ON TABLE orders IS 'Table storing order information';
|
||||
COMMENT ON COLUMN orders.order_number IS 'Unique order code';
|
||||
COMMENT ON COLUMN orders.status IS 'Order status';
|
||||
|
||||
-- Trigger for updated_at
|
||||
CREATE TRIGGER trg_orders_updated_at
|
||||
BEFORE UPDATE ON orders
|
||||
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Use `TIMESTAMPTZ` instead of `TIMESTAMP`
|
||||
- [ ] Use `JSONB` instead of `JSON`
|
||||
- [ ] `COMMENT ON` for tables and columns
|
||||
- [ ] Trigger for `updated_at`
|
||||
- [ ] Partial indexes for soft delete / filtered queries
|
||||
- [ ] GIN index for JSONB columns
|
||||
244
.opencode/skills/databases/stacks/sqlite.md
Normal file
244
.opencode/skills/databases/stacks/sqlite.md
Normal file
@@ -0,0 +1,244 @@
|
||||
# SQLite Rules
|
||||
|
||||
Guidelines for designing schemas specific to SQLite.
|
||||
|
||||
---
|
||||
|
||||
## Data Types
|
||||
|
||||
SQLite has **dynamic typing**, with only 5 storage classes:
|
||||
|
||||
| Storage Class | Use case | Notes |
|
||||
|---------------|----------|-------|
|
||||
| `INTEGER` | PK, FK, counts, booleans | |
|
||||
| `REAL` | Floats, decimals | |
|
||||
| `TEXT` | Strings, dates, JSON, enums | |
|
||||
| `BLOB` | Binary data | |
|
||||
| `NULL` | Null values | |
|
||||
|
||||
### Type affinity
|
||||
SQLite does not strictly enforce types. Type declaration is just a "hint":
|
||||
```sql
|
||||
-- All of these will be stored
|
||||
CREATE TABLE test (price REAL);
|
||||
INSERT INTO test VALUES (100); -- Stored as INTEGER
|
||||
INSERT INTO test VALUES (99.99); -- Stored as REAL
|
||||
INSERT INTO test VALUES ('free'); -- Stored as TEXT (!)
|
||||
```
|
||||
|
||||
### Date/Time
|
||||
SQLite **does not have** DATE/TIME type. Store as TEXT with ISO format:
|
||||
```sql
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
-- Format: YYYY-MM-DD HH:MM:SS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Metadata Tables (REQUIRED)
|
||||
|
||||
SQLite **does not support comments** on tables and columns. Need to create metadata tables:
|
||||
|
||||
### Create metadata tables
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS metadata_tables (
|
||||
table_name TEXT PRIMARY KEY,
|
||||
description TEXT NOT NULL,
|
||||
created_at TEXT DEFAULT (datetime('now')),
|
||||
updated_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS metadata_columns (
|
||||
table_name TEXT NOT NULL,
|
||||
column_name TEXT NOT NULL,
|
||||
description TEXT NOT NULL,
|
||||
data_type TEXT,
|
||||
is_nullable INTEGER DEFAULT 1, -- 0=NOT NULL, 1=NULL
|
||||
default_value TEXT,
|
||||
created_at TEXT DEFAULT (datetime('now')),
|
||||
PRIMARY KEY (table_name, column_name),
|
||||
FOREIGN KEY (table_name) REFERENCES metadata_tables(table_name) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_metadata_columns_table ON metadata_columns(table_name);
|
||||
```
|
||||
|
||||
### Insert metadata after creating table
|
||||
|
||||
```sql
|
||||
-- Table metadata
|
||||
INSERT INTO metadata_tables (table_name, description)
|
||||
VALUES ('orders', 'Table storing customer order information');
|
||||
|
||||
-- Column metadata
|
||||
INSERT INTO metadata_columns (table_name, column_name, description, data_type, is_nullable)
|
||||
VALUES
|
||||
('orders', 'id', 'Primary key, auto increment', 'INTEGER', 0),
|
||||
('orders', 'order_number', 'Unique order code', 'TEXT', 0),
|
||||
('orders', 'user_id', 'FK to users.id', 'INTEGER', 0),
|
||||
('orders', 'status', 'pending|confirmed|shipped|cancelled', 'TEXT', 0),
|
||||
('orders', 'total_amount', 'Total value', 'REAL', 0);
|
||||
```
|
||||
|
||||
### Query metadata
|
||||
|
||||
```sql
|
||||
-- View schema with descriptions
|
||||
SELECT
|
||||
t.table_name,
|
||||
t.description AS table_description,
|
||||
c.column_name,
|
||||
c.description AS column_description,
|
||||
c.data_type
|
||||
FROM metadata_tables t
|
||||
LEFT JOIN metadata_columns c ON t.table_name = c.table_name
|
||||
ORDER BY t.table_name, c.column_name;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Auto Increment
|
||||
|
||||
```sql
|
||||
-- Option 1: INTEGER PRIMARY KEY (recommended)
|
||||
-- SQLite auto-increments INTEGER PRIMARY KEY automatically
|
||||
CREATE TABLE users (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name TEXT
|
||||
);
|
||||
|
||||
-- Option 2: AUTOINCREMENT keyword
|
||||
-- Ensures deleted IDs are not reused (slower)
|
||||
CREATE TABLE users (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Foreign Keys
|
||||
|
||||
**IMPORTANT**: FK is disabled by default in SQLite!
|
||||
|
||||
```sql
|
||||
-- Enable FK for each connection
|
||||
PRAGMA foreign_keys = ON;
|
||||
|
||||
-- Create table with FK
|
||||
CREATE TABLE order_items (
|
||||
id INTEGER PRIMARY KEY,
|
||||
order_id INTEGER NOT NULL,
|
||||
product_id INTEGER NOT NULL,
|
||||
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT
|
||||
);
|
||||
|
||||
-- Create index for FK
|
||||
CREATE INDEX idx_order_items_order ON order_items(order_id);
|
||||
CREATE INDEX idx_order_items_product ON order_items(product_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ENUM substitute
|
||||
|
||||
SQLite does not have ENUM. Use CHECK constraint:
|
||||
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
CHECK (status IN ('pending', 'confirmed', 'shipped', 'cancelled'))
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Index Types
|
||||
|
||||
SQLite only supports B-Tree indexes:
|
||||
|
||||
```sql
|
||||
-- Standard index
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
|
||||
-- Composite index
|
||||
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
|
||||
|
||||
-- Unique index
|
||||
CREATE UNIQUE INDEX idx_orders_number ON orders(order_number);
|
||||
|
||||
-- Partial index (SQLite 3.8.0+)
|
||||
CREATE INDEX idx_orders_active ON orders(user_id)
|
||||
WHERE deleted_at IS NULL;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
| Feature | Status | Workaround |
|
||||
|---------|--------|------------|
|
||||
| `ALTER COLUMN` | ❌ Not supported | Recreate table |
|
||||
| `DROP COLUMN` | ✅ SQLite 3.35+ | Recreate table (older) |
|
||||
| Comments | ❌ Not supported | Metadata tables |
|
||||
| ENUM | ❌ Not supported | CHECK constraint |
|
||||
| Stored procedures | ❌ Not supported | App logic |
|
||||
| Concurrent writes | ⚠️ Limited | Single writer |
|
||||
|
||||
---
|
||||
|
||||
## Example DDL
|
||||
|
||||
```sql
|
||||
-- Enable foreign keys
|
||||
PRAGMA foreign_keys = ON;
|
||||
|
||||
-- Create table
|
||||
CREATE TABLE orders (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
order_number TEXT NOT NULL,
|
||||
user_id INTEGER NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
CHECK (status IN ('pending', 'confirmed', 'shipped', 'cancelled')),
|
||||
subtotal REAL NOT NULL DEFAULT 0,
|
||||
discount_amount REAL NOT NULL DEFAULT 0,
|
||||
total_amount REAL NOT NULL DEFAULT 0,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
deleted_at TEXT,
|
||||
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
|
||||
);
|
||||
|
||||
-- Indexes
|
||||
CREATE UNIQUE INDEX idx_orders_number ON orders(order_number);
|
||||
CREATE INDEX idx_orders_user ON orders(user_id);
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
CREATE INDEX idx_orders_updated ON orders(updated_at);
|
||||
|
||||
-- Metadata
|
||||
INSERT INTO metadata_tables (table_name, description)
|
||||
VALUES ('orders', 'Table storing order information');
|
||||
|
||||
INSERT INTO metadata_columns (table_name, column_name, description, data_type, is_nullable)
|
||||
VALUES
|
||||
('orders', 'id', 'Primary key', 'INTEGER', 0),
|
||||
('orders', 'order_number', 'Unique order code', 'TEXT', 0),
|
||||
('orders', 'user_id', 'FK to users.id', 'INTEGER', 0),
|
||||
('orders', 'status', 'pending|confirmed|shipped|cancelled', 'TEXT', 0),
|
||||
('orders', 'total_amount', 'Total payment', 'REAL', 0);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] `PRAGMA foreign_keys = ON` at start of each connection
|
||||
- [ ] Metadata tables created
|
||||
- [ ] INSERT metadata after each CREATE TABLE
|
||||
- [ ] Use `TEXT` for dates (ISO format: YYYY-MM-DD HH:MM:SS)
|
||||
- [ ] CHECK constraint instead of ENUM
|
||||
- [ ] Index for FK columns
|
||||
- [ ] Update metadata when ALTER/DROP table
|
||||
176
.opencode/skills/databases/transactional.md
Normal file
176
.opencode/skills/databases/transactional.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Transactional (OLTP) Rules
|
||||
|
||||
> **Note:** Core naming conventions, workflow, and checklist are in `SKILL.md` or `db-design.md` (always loaded).
|
||||
|
||||
Guidelines for designing schemas for day-to-day business operations.
|
||||
|
||||
---
|
||||
|
||||
## Normalization Principles
|
||||
|
||||
### Prefer 3NF (Third Normal Form)
|
||||
|
||||
- Each table represents one clear entity/relationship
|
||||
- No repeating information that can be referenced (use FK)
|
||||
- Clear separation:
|
||||
- `orders` (header) vs `order_items` (line items)
|
||||
- `products` vs `product_variants`, `product_prices`
|
||||
|
||||
### Foreign Key Constraints
|
||||
|
||||
Use FK with appropriate ON DELETE / ON UPDATE:
|
||||
|
||||
```sql
|
||||
-- Cascade: delete order → delete order_items
|
||||
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
|
||||
|
||||
-- Restrict: cannot delete user if orders exist
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
|
||||
|
||||
-- Set null: delete category → product.category_id = NULL
|
||||
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE SET NULL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Indexing Rules
|
||||
|
||||
### 1. Primary Key
|
||||
- Usually `BIGINT` auto-increment or UUID
|
||||
- Format: `PRIMARY KEY (id)`
|
||||
|
||||
### 2. Foreign Key Indexes
|
||||
**IMPORTANT**: Create indexes for ALL foreign keys for efficient JOINs:
|
||||
```sql
|
||||
CREATE INDEX idx_orders_user_id ON orders(user_id);
|
||||
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
|
||||
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
|
||||
```
|
||||
|
||||
### 3. Frequently Filtered Columns
|
||||
Index columns commonly used in WHERE:
|
||||
- `status`, `created_at`, `updated_at`
|
||||
- Code/reference columns: `order_number`, `sku`
|
||||
|
||||
### 4. Composite Indexes
|
||||
Based on actual query patterns:
|
||||
```sql
|
||||
-- Query: WHERE user_id = ? AND status = ? ORDER BY created_at DESC
|
||||
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
|
||||
|
||||
-- Query: WHERE store_id = ? AND created_at BETWEEN ...
|
||||
CREATE INDEX idx_orders_store_created ON orders(store_id, created_at);
|
||||
```
|
||||
|
||||
**Composite index rules:**
|
||||
- Put columns with **high selectivity** (fewer duplicate values) first
|
||||
- Avoid duplicate/redundant indexes
|
||||
- Index should cover WHERE + ORDER BY of query
|
||||
|
||||
### 5. Unique Constraints
|
||||
```sql
|
||||
UNIQUE (order_number)
|
||||
UNIQUE (sku)
|
||||
UNIQUE (user_id, email) -- compound unique
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Soft Delete Pattern
|
||||
|
||||
When you need to keep deleted data instead of permanently deleting:
|
||||
|
||||
```sql
|
||||
-- Add deleted_at column
|
||||
deleted_at TIMESTAMP NULL
|
||||
|
||||
-- Partial index for non-deleted records (PostgreSQL)
|
||||
CREATE INDEX idx_orders_active ON orders(user_id, status)
|
||||
WHERE deleted_at IS NULL;
|
||||
|
||||
-- Query only active records
|
||||
SELECT * FROM orders WHERE deleted_at IS NULL;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-patterns to Avoid
|
||||
|
||||
### Missing FK Index
|
||||
```sql
|
||||
-- ❌ BAD: FK without index → slow JOINs
|
||||
FOREIGN KEY (user_id) REFERENCES users(id)
|
||||
-- Forgot CREATE INDEX
|
||||
```
|
||||
|
||||
### Over-indexing
|
||||
```sql
|
||||
-- ❌ BAD: Indexing each column separately
|
||||
CREATE INDEX idx_a ON orders(user_id);
|
||||
CREATE INDEX idx_b ON orders(status);
|
||||
CREATE INDEX idx_c ON orders(created_at);
|
||||
|
||||
-- ✅ GOOD: Composite index based on query pattern
|
||||
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
|
||||
```
|
||||
|
||||
### Using TEXT instead of ENUM
|
||||
```sql
|
||||
-- ❌ BAD: Cannot validate values
|
||||
status TEXT
|
||||
|
||||
-- ✅ GOOD: Use ENUM or CHECK
|
||||
status ENUM('pending', 'confirmed', 'shipped', 'cancelled')
|
||||
-- or
|
||||
status VARCHAR(32) CHECK (status IN ('pending', 'confirmed', 'shipped'))
|
||||
```
|
||||
|
||||
### Missing Audit Columns
|
||||
```sql
|
||||
-- ❌ BAD
|
||||
CREATE TABLE products (id INT, name VARCHAR(255));
|
||||
|
||||
-- ✅ GOOD
|
||||
CREATE TABLE products (
|
||||
id INT,
|
||||
name VARCHAR(255),
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example DDL
|
||||
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
id BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
order_number VARCHAR(50) NOT NULL,
|
||||
user_id BIGINT NOT NULL,
|
||||
status VARCHAR(32) NOT NULL DEFAULT 'pending',
|
||||
subtotal DECIMAL(18,2) NOT NULL DEFAULT 0,
|
||||
discount_amount DECIMAL(18,2) NOT NULL DEFAULT 0,
|
||||
total_amount DECIMAL(18,2) NOT NULL DEFAULT 0,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
|
||||
|
||||
UNIQUE (order_number),
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
|
||||
);
|
||||
|
||||
CREATE INDEX idx_orders_user_id ON orders(user_id);
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Audit columns: `created_at`, `updated_at`
|
||||
- [ ] All FKs have indexes
|
||||
- [ ] Unique constraints for business keys (`order_number`, `sku`, `email`)
|
||||
- [ ] ENUM or CHECK for status/type columns
|
||||
- [ ] Composite index based on main query patterns
|
||||
- [ ] Soft delete if needed: `deleted_at` + partial index
|
||||
Reference in New Issue
Block a user