# Analytics (OLAP) Rules > **Note:** Core naming conventions, workflow, and checklist are in `SKILL.md` or `db-design.md` (always loaded). Guidelines for designing schemas for statistics and reporting tables. --- ## General Principles - **Separate** from transactional tables - don't mix analytics logic into business tables - When heavy analytics queries/aggregations repeat → create separate tables - Use **Star Schema**: Fact tables at center, Dimension tables around --- ## Design Process ### 1. Analyze Statistics Requirements Ask user to clarify: - **Analysis dimensions**: by date, by customer, by product, by channel, by region? - **Granularity**: per order, per item, per day, per month? - **Metrics**: order_count, revenue, margin, conversion_rate, avg_order_value? ### 2. Define Fact Granularity **Important**: What does 1 row in fact table represent? | Fact Table | Granularity | Use case | |------------|-------------|----------| | `fact_orders` | 1 row = 1 order | Statistics by order | | `fact_order_items` | 1 row = 1 order item | Statistics by product | | `fact_daily_sales` | 1 row = 1 day + store | Daily summary | ### 3. Identify Required Dimensions Create separate dim table when: - Reused in multiple places - Has many descriptive attributes - Subject to slow changes (Slowly Changing Dimension) --- ## Fact Tables ### Fact table structure ```sql CREATE TABLE fact_orders ( fact_id BIGINT PRIMARY KEY AUTO_INCREMENT, -- Dimension keys date_key INT NOT NULL, -- FK to dim_date customer_key BIGINT NOT NULL, -- FK to dim_customer store_key INT, channel_key INT, -- Degenerate dimensions (no separate dim needed) order_id BIGINT NOT NULL, order_number VARCHAR(50), -- Measures item_count INT NOT NULL, gross_amount DECIMAL(18,2) NOT NULL, discount_amount DECIMAL(18,2) DEFAULT 0, net_amount DECIMAL(18,2) NOT NULL, INDEX idx_fact_orders_date (date_key), INDEX idx_fact_orders_customer (customer_key), INDEX idx_fact_orders_date_store (date_key, store_key) ); ``` --- ## Dimension Tables ### dim_date (required for every analytics schema) ```sql CREATE TABLE dim_date ( date_key INT PRIMARY KEY, -- Format: YYYYMMDD (20241215) full_date DATE NOT NULL, year INT NOT NULL, quarter INT NOT NULL, -- 1-4 month INT NOT NULL, -- 1-12 month_name VARCHAR(20), -- 'January', 'February' week_of_year INT NOT NULL, day_of_month INT NOT NULL, day_of_week INT NOT NULL, -- 1=Monday, 7=Sunday day_name VARCHAR(20), is_weekend BOOLEAN NOT NULL, is_holiday BOOLEAN DEFAULT FALSE, UNIQUE (full_date) ); -- Pre-populate for multiple years (2020-2030) ``` ### dim_customer ```sql CREATE TABLE dim_customer ( customer_key BIGINT PRIMARY KEY AUTO_INCREMENT, -- Surrogate key customer_id BIGINT NOT NULL, -- Natural key from users customer_name VARCHAR(255), email VARCHAR(255), segment VARCHAR(50), -- 'VIP', 'Regular', 'New' city VARCHAR(100), region VARCHAR(100), first_order_date DATE, -- SCD Type 2 columns (if history needed) effective_from DATE NOT NULL, effective_to DATE, is_current BOOLEAN DEFAULT TRUE, INDEX idx_dim_customer_id (customer_id), INDEX idx_dim_customer_current (is_current, customer_id) ); ``` --- ## Summary Tables (Pre-aggregated) When pre-aggregation needed for dashboard performance: ```sql CREATE TABLE summary_daily_sales ( id BIGINT PRIMARY KEY AUTO_INCREMENT, date_key INT NOT NULL, store_key INT, channel_key INT, -- Pre-aggregated measures order_count INT NOT NULL, item_count INT NOT NULL, gross_revenue DECIMAL(18,2) NOT NULL, net_revenue DECIMAL(18,2) NOT NULL, unique_customers INT NOT NULL, avg_order_value DECIMAL(18,2), UNIQUE (date_key, store_key, channel_key), INDEX idx_summary_date (date_key) ); ``` --- ## Slowly Changing Dimensions (SCD) ### Type 1 - Overwrite Overwrite old value, no history kept: ```sql UPDATE dim_customer SET segment = 'VIP' WHERE customer_id = 123; ``` ### Type 2 - Add new row (Recommended when history needed) ```sql -- 1. Close old row UPDATE dim_customer SET effective_to = CURRENT_DATE - 1, is_current = FALSE WHERE customer_id = 123 AND is_current = TRUE; -- 2. Add new row INSERT INTO dim_customer (customer_id, segment, effective_from, is_current) VALUES (123, 'VIP', CURRENT_DATE, TRUE); ``` --- ## Indexing for Analytics ### Fact tables - Index FKs to dimensions: `date_key`, `customer_key`, `product_key` - Composite index based on query patterns: `INDEX (date_key, store_key)` ### Dimension tables - PK: surrogate key - Index natural key: `customer_id`, `product_id` - Index for SCD: `(is_current, customer_id)` --- ## Naming Convention - Fact tables: `fact_*` or `fct_*` - Dimension tables: `dim_*` - Summary tables: `summary_*` or `agg_*` --- ## Checklist - [ ] Granularity defined for each fact table - [ ] dim_date exists or created (pre-populate multiple years) - [ ] Surrogate keys for dimensions - [ ] Index FKs in fact tables - [ ] SCD strategy for changing dimensions (Type 1 or Type 2) - [ ] Naming: `fact_*`, `dim_*`, `summary_*` - [ ] Refresh strategy: see [incremental-etl.md](incremental-etl.md)