# CLAUDE.md — CDP Analytics Service > You are a senior software engineer building the **Analytics & Data Layer** for a self-hosted CDP platform. > This service focuses on **query, explore, and activate** data already ingested into ClickHouse. > > **Scope boundary**: Read-side only. Never write raw events. Ingestion is handled by `cdp-ingestion`. --- ## What This Service Does Exposes ingested event data via Query API for exploration and analysis. Computes Traits and Audience Segments from event history via background workers. Activates segments to external tools via Reverse ETL and webhooks. --- ## Repository Layout ``` cdp-analytics/ ├── api/ # Go — Query API, Profile API (port 4000) ├── workers/ # Go — Background jobs: Computed Traits, Segment refresh ├── console/ # React + Vite + shadcn/ui + Tailwind — Analytics UI └── infra/ ├── migrations/ # PostgreSQL migrations (golang-migrate) └── clickhouse/ # ClickHouse query templates (.sql files) ``` --- ## Tech Stack ### Go Services (api, workers) | Concern | Library | Notes | |---------|---------|-------| | HTTP router | `chi` | Lightweight, stdlib-compatible middleware | | Logger | `zap` | Structured logging, fastest | | PostgreSQL | `pgx/v5` | Native driver, no database/sql wrapper | | ClickHouse | `clickhouse-go/v2` | Official driver, native protocol, good batch support | | Redis | `rueidis` | Modern client, faster than go-redis | | Job queue | `riverqueue/river` | Postgres-backed, pgx/v5 native, built-in scheduler + retry | | Config | `caarlos0/env` | Parse env vars into structs, zero deps | | Validation | `go-playground/validator/v10` | Struct tags validation | | Migration | `golang-migrate` + pgx driver | CLI only — never auto-migrate on startup | | Test assertion | `testify` | assert + require + mock | | Integration test | `testcontainers-go` | Real PG / Redis / ClickHouse in tests | ### React Console (console/) | Concern | Library | |---------|---------| | Build | Vite | | UI components | shadcn/ui + Tailwind | | Routing | React Router v6 | | Server state | TanStack Query | | Client state | Zustand | | Forms | react-hook-form + zod | | Charts | Recharts | | Icons | lucide-react | > **No new technology** without discussion. All additions must justify why existing stack cannot handle it. --- ## Go Project Structure ### api/ ``` api/ ├── cmd/ │ └── server/ │ └── main.go # wire everything, start server └── internal/ ├── handler/ # HTTP handlers — parse request, call service, write response ├── service/ # business logic — no HTTP, no DB concerns ├── repo/ # DB queries — PostgreSQL via pgx, ClickHouse via clickhouse-go ├── middleware/ # auth, request ID, logging └── config/ # env parsing via caarlos0/env ``` ### workers/ ``` workers/ ├── cmd/ │ └── worker/ │ └── main.go # register jobs, start river worker └── internal/ ├── job/ # job definitions (ComputeTraitsJob, RefreshSegmentJob, ReverseETLJob) ├── handler/ # job handlers — business logic per job type ├── repo/ # DB queries shared across job handlers └── config/ ``` Rules: - `handler` depends on `service` (api) or `handler` on `repo` (workers). Never reverse. - `handler` never touches DB directly in api/. - `service` never imports `chi` or any HTTP package. - `repo` returns domain types, never raw `pgx.Rows` or `driver.Rows`. - ClickHouse queries live as `.sql` files in `infra/clickhouse/` — no inline SQL strings for complex queries. --- ## Error Handling Same `AppError` pattern as ingestion. Never return raw `pgx` or `clickhouse-go` errors to handlers. ```go // internal/apperr/apperr.go type AppError struct { Code int // HTTP status code to return Message string // user-facing message (safe to expose) Field string // optional: which field caused the error Err error // original error for logging (not exposed to user) } func (e *AppError) Error() string { return e.Message } func (e *AppError) Unwrap() error { return e.Err } // Constructors func BadRequest(msg, field string, err error) *AppError func NotFound(msg string) *AppError func Forbidden(msg string) *AppError func Internal(err error) *AppError ``` Handler pattern — one place handles all errors: ```go func writeError(w http.ResponseWriter, err error) { var appErr *apperr.AppError if errors.As(err, &appErr) { render.JSON(w, appErr.Code, ErrorResponse{Error: appErr.Message, Field: appErr.Field}) return } render.JSON(w, 500, ErrorResponse{Error: "internal server error"}) } ``` --- ## ClickHouse Query Pattern Use raw SQL only. No query builder — ClickHouse SQL has its own syntax that builders handle poorly. ``` infra/clickhouse/ ├── event_explorer.sql ├── funnel_analysis.sql ├── retention_cohort.sql └── session_analysis.sql ``` Load templates at startup, inject parameters safely: ```go // Never fmt.Sprintf into SQL — use named parameters query, err := templates.Load("funnel_analysis.sql") rows, err := chConn.Query(ctx, query, clickhouse.Named("workspace_id", id), ...) ``` Rules: - All ClickHouse queries must have a corresponding `.sql` file in `infra/clickhouse/` - No multi-line SQL strings inline in Go code - Every ClickHouse schema change must have a DDL file in `infra/clickhouse/` --- ## Job Queue (river) Background workers use `riverqueue/river` backed by PostgreSQL. ```go // Define a job type ComputeTraitsArgs struct { WorkspaceID string `json:"workspace_id"` TraitID string `json:"trait_id"` } func (ComputeTraitsArgs) Kind() string { return "compute_traits" } // Register handler river.AddWorker(workers, &ComputeTraitsWorker{repo: repo}) // Enqueue client.Insert(ctx, ComputeTraitsArgs{WorkspaceID: "ws_123", TraitID: "t_456"}, nil) ``` Scheduled jobs (periodic): ```go // Hourly trait recompute, hourly segment refresh &river.PeriodicJob{ ScheduleFunc: river.ScheduleFunc(func(t time.Time) time.Time { return t.Add(time.Hour) }), ConstructorFunc: func() (river.JobArgs, *river.InsertOpts) { return ComputeTraitsArgs{}, nil }, } ``` Rules: - Workers must be idempotent — river may retry on failure - Use `river`'s built-in retry with exponential backoff, do not implement custom retry - Log job start, job end, duration, and error with full context (job_id, args) --- ## Cache Strategy (Redis) Semantic key structure — allows per-workspace invalidation: ``` cache:query:events:{workspace_id}:{hash(params)} TTL 60s cache:query:funnel:{workspace_id}:{hash(params)} TTL 60s cache:query:retention:{workspace_id}:{hash(params)} TTL 60s cache:dashboard:{workspace_id} TTL 60s cache:profile:{workspace_id}:{profile_id} TTL 30s ``` Rules: - Default TTL: 60s for aggregate queries, 30s for profile lookups - TTL is configurable per query type via env vars - On cache miss: query ClickHouse, write result to Redis, return result - Never cache Custom SQL results — each query is arbitrary --- ## Custom SQL Sandbox `POST /query/sql` allows arbitrary SQL on ClickHouse. Two layers of protection: **Layer 1 — App-level parse (Go):** ```go // Reject anything that is not a SELECT statement func validateReadOnly(sql string) error { normalized := strings.TrimSpace(strings.ToUpper(sql)) if !strings.HasPrefix(normalized, "SELECT") { return apperr.BadRequest("only SELECT statements are allowed", "sql", nil) } // Reject common DDL/DML keywords forbidden := []string{"INSERT", "UPDATE", "DELETE", "DROP", "CREATE", "ALTER", "TRUNCATE"} for _, kw := range forbidden { if strings.Contains(normalized, kw) { return apperr.BadRequest("statement contains forbidden keyword: "+kw, "sql", nil) } } return nil } ``` **Layer 2 — ClickHouse read-only user:** - Custom SQL queries run as a separate ClickHouse user with `SELECT`-only grants - DDL/DML rejected at DB level even if app-level check is bypassed --- ## Testing Strategy ### Unit tests — handler + service layer - Mock interfaces with `testify/mock` - No real DB, no real Redis, no real ClickHouse - File: `foo_test.go` alongside the file being tested ### Integration tests — repo layer only - Use `testcontainers-go` to spin up real PostgreSQL, Redis, ClickHouse - File: `internal/repo/event_repo_test.go` - Tag: `//go:build integration` ```bash make test # unit only (fast, no containers) make test/integration # repo layer with real DBs (slower, CI) ``` --- ## Migration Workflow ```bash make migrate/new name=add_profile_traits # create up+down files make migrate/up # apply all pending make migrate/down # rollback one step make migrate/status # show current version ``` - Migration files: `infra/migrations/{version}_{name}.up.sql` + `.down.sql` - **Never** auto-run migrations on server startup - Every PostgreSQL schema change **must** have a migration file --- ## PostgreSQL Schema (Analytics-owned tables) ```sql -- Computed trait values per profile profile_traits ( profile_id UUID, trait_key TEXT, trait_value JSONB, computed_at TIMESTAMPTZ ) -- Segment membership history (used for delta Reverse ETL) segment_memberships ( segment_id UUID, profile_id UUID, entered_at TIMESTAMPTZ, exited_at TIMESTAMPTZ -- NULL = currently a member ) ``` --- ## Data Sources (Read-only) This service **only reads** data written by `cdp-ingestion`. Never write to these tables. | Source | Data | |--------|------| | ClickHouse `events` | Flattened, schema-managed raw events | | PostgreSQL `profiles` | Identity graph, unified profiles | | PostgreSQL `sources` / `destinations` | Config metadata | | PostgreSQL `schemas` | Schema registry from ingestion | --- ## Key Design Decisions | Problem | Decision | |---------|---------| | Job queue | `river` on PostgreSQL — no Temporal, no Celery | | Computed Traits refresh | Hourly default, configurable per trait | | Segment re-evaluate | Full re-evaluate — simpler than incremental | | Query cache | Redis semantic keys, TTL 60s default | | Custom SQL | App-level SELECT-only check + ClickHouse read-only user | | Reverse ETL | Delta only (entered/exited) — never push full member list | | ClickHouse queries | Raw SQL in `.sql` template files — no query builder | | Scaling | Vertical — increase RAM/CPU, not instances | | Migration | CLI only — never auto-migrate on startup | --- ## API Endpoints | Method | Path | Description | |--------|------|-------------| | `POST` | `/query/events` | Filter + query raw events | | `POST` | `/query/sql` | Custom SQL on ClickHouse (SELECT only) | | `POST` | `/query/funnel` | Funnel analysis | | `POST` | `/query/retention` | Retention cohort | | `GET` | `/profiles/:id` | Unified profile lookup | | `GET` | `/profiles/:id/events` | User event timeline | | `GET` | `/segments` | List segments | | `POST` | `/segments` | Create segment | | `GET` | `/segments/:id/members` | Segment members | | `GET` | `/traits/definitions` | List computed trait definitions | | `GET` | `/health` | Health check | | `GET` | `/ready` | Readiness check | Every endpoint must have a request struct with `validate` tags. Validation runs before any business logic. --- ## Feature Priorities | Priority | Features | |----------|---------| | **P0** | Event Explorer, Custom SQL, Profile Lookup, Event Timeline, Saved Queries | | **P1** | Funnel Analysis, Retention Analysis, Session Analysis, Pre-built Dashboards | | **P2** | Computed Traits, Audience Segments, Background Worker | | **P3** | Reverse ETL, Webhook Push, Schema Registry, Data Catalog | Build in priority order. Do not start P1 before P0 is stable. --- ## Logging Policy (zap) ``` Query requests → log workspace_id, query_type, duration_ms, rows_returned, cache_hit Worker jobs → log job_id, job_kind, args, duration_ms, status (success/error) Errors → log full error chain with context ``` --- ## Coding Rules - **Do not write code unless asked** — discuss architecture/features first - **Ask when scope is unclear**, especially when multiple valid approaches exist - **YAGNI + KISS** — do not build what is not needed yet - **Correctness before performance** — optimize only when profiling proves it necessary - **Every PostgreSQL schema change must have a migration file** in `infra/migrations/` - **Every ClickHouse query must have a `.sql` file** in `infra/clickhouse/` - **Every API endpoint must have a request struct with `validate` tags** - **Never write raw events** — this service is read-side only - Discuss in **Vietnamese**, write code and comments in **English** --- ## Common Pitfalls - Do not query ClickHouse directly for computed traits at request time — serve from PostgreSQL - Do not run full segment scans on every API request — that is the worker's job - Do not cache Custom SQL results — queries are arbitrary, cache would be useless - Do not inline complex SQL strings in Go — use `.sql` template files - Do not return raw `pgx` or `clickhouse-go` errors to HTTP handlers — wrap with `AppError` - Do not run migrations on server startup — use `make migrate/up` explicitly - Reverse ETL must push delta only (entered/exited), never the full member list per run - Workers must be idempotent — `river` retries on failure, job may run more than once - `service` layer must never import `net/http` or `chi`