Files

renolation 4e8c11d545 init ingestion

2026-05-24 22:59:24 +07:00

14 KiB

Raw Blame History

CLAUDE.md — CDP Analytics Service

You are a senior software engineer building the Analytics & Data Layer for a self-hosted CDP platform. This service focuses on query, explore, and activate data already ingested into ClickHouse.

Scope boundary: Read-side only. Never write raw events. Ingestion is handled by cdp-ingestion.

What This Service Does

Exposes ingested event data via Query API for exploration and analysis. Computes Traits and Audience Segments from event history via background workers. Activates segments to external tools via Reverse ETL and webhooks.

Repository Layout

cdp-analytics/
├── api/          # Go — Query API, Profile API                   (port 4000)
├── workers/      # Go — Background jobs: Computed Traits, Segment refresh
├── console/      # React + Vite + shadcn/ui + Tailwind — Analytics UI
└── infra/
    ├── migrations/   # PostgreSQL migrations (golang-migrate)
    └── clickhouse/   # ClickHouse query templates (.sql files)

Tech Stack

Go Services (api, workers)

Concern	Library	Notes
HTTP router	`chi`	Lightweight, stdlib-compatible middleware
Logger	`zap`	Structured logging, fastest
PostgreSQL	`pgx/v5`	Native driver, no database/sql wrapper
ClickHouse	`clickhouse-go/v2`	Official driver, native protocol, good batch support
Redis	`rueidis`	Modern client, faster than go-redis
Job queue	`riverqueue/river`	Postgres-backed, pgx/v5 native, built-in scheduler + retry
Config	`caarlos0/env`	Parse env vars into structs, zero deps
Validation	`go-playground/validator/v10`	Struct tags validation
Migration	`golang-migrate` + pgx driver	CLI only — never auto-migrate on startup
Test assertion	`testify`	assert + require + mock
Integration test	`testcontainers-go`	Real PG / Redis / ClickHouse in tests

React Console (console/)

Concern	Library
Build	Vite
UI components	shadcn/ui + Tailwind
Routing	React Router v6
Server state	TanStack Query
Client state	Zustand
Forms	react-hook-form + zod
Charts	Recharts
Icons	lucide-react

No new technology without discussion. All additions must justify why existing stack cannot handle it.

Go Project Structure

api/

api/
├── cmd/
│   └── server/
│       └── main.go        # wire everything, start server
└── internal/
    ├── handler/            # HTTP handlers — parse request, call service, write response
    ├── service/            # business logic — no HTTP, no DB concerns
    ├── repo/               # DB queries — PostgreSQL via pgx, ClickHouse via clickhouse-go
    ├── middleware/         # auth, request ID, logging
    └── config/             # env parsing via caarlos0/env

workers/

workers/
├── cmd/
│   └── worker/
│       └── main.go        # register jobs, start river worker
└── internal/
    ├── job/                # job definitions (ComputeTraitsJob, RefreshSegmentJob, ReverseETLJob)
    ├── handler/            # job handlers — business logic per job type
    ├── repo/               # DB queries shared across job handlers
    └── config/

Rules:

handler depends on service (api) or handler on repo (workers). Never reverse.
handler never touches DB directly in api/.
service never imports chi or any HTTP package.
repo returns domain types, never raw pgx.Rows or driver.Rows.
ClickHouse queries live as .sql files in infra/clickhouse/ — no inline SQL strings for complex queries.

Error Handling

Same AppError pattern as ingestion. Never return raw pgx or clickhouse-go errors to handlers.

// internal/apperr/apperr.go

type AppError struct {
    Code    int    // HTTP status code to return
    Message string // user-facing message (safe to expose)
    Field   string // optional: which field caused the error
    Err     error  // original error for logging (not exposed to user)
}

func (e *AppError) Error() string { return e.Message }
func (e *AppError) Unwrap() error { return e.Err }

// Constructors
func BadRequest(msg, field string, err error) *AppError
func NotFound(msg string) *AppError
func Forbidden(msg string) *AppError
func Internal(err error) *AppError

Handler pattern — one place handles all errors:

func writeError(w http.ResponseWriter, err error) {
    var appErr *apperr.AppError
    if errors.As(err, &appErr) {
        render.JSON(w, appErr.Code, ErrorResponse{Error: appErr.Message, Field: appErr.Field})
        return
    }
    render.JSON(w, 500, ErrorResponse{Error: "internal server error"})
}

ClickHouse Query Pattern

Use raw SQL only. No query builder — ClickHouse SQL has its own syntax that builders handle poorly.

infra/clickhouse/
├── event_explorer.sql
├── funnel_analysis.sql
├── retention_cohort.sql
└── session_analysis.sql

Load templates at startup, inject parameters safely:

// Never fmt.Sprintf into SQL — use named parameters
query, err := templates.Load("funnel_analysis.sql")
rows, err := chConn.Query(ctx, query, clickhouse.Named("workspace_id", id), ...)

Rules:

All ClickHouse queries must have a corresponding .sql file in infra/clickhouse/
No multi-line SQL strings inline in Go code
Every ClickHouse schema change must have a DDL file in infra/clickhouse/

Job Queue (river)

Background workers use riverqueue/river backed by PostgreSQL.

// Define a job
type ComputeTraitsArgs struct {
    WorkspaceID string `json:"workspace_id"`
    TraitID     string `json:"trait_id"`
}
func (ComputeTraitsArgs) Kind() string { return "compute_traits" }

// Register handler
river.AddWorker(workers, &ComputeTraitsWorker{repo: repo})

// Enqueue
client.Insert(ctx, ComputeTraitsArgs{WorkspaceID: "ws_123", TraitID: "t_456"}, nil)

Scheduled jobs (periodic):

// Hourly trait recompute, hourly segment refresh
&river.PeriodicJob{
    ScheduleFunc: river.ScheduleFunc(func(t time.Time) time.Time {
        return t.Add(time.Hour)
    }),
    ConstructorFunc: func() (river.JobArgs, *river.InsertOpts) {
        return ComputeTraitsArgs{}, nil
    },
}

Rules:

Workers must be idempotent — river may retry on failure
Use river's built-in retry with exponential backoff, do not implement custom retry
Log job start, job end, duration, and error with full context (job_id, args)

Cache Strategy (Redis)

Semantic key structure — allows per-workspace invalidation:

cache:query:events:{workspace_id}:{hash(params)}     TTL 60s
cache:query:funnel:{workspace_id}:{hash(params)}     TTL 60s
cache:query:retention:{workspace_id}:{hash(params)}  TTL 60s
cache:dashboard:{workspace_id}                       TTL 60s
cache:profile:{workspace_id}:{profile_id}            TTL 30s

Rules:

Default TTL: 60s for aggregate queries, 30s for profile lookups
TTL is configurable per query type via env vars
On cache miss: query ClickHouse, write result to Redis, return result
Never cache Custom SQL results — each query is arbitrary

Custom SQL Sandbox

POST /query/sql allows arbitrary SQL on ClickHouse. Two layers of protection:

Layer 1 — App-level parse (Go):

// Reject anything that is not a SELECT statement
func validateReadOnly(sql string) error {
    normalized := strings.TrimSpace(strings.ToUpper(sql))
    if !strings.HasPrefix(normalized, "SELECT") {
        return apperr.BadRequest("only SELECT statements are allowed", "sql", nil)
    }
    // Reject common DDL/DML keywords
    forbidden := []string{"INSERT", "UPDATE", "DELETE", "DROP", "CREATE", "ALTER", "TRUNCATE"}
    for _, kw := range forbidden {
        if strings.Contains(normalized, kw) {
            return apperr.BadRequest("statement contains forbidden keyword: "+kw, "sql", nil)
        }
    }
    return nil
}

Layer 2 — ClickHouse read-only user:

Custom SQL queries run as a separate ClickHouse user with SELECT-only grants
DDL/DML rejected at DB level even if app-level check is bypassed

Testing Strategy

Unit tests — handler + service layer

Mock interfaces with testify/mock
No real DB, no real Redis, no real ClickHouse
File: foo_test.go alongside the file being tested

Integration tests — repo layer only

Use testcontainers-go to spin up real PostgreSQL, Redis, ClickHouse
File: internal/repo/event_repo_test.go
Tag: //go:build integration

make test              # unit only (fast, no containers)
make test/integration  # repo layer with real DBs (slower, CI)

Migration Workflow

make migrate/new name=add_profile_traits   # create up+down files
make migrate/up                            # apply all pending
make migrate/down                          # rollback one step
make migrate/status                        # show current version

Migration files: infra/migrations/{version}_{name}.up.sql + .down.sql
Never auto-run migrations on server startup
Every PostgreSQL schema change must have a migration file

PostgreSQL Schema (Analytics-owned tables)

-- Computed trait values per profile
profile_traits (
    profile_id   UUID,
    trait_key    TEXT,
    trait_value  JSONB,
    computed_at  TIMESTAMPTZ
)

-- Segment membership history (used for delta Reverse ETL)
segment_memberships (
    segment_id   UUID,
    profile_id   UUID,
    entered_at   TIMESTAMPTZ,
    exited_at    TIMESTAMPTZ   -- NULL = currently a member
)

Data Sources (Read-only)

This service only reads data written by cdp-ingestion. Never write to these tables.

Source	Data
ClickHouse `events`	Flattened, schema-managed raw events
PostgreSQL `profiles`	Identity graph, unified profiles
PostgreSQL `sources` / `destinations`	Config metadata
PostgreSQL `schemas`	Schema registry from ingestion

Key Design Decisions

Problem	Decision
Job queue	`river` on PostgreSQL — no Temporal, no Celery
Computed Traits refresh	Hourly default, configurable per trait
Segment re-evaluate	Full re-evaluate — simpler than incremental
Query cache	Redis semantic keys, TTL 60s default
Custom SQL	App-level SELECT-only check + ClickHouse read-only user
Reverse ETL	Delta only (entered/exited) — never push full member list
ClickHouse queries	Raw SQL in `.sql` template files — no query builder
Scaling	Vertical — increase RAM/CPU, not instances
Migration	CLI only — never auto-migrate on startup

API Endpoints

Method	Path	Description
`POST`	`/query/events`	Filter + query raw events
`POST`	`/query/sql`	Custom SQL on ClickHouse (SELECT only)
`POST`	`/query/funnel`	Funnel analysis
`POST`	`/query/retention`	Retention cohort
`GET`	`/profiles/:id`	Unified profile lookup
`GET`	`/profiles/:id/events`	User event timeline
`GET`	`/segments`	List segments
`POST`	`/segments`	Create segment
`GET`	`/segments/:id/members`	Segment members
`GET`	`/traits/definitions`	List computed trait definitions
`GET`	`/health`	Health check
`GET`	`/ready`	Readiness check

Every endpoint must have a request struct with validate tags. Validation runs before any business logic.

Feature Priorities

Priority	Features
P0	Event Explorer, Custom SQL, Profile Lookup, Event Timeline, Saved Queries
P1	Funnel Analysis, Retention Analysis, Session Analysis, Pre-built Dashboards
P2	Computed Traits, Audience Segments, Background Worker
P3	Reverse ETL, Webhook Push, Schema Registry, Data Catalog

Build in priority order. Do not start P1 before P0 is stable.

Logging Policy (zap)

Query requests  → log workspace_id, query_type, duration_ms, rows_returned, cache_hit
Worker jobs     → log job_id, job_kind, args, duration_ms, status (success/error)
Errors          → log full error chain with context

Coding Rules

Do not write code unless asked — discuss architecture/features first
Ask when scope is unclear, especially when multiple valid approaches exist
YAGNI + KISS — do not build what is not needed yet
Correctness before performance — optimize only when profiling proves it necessary
Every PostgreSQL schema change must have a migration file in infra/migrations/
Every ClickHouse query must have a .sql file in infra/clickhouse/
Every API endpoint must have a request struct with validate tags
Never write raw events — this service is read-side only
Discuss in Vietnamese, write code and comments in English

Common Pitfalls

Do not query ClickHouse directly for computed traits at request time — serve from PostgreSQL
Do not run full segment scans on every API request — that is the worker's job
Do not cache Custom SQL results — queries are arbitrary, cache would be useless
Do not inline complex SQL strings in Go — use .sql template files
Do not return raw pgx or clickhouse-go errors to HTTP handlers — wrap with AppError
Do not run migrations on server startup — use make migrate/up explicitly
Reverse ETL must push delta only (entered/exited), never the full member list per run
Workers must be idempotent — river retries on failure, job may run more than once
service layer must never import net/http or chi

14 KiB Raw Blame History

CLAUDE.md — CDP Analytics Service

What This Service Does

Repository Layout

Tech Stack

Go Services (api, workers)

React Console (console/)

Go Project Structure

api/

workers/

Error Handling

ClickHouse Query Pattern

Job Queue (river)

Cache Strategy (Redis)

Custom SQL Sandbox

Testing Strategy

Unit tests — handler + service layer

Integration tests — repo layer only

Migration Workflow

PostgreSQL Schema (Analytics-owned tables)

Data Sources (Read-only)

Key Design Decisions

API Endpoints

Feature Priorities

Logging Policy (zap)

Coding Rules

Common Pitfalls

14 KiB

Raw Blame History