data layer

This commit is contained in:
2026-05-25 08:38:26 +07:00
parent 4e8c11d545
commit a428170fef
81 changed files with 3941 additions and 0 deletions

91
data-layer/README.md Normal file
View File

@@ -0,0 +1,91 @@
# CDP Analytics (data-layer)
Read-side of the self-hosted CDP platform. Queries events written by
`cdp-ingestion`, computes traits and segments, and activates segments
to external tools.
## Services
| Service | Lang | Port | Role |
|-----------|-------------|------|------|
| `api` | Go | 4000 | Query API, Profile API, Custom SQL sandbox |
| `workers` | Go (river) | 4001 | Computed Traits, Segment refresh, Reverse ETL |
| `console` | React + Vite| 4002 | Analytics UI |
## Quick start
Shared infra (Postgres / Redis / ClickHouse) is brought up by the ingestion
repo. Start it there first:
```bash
cd ../ingestion && make up
```
Then in this directory:
```bash
make migrate/up # apply analytics PostgreSQL migrations
make clickhouse/up # apply analytics ClickHouse DDL (if any)
# First time only:
(cd api && go mod tidy)
(cd workers && go mod tidy)
(cd console && npm install)
make run/api # start API on :4000
make run/workers # start worker on :4001
make run/console # start console on :4002
```
## Endpoints (shipped)
All endpoints below require an `X-Workspace-Id` header (UUID). Workspace
membership / auth is a TODO; the header is the only authority for now.
| Method | Path | Priority | Description |
|--------|-------------------------------|----------|-------------|
| GET | `/health` | - | Liveness |
| GET | `/ready` | - | Readiness |
| POST | `/query/events` | P0 | Filter raw events on one of `events_track/identify/page/group` |
| POST | `/query/sql` | P0 | Arbitrary `SELECT` on ClickHouse (read-only user) |
| GET | `/profiles/{id}` | P0 | Unified profile lookup |
| GET | `/profiles/{id}/events` | P0 | Merged event timeline for the profile's `user_id` |
| POST | `/queries` | P0 | Create saved query |
| GET | `/queries` | P0 | List saved queries |
| GET | `/queries/{id}` | P0 | Get saved query |
| PUT | `/queries/{id}` | P0 | Update saved query |
| DELETE | `/queries/{id}` | P0 | Delete saved query |
| POST | `/query/funnel` | P1 | Windowed funnel via ClickHouse `windowFunnel()` |
| POST | `/query/retention` | P1 | Cohort retention via ClickHouse `retention()` |
| POST | `/query/session` | P1 | Session bucketing with inactivity timeout |
Cache: 60s default for query results, 30s for profile lookups. Per-query
TTLs configurable via `ANALYTICS_CACHE_TTL_*_SECONDS`. Custom SQL is never
cached.
## Console pages (shipped)
- **Explore** — wired to `/query/events`
- **Custom SQL** — wired to `/query/sql`
- Profiles / Funnels / Retention / Segments / Traits — placeholders
## Testing
```bash
make test # unit tests (no containers)
make test/integration # repo-layer integration tests (testcontainers)
```
## Caveats
- The `profiles` table is **read-only contract from cdp-ingestion**; it does
not exist yet in the ingestion migrations. `repo/profile_repo.go` assumes
`profiles(id, workspace_id, user_id, anonymous_ids, traits, first_seen_at,
last_seen_at)` — align before shipping.
- `/query/sql` ideally runs against a `analytics_ro` ClickHouse user with
`SELECT`-only grants. If that account does not exist the server falls back
to the main connection and logs a warning — fix before production.
- Auth: every request must supply `X-Workspace-Id`. Wire the console's
workspace store to a real session/JWT once the auth scheme is decided.
See [CLAUDE_analytics.md](./CLAUDE_analytics.md) for the full design contract.