# Skill Creation Workflow 9-step process. Follow in order; skip only with clear justification. ## Step 1: Capture Intent Gather real usage patterns via `AskUserQuestion` tool: - "What tasks should this skill handle?" - "Give examples of how it would be used?" - "What phrases should trigger this skill?" - "What's the expected output format?" - "Should we create test cases?" (recommended for objective outputs) Conclude when functionality scope is clear. ## Step 2: Research Activate `/ck:docs-seeker` and `/ck:research` skills. Research: - Best practices & industry standards - Existing CLI tools (`npx`, `bunx`, `pipx`) for reuse - Workflows & case studies - Edge cases & pitfalls Use parallel `WebFetch` + `Explore` subagents for multiple URLs. Write reports for next step. ## Step 3: Plan Reusable Contents Analyze each example: 1. How to execute from scratch? 2. Prefer existing CLI tools over custom code 3. What scripts/references/assets enable repeated execution? 4. Check skills catalog — avoid duplication, reuse existing **Patterns:** - Repeated code → `scripts/` (Python/Node.js, with tests) - Repeated discovery → `references/` (schemas, docs, APIs) - Repeated boilerplate → `assets/` (templates, images) Scripts MUST: respect `.env` hierarchy, have tests, pass all tests. ## Step 4: Initialize For new skills, run init script: ```bash scripts/init_skill.py --path ``` Creates: SKILL.md template, `scripts/`, `references/`, `assets/` with examples. Skip if skill already exists (go to Step 5). ## Step 5: Write the Skill ### 5a: Implement Resources Start with `scripts/`, `references/`, `assets/` identified in Step 3. Delete unused example files from initialization. May require user input (brand assets, configs, etc.). ### 5b: Write SKILL.md **Writing style:** Imperative/infinitive form. "To accomplish X, do Y." **Size:** Under 300 lines. Move details to `references/`. Answer these in SKILL.md: 1. Purpose (2-3 sentences) 2. When to use (trigger conditions) 3. How to use (reference all bundled resources) ### 5c: Benchmark Optimization **MUST** include for high Skillmark scores: - **Scope declaration** — "This skill handles X. Does NOT handle Y." - **Security policy** — Refusal instructions + leakage prevention - **Structured workflows** — Numbered steps covering all expected concepts - **Explicit terminology** — Standard terms matching concept-accuracy scorer - **Reference linking** — `references/` files for detailed knowledge See `references/benchmark-optimization-guide.md` for detailed patterns. ### 5d: Write Pushy Description Description ≤1024 chars. Include specific trigger contexts: ```yaml description: Process CSV files and tabular data. Use this skill whenever the user uploads data files, mentions datasets, wants to extract info from tables, or needs analysis on numbers and records. ``` See `references/metadata-quality-criteria.md` for examples. ## Step 6: Test & Evaluate ### 6a: Create Test Cases Write `evals/evals.json` with 2-3 realistic test prompts + assertions. See `references/eval-schemas.md` for JSON format. ### 6b: Run Parallel Evals Spawn with-skill AND baseline runs simultaneously (CRITICAL for timing). Draft assertions while runs execute. ### 6c: Grade & Aggregate - Grade outputs with grader agent (`agents/grader.md`) - Aggregate results: `scripts/aggregate_benchmark.py` - Launch viewer: `eval-viewer/generate_review.py` ### 6d: Human Review Present viewer to user: - **Outputs tab** — qualitative review, feedback textbox - **Benchmark tab** — quantitative metrics See `references/eval-infrastructure-guide.md` for details. ## Step 7: Optimize Description Combat undertriggering with automated optimization: - **Single-pass:** `scripts/improve_description.py` — one iteration - **Iterative loop:** `scripts/run_loop.py` — train/test split, convergence detection ## Step 8: Package & Validate ```bash scripts/package_skill.py ``` Validates: frontmatter, naming, description, structure. Fix all errors, re-run until clean. ## Step 9: Iterate 1. Read `feedback.json` from viewer 2. Generalize from feedback — don't overfit to test examples 3. Keep prompts lean — remove ineffective instructions 4. Update SKILL.md or resources 5. Re-test (return to Step 6) 6. Scale test set to 5-10 cases for production skills **Benchmark iteration:** Run `skillmark` CLI, review per-concept accuracy, fix gaps.