renolation/english

Fork 0

Files

renolation 10d660cbcb init

2026-04-12 01:06:31 +07:00

4.4 KiB

Raw Blame History

Skill Creation Workflow

9-step process. Follow in order; skip only with clear justification.

Step 1: Capture Intent

Gather real usage patterns via AskUserQuestion tool:

"What tasks should this skill handle?"
"Give examples of how it would be used?"
"What phrases should trigger this skill?"
"What's the expected output format?"
"Should we create test cases?" (recommended for objective outputs)

Conclude when functionality scope is clear.

Step 2: Research

Activate /ck:docs-seeker and /ck:research skills. Research:

Best practices & industry standards
Existing CLI tools (npx, bunx, pipx) for reuse
Workflows & case studies
Edge cases & pitfalls

Use parallel WebFetch + Explore subagents for multiple URLs. Write reports for next step.

Step 3: Plan Reusable Contents

Analyze each example:

How to execute from scratch?
Prefer existing CLI tools over custom code
What scripts/references/assets enable repeated execution?
Check skills catalog — avoid duplication, reuse existing

Patterns:

Repeated code → scripts/ (Python/Node.js, with tests)
Repeated discovery → references/ (schemas, docs, APIs)
Repeated boilerplate → assets/ (templates, images)

Scripts MUST: respect .env hierarchy, have tests, pass all tests.

Step 4: Initialize

For new skills, run init script:

scripts/init_skill.py <skill-name> --path <output-directory>

Creates: SKILL.md template, scripts/, references/, assets/ with examples. Skip if skill already exists (go to Step 5).

Step 5: Write the Skill

5a: Implement Resources

Start with scripts/, references/, assets/ identified in Step 3. Delete unused example files from initialization. May require user input (brand assets, configs, etc.).

5b: Write SKILL.md

Writing style: Imperative/infinitive form. "To accomplish X, do Y." Size: Under 300 lines. Move details to references/.

Answer these in SKILL.md:

Purpose (2-3 sentences)
When to use (trigger conditions)
How to use (reference all bundled resources)

5c: Benchmark Optimization

MUST include for high Skillmark scores:

Scope declaration — "This skill handles X. Does NOT handle Y."
Security policy — Refusal instructions + leakage prevention
Structured workflows — Numbered steps covering all expected concepts
Explicit terminology — Standard terms matching concept-accuracy scorer
Reference linking — references/ files for detailed knowledge

See references/benchmark-optimization-guide.md for detailed patterns.

5d: Write Pushy Description

Description ≤1024 chars. Include specific trigger contexts:

description: Process CSV files and tabular data. Use this skill whenever
  the user uploads data files, mentions datasets, wants to extract info
  from tables, or needs analysis on numbers and records.

See references/metadata-quality-criteria.md for examples.

Step 6: Test & Evaluate

6a: Create Test Cases

Write evals/evals.json with 2-3 realistic test prompts + assertions. See references/eval-schemas.md for JSON format.

6b: Run Parallel Evals

Spawn with-skill AND baseline runs simultaneously (CRITICAL for timing). Draft assertions while runs execute.

6c: Grade & Aggregate

Grade outputs with grader agent (agents/grader.md)
Aggregate results: scripts/aggregate_benchmark.py
Launch viewer: eval-viewer/generate_review.py

6d: Human Review

Present viewer to user:

Outputs tab — qualitative review, feedback textbox
Benchmark tab — quantitative metrics

See references/eval-infrastructure-guide.md for details.

Step 7: Optimize Description

Combat undertriggering with automated optimization:

Single-pass: scripts/improve_description.py — one iteration
Iterative loop: scripts/run_loop.py — train/test split, convergence detection

Step 8: Package & Validate

scripts/package_skill.py <path/to/skill-folder>

Validates: frontmatter, naming, description, structure. Fix all errors, re-run until clean.

Step 9: Iterate

Read feedback.json from viewer
Generalize from feedback — don't overfit to test examples
Keep prompts lean — remove ineffective instructions
Update SKILL.md or resources
Re-test (return to Step 6)
Scale test set to 5-10 cases for production skills

Benchmark iteration: Run skillmark CLI, review per-concept accuracy, fix gaps.

4.4 KiB Raw Blame History