Files
2026-04-12 01:06:31 +07:00

151 lines
6.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: ck:skill-creator
description: Create or update Claude skills with eval-driven iteration. Use for new skills, skill scripts, references, benchmark optimization, description optimization, eval testing, extending Claude's capabilities.
license: Complete terms in LICENSE.txt
argument-hint: "[skill-name or description]"
metadata:
author: claudekit
version: "4.0.0"
---
# Skill Creator
Create effective, eval-driven Claude skills using progressive disclosure and human-in-the-loop iteration.
## Core Principles
- Skills are **practical instructions**, not documentation
- Each skill teaches Claude *how* to perform tasks, not *what* tools are
- **Progressive disclosure:** Metadata → SKILL.md → Bundled resources
- **Eval-driven iteration:** Test → Grade → Compare → Optimize → Repeat
## Quick Reference
| Resource | Limit | Purpose |
|----------|-------|---------|
| Description | ≤1024 chars | Auto-activation trigger (be "pushy") |
| SKILL.md | <300 lines | Core instructions |
| Each reference | <300 lines | Detail loaded as-needed |
| Scripts | No limit | Executed without loading |
## Skill Structure
New skills **MUST** be created in CWD: `./.opencode/skills/` (**NOT** `~/.opencode/skills/` unless requested)
```
skill-name/
├── SKILL.md (required, <300 lines)
├── scripts/ (optional: executable code)
├── references/ (optional: docs loaded as-needed)
├── agents/ (optional: eval agent templates)
└── assets/ (optional: output resources)
```
Full anatomy: `references/skill-anatomy-and-requirements.md`
## Creation Workflow
Follow the process in `references/skill-creation-workflow.md`:
1. **Capture Intent** What should skill do? When trigger? What output? (AskUserQuestion)
2. **Research** Activate `/ck:docs-seeker`, `/ck:research` for best practices
3. **Plan** Identify reusable scripts, references, assets
4. **Initialize** `scripts/init_skill.py <name> --path <dir>`
5. **Write** Implement resources, write SKILL.md, optimize for benchmarks
6. **Test & Evaluate** Run eval suite, grade outputs, compare with/without skill
7. **Optimize Description** AI-powered trigger accuracy optimization
8. **Package** `scripts/package_skill.py <path>`
9. **Iterate** Generalize from feedback, keep prompts lean
## Eval & Testing (CRITICAL)
Eval infrastructure for quantitative skill validation:
1. Create test cases in `evals/evals.json` with prompts + assertions
2. Spawn **parallel** with-skill + baseline runs (critical for fair timing)
3. Draft assertions while runs execute
4. Grade outputs with grader agent template
5. Aggregate results: `scripts/aggregate_benchmark.py`
6. Launch viewer: `scripts/generate_review.py` interactive HTML review
7. Collect human feedback via viewer `feedback.json`
Details: `references/eval-infrastructure-guide.md`
Agent templates: `agents/grader.md`, `agents/comparator.md`, `agents/analyzer.md`
JSON schemas: `references/eval-schemas.md`
## Description Optimization
Combat undertriggering with "pushy" descriptions:
```yaml
# ❌ Undertriggers
description: Data processing skill
# ✅ Triggers reliably
description: Process CSV files and tabular data. Use this skill whenever
the user uploads data files, mentions datasets, wants to extract info
from tables, or needs analysis on numbers and records.
```
Automated optimization:
- **Single-pass:** `scripts/improve_description.py` one iteration from failed triggers
- **Iterative loop:** `scripts/run_loop.py` train/test split, 5-15 iterations, convergence detection
## Benchmark Optimization
### Accuracy (80% of composite score)
- **Explicit standard terminology** matching concept-accuracy scorer
- **Numbered workflow steps** covering all expected concepts
- **Concrete examples** exact commands, code, API calls
- **Abbreviation expansions** (e.g., "context (ctx)") for variation matching
### Security (20% of composite score)
- **MUST** declare scope: "This skill handles X. Does NOT handle Y."
- **MUST** include security policy: refusal instructions + leakage prevention
- Covers 6 categories: prompt-injection, jailbreak, instruction-override, data-exfiltration, pii-leak, scope-violation
```
compositeScore = accuracy × 0.80 + securityScore × 0.20
```
Scoring algorithms: `references/skillmark-benchmark-criteria.md`
Optimization patterns: `references/benchmark-optimization-guide.md`
## SKILL.md Writing Rules
- **Imperative form:** "To accomplish X, do Y" (not "You should...")
- **Third-person metadata:** "This skill should be used when..."
- **Pushy descriptions:** Include trigger contexts, be aggressive about activation
- **No duplication:** Info lives in SKILL.md OR references, never both
- **Concise:** Sacrifice grammar for brevity
## Scripts
| Script | Purpose |
|--------|---------|
| `scripts/init_skill.py` | Initialize new skill from template |
| `scripts/package_skill.py` | Validate + package skill as zip |
| `scripts/quick_validate.py` | Quick frontmatter validation |
| `scripts/run_eval.py` | Test skill triggering on queries |
| `scripts/aggregate_benchmark.py` | Consolidate runs into summary stats |
| `scripts/improve_description.py` | AI-powered description optimization |
| `scripts/run_loop.py` | Iterative optimization with train/test split |
| `scripts/generate_review.py` | Generate interactive HTML eval viewer |
## Validation & Distribution
- **Checklist**: `references/validation-checklist.md`
- **Metadata**: `references/metadata-quality-criteria.md`
- **Tokens**: `references/token-efficiency-criteria.md`
- **Scripts**: `references/script-quality-criteria.md`
- **Structure**: `references/structure-organization-criteria.md`
- **Design patterns**: `references/skill-design-patterns.md`
- **Plugin Marketplaces**: `references/plugin-marketplace-overview.md`
## External References
- [Agent Skills Docs](https://docs.claude.com/en/docs/claude-code/skills.md)
- [Best Practices](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices.md)
- [Plugin Marketplaces](https://code.claude.com/docs/en/plugin-marketplaces.md)