init
This commit is contained in:
141
.opencode/skills/ai-multimodal/references/minimax-generation.md
Normal file
141
.opencode/skills/ai-multimodal/references/minimax-generation.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# MiniMax Generation Reference
|
||||
|
||||
## Overview
|
||||
|
||||
MiniMax provides image, video (Hailuo), speech (TTS), and music generation APIs.
|
||||
Base URL: `https://api.minimax.io/v1` | Auth: `Bearer {MINIMAX_API_KEY}`
|
||||
|
||||
## Image Generation
|
||||
|
||||
**Endpoint**: `POST /image_generation`
|
||||
**Models**: `image-01` (standard), `image-01-live` (enhanced)
|
||||
**Rate**: 10 RPM | **Cost**: ~$0.03/image
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "image-01",
|
||||
"prompt": "A girl looking into the distance",
|
||||
"aspect_ratio": "16:9",
|
||||
"n": 2,
|
||||
"response_format": "url",
|
||||
"prompt_optimizer": true,
|
||||
"subject_reference": [{"type": "character", "image_file": "url", "weight": 0.8}]
|
||||
}
|
||||
```
|
||||
|
||||
**Aspect ratios**: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, 21:9
|
||||
**Custom dims**: 512-2048px (divisible by 8)
|
||||
**Batch**: 1-9 images per request
|
||||
|
||||
## Video Generation (Hailuo)
|
||||
|
||||
**Endpoints**: POST `/video_generation` → GET `/query/video_generation` → GET `/files/retrieve`
|
||||
**Async workflow**: Submit task → poll every 10s → download file (URL valid 9h)
|
||||
|
||||
### Models
|
||||
| Model | Features | Resolution |
|
||||
|-------|----------|-----------|
|
||||
| `MiniMax-Hailuo-2.3` | Text/image-to-video | 720p/1080p |
|
||||
| `MiniMax-Hailuo-2.3-Fast` | Same, 50% faster+cheaper | 720p/1080p |
|
||||
| `MiniMax-Hailuo-02` | First+last frame mode | 720p |
|
||||
| `S2V-01` | Subject reference | 720p |
|
||||
|
||||
**Rate**: 5 RPM | **Cost**: $0.25 (6s/768p), $0.52 (10s/768p)
|
||||
|
||||
```json
|
||||
// Text-to-video
|
||||
{"prompt": "A dancer", "model": "MiniMax-Hailuo-2.3", "duration": 6, "resolution": "1080P"}
|
||||
|
||||
// Image-to-video
|
||||
{"prompt": "Scene desc", "first_frame_image": "url", "model": "MiniMax-Hailuo-2.3", "duration": 6}
|
||||
|
||||
// First+last frame
|
||||
{"prompt": "Transition", "first_frame_image": "url", "last_frame_image": "url", "model": "MiniMax-Hailuo-02"}
|
||||
|
||||
// Subject reference
|
||||
{"prompt": "Scene with character", "subject_reference": [{"type": "character", "image": ["url"]}], "model": "S2V-01"}
|
||||
```
|
||||
|
||||
## Speech/TTS
|
||||
|
||||
**Endpoint**: `POST /speech/speech_t2a_input`
|
||||
**Models**: `speech-2.8-hd` (best), `speech-2.8-turbo` (fast), `speech-2.6-hd/turbo`, `speech-02-hd/turbo`
|
||||
**Rate**: 60 RPM | **Cost**: $30-50/1M chars
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "speech-2.8-hd",
|
||||
"text": "Your text here",
|
||||
"voice": "English_Warm_Bestie",
|
||||
"emotion": "happy",
|
||||
"rate": 1.0,
|
||||
"volume": 1.0,
|
||||
"pitch": 1.0,
|
||||
"output_format": "mp3"
|
||||
}
|
||||
```
|
||||
|
||||
**Voices**: 300+ system voices, 40+ languages
|
||||
**Emotions**: happy, sad, angry, fearful, disgusted, surprised, neutral
|
||||
**Formats**: mp3, wav, pcm, flac
|
||||
**Text limit**: 10,000 chars
|
||||
|
||||
### Voice Cloning
|
||||
```json
|
||||
POST /voice_clone
|
||||
{"audio_url": "https://sample.wav", "clone_name": "my_voice"}
|
||||
```
|
||||
Requires 10+ seconds of reference audio. Rate: 60 RPM.
|
||||
|
||||
## Music Generation
|
||||
|
||||
**Endpoint**: `POST /music_generation`
|
||||
**Models**: `music-2.5` (latest, vocals+accompaniment, 4min songs)
|
||||
**Rate**: 120 RPM | **Cost**: $0.03-0.075/generation
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "music-2.5",
|
||||
"lyrics": "Verse 1\nLine one\n\n[Chorus]\nChorus line",
|
||||
"prompt": "Upbeat pop with electronic elements",
|
||||
"output_format": "url",
|
||||
"audio_setting": {"sample_rate": 44100, "bitrate": 128000, "format": "mp3"}
|
||||
}
|
||||
```
|
||||
|
||||
**Lyrics**: 1-3500 chars, supports structure tags ([Verse], [Chorus], etc.)
|
||||
**Prompt**: 0-2000 chars, style/mood description
|
||||
**Sample rates**: 16000, 24000, 32000, 44100 Hz
|
||||
**Bitrates**: 32000, 64000, 128000, 256000 bps
|
||||
|
||||
## Error Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success |
|
||||
| 1002 | Rate limit exceeded |
|
||||
| 1008 | Insufficient balance |
|
||||
| 2013 | Invalid parameters |
|
||||
|
||||
## CLI Examples
|
||||
|
||||
```bash
|
||||
# Image
|
||||
python minimax_cli.py --task generate --prompt "A cyberpunk city" --model image-01 --aspect-ratio 16:9
|
||||
|
||||
# Video
|
||||
python minimax_cli.py --task generate-video --prompt "A dancer" --model MiniMax-Hailuo-2.3 --duration 6
|
||||
|
||||
# Speech
|
||||
python minimax_cli.py --task generate-speech --text "Hello world" --model speech-2.8-hd --voice English_Warm_Bestie --emotion happy
|
||||
|
||||
# Music
|
||||
python minimax_cli.py --task generate-music --lyrics "La la la\nOh yeah" --prompt "upbeat pop" --model music-2.5
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [API Overview](https://platform.minimax.io/docs/api-reference/api-overview)
|
||||
- [Video Guide](https://platform.minimax.io/docs/guides/video-generation)
|
||||
- [Speech API](https://platform.minimax.io/docs/api-reference/speech-t2a-intro)
|
||||
- [Music API](https://platform.minimax.io/docs/api-reference/music-generation)
|
||||
Reference in New Issue
Block a user