init

2026-04-12 01:06:31 +07:00
commit 10d660cbcb
1066 changed files with 228596 additions and 0 deletions
--- a/.opencode/skills/ai-multimodal/references/music-generation.md
+++ b/.opencode/skills/ai-multimodal/references/music-generation.md
@@ -0,0 +1,311 @@
+# Music Generation Reference
+
+Real-time music generation using Lyria RealTime via WebSocket API.
+
+## Core Capabilities
+
+- **Real-time streaming**: Bidirectional WebSocket for continuous generation
+- **Dynamic control**: Modify music in real-time during generation
+- **Style steering**: Genre, mood, instrumentation guidance
+- **Audio output**: 48kHz stereo 16-bit PCM
+
+## Model
+
+**Lyria RealTime** (Experimental)
+- WebSocket-based streaming
+- Real-time parameter adjustment
+- Instrumental only (no vocals)
+- Watermarked output
+
+## Quick Start
+
+### Python
+
+```python
+from google import genai
+import asyncio
+
+client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
+
+async def generate_music():
+    async with client.aio.live.music.connect() as session:
+        # Set style prompts with weights (0.0-1.0)
+        await session.set_weighted_prompts([
+            {"prompt": "Upbeat corporate background music", "weight": 0.8},
+            {"prompt": "Modern electronic elements", "weight": 0.5}
+        ])
+
+        # Configure generation parameters
+        await session.set_music_generation_config(
+            guidance=4.0,     # Prompt adherence (0.0-6.0)
+            bpm=120,          # Tempo (60-200)
+            density=0.6,      # Note density (0.0-1.0)
+            brightness=0.5    # Tonal quality (0.0-1.0)
+        )
+
+        # Start playback and collect audio
+        await session.play()
+
+        audio_chunks = []
+        async for chunk in session:
+            audio_chunks.append(chunk.audio_data)
+
+        return b''.join(audio_chunks)
+```
+
+### JavaScript
+
+```javascript
+const client = new GenaiClient({ apiKey: process.env.GEMINI_API_KEY });
+
+async function generateMusic() {
+    const session = await client.live.music.connect();
+
+    await session.setWeightedPrompts([
+        { prompt: "Calm ambient background", weight: 0.9 },
+        { prompt: "Nature sounds influence", weight: 0.3 }
+    ]);
+
+    await session.setMusicGenerationConfig({
+        guidance: 3.5,
+        bpm: 80,
+        density: 0.4,
+        brightness: 0.6
+    });
+
+    session.onAudio((audioChunk) => {
+        // Process 48kHz stereo PCM audio
+        audioBuffer.push(audioChunk);
+    });
+
+    await session.play();
+}
+```
+
+## Configuration Parameters
+
+| Parameter | Range | Default | Description |
+|-----------|-------|---------|-------------|
+| `guidance` | 0.0-6.0 | 4.0 | Prompt adherence (higher = stricter) |
+| `bpm` | 60-200 | 120 | Tempo in beats per minute |
+| `density` | 0.0-1.0 | 0.5 | Note/sound density |
+| `brightness` | 0.0-1.0 | 0.5 | Tonal quality (higher = brighter) |
+| `scale` | 12 keys | C Major | Musical key |
+| `mute_bass` | bool | false | Remove bass elements |
+| `mute_drums` | bool | false | Remove drum elements |
+| `mode` | enum | QUALITY | QUALITY, DIVERSITY, VOCALIZATION |
+| `temperature` | 0.0-2.0 | 1.0 | Sampling randomness |
+| `top_k` | int | 40 | Sampling top-k |
+| `seed` | int | random | Reproducibility seed |
+
+## Weighted Prompts
+
+Control generation direction with weighted prompts:
+
+```python
+await session.set_weighted_prompts([
+    {"prompt": "Main style description", "weight": 1.0},    # Primary
+    {"prompt": "Secondary influence", "weight": 0.5},       # Supporting
+    {"prompt": "Subtle element", "weight": 0.2}             # Accent
+])
+```
+
+**Weight guidelines**:
+- 0.8-1.0: Dominant influence
+- 0.5-0.7: Secondary contribution
+- 0.2-0.4: Subtle accent
+- 0.0-0.1: Minimal effect
+
+## Style Prompts by Use Case
+
+### Corporate/Marketing
+
+```python
+prompts = [
+    {"prompt": "Professional corporate background music, modern", "weight": 0.9},
+    {"prompt": "Uplifting, optimistic mood", "weight": 0.6},
+    {"prompt": "Clean production, minimal complexity", "weight": 0.5}
+]
+config = {"bpm": 100, "brightness": 0.6, "density": 0.5}
+```
+
+### Social Media/Short-form
+
+```python
+prompts = [
+    {"prompt": "Trending pop electronic beat", "weight": 0.9},
+    {"prompt": "Energetic, catchy rhythm", "weight": 0.7},
+    {"prompt": "Bass-heavy, punchy", "weight": 0.5}
+]
+config = {"bpm": 128, "brightness": 0.7, "density": 0.7}
+```
+
+### Emotional/Cinematic
+
+```python
+prompts = [
+    {"prompt": "Cinematic orchestral underscore", "weight": 0.9},
+    {"prompt": "Emotional, inspiring", "weight": 0.7},
+    {"prompt": "Building tension and release", "weight": 0.5}
+]
+config = {"bpm": 70, "brightness": 0.4, "density": 0.4}
+```
+
+### Ambient/Background
+
+```python
+prompts = [
+    {"prompt": "Calm ambient soundscape", "weight": 0.9},
+    {"prompt": "Minimal, atmospheric", "weight": 0.6},
+    {"prompt": "Lo-fi textures", "weight": 0.4}
+]
+config = {"bpm": 80, "brightness": 0.4, "density": 0.3}
+```
+
+## Real-time Transitions
+
+Smoothly transition between styles during generation:
+
+```python
+async def dynamic_music_generation():
+    async with client.aio.live.music.connect() as session:
+        # Start with intro style
+        await session.set_weighted_prompts([
+            {"prompt": "Soft ambient intro", "weight": 0.9}
+        ])
+        await session.play()
+
+        # Collect intro (4 seconds)
+        intro_chunks = []
+        for _ in range(192):  # ~4 seconds at 48kHz
+            chunk = await session.__anext__()
+            intro_chunks.append(chunk.audio_data)
+
+        # Transition to main section
+        await session.set_weighted_prompts([
+            {"prompt": "Building energy", "weight": 0.7},
+            {"prompt": "Full beat drop", "weight": 0.5}
+        ])
+
+        # Continue with new style...
+```
+
+## Output Specifications
+
+- **Format**: Raw 16-bit PCM
+- **Sample Rate**: 48,000 Hz
+- **Channels**: 2 (stereo)
+- **Bit Depth**: 16 bits
+- **Watermarking**: Always enabled (SynthID)
+
+### Save to WAV
+
+```python
+import wave
+
+def save_pcm_to_wav(pcm_data, filename):
+    with wave.open(filename, 'wb') as wav_file:
+        wav_file.setnchannels(2)        # Stereo
+        wav_file.setsampwidth(2)        # 16-bit
+        wav_file.setframerate(48000)    # 48kHz
+        wav_file.writeframes(pcm_data)
+```
+
+### Convert to MP3
+
+```bash
+# Using FFmpeg
+ffmpeg -f s16le -ar 48000 -ac 2 -i input.pcm output.mp3
+```
+
+## Integration with Video Production
+
+### Generate Background Music for Video
+
+```python
+async def generate_video_background(duration_seconds, mood):
+    """Generate background music matching video length"""
+
+    # Configure for video background
+    prompts = [
+        {"prompt": f"{mood} background music for video", "weight": 0.9},
+        {"prompt": "Non-distracting, supportive underscore", "weight": 0.6}
+    ]
+
+    async with client.aio.live.music.connect() as session:
+        await session.set_weighted_prompts(prompts)
+        await session.set_music_generation_config(
+            guidance=4.0,
+            density=0.4,  # Keep sparse for background
+            brightness=0.5
+        )
+        await session.play()
+
+        # Calculate chunks needed (48kHz stereo = 192000 bytes/second)
+        total_chunks = duration_seconds * 48000 // 512  # Chunk size estimate
+
+        audio_data = []
+        async for i, chunk in enumerate(session):
+            audio_data.append(chunk.audio_data)
+            if i >= total_chunks:
+                break
+
+        return b''.join(audio_data)
+```
+
+### Sync with Storyboard Timing
+
+```python
+async def generate_scene_music(scenes):
+    """Generate music with transitions matching scene changes"""
+
+    all_audio = []
+
+    async with client.aio.live.music.connect() as session:
+        for scene in scenes:
+            # Update style for each scene
+            await session.set_weighted_prompts([
+                {"prompt": scene['mood'], "weight": 0.9},
+                {"prompt": scene['style'], "weight": 0.5}
+            ])
+
+            if scene['index'] == 0:
+                await session.play()
+
+            # Collect audio for scene duration
+            chunks = int(scene['duration'] * 48000 / 512)
+            for _ in range(chunks):
+                chunk = await session.__anext__()
+                all_audio.append(chunk.audio_data)
+
+    return b''.join(all_audio)
+```
+
+## Limitations
+
+- **Instrumental only**: No vocal/singing generation
+- **WebSocket required**: Real-time streaming connection
+- **Safety filtering**: Prompts undergo safety review
+- **Watermarking**: All output contains SynthID watermark
+- **Experimental**: API may change
+
+## Best Practices
+
+1. **Buffer audio**: Implement robust buffering for smooth playback
+2. **Gradual transitions**: Avoid drastic prompt changes mid-stream
+3. **Sparse for backgrounds**: Lower density for video backgrounds
+4. **Test prompts**: Iterate on prompt combinations
+5. **Cross-fade transitions**: Blend audio at style changes
+6. **Match video mood**: Align music tempo/energy with visuals
+
+## Resources
+
+- [Lyria RealTime Docs](https://ai.google.dev/gemini-api/docs/music-generation)
+- [Audio Processing Guide](./audio-processing.md)
+- [Video Generation](./video-generation.md)
+
+---
+
+**Related**: [Audio Processing](./audio-processing.md) | [Video Generation](./video-generation.md)
+
+**Back to**: [AI Multimodal Skill](../SKILL.md)