init
This commit is contained in:
311
.opencode/skills/ai-multimodal/references/music-generation.md
Normal file
311
.opencode/skills/ai-multimodal/references/music-generation.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# Music Generation Reference
|
||||
|
||||
Real-time music generation using Lyria RealTime via WebSocket API.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Real-time streaming**: Bidirectional WebSocket for continuous generation
|
||||
- **Dynamic control**: Modify music in real-time during generation
|
||||
- **Style steering**: Genre, mood, instrumentation guidance
|
||||
- **Audio output**: 48kHz stereo 16-bit PCM
|
||||
|
||||
## Model
|
||||
|
||||
**Lyria RealTime** (Experimental)
|
||||
- WebSocket-based streaming
|
||||
- Real-time parameter adjustment
|
||||
- Instrumental only (no vocals)
|
||||
- Watermarked output
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
from google import genai
|
||||
import asyncio
|
||||
|
||||
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
|
||||
|
||||
async def generate_music():
|
||||
async with client.aio.live.music.connect() as session:
|
||||
# Set style prompts with weights (0.0-1.0)
|
||||
await session.set_weighted_prompts([
|
||||
{"prompt": "Upbeat corporate background music", "weight": 0.8},
|
||||
{"prompt": "Modern electronic elements", "weight": 0.5}
|
||||
])
|
||||
|
||||
# Configure generation parameters
|
||||
await session.set_music_generation_config(
|
||||
guidance=4.0, # Prompt adherence (0.0-6.0)
|
||||
bpm=120, # Tempo (60-200)
|
||||
density=0.6, # Note density (0.0-1.0)
|
||||
brightness=0.5 # Tonal quality (0.0-1.0)
|
||||
)
|
||||
|
||||
# Start playback and collect audio
|
||||
await session.play()
|
||||
|
||||
audio_chunks = []
|
||||
async for chunk in session:
|
||||
audio_chunks.append(chunk.audio_data)
|
||||
|
||||
return b''.join(audio_chunks)
|
||||
```
|
||||
|
||||
### JavaScript
|
||||
|
||||
```javascript
|
||||
const client = new GenaiClient({ apiKey: process.env.GEMINI_API_KEY });
|
||||
|
||||
async function generateMusic() {
|
||||
const session = await client.live.music.connect();
|
||||
|
||||
await session.setWeightedPrompts([
|
||||
{ prompt: "Calm ambient background", weight: 0.9 },
|
||||
{ prompt: "Nature sounds influence", weight: 0.3 }
|
||||
]);
|
||||
|
||||
await session.setMusicGenerationConfig({
|
||||
guidance: 3.5,
|
||||
bpm: 80,
|
||||
density: 0.4,
|
||||
brightness: 0.6
|
||||
});
|
||||
|
||||
session.onAudio((audioChunk) => {
|
||||
// Process 48kHz stereo PCM audio
|
||||
audioBuffer.push(audioChunk);
|
||||
});
|
||||
|
||||
await session.play();
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration Parameters
|
||||
|
||||
| Parameter | Range | Default | Description |
|
||||
|-----------|-------|---------|-------------|
|
||||
| `guidance` | 0.0-6.0 | 4.0 | Prompt adherence (higher = stricter) |
|
||||
| `bpm` | 60-200 | 120 | Tempo in beats per minute |
|
||||
| `density` | 0.0-1.0 | 0.5 | Note/sound density |
|
||||
| `brightness` | 0.0-1.0 | 0.5 | Tonal quality (higher = brighter) |
|
||||
| `scale` | 12 keys | C Major | Musical key |
|
||||
| `mute_bass` | bool | false | Remove bass elements |
|
||||
| `mute_drums` | bool | false | Remove drum elements |
|
||||
| `mode` | enum | QUALITY | QUALITY, DIVERSITY, VOCALIZATION |
|
||||
| `temperature` | 0.0-2.0 | 1.0 | Sampling randomness |
|
||||
| `top_k` | int | 40 | Sampling top-k |
|
||||
| `seed` | int | random | Reproducibility seed |
|
||||
|
||||
## Weighted Prompts
|
||||
|
||||
Control generation direction with weighted prompts:
|
||||
|
||||
```python
|
||||
await session.set_weighted_prompts([
|
||||
{"prompt": "Main style description", "weight": 1.0}, # Primary
|
||||
{"prompt": "Secondary influence", "weight": 0.5}, # Supporting
|
||||
{"prompt": "Subtle element", "weight": 0.2} # Accent
|
||||
])
|
||||
```
|
||||
|
||||
**Weight guidelines**:
|
||||
- 0.8-1.0: Dominant influence
|
||||
- 0.5-0.7: Secondary contribution
|
||||
- 0.2-0.4: Subtle accent
|
||||
- 0.0-0.1: Minimal effect
|
||||
|
||||
## Style Prompts by Use Case
|
||||
|
||||
### Corporate/Marketing
|
||||
|
||||
```python
|
||||
prompts = [
|
||||
{"prompt": "Professional corporate background music, modern", "weight": 0.9},
|
||||
{"prompt": "Uplifting, optimistic mood", "weight": 0.6},
|
||||
{"prompt": "Clean production, minimal complexity", "weight": 0.5}
|
||||
]
|
||||
config = {"bpm": 100, "brightness": 0.6, "density": 0.5}
|
||||
```
|
||||
|
||||
### Social Media/Short-form
|
||||
|
||||
```python
|
||||
prompts = [
|
||||
{"prompt": "Trending pop electronic beat", "weight": 0.9},
|
||||
{"prompt": "Energetic, catchy rhythm", "weight": 0.7},
|
||||
{"prompt": "Bass-heavy, punchy", "weight": 0.5}
|
||||
]
|
||||
config = {"bpm": 128, "brightness": 0.7, "density": 0.7}
|
||||
```
|
||||
|
||||
### Emotional/Cinematic
|
||||
|
||||
```python
|
||||
prompts = [
|
||||
{"prompt": "Cinematic orchestral underscore", "weight": 0.9},
|
||||
{"prompt": "Emotional, inspiring", "weight": 0.7},
|
||||
{"prompt": "Building tension and release", "weight": 0.5}
|
||||
]
|
||||
config = {"bpm": 70, "brightness": 0.4, "density": 0.4}
|
||||
```
|
||||
|
||||
### Ambient/Background
|
||||
|
||||
```python
|
||||
prompts = [
|
||||
{"prompt": "Calm ambient soundscape", "weight": 0.9},
|
||||
{"prompt": "Minimal, atmospheric", "weight": 0.6},
|
||||
{"prompt": "Lo-fi textures", "weight": 0.4}
|
||||
]
|
||||
config = {"bpm": 80, "brightness": 0.4, "density": 0.3}
|
||||
```
|
||||
|
||||
## Real-time Transitions
|
||||
|
||||
Smoothly transition between styles during generation:
|
||||
|
||||
```python
|
||||
async def dynamic_music_generation():
|
||||
async with client.aio.live.music.connect() as session:
|
||||
# Start with intro style
|
||||
await session.set_weighted_prompts([
|
||||
{"prompt": "Soft ambient intro", "weight": 0.9}
|
||||
])
|
||||
await session.play()
|
||||
|
||||
# Collect intro (4 seconds)
|
||||
intro_chunks = []
|
||||
for _ in range(192): # ~4 seconds at 48kHz
|
||||
chunk = await session.__anext__()
|
||||
intro_chunks.append(chunk.audio_data)
|
||||
|
||||
# Transition to main section
|
||||
await session.set_weighted_prompts([
|
||||
{"prompt": "Building energy", "weight": 0.7},
|
||||
{"prompt": "Full beat drop", "weight": 0.5}
|
||||
])
|
||||
|
||||
# Continue with new style...
|
||||
```
|
||||
|
||||
## Output Specifications
|
||||
|
||||
- **Format**: Raw 16-bit PCM
|
||||
- **Sample Rate**: 48,000 Hz
|
||||
- **Channels**: 2 (stereo)
|
||||
- **Bit Depth**: 16 bits
|
||||
- **Watermarking**: Always enabled (SynthID)
|
||||
|
||||
### Save to WAV
|
||||
|
||||
```python
|
||||
import wave
|
||||
|
||||
def save_pcm_to_wav(pcm_data, filename):
|
||||
with wave.open(filename, 'wb') as wav_file:
|
||||
wav_file.setnchannels(2) # Stereo
|
||||
wav_file.setsampwidth(2) # 16-bit
|
||||
wav_file.setframerate(48000) # 48kHz
|
||||
wav_file.writeframes(pcm_data)
|
||||
```
|
||||
|
||||
### Convert to MP3
|
||||
|
||||
```bash
|
||||
# Using FFmpeg
|
||||
ffmpeg -f s16le -ar 48000 -ac 2 -i input.pcm output.mp3
|
||||
```
|
||||
|
||||
## Integration with Video Production
|
||||
|
||||
### Generate Background Music for Video
|
||||
|
||||
```python
|
||||
async def generate_video_background(duration_seconds, mood):
|
||||
"""Generate background music matching video length"""
|
||||
|
||||
# Configure for video background
|
||||
prompts = [
|
||||
{"prompt": f"{mood} background music for video", "weight": 0.9},
|
||||
{"prompt": "Non-distracting, supportive underscore", "weight": 0.6}
|
||||
]
|
||||
|
||||
async with client.aio.live.music.connect() as session:
|
||||
await session.set_weighted_prompts(prompts)
|
||||
await session.set_music_generation_config(
|
||||
guidance=4.0,
|
||||
density=0.4, # Keep sparse for background
|
||||
brightness=0.5
|
||||
)
|
||||
await session.play()
|
||||
|
||||
# Calculate chunks needed (48kHz stereo = 192000 bytes/second)
|
||||
total_chunks = duration_seconds * 48000 // 512 # Chunk size estimate
|
||||
|
||||
audio_data = []
|
||||
async for i, chunk in enumerate(session):
|
||||
audio_data.append(chunk.audio_data)
|
||||
if i >= total_chunks:
|
||||
break
|
||||
|
||||
return b''.join(audio_data)
|
||||
```
|
||||
|
||||
### Sync with Storyboard Timing
|
||||
|
||||
```python
|
||||
async def generate_scene_music(scenes):
|
||||
"""Generate music with transitions matching scene changes"""
|
||||
|
||||
all_audio = []
|
||||
|
||||
async with client.aio.live.music.connect() as session:
|
||||
for scene in scenes:
|
||||
# Update style for each scene
|
||||
await session.set_weighted_prompts([
|
||||
{"prompt": scene['mood'], "weight": 0.9},
|
||||
{"prompt": scene['style'], "weight": 0.5}
|
||||
])
|
||||
|
||||
if scene['index'] == 0:
|
||||
await session.play()
|
||||
|
||||
# Collect audio for scene duration
|
||||
chunks = int(scene['duration'] * 48000 / 512)
|
||||
for _ in range(chunks):
|
||||
chunk = await session.__anext__()
|
||||
all_audio.append(chunk.audio_data)
|
||||
|
||||
return b''.join(all_audio)
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Instrumental only**: No vocal/singing generation
|
||||
- **WebSocket required**: Real-time streaming connection
|
||||
- **Safety filtering**: Prompts undergo safety review
|
||||
- **Watermarking**: All output contains SynthID watermark
|
||||
- **Experimental**: API may change
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Buffer audio**: Implement robust buffering for smooth playback
|
||||
2. **Gradual transitions**: Avoid drastic prompt changes mid-stream
|
||||
3. **Sparse for backgrounds**: Lower density for video backgrounds
|
||||
4. **Test prompts**: Iterate on prompt combinations
|
||||
5. **Cross-fade transitions**: Blend audio at style changes
|
||||
6. **Match video mood**: Align music tempo/energy with visuals
|
||||
|
||||
## Resources
|
||||
|
||||
- [Lyria RealTime Docs](https://ai.google.dev/gemini-api/docs/music-generation)
|
||||
- [Audio Processing Guide](./audio-processing.md)
|
||||
- [Video Generation](./video-generation.md)
|
||||
|
||||
---
|
||||
|
||||
**Related**: [Audio Processing](./audio-processing.md) | [Video Generation](./video-generation.md)
|
||||
|
||||
**Back to**: [AI Multimodal Skill](../SKILL.md)
|
||||
Reference in New Issue
Block a user