8.8 KiB
8.8 KiB
Music Generation Reference
Real-time music generation using Lyria RealTime via WebSocket API.
Core Capabilities
- Real-time streaming: Bidirectional WebSocket for continuous generation
- Dynamic control: Modify music in real-time during generation
- Style steering: Genre, mood, instrumentation guidance
- Audio output: 48kHz stereo 16-bit PCM
Model
Lyria RealTime (Experimental)
- WebSocket-based streaming
- Real-time parameter adjustment
- Instrumental only (no vocals)
- Watermarked output
Quick Start
Python
from google import genai
import asyncio
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
async def generate_music():
async with client.aio.live.music.connect() as session:
# Set style prompts with weights (0.0-1.0)
await session.set_weighted_prompts([
{"prompt": "Upbeat corporate background music", "weight": 0.8},
{"prompt": "Modern electronic elements", "weight": 0.5}
])
# Configure generation parameters
await session.set_music_generation_config(
guidance=4.0, # Prompt adherence (0.0-6.0)
bpm=120, # Tempo (60-200)
density=0.6, # Note density (0.0-1.0)
brightness=0.5 # Tonal quality (0.0-1.0)
)
# Start playback and collect audio
await session.play()
audio_chunks = []
async for chunk in session:
audio_chunks.append(chunk.audio_data)
return b''.join(audio_chunks)
JavaScript
const client = new GenaiClient({ apiKey: process.env.GEMINI_API_KEY });
async function generateMusic() {
const session = await client.live.music.connect();
await session.setWeightedPrompts([
{ prompt: "Calm ambient background", weight: 0.9 },
{ prompt: "Nature sounds influence", weight: 0.3 }
]);
await session.setMusicGenerationConfig({
guidance: 3.5,
bpm: 80,
density: 0.4,
brightness: 0.6
});
session.onAudio((audioChunk) => {
// Process 48kHz stereo PCM audio
audioBuffer.push(audioChunk);
});
await session.play();
}
Configuration Parameters
| Parameter | Range | Default | Description |
|---|---|---|---|
guidance |
0.0-6.0 | 4.0 | Prompt adherence (higher = stricter) |
bpm |
60-200 | 120 | Tempo in beats per minute |
density |
0.0-1.0 | 0.5 | Note/sound density |
brightness |
0.0-1.0 | 0.5 | Tonal quality (higher = brighter) |
scale |
12 keys | C Major | Musical key |
mute_bass |
bool | false | Remove bass elements |
mute_drums |
bool | false | Remove drum elements |
mode |
enum | QUALITY | QUALITY, DIVERSITY, VOCALIZATION |
temperature |
0.0-2.0 | 1.0 | Sampling randomness |
top_k |
int | 40 | Sampling top-k |
seed |
int | random | Reproducibility seed |
Weighted Prompts
Control generation direction with weighted prompts:
await session.set_weighted_prompts([
{"prompt": "Main style description", "weight": 1.0}, # Primary
{"prompt": "Secondary influence", "weight": 0.5}, # Supporting
{"prompt": "Subtle element", "weight": 0.2} # Accent
])
Weight guidelines:
- 0.8-1.0: Dominant influence
- 0.5-0.7: Secondary contribution
- 0.2-0.4: Subtle accent
- 0.0-0.1: Minimal effect
Style Prompts by Use Case
Corporate/Marketing
prompts = [
{"prompt": "Professional corporate background music, modern", "weight": 0.9},
{"prompt": "Uplifting, optimistic mood", "weight": 0.6},
{"prompt": "Clean production, minimal complexity", "weight": 0.5}
]
config = {"bpm": 100, "brightness": 0.6, "density": 0.5}
Social Media/Short-form
prompts = [
{"prompt": "Trending pop electronic beat", "weight": 0.9},
{"prompt": "Energetic, catchy rhythm", "weight": 0.7},
{"prompt": "Bass-heavy, punchy", "weight": 0.5}
]
config = {"bpm": 128, "brightness": 0.7, "density": 0.7}
Emotional/Cinematic
prompts = [
{"prompt": "Cinematic orchestral underscore", "weight": 0.9},
{"prompt": "Emotional, inspiring", "weight": 0.7},
{"prompt": "Building tension and release", "weight": 0.5}
]
config = {"bpm": 70, "brightness": 0.4, "density": 0.4}
Ambient/Background
prompts = [
{"prompt": "Calm ambient soundscape", "weight": 0.9},
{"prompt": "Minimal, atmospheric", "weight": 0.6},
{"prompt": "Lo-fi textures", "weight": 0.4}
]
config = {"bpm": 80, "brightness": 0.4, "density": 0.3}
Real-time Transitions
Smoothly transition between styles during generation:
async def dynamic_music_generation():
async with client.aio.live.music.connect() as session:
# Start with intro style
await session.set_weighted_prompts([
{"prompt": "Soft ambient intro", "weight": 0.9}
])
await session.play()
# Collect intro (4 seconds)
intro_chunks = []
for _ in range(192): # ~4 seconds at 48kHz
chunk = await session.__anext__()
intro_chunks.append(chunk.audio_data)
# Transition to main section
await session.set_weighted_prompts([
{"prompt": "Building energy", "weight": 0.7},
{"prompt": "Full beat drop", "weight": 0.5}
])
# Continue with new style...
Output Specifications
- Format: Raw 16-bit PCM
- Sample Rate: 48,000 Hz
- Channels: 2 (stereo)
- Bit Depth: 16 bits
- Watermarking: Always enabled (SynthID)
Save to WAV
import wave
def save_pcm_to_wav(pcm_data, filename):
with wave.open(filename, 'wb') as wav_file:
wav_file.setnchannels(2) # Stereo
wav_file.setsampwidth(2) # 16-bit
wav_file.setframerate(48000) # 48kHz
wav_file.writeframes(pcm_data)
Convert to MP3
# Using FFmpeg
ffmpeg -f s16le -ar 48000 -ac 2 -i input.pcm output.mp3
Integration with Video Production
Generate Background Music for Video
async def generate_video_background(duration_seconds, mood):
"""Generate background music matching video length"""
# Configure for video background
prompts = [
{"prompt": f"{mood} background music for video", "weight": 0.9},
{"prompt": "Non-distracting, supportive underscore", "weight": 0.6}
]
async with client.aio.live.music.connect() as session:
await session.set_weighted_prompts(prompts)
await session.set_music_generation_config(
guidance=4.0,
density=0.4, # Keep sparse for background
brightness=0.5
)
await session.play()
# Calculate chunks needed (48kHz stereo = 192000 bytes/second)
total_chunks = duration_seconds * 48000 // 512 # Chunk size estimate
audio_data = []
async for i, chunk in enumerate(session):
audio_data.append(chunk.audio_data)
if i >= total_chunks:
break
return b''.join(audio_data)
Sync with Storyboard Timing
async def generate_scene_music(scenes):
"""Generate music with transitions matching scene changes"""
all_audio = []
async with client.aio.live.music.connect() as session:
for scene in scenes:
# Update style for each scene
await session.set_weighted_prompts([
{"prompt": scene['mood'], "weight": 0.9},
{"prompt": scene['style'], "weight": 0.5}
])
if scene['index'] == 0:
await session.play()
# Collect audio for scene duration
chunks = int(scene['duration'] * 48000 / 512)
for _ in range(chunks):
chunk = await session.__anext__()
all_audio.append(chunk.audio_data)
return b''.join(all_audio)
Limitations
- Instrumental only: No vocal/singing generation
- WebSocket required: Real-time streaming connection
- Safety filtering: Prompts undergo safety review
- Watermarking: All output contains SynthID watermark
- Experimental: API may change
Best Practices
- Buffer audio: Implement robust buffering for smooth playback
- Gradual transitions: Avoid drastic prompt changes mid-stream
- Sparse for backgrounds: Lower density for video backgrounds
- Test prompts: Iterate on prompt combinations
- Cross-fade transitions: Blend audio at style changes
- Match video mood: Align music tempo/energy with visuals
Resources
Related: Audio Processing | Video Generation
Back to: AI Multimodal Skill