11 KiB
11 KiB
Video Generation Reference
Comprehensive guide for video creation using Veo models via Gemini API.
Core Capabilities
- Text-to-Video: Generate 8-second videos from text prompts
- Image-to-Video: Animate images with text direction
- Video Extension: Continue previously generated videos
- Frame Control: Precise camera movements and effects
- Native Audio: Synchronized audio generation
- Multiple Resolutions: 720p and 1080p output
- Aspect Ratios: 16:9, 9:16, 1:1
Models
Veo 3.1 Preview (Latest)
veo-3.1-generate-preview - Latest with advanced controls
- Frame-specific generation
- Up to 3 reference images for image-to-video
- Video extension capability
- Native audio generation
- Resolution: 720p, 1080p
- Duration: 8 seconds at 24fps
- Status: Preview (API may change)
- Updated: September 2025
veo-3.1-fast-generate-preview - Speed-optimized
- Optimized for business use cases
- Programmatic ad creation
- Social media content
- Same features as standard but faster
- Status: Preview
- Updated: September 2025
Veo 3.0 Stable
veo-3.0-generate-001 - Production-ready
- Native audio generation
- Text-to-video and image-to-video
- 720p and 1080p (16:9 only)
- 8 seconds at 24fps
- Status: Stable
- Updated: July 2025
veo-3.0-fast-generate-001 - Stable fast variant
- Speed-optimized stable version
- Same reliability as 3.0
- Status: Stable
- Updated: July 2025
Model Comparison
| Model | Speed | Features | Audio | Status | Best For |
|---|---|---|---|---|---|
| veo-3.1-preview | Medium | All | ✓ | Preview | Latest features |
| veo-3.1-fast | Fast | All | ✓ | Preview | Business/speed |
| veo-3.0-001 | Medium | Standard | ✓ | Stable | Production |
| veo-3.0-fast | Fast | Standard | ✓ | Stable | Production/speed |
Quick Start
Text-to-Video
from google import genai
from google.genai import types
import os
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
# Basic generation
response = client.models.generate_video(
model='veo-3.1-generate-preview',
prompt='A serene beach at sunset with gentle waves rolling onto the shore',
config=types.VideoGenerationConfig(
resolution='1080p',
aspect_ratio='16:9'
)
)
# Save video
with open('output.mp4', 'wb') as f:
f.write(response.video.data)
Image-to-Video
import PIL.Image
# Load reference image
ref_image = PIL.Image.open('beach.jpg')
# Animate the image
response = client.models.generate_video(
model='veo-3.1-generate-preview',
prompt='Camera slowly pans across the scene from left to right',
reference_images=[ref_image],
config=types.VideoGenerationConfig(
resolution='1080p'
)
)
Multiple Reference Images
# Use up to 3 reference images for complex scenes
img1 = PIL.Image.open('foreground.jpg')
img2 = PIL.Image.open('background.jpg')
img3 = PIL.Image.open('subject.jpg')
response = client.models.generate_video(
model='veo-3.1-generate-preview',
prompt='Combine these elements into a cohesive animated scene',
reference_images=[img1, img2, img3],
config=types.VideoGenerationConfig(
resolution='1080p',
aspect_ratio='16:9'
)
)
Advanced Features
Video Extension
# Continue from previously generated video
previous_video = open('part1.mp4', 'rb').read()
response = client.models.extend_video(
model='veo-3.1-generate-preview',
video=previous_video,
prompt='The scene transitions to nighttime with stars appearing'
)
Frame Control
# Precise camera movements
response = client.models.generate_video(
model='veo-3.1-generate-preview',
prompt='A mountain landscape',
config=types.VideoGenerationConfig(
resolution='1080p',
camera_motion='zoom_in', # Options: zoom_in, zoom_out, pan_left, pan_right, tilt_up, tilt_down, static
motion_speed='slow' # Options: slow, medium, fast
)
)
Prompt Engineering
Effective Video Prompts
Structure:
- Subject: What's in the scene
- Action: What's happening
- Camera: How it's filmed
- Style: Visual treatment
- Timing: Pacing details
Example:
"A hummingbird [subject] hovers near a red flower, then flies away [action].
Slow-motion close-up shot [camera] with vibrant colors and soft focus background [style].
Gentle, peaceful pacing [timing]."
Action Verbs
Movement:
- "walks", "runs", "flies", "swims", "dances"
- "rotates", "spins", "rolls", "bounces"
- "emerges", "disappears", "transforms"
Camera:
- "zoom in on", "pull back from", "follow"
- "orbit around", "track alongside"
- "tilt up to reveal", "pan across"
Transitions:
- "gradually changes from... to..."
- "morphs into", "dissolves into"
- "cuts to", "fades to"
Timing Control
# Explicit timing in prompt
prompt = '''
0-2s: Close-up of a seed in soil
2-4s: Time-lapse of sprout emerging
4-6s: Growing into a small plant
6-8s: Zoom out to show garden context
'''
Configuration Options
Resolution
config = types.VideoGenerationConfig(
resolution='1080p' # Options: 720p, 1080p
)
Considerations:
- 1080p: Higher quality, longer generation time, larger file
- 720p: Faster generation, smaller file, good for drafts
Aspect Ratios
config = types.VideoGenerationConfig(
aspect_ratio='16:9' # Options: 16:9, 9:16, 1:1
)
Use Cases:
- 16:9: Landscape, YouTube, traditional video
- 9:16: Mobile, TikTok, Instagram Stories
- 1:1: Square, Instagram feed, versatile
Audio Control
config = types.VideoGenerationConfig(
include_audio=True # Default: True
)
Native audio is generated automatically and synchronized with video content.
Best Practices
1. Prompt Quality
Be specific:
- ❌ "A person walking"
- ✅ "A young woman in a red coat walking through a park in autumn"
Include motion:
- ❌ "A city street"
- ✅ "A busy city street with cars passing and people crossing"
Specify camera:
- ❌ "A mountain"
- ✅ "Aerial drone shot slowly ascending over a snow-capped mountain"
2. Reference Images
Quality:
- Use high-resolution images (1080p+)
- Clear, well-lit subjects
- Minimal motion blur
Composition:
- Match desired final aspect ratio
- Leave room for motion/movement
- Consider camera angle in prompt
3. Performance Optimization
Generation Time:
- 720p: ~30-60 seconds
- 1080p: ~60-120 seconds
- Fast models: 30-50% faster
Strategies:
- Use 720p for iteration/drafts
- Use fast models for rapid feedback
- Batch multiple requests
- Use async processing for UI responsiveness
Common Use Cases
1. Product Demos
response = client.models.generate_video(
model='veo-3.0-fast-generate-001',
prompt='''
Professional product video:
- Sleek smartphone rotating on a pedestal
- Clean white background with soft shadows
- Slow 360-degree rotation
- Spotlight highlighting premium design
- Modern, minimalist aesthetic
''',
config=types.VideoGenerationConfig(
resolution='1080p',
aspect_ratio='1:1'
)
)
2. Social Media Content
response = client.models.generate_video(
model='veo-3.1-fast-generate-preview',
prompt='''
Trendy social media clip:
- Text overlay "NEW ARRIVAL" appears
- Fashion product showcase
- Quick cuts and dynamic camera
- Vibrant colors, high energy
- Upbeat pacing
''',
config=types.VideoGenerationConfig(
resolution='1080p',
aspect_ratio='9:16' # Mobile
)
)
3. Explainer Animations
response = client.models.generate_video(
model='veo-3.1-generate-preview',
prompt='''
Educational animation:
- Simple diagram illustrating data flow
- Arrows and icons animating in sequence
- Clean, clear visual hierarchy
- Smooth transitions between steps
- Professional corporate style
''',
config=types.VideoGenerationConfig(
resolution='720p',
aspect_ratio='16:9'
)
)
Safety & Content Policy
Safety Settings
config = types.VideoGenerationConfig(
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)
Prohibited Content
- Violence, gore, harm
- Sexually explicit content
- Hate speech, harassment
- Copyrighted characters/brands
- Real people (without consent)
- Misleading/deceptive content
Limitations
- Duration: Fixed 8 seconds (as of Sept 2025)
- Frame Rate: 24fps only
- File Size: ~5-20MB per video
- Generation Time: 30s-2min depending on resolution
- Reference Images: Max 3 images
- Preview Status: API may change (3.1 models)
- Audio: Cannot upload custom audio (native only)
- No real-time: Pre-generation required
Troubleshooting
Long Generation Times
import time
# Track generation progress
start = time.time()
response = client.models.generate_video(...)
duration = time.time() - start
print(f"Generated in {duration:.1f}s")
Expected times:
- Fast models + 720p: 30-45s
- Standard models + 720p: 45-90s
- Fast models + 1080p: 45-60s
- Standard models + 1080p: 60-120s
Safety Filter Blocking
try:
response = client.models.generate_video(...)
except Exception as e:
if 'safety' in str(e).lower():
print("Video blocked by safety filters")
# Modify prompt and retry
Quota Exceeded
# Implement exponential backoff
import time
def generate_with_retry(model, prompt, max_retries=3):
for attempt in range(max_retries):
try:
return client.models.generate_video(model=model, prompt=prompt)
except Exception as e:
if '429' in str(e): # Rate limit
wait = 2 ** attempt
print(f"Rate limited, waiting {wait}s...")
time.sleep(wait)
else:
raise
raise Exception("Max retries exceeded")
Cost Estimation
Pricing: TBD (preview models)
Estimated based on compute:
- Fast + 720p: ~$0.05-$0.10 per video
- Standard + 1080p: ~$0.15-$0.25 per video
Monitor: https://ai.google.dev/pricing
Resources
Related References
Current: Video Generation
Related Capabilities:
- Video Analysis - Understanding existing videos
- Image Generation - Creating static images
- Image Understanding - Analyzing reference images
Back to: AI Multimodal Skill