Files
english/.opencode/skills/ai-multimodal/references/image-generation.md
2026-04-12 01:06:31 +07:00

1003 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Image Generation Reference
Comprehensive guide for image creation, editing, and composition using Imagen 4 and Gemini models ("Nano Banana").
> **Nano Banana** = Google's internal name for native image generation in Gemini API. Three variants:
> - **Nano Banana 2** (`gemini-3.1-flash-image-preview`) - NEW DEFAULT. 3-5x faster, 95% Pro quality, web grounding, 100+ language text rendering, character consistency (5 chars/14 objects). Released Feb 2026.
> - **Nano Banana Flash** (`gemini-2.5-flash-image`) - Previous default, still stable.
> - **Nano Banana Pro** (`gemini-3-pro-image-preview`) - Quality with reasoning, 4K text.
## Core Capabilities
- **Text-to-Image**: Generate images from text prompts
- **Image Editing**: Modify existing images with text instructions
- **Multi-Image Composition**: Combine up to 14 reference images (Pro model)
- **Iterative Refinement**: Multi-turn conversational refinement
- **Aspect Ratios**: 10 formats (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
- **Image Sizes**: 1K, 2K, 4K (uppercase K required)
- **Quality Variants**: Standard/Ultra/Fast for different needs
- **Text in Images**: Up to 25 chars optimal (4K text in Pro)
- **Search Grounding**: Real-time data integration (Pro only)
- **Thinking Mode**: Advanced reasoning for complex prompts (Pro only)
## Models
### Nano Banana 2 (Default - Recommended)
**gemini-3.1-flash-image-preview** - Nano Banana 2 ⭐ NEW DEFAULT
- Best for: General use, fast generation with near-Pro quality
- Quality: High (95% parity with Pro)
- Speed: 3-5x faster than previous Flash
- Cost: ~$0.045/image (512px) to ~$0.151/image (4K); ~25-30% cheaper than Pro
- Resolution: 512px to 4K with expanded aspect ratios
- Text rendering: 100+ languages with proper formatting
- Character consistency: Up to 5 characters and 14 objects
- Reasoning levels: Minimal/High/Dynamic for complex prompts
- Web grounding: Real-time data integration for brands, landmarks, recent events
- Status: Preview (Feb 2026)
### Nano Banana Flash (Previous Default)
**gemini-2.5-flash-image** - Nano Banana Flash
- Best for: Speed, high-volume generation, rapid prototyping
- Quality: High
- Context: 65,536 input / 32,768 output tokens
- Speed: Fast (~5-10s per image)
- Cost: ~$1/1M input tokens
- Aspect Ratios: All 10 supported
- Image Sizes: 1K, 2K, 4K
- Status: Stable (Oct 2025)
**gemini-3-pro-image-preview** - Nano Banana Pro
- Best for: Professional assets, 4K text rendering, complex prompts
- Quality: Ultra (with advanced reasoning)
- Context: 65,536 input / 32,768 output tokens
- Speed: Medium
- Cost: ~$2/1M text input, $0.134/image (resolution-dependent)
- Multi-Image: Up to 14 reference images (6 objects + 5 humans)
- Features: Thinking mode, Google Search grounding
- Status: Preview (Nov 2025)
### Imagen 4 (Alternative - Production)
**imagen-4.0-generate-001** - Standard quality, balanced performance
- Best for: Production workflows, marketing assets
- Quality: High
- Speed: Medium (~5-10s per image)
- Cost: ~$0.02/image (estimated)
- Output: 1-4 images per request
- Resolution: 1K or 2K
- Updated: June 2025
**imagen-4.0-ultra-generate-001** - Maximum quality
- Best for: Final production, marketing assets, detailed artwork
- Quality: Ultra (highest available)
- Speed: Slow (~15-25s per image)
- Cost: ~$0.04/image (estimated)
- Output: 1-4 images per request
- Resolution: 2K preferred
- Updated: June 2025
**imagen-4.0-fast-generate-001** - Fastest generation
- Best for: Rapid iteration, bulk generation, real-time use
- Quality: Good
- Speed: Fast (~2-5s per image)
- Cost: ~$0.01/image (estimated)
- Output: 1-4 images per request
- Resolution: 1K
- Updated: June 2025
### Legacy Models
**gemini-2.0-flash-preview-image-generation** - Legacy
- Status: Deprecated (use Nano Banana or Imagen 4 instead)
- Context: 32,768 input / 8,192 output tokens
## Model Comparison
| Model | Quality | Speed | Cost | Best For |
|-------|---------|-------|------|----------|
| gemini-3.1-flash-image-preview | ⭐⭐⭐⭐½ | 🚀🚀 Fastest | 💵 Low | **NEW DEFAULT** - General use |
| gemini-2.5-flash-image | ⭐⭐⭐⭐ | 🚀 Fast | 💵 Low | Previous default, stable |
| gemini-3-pro-image | ⭐⭐⭐⭐⭐ | 💡 Medium | 💰 Medium | Text/reasoning |
| imagen-4.0-generate | ⭐⭐⭐⭐ | 💡 Medium | 💰 Medium | Production (alternative) |
| imagen-4.0-ultra | ⭐⭐⭐⭐⭐ | 🐢 Slow | 💰💰 High | Marketing assets |
| imagen-4.0-fast | ⭐⭐⭐ | 🚀 Fast | 💵 Low | Bulk generation |
**Selection Guide**:
- **Default/General**: Use `gemini-3.1-flash-image-preview` (fastest, near-Pro quality, web grounding)
- **Stable Alternative**: Use `gemini-2.5-flash-image` (previous default, fully stable)
- **Production Quality**: Use `imagen-4.0-generate-001` (alternative for final assets)
- **Marketing/Ultra Quality**: Use `imagen-4.0-ultra` for maximum quality
- **Text-Heavy Images**: Use `gemini-3-pro-image-preview` for 4K text rendering
- **Complex Prompts with Reasoning**: Use `gemini-3-pro-image-preview` with Thinking mode
- **Real-time Data Integration**: Use `gemini-3.1-flash-image-preview` or `gemini-3-pro-image-preview` with Search grounding
## Quick Start
### Basic Generation (Default - Nano Banana 2)
```python
from google import genai
from google.genai import types
import os
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
# Nano Banana 2 - NEW DEFAULT (fastest, near-Pro quality, web grounding)
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview',
contents='A serene mountain landscape at sunset with snow-capped peaks',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'], # Uppercase required
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='2K' # 512px, 1K, 2K, 4K - uppercase K required
)
)
)
# Save images
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'output-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
```
### Alternative - Imagen 4 (Production Quality)
```python
# Imagen 4 Standard - alternative for production workflows
response = client.models.generate_images(
model='imagen-4.0-generate-001',
prompt='Professional product photography of smartphone',
config=types.GenerateImagesConfig(
numberOfImages=1,
aspectRatio='16:9',
imageSize='1K'
)
)
# Save Imagen 4 output
for i, generated_image in enumerate(response.generated_images):
with open(f'output-{i}.png', 'wb') as f:
f.write(generated_image.image.image_bytes)
```
### Imagen 4 Quality Variants
```python
# Ultra quality (marketing assets)
response = client.models.generate_images(
model='imagen-4.0-ultra-generate-001',
prompt='Professional product photography of smartphone',
config=types.GenerateImagesConfig(
numberOfImages=1,
imageSize='2K' # Use 2K for ultra (Standard/Ultra only)
)
)
# Fast generation (bulk)
# Note: Fast model doesn't support imageSize parameter
response = client.models.generate_images(
model='imagen-4.0-fast-generate-001',
prompt='Quick concept sketch of robot character',
config=types.GenerateImagesConfig(
numberOfImages=4, # Generate multiple variants (default: 4)
aspectRatio='1:1'
)
)
```
### Nano Banana Pro (4K Text, Reasoning)
```python
# Nano Banana Pro - for text rendering and complex prompts
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents='A futuristic cityscape with neon lights',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'], # Uppercase required
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='4K' # 4K text rendering
)
)
)
# Nano Banana Pro - with Thinking mode and Search grounding
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents='Current weather in Tokyo visualized as artistic infographic',
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'], # Both text and image
image_config=types.ImageConfig(
aspect_ratio='1:1',
image_size='4K'
)
),
tools=[{'google_search': {}}] # Enable search grounding
)
# Save from content parts
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'output-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
```
### Multi-Image Reference (Nano Banana Pro)
```python
from PIL import Image
# Up to 14 reference images (6 objects + 5 humans recommended)
img1 = Image.open('style_ref.png')
img2 = Image.open('color_ref.png')
img3 = Image.open('composition_ref.png')
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents=[
'Blend these reference styles into a cohesive hero image for a tech product',
img1, img2, img3
],
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='4K'
)
)
)
```
### Nano Banana 2 with Web Grounding
```python
# Nano Banana 2 - real-time web integration for brands, landmarks, events
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview',
contents='Current Apple Vision Pro product shot with accurate branding',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='2K'
)
)
)
```
### Nano Banana 2 with Reasoning Levels
```python
# Use reasoning levels for complex prompts
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview',
contents='A photorealistic scene of 5 diverse characters sitting around a campfire, each with distinct clothing and accessories, consistent lighting from the fire',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='4K'
)
)
)
# Nano Banana 2 auto-selects reasoning level (Minimal/High/Dynamic)
# For explicit control, check API docs for reasoning_level parameter
```
### Multi-Turn Refinement Chat
```python
# Conversational image refinement (works with any Nano Banana model)
chat = client.chats.create(
model='gemini-3.1-flash-image-preview', # or gemini-2.5-flash-image
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
# Initial generation
response1 = chat.send_message('Create a minimalist logo for a coffee brand called "Brew"')
# Iterative refinement
response2 = chat.send_message('Make the text bolder and add steam rising from the cup')
response3 = chat.send_message('Change the color palette to warm earth tones')
```
## API Differences
### Imagen 4 vs Nano Banana (Gemini Native)
| Feature | Imagen 4 | Nano Banana (Gemini) |
|---------|----------|---------------------|
| Method | `generate_images()` | `generate_content()` |
| Config | `GenerateImagesConfig` | `GenerateContentConfig` |
| Prompt param | `prompt` (string) | `contents` (string/list) |
| Image count | `numberOfImages` (camelCase) | N/A (single per request) |
| Aspect ratio | `aspectRatio` (camelCase) | `aspect_ratio` (snake_case) |
| Size | `imageSize` | `image_size` |
| Response | `generated_images[i].image.image_bytes` | `candidates[0].content.parts[i].inline_data.data` |
| Multi-image input | ❌ | ✅ Up to 14 references |
| Multi-turn chat | ❌ | ✅ Conversational |
| Search grounding | ❌ | ✅ (Pro only) |
| Thinking mode | ❌ | ✅ (Pro only) |
| Text rendering | Limited | 4K (Pro) |
**Imagen 4** uses `generate_images()`:
```python
response = client.models.generate_images(
model='imagen-4.0-generate-001',
prompt='...',
config=types.GenerateImagesConfig(
numberOfImages=1, # camelCase
aspectRatio='16:9', # camelCase
imageSize='1K' # Standard/Ultra only
)
)
# Access: response.generated_images[0].image.image_bytes
```
**Nano Banana** uses `generate_content()`:
```python
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview', # or gemini-2.5-flash-image, gemini-3-pro-image-preview
contents='...',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'], # Uppercase required
image_config=types.ImageConfig(
aspect_ratio='16:9', # snake_case
image_size='2K' # 1K, 2K, 4K - uppercase K
)
)
)
# Access: response.candidates[0].content.parts[0].inline_data.data
```
**Critical Notes**:
1. `response_modalities` values MUST be uppercase: `'IMAGE'`, `'TEXT'`
2. `image_size` value MUST have uppercase K: `'1K'`, `'2K'`, `'4K'`
3. Imagen 4 Fast model doesn't support `imageSize` parameter
## Aspect Ratios
| Ratio | Resolution (1K) | Use Case | Token Cost |
|-------|----------------|----------|------------|
| 1:1 | 1024×1024 | Social media, avatars, icons | 1290 |
| 2:3 | 682×1024 | Vertical portraits | 1290 |
| 3:2 | 1024×682 | Horizontal portraits | 1290 |
| 3:4 | 768×1024 | Vertical posters | 1290 |
| 4:3 | 1024×768 | Traditional media | 1290 |
| 4:5 | 819×1024 | Instagram portrait | 1290 |
| 5:4 | 1024×819 | Horizontal photos | 1290 |
| 9:16 | 576×1024 | Mobile/stories/reels | 1290 |
| 16:9 | 1024×576 | Landscapes, banners, YouTube | 1290 |
| 21:9 | 1024×438 | Ultrawide/cinematic | 1290 |
All ratios cost the same: 1,290 tokens per image (Gemini models).
## Response Modalities
### Image Only
```python
config = types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='1:1'
)
```
### Text Only (No Image)
```python
config = types.GenerateContentConfig(
response_modalities=['text']
)
# Returns text description instead of generating image
```
### Both Image and Text
```python
config = types.GenerateContentConfig(
response_modalities=['image', 'text'],
aspect_ratio='16:9'
)
# Returns both generated image and description
```
## Image Editing
### Modify Existing Image
```python
import PIL.Image
# Load original
img = PIL.Image.open('original.png')
# Edit with instructions
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a red balloon floating in the sky',
img
],
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
```
### Style Transfer
```python
img = PIL.Image.open('photo.jpg')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Transform this into an oil painting style',
img
]
)
```
### Object Addition/Removal
```python
# Add object
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a vintage car parked on the street',
img
]
)
# Remove object
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Remove the person on the left side',
img
]
)
```
## Multi-Image Composition
### Combine Multiple Images
```python
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
img3 = PIL.Image.open('overlay.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Combine these images into a cohesive scene',
img1,
img2,
img3
],
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
```
**Note**: Recommended maximum 3 input images for best results.
## Prompt Engineering
### Core Principle: Narrative > Keywords
> **Nano Banana prompting**: Write like you're briefing a photographer, not providing SEO keywords. Narrative paragraphs outperform keyword lists.
**Bad**: "cat, 4k, masterpiece, trending, professional, ultra detailed, cinematic"
**Good**: "A fluffy orange tabby cat with green eyes lounging on a sun-drenched windowsill. Soft morning light creates a warm glow. Shot with a 50mm lens at f/1.8 for shallow depth of field. Natural lighting, documentary photography style."
### Effective Prompt Structure
**Three key elements**:
1. **Subject**: What to generate (be specific)
2. **Context**: Environmental setting (lighting, location, time)
3. **Style**: Artistic treatment (photography, illustration, etc.)
### Quality Modifiers
**Technical terms**:
- "4K", "8K", "high resolution"
- "HDR", "high dynamic range"
- "professional photography"
- "studio lighting"
- "ultra detailed"
**Camera settings**:
- "35mm lens", "50mm lens"
- "shallow depth of field"
- "wide angle shot"
- "macro photography"
- "golden hour lighting"
### Style Keywords
**Art styles**:
- "oil painting", "watercolor", "sketch"
- "digital art", "concept art"
- "photorealistic", "hyperrealistic"
- "minimalist", "abstract"
- "cyberpunk", "steampunk", "fantasy"
**Mood and atmosphere**:
- "dramatic lighting", "soft lighting"
- "moody", "bright and cheerful"
- "mysterious", "whimsical"
- "dark and gritty", "pastel colors"
### Subject Description
**Be specific**:
- ❌ "A cat"
- ✅ "A fluffy orange tabby cat with green eyes"
**Add context**:
- ❌ "A building"
- ✅ "A modern glass skyscraper reflecting sunset clouds"
**Include details**:
- ❌ "A person"
- ✅ "A young woman in a red dress holding an umbrella"
### Composition and Framing
**Camera angles**:
- "bird's eye view", "aerial shot"
- "low angle", "high angle"
- "close-up", "wide shot"
- "centered composition"
- "rule of thirds"
**Perspective**:
- "first person view"
- "third person perspective"
- "isometric view"
- "forced perspective"
### Text in Images
**Limitations**:
- Maximum 25 characters total for optimal results
- Up to 3 distinct text phrases
- For 4K text rendering, use `gemini-3-pro-image-preview`
**Text prompt template**:
```
Image with text "[EXACT TEXT]" in [font style].
Font: [style description].
Color: [hex code like #FF5733].
Position: [top/center/bottom].
Background: [description].
Context: [poster/sign/label].
```
**Example**:
```python
response = client.models.generate_content(
model='gemini-3-pro-image-preview', # Use Pro for better text
contents='''
Create a vintage travel poster with text "EXPLORE TOKYO" at the top.
Font: Bold retro sans-serif, slightly condensed.
Color: #F5E6D3 (cream white).
Position: Top third of image.
Background: Stylized Tokyo skyline with Mt. Fuji, sunset colors.
Style: 1950s travel poster aesthetic, muted warm colors.
'''
)
```
**Font keywords**:
- "bold sans-serif", "handwritten script", "vintage letterpress"
- "modern minimalist", "art deco", "neon sign"
### Nano Banana Prompt Techniques
| Technique | Example | Purpose |
|-----------|---------|---------|
| ALL CAPS emphasis | `The logo MUST be centered` | Force attention to critical requirements |
| Hex colors | `#9F2B68` instead of "dark magenta" | Exact color control |
| Negative constraints | `NEVER include text/watermarks. DO NOT add labels.` | Explicit exclusions |
| Realism trigger | `Natural lighting, DOF. Captured with Canon EOS 90D DSLR.` | Photography authenticity |
| Structured edits | `Make ALL edits: - [1] - [2] - [3]` | Multi-step changes |
| Complex logic | `Kittens MUST have heterochromatic eyes matching fur colors` | Precise conditions |
**Prompt Templates**:
**Photorealistic**:
```
A [subject] in [location], [lens] lens. [Lighting] creates [mood]. [Details].
[Camera angle]. Professional photography, natural lighting.
```
**Illustration**:
```
[Art style] illustration of [subject]. [Color palette]. [Line style].
[Background]. [Mood].
```
**Product**:
```
[Product] on [surface]. Materials: [finish]. Lighting: [setup].
Camera: [angle]. Background: [type]. Style: [commercial/lifestyle].
```
## Advanced Techniques
### Iterative Refinement
```python
# Initial generation
response1 = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A futuristic city skyline'
)
# Save first version
with open('v1.png', 'wb') as f:
f.write(response1.candidates[0].content.parts[0].inline_data.data)
# Refine
img = PIL.Image.open('v1.png')
response2 = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add flying vehicles and neon signs',
img
]
)
```
### Negative Prompts (Indirect)
```python
# Instead of "no blur", be specific about what you want
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
)
```
### Consistent Style Across Images
```python
base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"
prompts = [
f"{base_prompt}, a warrior character",
f"{base_prompt}, a mage character",
f"{base_prompt}, a rogue character"
]
for i, prompt in enumerate(prompts):
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=prompt
)
# Save each character
```
## Safety Settings
### Configure Safety Filters
```python
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
),
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)
```
### Available Categories
- `HARM_CATEGORY_HATE_SPEECH`
- `HARM_CATEGORY_DANGEROUS_CONTENT`
- `HARM_CATEGORY_HARASSMENT`
- `HARM_CATEGORY_SEXUALLY_EXPLICIT`
### Thresholds
- `BLOCK_NONE`: No blocking
- `BLOCK_LOW_AND_ABOVE`: Block low probability and above
- `BLOCK_MEDIUM_AND_ABOVE`: Block medium and above (default)
- `BLOCK_ONLY_HIGH`: Block only high probability
## Common Use Cases
### 1. Marketing Assets
```python
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Professional product photography:
- Sleek smartphone on minimalist white surface
- Dramatic side lighting creating subtle shadows
- Shallow depth of field, crisp focus
- Clean, modern aesthetic
- 4K quality
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='4:3'
)
)
```
### 2. Concept Art
```python
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Fantasy concept art:
- Ancient floating islands connected by chains
- Waterfalls cascading into clouds below
- Magical crystals glowing on the islands
- Epic scale, dramatic lighting
- Detailed digital painting style
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
```
### 3. Social Media Graphics
```python
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Instagram post design:
- Pastel gradient background (pink to blue)
- Motivational quote layout
- Modern minimalist style
- Clean typography
- Mobile-friendly composition
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='1:1'
)
)
```
### 4. Illustration
```python
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Children's book illustration:
- Friendly cartoon dragon reading a book
- Bright, cheerful colors
- Soft, rounded shapes
- Whimsical forest background
- Warm, inviting atmosphere
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='4:3'
)
)
```
### 5. UI/UX Mockups
```python
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Modern mobile app interface:
- Clean dashboard design
- Card-based layout
- Soft shadows and gradients
- Contemporary color scheme (blue and white)
- Professional fintech aesthetic
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='9:16'
)
)
```
## Best Practices
### Prompt Quality
1. **Be specific**: More detail = better results
2. **Order matters**: Most important elements first
3. **Use examples**: Reference known styles or artists
4. **Avoid contradictions**: Don't ask for opposing styles
5. **Test and iterate**: Refine prompts based on results
### File Management
```python
# Save with descriptive names
timestamp = int(time.time())
filename = f'generated_{timestamp}_{aspect_ratio}.png'
with open(filename, 'wb') as f:
f.write(image_data)
```
### Cost Optimization
**Nano Banana 2 pricing (per image)**:
| Resolution | Cost/Image | Batch (50% off) |
|-----------|-----------|-----------------|
| 512px | $0.045 | $0.023 |
| 1K | $0.067 | $0.034 |
| 2K | $0.101 | $0.051 |
| 4K | $0.151 | $0.076 |
**Flash Image token costs**:
- 1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
- 10 images: 12,900 tokens = $0.0129
- 100 images: 129,000 tokens = $0.129
**Strategies**:
- Generate fewer iterations
- Use text modality first to validate concept
- Batch similar requests
- Cache prompts for consistent style
## Error Handling
### Safety Filter Blocking
```python
try:
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=prompt
)
except Exception as e:
# Check block reason
if hasattr(e, 'prompt_feedback'):
print(f"Blocked: {e.prompt_feedback.block_reason}")
# Modify prompt and retry
```
### Token Limit Exceeded
```python
# Keep prompts concise
if len(prompt) > 1000:
# Truncate or simplify
prompt = prompt[:1000]
```
## Limitations
### Imagen 4 Constraints
- **Language**: English prompts only
- **Prompt length**: Maximum 480 tokens
- **Output**: 1-4 images per request
- **Watermark**: All images include SynthID watermark
- **Fast model**: No `imageSize` parameter support (fixed resolution)
- **Text rendering**: Limited to ~25 characters for optimal results
- **Regional restrictions**: Child images restricted in EEA, CH, UK
- **Cannot replicate**: Specific people or copyrighted characters
### Nano Banana (Gemini) Constraints
- **Language**: English prompts primary support
- **Context**: 32K token window
- **Multi-image**: Standard models ~3-5 refs; Pro up to 14 refs
- **Text rendering**: Standard limited; Pro supports 4K text
- **Watermark**: All images include SynthID watermark
- **Case sensitivity**: `response_modalities` must be uppercase (`'IMAGE'`, `'TEXT'`)
- **Size format**: `image_size` must have uppercase K (`'1K'`, `'2K'`, `'4K'`)
### General Limitations
- Maximum 14 input images for composition (Pro only)
- No video or animation generation (use Veo for video)
- No real-time generation
## Troubleshooting
### aspect_ratio Parameter Error
**Error**: `Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]`
**Cause**: The `aspect_ratio` parameter must be nested inside an `image_config` object, not passed directly to `GenerateContentConfig`.
**Incorrect Usage**:
```python
# ❌ This will fail
config = types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9' # Wrong - not a direct parameter
)
```
**Correct Usage**:
```python
# ✅ Correct implementation
config = types.GenerateContentConfig(
response_modalities=['Image'], # Note: Capital 'I'
image_config=types.ImageConfig(
aspect_ratio='16:9'
)
)
```
### Response Modality Case Sensitivity
The `response_modalities` parameter expects uppercase values:
- ✅ Correct: `['IMAGE']`, `['TEXT']`, `['IMAGE', 'TEXT']`
- ❌ Wrong: `['image']`, `['text']`, `['Image']`
### Image Size Parameter Not Supported
**Error**: `400 INVALID_ARGUMENT`
**Cause**: The `image_size` parameter in `ImageConfig` is not supported by all Nano Banana models.
**Solution**: Don't pass `image_size` unless explicitly needed. The API uses sensible defaults.
```python
# ✅ Works - no image_size
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9' # Only aspect_ratio
)
)
# ⚠️ May fail - with image_size (model-dependent)
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='2K' # Not supported by all models
)
)
```
### Multi-Image Reference Issues
**Problem**: Poor composition with multiple reference images
**Solutions**:
1. Limit to 3-5 reference images for standard models
2. Use Pro model for up to 14 references
3. Collage multiple style refs into single image
4. Provide clear textual descriptions of how to blend styles
---
## Related References
**Current**: Image Generation
**Related Capabilities**:
- [Image Understanding](./vision-understanding.md) - Analyzing and editing reference images
- [Video Generation](./video-generation.md) - Creating animated video content
- [Audio Processing](./audio-processing.md) - Text-to-speech for multimedia
**Back to**: [AI Multimodal Skill](../SKILL.md)