29 KiB
Image Generation Reference
Comprehensive guide for image creation, editing, and composition using Imagen 4 and Gemini models ("Nano Banana").
Nano Banana = Google's internal name for native image generation in Gemini API. Three variants:
- Nano Banana 2 (
gemini-3.1-flash-image-preview) - NEW DEFAULT. 3-5x faster, 95% Pro quality, web grounding, 100+ language text rendering, character consistency (5 chars/14 objects). Released Feb 2026.- Nano Banana Flash (
gemini-2.5-flash-image) - Previous default, still stable.- Nano Banana Pro (
gemini-3-pro-image-preview) - Quality with reasoning, 4K text.
Core Capabilities
- Text-to-Image: Generate images from text prompts
- Image Editing: Modify existing images with text instructions
- Multi-Image Composition: Combine up to 14 reference images (Pro model)
- Iterative Refinement: Multi-turn conversational refinement
- Aspect Ratios: 10 formats (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
- Image Sizes: 1K, 2K, 4K (uppercase K required)
- Quality Variants: Standard/Ultra/Fast for different needs
- Text in Images: Up to 25 chars optimal (4K text in Pro)
- Search Grounding: Real-time data integration (Pro only)
- Thinking Mode: Advanced reasoning for complex prompts (Pro only)
Models
Nano Banana 2 (Default - Recommended)
gemini-3.1-flash-image-preview - Nano Banana 2 ⭐ NEW DEFAULT
- Best for: General use, fast generation with near-Pro quality
- Quality: High (95% parity with Pro)
- Speed: 3-5x faster than previous Flash
- Cost: ~$0.045/image (512px) to ~$0.151/image (4K); ~25-30% cheaper than Pro
- Resolution: 512px to 4K with expanded aspect ratios
- Text rendering: 100+ languages with proper formatting
- Character consistency: Up to 5 characters and 14 objects
- Reasoning levels: Minimal/High/Dynamic for complex prompts
- Web grounding: Real-time data integration for brands, landmarks, recent events
- Status: Preview (Feb 2026)
Nano Banana Flash (Previous Default)
gemini-2.5-flash-image - Nano Banana Flash
- Best for: Speed, high-volume generation, rapid prototyping
- Quality: High
- Context: 65,536 input / 32,768 output tokens
- Speed: Fast (~5-10s per image)
- Cost: ~$1/1M input tokens
- Aspect Ratios: All 10 supported
- Image Sizes: 1K, 2K, 4K
- Status: Stable (Oct 2025)
gemini-3-pro-image-preview - Nano Banana Pro
- Best for: Professional assets, 4K text rendering, complex prompts
- Quality: Ultra (with advanced reasoning)
- Context: 65,536 input / 32,768 output tokens
- Speed: Medium
- Cost: ~$2/1M text input, $0.134/image (resolution-dependent)
- Multi-Image: Up to 14 reference images (6 objects + 5 humans)
- Features: Thinking mode, Google Search grounding
- Status: Preview (Nov 2025)
Imagen 4 (Alternative - Production)
imagen-4.0-generate-001 - Standard quality, balanced performance
- Best for: Production workflows, marketing assets
- Quality: High
- Speed: Medium (~5-10s per image)
- Cost: ~$0.02/image (estimated)
- Output: 1-4 images per request
- Resolution: 1K or 2K
- Updated: June 2025
imagen-4.0-ultra-generate-001 - Maximum quality
- Best for: Final production, marketing assets, detailed artwork
- Quality: Ultra (highest available)
- Speed: Slow (~15-25s per image)
- Cost: ~$0.04/image (estimated)
- Output: 1-4 images per request
- Resolution: 2K preferred
- Updated: June 2025
imagen-4.0-fast-generate-001 - Fastest generation
- Best for: Rapid iteration, bulk generation, real-time use
- Quality: Good
- Speed: Fast (~2-5s per image)
- Cost: ~$0.01/image (estimated)
- Output: 1-4 images per request
- Resolution: 1K
- Updated: June 2025
Legacy Models
gemini-2.0-flash-preview-image-generation - Legacy
- Status: Deprecated (use Nano Banana or Imagen 4 instead)
- Context: 32,768 input / 8,192 output tokens
Model Comparison
| Model | Quality | Speed | Cost | Best For |
|---|---|---|---|---|
| gemini-3.1-flash-image-preview | ⭐⭐⭐⭐½ | 🚀🚀 Fastest | 💵 Low | NEW DEFAULT - General use |
| gemini-2.5-flash-image | ⭐⭐⭐⭐ | 🚀 Fast | 💵 Low | Previous default, stable |
| gemini-3-pro-image | ⭐⭐⭐⭐⭐ | 💡 Medium | 💰 Medium | Text/reasoning |
| imagen-4.0-generate | ⭐⭐⭐⭐ | 💡 Medium | 💰 Medium | Production (alternative) |
| imagen-4.0-ultra | ⭐⭐⭐⭐⭐ | 🐢 Slow | 💰💰 High | Marketing assets |
| imagen-4.0-fast | ⭐⭐⭐ | 🚀 Fast | 💵 Low | Bulk generation |
Selection Guide:
- Default/General: Use
gemini-3.1-flash-image-preview(fastest, near-Pro quality, web grounding) - Stable Alternative: Use
gemini-2.5-flash-image(previous default, fully stable) - Production Quality: Use
imagen-4.0-generate-001(alternative for final assets) - Marketing/Ultra Quality: Use
imagen-4.0-ultrafor maximum quality - Text-Heavy Images: Use
gemini-3-pro-image-previewfor 4K text rendering - Complex Prompts with Reasoning: Use
gemini-3-pro-image-previewwith Thinking mode - Real-time Data Integration: Use
gemini-3.1-flash-image-previeworgemini-3-pro-image-previewwith Search grounding
Quick Start
Basic Generation (Default - Nano Banana 2)
from google import genai
from google.genai import types
import os
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
# Nano Banana 2 - NEW DEFAULT (fastest, near-Pro quality, web grounding)
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview',
contents='A serene mountain landscape at sunset with snow-capped peaks',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'], # Uppercase required
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='2K' # 512px, 1K, 2K, 4K - uppercase K required
)
)
)
# Save images
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'output-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
Alternative - Imagen 4 (Production Quality)
# Imagen 4 Standard - alternative for production workflows
response = client.models.generate_images(
model='imagen-4.0-generate-001',
prompt='Professional product photography of smartphone',
config=types.GenerateImagesConfig(
numberOfImages=1,
aspectRatio='16:9',
imageSize='1K'
)
)
# Save Imagen 4 output
for i, generated_image in enumerate(response.generated_images):
with open(f'output-{i}.png', 'wb') as f:
f.write(generated_image.image.image_bytes)
Imagen 4 Quality Variants
# Ultra quality (marketing assets)
response = client.models.generate_images(
model='imagen-4.0-ultra-generate-001',
prompt='Professional product photography of smartphone',
config=types.GenerateImagesConfig(
numberOfImages=1,
imageSize='2K' # Use 2K for ultra (Standard/Ultra only)
)
)
# Fast generation (bulk)
# Note: Fast model doesn't support imageSize parameter
response = client.models.generate_images(
model='imagen-4.0-fast-generate-001',
prompt='Quick concept sketch of robot character',
config=types.GenerateImagesConfig(
numberOfImages=4, # Generate multiple variants (default: 4)
aspectRatio='1:1'
)
)
Nano Banana Pro (4K Text, Reasoning)
# Nano Banana Pro - for text rendering and complex prompts
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents='A futuristic cityscape with neon lights',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'], # Uppercase required
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='4K' # 4K text rendering
)
)
)
# Nano Banana Pro - with Thinking mode and Search grounding
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents='Current weather in Tokyo visualized as artistic infographic',
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'], # Both text and image
image_config=types.ImageConfig(
aspect_ratio='1:1',
image_size='4K'
)
),
tools=[{'google_search': {}}] # Enable search grounding
)
# Save from content parts
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'output-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
Multi-Image Reference (Nano Banana Pro)
from PIL import Image
# Up to 14 reference images (6 objects + 5 humans recommended)
img1 = Image.open('style_ref.png')
img2 = Image.open('color_ref.png')
img3 = Image.open('composition_ref.png')
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents=[
'Blend these reference styles into a cohesive hero image for a tech product',
img1, img2, img3
],
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='4K'
)
)
)
Nano Banana 2 with Web Grounding
# Nano Banana 2 - real-time web integration for brands, landmarks, events
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview',
contents='Current Apple Vision Pro product shot with accurate branding',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='2K'
)
)
)
Nano Banana 2 with Reasoning Levels
# Use reasoning levels for complex prompts
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview',
contents='A photorealistic scene of 5 diverse characters sitting around a campfire, each with distinct clothing and accessories, consistent lighting from the fire',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='4K'
)
)
)
# Nano Banana 2 auto-selects reasoning level (Minimal/High/Dynamic)
# For explicit control, check API docs for reasoning_level parameter
Multi-Turn Refinement Chat
# Conversational image refinement (works with any Nano Banana model)
chat = client.chats.create(
model='gemini-3.1-flash-image-preview', # or gemini-2.5-flash-image
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
# Initial generation
response1 = chat.send_message('Create a minimalist logo for a coffee brand called "Brew"')
# Iterative refinement
response2 = chat.send_message('Make the text bolder and add steam rising from the cup')
response3 = chat.send_message('Change the color palette to warm earth tones')
API Differences
Imagen 4 vs Nano Banana (Gemini Native)
| Feature | Imagen 4 | Nano Banana (Gemini) |
|---|---|---|
| Method | generate_images() |
generate_content() |
| Config | GenerateImagesConfig |
GenerateContentConfig |
| Prompt param | prompt (string) |
contents (string/list) |
| Image count | numberOfImages (camelCase) |
N/A (single per request) |
| Aspect ratio | aspectRatio (camelCase) |
aspect_ratio (snake_case) |
| Size | imageSize |
image_size |
| Response | generated_images[i].image.image_bytes |
candidates[0].content.parts[i].inline_data.data |
| Multi-image input | ❌ | ✅ Up to 14 references |
| Multi-turn chat | ❌ | ✅ Conversational |
| Search grounding | ❌ | ✅ (Pro only) |
| Thinking mode | ❌ | ✅ (Pro only) |
| Text rendering | Limited | 4K (Pro) |
Imagen 4 uses generate_images():
response = client.models.generate_images(
model='imagen-4.0-generate-001',
prompt='...',
config=types.GenerateImagesConfig(
numberOfImages=1, # camelCase
aspectRatio='16:9', # camelCase
imageSize='1K' # Standard/Ultra only
)
)
# Access: response.generated_images[0].image.image_bytes
Nano Banana uses generate_content():
response = client.models.generate_content(
model='gemini-3.1-flash-image-preview', # or gemini-2.5-flash-image, gemini-3-pro-image-preview
contents='...',
config=types.GenerateContentConfig(
response_modalities=['IMAGE'], # Uppercase required
image_config=types.ImageConfig(
aspect_ratio='16:9', # snake_case
image_size='2K' # 1K, 2K, 4K - uppercase K
)
)
)
# Access: response.candidates[0].content.parts[0].inline_data.data
Critical Notes:
response_modalitiesvalues MUST be uppercase:'IMAGE','TEXT'image_sizevalue MUST have uppercase K:'1K','2K','4K'- Imagen 4 Fast model doesn't support
imageSizeparameter
Aspect Ratios
| Ratio | Resolution (1K) | Use Case | Token Cost |
|---|---|---|---|
| 1:1 | 1024×1024 | Social media, avatars, icons | 1290 |
| 2:3 | 682×1024 | Vertical portraits | 1290 |
| 3:2 | 1024×682 | Horizontal portraits | 1290 |
| 3:4 | 768×1024 | Vertical posters | 1290 |
| 4:3 | 1024×768 | Traditional media | 1290 |
| 4:5 | 819×1024 | Instagram portrait | 1290 |
| 5:4 | 1024×819 | Horizontal photos | 1290 |
| 9:16 | 576×1024 | Mobile/stories/reels | 1290 |
| 16:9 | 1024×576 | Landscapes, banners, YouTube | 1290 |
| 21:9 | 1024×438 | Ultrawide/cinematic | 1290 |
All ratios cost the same: 1,290 tokens per image (Gemini models).
Response Modalities
Image Only
config = types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='1:1'
)
Text Only (No Image)
config = types.GenerateContentConfig(
response_modalities=['text']
)
# Returns text description instead of generating image
Both Image and Text
config = types.GenerateContentConfig(
response_modalities=['image', 'text'],
aspect_ratio='16:9'
)
# Returns both generated image and description
Image Editing
Modify Existing Image
import PIL.Image
# Load original
img = PIL.Image.open('original.png')
# Edit with instructions
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a red balloon floating in the sky',
img
],
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
Style Transfer
img = PIL.Image.open('photo.jpg')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Transform this into an oil painting style',
img
]
)
Object Addition/Removal
# Add object
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a vintage car parked on the street',
img
]
)
# Remove object
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Remove the person on the left side',
img
]
)
Multi-Image Composition
Combine Multiple Images
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
img3 = PIL.Image.open('overlay.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Combine these images into a cohesive scene',
img1,
img2,
img3
],
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
Note: Recommended maximum 3 input images for best results.
Prompt Engineering
Core Principle: Narrative > Keywords
Nano Banana prompting: Write like you're briefing a photographer, not providing SEO keywords. Narrative paragraphs outperform keyword lists.
❌ Bad: "cat, 4k, masterpiece, trending, professional, ultra detailed, cinematic" ✅ Good: "A fluffy orange tabby cat with green eyes lounging on a sun-drenched windowsill. Soft morning light creates a warm glow. Shot with a 50mm lens at f/1.8 for shallow depth of field. Natural lighting, documentary photography style."
Effective Prompt Structure
Three key elements:
- Subject: What to generate (be specific)
- Context: Environmental setting (lighting, location, time)
- Style: Artistic treatment (photography, illustration, etc.)
Quality Modifiers
Technical terms:
- "4K", "8K", "high resolution"
- "HDR", "high dynamic range"
- "professional photography"
- "studio lighting"
- "ultra detailed"
Camera settings:
- "35mm lens", "50mm lens"
- "shallow depth of field"
- "wide angle shot"
- "macro photography"
- "golden hour lighting"
Style Keywords
Art styles:
- "oil painting", "watercolor", "sketch"
- "digital art", "concept art"
- "photorealistic", "hyperrealistic"
- "minimalist", "abstract"
- "cyberpunk", "steampunk", "fantasy"
Mood and atmosphere:
- "dramatic lighting", "soft lighting"
- "moody", "bright and cheerful"
- "mysterious", "whimsical"
- "dark and gritty", "pastel colors"
Subject Description
Be specific:
- ❌ "A cat"
- ✅ "A fluffy orange tabby cat with green eyes"
Add context:
- ❌ "A building"
- ✅ "A modern glass skyscraper reflecting sunset clouds"
Include details:
- ❌ "A person"
- ✅ "A young woman in a red dress holding an umbrella"
Composition and Framing
Camera angles:
- "bird's eye view", "aerial shot"
- "low angle", "high angle"
- "close-up", "wide shot"
- "centered composition"
- "rule of thirds"
Perspective:
- "first person view"
- "third person perspective"
- "isometric view"
- "forced perspective"
Text in Images
Limitations:
- Maximum 25 characters total for optimal results
- Up to 3 distinct text phrases
- For 4K text rendering, use
gemini-3-pro-image-preview
Text prompt template:
Image with text "[EXACT TEXT]" in [font style].
Font: [style description].
Color: [hex code like #FF5733].
Position: [top/center/bottom].
Background: [description].
Context: [poster/sign/label].
Example:
response = client.models.generate_content(
model='gemini-3-pro-image-preview', # Use Pro for better text
contents='''
Create a vintage travel poster with text "EXPLORE TOKYO" at the top.
Font: Bold retro sans-serif, slightly condensed.
Color: #F5E6D3 (cream white).
Position: Top third of image.
Background: Stylized Tokyo skyline with Mt. Fuji, sunset colors.
Style: 1950s travel poster aesthetic, muted warm colors.
'''
)
Font keywords:
- "bold sans-serif", "handwritten script", "vintage letterpress"
- "modern minimalist", "art deco", "neon sign"
Nano Banana Prompt Techniques
| Technique | Example | Purpose |
|---|---|---|
| ALL CAPS emphasis | The logo MUST be centered |
Force attention to critical requirements |
| Hex colors | #9F2B68 instead of "dark magenta" |
Exact color control |
| Negative constraints | NEVER include text/watermarks. DO NOT add labels. |
Explicit exclusions |
| Realism trigger | Natural lighting, DOF. Captured with Canon EOS 90D DSLR. |
Photography authenticity |
| Structured edits | Make ALL edits: - [1] - [2] - [3] |
Multi-step changes |
| Complex logic | Kittens MUST have heterochromatic eyes matching fur colors |
Precise conditions |
Prompt Templates:
Photorealistic:
A [subject] in [location], [lens] lens. [Lighting] creates [mood]. [Details].
[Camera angle]. Professional photography, natural lighting.
Illustration:
[Art style] illustration of [subject]. [Color palette]. [Line style].
[Background]. [Mood].
Product:
[Product] on [surface]. Materials: [finish]. Lighting: [setup].
Camera: [angle]. Background: [type]. Style: [commercial/lifestyle].
Advanced Techniques
Iterative Refinement
# Initial generation
response1 = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A futuristic city skyline'
)
# Save first version
with open('v1.png', 'wb') as f:
f.write(response1.candidates[0].content.parts[0].inline_data.data)
# Refine
img = PIL.Image.open('v1.png')
response2 = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add flying vehicles and neon signs',
img
]
)
Negative Prompts (Indirect)
# Instead of "no blur", be specific about what you want
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
)
Consistent Style Across Images
base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"
prompts = [
f"{base_prompt}, a warrior character",
f"{base_prompt}, a mage character",
f"{base_prompt}, a rogue character"
]
for i, prompt in enumerate(prompts):
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=prompt
)
# Save each character
Safety Settings
Configure Safety Filters
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
),
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)
Available Categories
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Thresholds
BLOCK_NONE: No blockingBLOCK_LOW_AND_ABOVE: Block low probability and aboveBLOCK_MEDIUM_AND_ABOVE: Block medium and above (default)BLOCK_ONLY_HIGH: Block only high probability
Common Use Cases
1. Marketing Assets
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Professional product photography:
- Sleek smartphone on minimalist white surface
- Dramatic side lighting creating subtle shadows
- Shallow depth of field, crisp focus
- Clean, modern aesthetic
- 4K quality
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='4:3'
)
)
2. Concept Art
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Fantasy concept art:
- Ancient floating islands connected by chains
- Waterfalls cascading into clouds below
- Magical crystals glowing on the islands
- Epic scale, dramatic lighting
- Detailed digital painting style
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
3. Social Media Graphics
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Instagram post design:
- Pastel gradient background (pink to blue)
- Motivational quote layout
- Modern minimalist style
- Clean typography
- Mobile-friendly composition
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='1:1'
)
)
4. Illustration
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Children's book illustration:
- Friendly cartoon dragon reading a book
- Bright, cheerful colors
- Soft, rounded shapes
- Whimsical forest background
- Warm, inviting atmosphere
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='4:3'
)
)
5. UI/UX Mockups
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Modern mobile app interface:
- Clean dashboard design
- Card-based layout
- Soft shadows and gradients
- Contemporary color scheme (blue and white)
- Professional fintech aesthetic
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='9:16'
)
)
Best Practices
Prompt Quality
- Be specific: More detail = better results
- Order matters: Most important elements first
- Use examples: Reference known styles or artists
- Avoid contradictions: Don't ask for opposing styles
- Test and iterate: Refine prompts based on results
File Management
# Save with descriptive names
timestamp = int(time.time())
filename = f'generated_{timestamp}_{aspect_ratio}.png'
with open(filename, 'wb') as f:
f.write(image_data)
Cost Optimization
Nano Banana 2 pricing (per image):
| Resolution | Cost/Image | Batch (50% off) |
|---|---|---|
| 512px | $0.045 | $0.023 |
| 1K | $0.067 | $0.034 |
| 2K | $0.101 | $0.051 |
| 4K | $0.151 | $0.076 |
Flash Image token costs:
- 1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
- 10 images: 12,900 tokens = $0.0129
- 100 images: 129,000 tokens = $0.129
Strategies:
- Generate fewer iterations
- Use text modality first to validate concept
- Batch similar requests
- Cache prompts for consistent style
Error Handling
Safety Filter Blocking
try:
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=prompt
)
except Exception as e:
# Check block reason
if hasattr(e, 'prompt_feedback'):
print(f"Blocked: {e.prompt_feedback.block_reason}")
# Modify prompt and retry
Token Limit Exceeded
# Keep prompts concise
if len(prompt) > 1000:
# Truncate or simplify
prompt = prompt[:1000]
Limitations
Imagen 4 Constraints
- Language: English prompts only
- Prompt length: Maximum 480 tokens
- Output: 1-4 images per request
- Watermark: All images include SynthID watermark
- Fast model: No
imageSizeparameter support (fixed resolution) - Text rendering: Limited to ~25 characters for optimal results
- Regional restrictions: Child images restricted in EEA, CH, UK
- Cannot replicate: Specific people or copyrighted characters
Nano Banana (Gemini) Constraints
- Language: English prompts primary support
- Context: 32K token window
- Multi-image: Standard models ~3-5 refs; Pro up to 14 refs
- Text rendering: Standard limited; Pro supports 4K text
- Watermark: All images include SynthID watermark
- Case sensitivity:
response_modalitiesmust be uppercase ('IMAGE','TEXT') - Size format:
image_sizemust have uppercase K ('1K','2K','4K')
General Limitations
- Maximum 14 input images for composition (Pro only)
- No video or animation generation (use Veo for video)
- No real-time generation
Troubleshooting
aspect_ratio Parameter Error
Error: Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]
Cause: The aspect_ratio parameter must be nested inside an image_config object, not passed directly to GenerateContentConfig.
Incorrect Usage:
# ❌ This will fail
config = types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9' # Wrong - not a direct parameter
)
Correct Usage:
# ✅ Correct implementation
config = types.GenerateContentConfig(
response_modalities=['Image'], # Note: Capital 'I'
image_config=types.ImageConfig(
aspect_ratio='16:9'
)
)
Response Modality Case Sensitivity
The response_modalities parameter expects uppercase values:
- ✅ Correct:
['IMAGE'],['TEXT'],['IMAGE', 'TEXT'] - ❌ Wrong:
['image'],['text'],['Image']
Image Size Parameter Not Supported
Error: 400 INVALID_ARGUMENT
Cause: The image_size parameter in ImageConfig is not supported by all Nano Banana models.
Solution: Don't pass image_size unless explicitly needed. The API uses sensible defaults.
# ✅ Works - no image_size
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9' # Only aspect_ratio
)
)
# ⚠️ May fail - with image_size (model-dependent)
config=types.GenerateContentConfig(
response_modalities=['IMAGE'],
image_config=types.ImageConfig(
aspect_ratio='16:9',
image_size='2K' # Not supported by all models
)
)
Multi-Image Reference Issues
Problem: Poor composition with multiple reference images
Solutions:
- Limit to 3-5 reference images for standard models
- Use Pro model for up to 14 references
- Collage multiple style refs into single image
- Provide clear textual descriptions of how to blend styles
Related References
Current: Image Generation
Related Capabilities:
- Image Understanding - Analyzing and editing reference images
- Video Generation - Creating animated video content
- Audio Processing - Text-to-speech for multimedia
Back to: AI Multimodal Skill