Files
english/.opencode/skills/ai-multimodal/references/image-generation.md
2026-04-12 01:06:31 +07:00

29 KiB
Raw Blame History

Image Generation Reference

Comprehensive guide for image creation, editing, and composition using Imagen 4 and Gemini models ("Nano Banana").

Nano Banana = Google's internal name for native image generation in Gemini API. Three variants:

  • Nano Banana 2 (gemini-3.1-flash-image-preview) - NEW DEFAULT. 3-5x faster, 95% Pro quality, web grounding, 100+ language text rendering, character consistency (5 chars/14 objects). Released Feb 2026.
  • Nano Banana Flash (gemini-2.5-flash-image) - Previous default, still stable.
  • Nano Banana Pro (gemini-3-pro-image-preview) - Quality with reasoning, 4K text.

Core Capabilities

  • Text-to-Image: Generate images from text prompts
  • Image Editing: Modify existing images with text instructions
  • Multi-Image Composition: Combine up to 14 reference images (Pro model)
  • Iterative Refinement: Multi-turn conversational refinement
  • Aspect Ratios: 10 formats (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
  • Image Sizes: 1K, 2K, 4K (uppercase K required)
  • Quality Variants: Standard/Ultra/Fast for different needs
  • Text in Images: Up to 25 chars optimal (4K text in Pro)
  • Search Grounding: Real-time data integration (Pro only)
  • Thinking Mode: Advanced reasoning for complex prompts (Pro only)

Models

gemini-3.1-flash-image-preview - Nano Banana 2 NEW DEFAULT

  • Best for: General use, fast generation with near-Pro quality
  • Quality: High (95% parity with Pro)
  • Speed: 3-5x faster than previous Flash
  • Cost: ~$0.045/image (512px) to ~$0.151/image (4K); ~25-30% cheaper than Pro
  • Resolution: 512px to 4K with expanded aspect ratios
  • Text rendering: 100+ languages with proper formatting
  • Character consistency: Up to 5 characters and 14 objects
  • Reasoning levels: Minimal/High/Dynamic for complex prompts
  • Web grounding: Real-time data integration for brands, landmarks, recent events
  • Status: Preview (Feb 2026)

Nano Banana Flash (Previous Default)

gemini-2.5-flash-image - Nano Banana Flash

  • Best for: Speed, high-volume generation, rapid prototyping
  • Quality: High
  • Context: 65,536 input / 32,768 output tokens
  • Speed: Fast (~5-10s per image)
  • Cost: ~$1/1M input tokens
  • Aspect Ratios: All 10 supported
  • Image Sizes: 1K, 2K, 4K
  • Status: Stable (Oct 2025)

gemini-3-pro-image-preview - Nano Banana Pro

  • Best for: Professional assets, 4K text rendering, complex prompts
  • Quality: Ultra (with advanced reasoning)
  • Context: 65,536 input / 32,768 output tokens
  • Speed: Medium
  • Cost: ~$2/1M text input, $0.134/image (resolution-dependent)
  • Multi-Image: Up to 14 reference images (6 objects + 5 humans)
  • Features: Thinking mode, Google Search grounding
  • Status: Preview (Nov 2025)

Imagen 4 (Alternative - Production)

imagen-4.0-generate-001 - Standard quality, balanced performance

  • Best for: Production workflows, marketing assets
  • Quality: High
  • Speed: Medium (~5-10s per image)
  • Cost: ~$0.02/image (estimated)
  • Output: 1-4 images per request
  • Resolution: 1K or 2K
  • Updated: June 2025

imagen-4.0-ultra-generate-001 - Maximum quality

  • Best for: Final production, marketing assets, detailed artwork
  • Quality: Ultra (highest available)
  • Speed: Slow (~15-25s per image)
  • Cost: ~$0.04/image (estimated)
  • Output: 1-4 images per request
  • Resolution: 2K preferred
  • Updated: June 2025

imagen-4.0-fast-generate-001 - Fastest generation

  • Best for: Rapid iteration, bulk generation, real-time use
  • Quality: Good
  • Speed: Fast (~2-5s per image)
  • Cost: ~$0.01/image (estimated)
  • Output: 1-4 images per request
  • Resolution: 1K
  • Updated: June 2025

Legacy Models

gemini-2.0-flash-preview-image-generation - Legacy

  • Status: Deprecated (use Nano Banana or Imagen 4 instead)
  • Context: 32,768 input / 8,192 output tokens

Model Comparison

Model Quality Speed Cost Best For
gemini-3.1-flash-image-preview ½ 🚀🚀 Fastest 💵 Low NEW DEFAULT - General use
gemini-2.5-flash-image 🚀 Fast 💵 Low Previous default, stable
gemini-3-pro-image 💡 Medium 💰 Medium Text/reasoning
imagen-4.0-generate 💡 Medium 💰 Medium Production (alternative)
imagen-4.0-ultra 🐢 Slow 💰💰 High Marketing assets
imagen-4.0-fast 🚀 Fast 💵 Low Bulk generation

Selection Guide:

  • Default/General: Use gemini-3.1-flash-image-preview (fastest, near-Pro quality, web grounding)
  • Stable Alternative: Use gemini-2.5-flash-image (previous default, fully stable)
  • Production Quality: Use imagen-4.0-generate-001 (alternative for final assets)
  • Marketing/Ultra Quality: Use imagen-4.0-ultra for maximum quality
  • Text-Heavy Images: Use gemini-3-pro-image-preview for 4K text rendering
  • Complex Prompts with Reasoning: Use gemini-3-pro-image-preview with Thinking mode
  • Real-time Data Integration: Use gemini-3.1-flash-image-preview or gemini-3-pro-image-preview with Search grounding

Quick Start

Basic Generation (Default - Nano Banana 2)

from google import genai
from google.genai import types
import os

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

# Nano Banana 2 - NEW DEFAULT (fastest, near-Pro quality, web grounding)
response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],  # Uppercase required
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='2K'  # 512px, 1K, 2K, 4K - uppercase K required
        )
    )
)

# Save images
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'output-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Alternative - Imagen 4 (Production Quality)

# Imagen 4 Standard - alternative for production workflows
response = client.models.generate_images(
    model='imagen-4.0-generate-001',
    prompt='Professional product photography of smartphone',
    config=types.GenerateImagesConfig(
        numberOfImages=1,
        aspectRatio='16:9',
        imageSize='1K'
    )
)

# Save Imagen 4 output
for i, generated_image in enumerate(response.generated_images):
    with open(f'output-{i}.png', 'wb') as f:
        f.write(generated_image.image.image_bytes)

Imagen 4 Quality Variants

# Ultra quality (marketing assets)
response = client.models.generate_images(
    model='imagen-4.0-ultra-generate-001',
    prompt='Professional product photography of smartphone',
    config=types.GenerateImagesConfig(
        numberOfImages=1,
        imageSize='2K'  # Use 2K for ultra (Standard/Ultra only)
    )
)

# Fast generation (bulk)
# Note: Fast model doesn't support imageSize parameter
response = client.models.generate_images(
    model='imagen-4.0-fast-generate-001',
    prompt='Quick concept sketch of robot character',
    config=types.GenerateImagesConfig(
        numberOfImages=4,  # Generate multiple variants (default: 4)
        aspectRatio='1:1'
    )
)

Nano Banana Pro (4K Text, Reasoning)

# Nano Banana Pro - for text rendering and complex prompts
response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents='A futuristic cityscape with neon lights',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],  # Uppercase required
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='4K'  # 4K text rendering
        )
    )
)

# Nano Banana Pro - with Thinking mode and Search grounding
response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents='Current weather in Tokyo visualized as artistic infographic',
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],  # Both text and image
        image_config=types.ImageConfig(
            aspect_ratio='1:1',
            image_size='4K'
        )
    ),
    tools=[{'google_search': {}}]  # Enable search grounding
)

# Save from content parts
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'output-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Multi-Image Reference (Nano Banana Pro)

from PIL import Image

# Up to 14 reference images (6 objects + 5 humans recommended)
img1 = Image.open('style_ref.png')
img2 = Image.open('color_ref.png')
img3 = Image.open('composition_ref.png')

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents=[
        'Blend these reference styles into a cohesive hero image for a tech product',
        img1, img2, img3
    ],
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='4K'
        )
    )
)

Nano Banana 2 with Web Grounding

# Nano Banana 2 - real-time web integration for brands, landmarks, events
response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',
    contents='Current Apple Vision Pro product shot with accurate branding',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='2K'
        )
    )
)

Nano Banana 2 with Reasoning Levels

# Use reasoning levels for complex prompts
response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',
    contents='A photorealistic scene of 5 diverse characters sitting around a campfire, each with distinct clothing and accessories, consistent lighting from the fire',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='4K'
        )
    )
)
# Nano Banana 2 auto-selects reasoning level (Minimal/High/Dynamic)
# For explicit control, check API docs for reasoning_level parameter

Multi-Turn Refinement Chat

# Conversational image refinement (works with any Nano Banana model)
chat = client.chats.create(
    model='gemini-3.1-flash-image-preview',  # or gemini-2.5-flash-image
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE']
    )
)

# Initial generation
response1 = chat.send_message('Create a minimalist logo for a coffee brand called "Brew"')

# Iterative refinement
response2 = chat.send_message('Make the text bolder and add steam rising from the cup')
response3 = chat.send_message('Change the color palette to warm earth tones')

API Differences

Imagen 4 vs Nano Banana (Gemini Native)

Feature Imagen 4 Nano Banana (Gemini)
Method generate_images() generate_content()
Config GenerateImagesConfig GenerateContentConfig
Prompt param prompt (string) contents (string/list)
Image count numberOfImages (camelCase) N/A (single per request)
Aspect ratio aspectRatio (camelCase) aspect_ratio (snake_case)
Size imageSize image_size
Response generated_images[i].image.image_bytes candidates[0].content.parts[i].inline_data.data
Multi-image input Up to 14 references
Multi-turn chat Conversational
Search grounding (Pro only)
Thinking mode (Pro only)
Text rendering Limited 4K (Pro)

Imagen 4 uses generate_images():

response = client.models.generate_images(
    model='imagen-4.0-generate-001',
    prompt='...',
    config=types.GenerateImagesConfig(
        numberOfImages=1,      # camelCase
        aspectRatio='16:9',    # camelCase
        imageSize='1K'         # Standard/Ultra only
    )
)
# Access: response.generated_images[0].image.image_bytes

Nano Banana uses generate_content():

response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',  # or gemini-2.5-flash-image, gemini-3-pro-image-preview
    contents='...',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],  # Uppercase required
        image_config=types.ImageConfig(
            aspect_ratio='16:9',        # snake_case
            image_size='2K'             # 1K, 2K, 4K - uppercase K
        )
    )
)
# Access: response.candidates[0].content.parts[0].inline_data.data

Critical Notes:

  1. response_modalities values MUST be uppercase: 'IMAGE', 'TEXT'
  2. image_size value MUST have uppercase K: '1K', '2K', '4K'
  3. Imagen 4 Fast model doesn't support imageSize parameter

Aspect Ratios

Ratio Resolution (1K) Use Case Token Cost
1:1 1024×1024 Social media, avatars, icons 1290
2:3 682×1024 Vertical portraits 1290
3:2 1024×682 Horizontal portraits 1290
3:4 768×1024 Vertical posters 1290
4:3 1024×768 Traditional media 1290
4:5 819×1024 Instagram portrait 1290
5:4 1024×819 Horizontal photos 1290
9:16 576×1024 Mobile/stories/reels 1290
16:9 1024×576 Landscapes, banners, YouTube 1290
21:9 1024×438 Ultrawide/cinematic 1290

All ratios cost the same: 1,290 tokens per image (Gemini models).

Response Modalities

Image Only

config = types.GenerateContentConfig(
    response_modalities=['image'],
    aspect_ratio='1:1'
)

Text Only (No Image)

config = types.GenerateContentConfig(
    response_modalities=['text']
)
# Returns text description instead of generating image

Both Image and Text

config = types.GenerateContentConfig(
    response_modalities=['image', 'text'],
    aspect_ratio='16:9'
)
# Returns both generated image and description

Image Editing

Modify Existing Image

import PIL.Image

# Load original
img = PIL.Image.open('original.png')

# Edit with instructions
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ],
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

Style Transfer

img = PIL.Image.open('photo.jpg')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Transform this into an oil painting style',
        img
    ]
)

Object Addition/Removal

# Add object
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a vintage car parked on the street',
        img
    ]
)

# Remove object
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Remove the person on the left side',
        img
    ]
)

Multi-Image Composition

Combine Multiple Images

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
img3 = PIL.Image.open('overlay.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2,
        img3
    ],
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

Note: Recommended maximum 3 input images for best results.

Prompt Engineering

Core Principle: Narrative > Keywords

Nano Banana prompting: Write like you're briefing a photographer, not providing SEO keywords. Narrative paragraphs outperform keyword lists.

Bad: "cat, 4k, masterpiece, trending, professional, ultra detailed, cinematic" Good: "A fluffy orange tabby cat with green eyes lounging on a sun-drenched windowsill. Soft morning light creates a warm glow. Shot with a 50mm lens at f/1.8 for shallow depth of field. Natural lighting, documentary photography style."

Effective Prompt Structure

Three key elements:

  1. Subject: What to generate (be specific)
  2. Context: Environmental setting (lighting, location, time)
  3. Style: Artistic treatment (photography, illustration, etc.)

Quality Modifiers

Technical terms:

  • "4K", "8K", "high resolution"
  • "HDR", "high dynamic range"
  • "professional photography"
  • "studio lighting"
  • "ultra detailed"

Camera settings:

  • "35mm lens", "50mm lens"
  • "shallow depth of field"
  • "wide angle shot"
  • "macro photography"
  • "golden hour lighting"

Style Keywords

Art styles:

  • "oil painting", "watercolor", "sketch"
  • "digital art", "concept art"
  • "photorealistic", "hyperrealistic"
  • "minimalist", "abstract"
  • "cyberpunk", "steampunk", "fantasy"

Mood and atmosphere:

  • "dramatic lighting", "soft lighting"
  • "moody", "bright and cheerful"
  • "mysterious", "whimsical"
  • "dark and gritty", "pastel colors"

Subject Description

Be specific:

  • "A cat"
  • "A fluffy orange tabby cat with green eyes"

Add context:

  • "A building"
  • "A modern glass skyscraper reflecting sunset clouds"

Include details:

  • "A person"
  • "A young woman in a red dress holding an umbrella"

Composition and Framing

Camera angles:

  • "bird's eye view", "aerial shot"
  • "low angle", "high angle"
  • "close-up", "wide shot"
  • "centered composition"
  • "rule of thirds"

Perspective:

  • "first person view"
  • "third person perspective"
  • "isometric view"
  • "forced perspective"

Text in Images

Limitations:

  • Maximum 25 characters total for optimal results
  • Up to 3 distinct text phrases
  • For 4K text rendering, use gemini-3-pro-image-preview

Text prompt template:

Image with text "[EXACT TEXT]" in [font style].
Font: [style description].
Color: [hex code like #FF5733].
Position: [top/center/bottom].
Background: [description].
Context: [poster/sign/label].

Example:

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',  # Use Pro for better text
    contents='''
    Create a vintage travel poster with text "EXPLORE TOKYO" at the top.
    Font: Bold retro sans-serif, slightly condensed.
    Color: #F5E6D3 (cream white).
    Position: Top third of image.
    Background: Stylized Tokyo skyline with Mt. Fuji, sunset colors.
    Style: 1950s travel poster aesthetic, muted warm colors.
    '''
)

Font keywords:

  • "bold sans-serif", "handwritten script", "vintage letterpress"
  • "modern minimalist", "art deco", "neon sign"

Nano Banana Prompt Techniques

Technique Example Purpose
ALL CAPS emphasis The logo MUST be centered Force attention to critical requirements
Hex colors #9F2B68 instead of "dark magenta" Exact color control
Negative constraints NEVER include text/watermarks. DO NOT add labels. Explicit exclusions
Realism trigger Natural lighting, DOF. Captured with Canon EOS 90D DSLR. Photography authenticity
Structured edits Make ALL edits: - [1] - [2] - [3] Multi-step changes
Complex logic Kittens MUST have heterochromatic eyes matching fur colors Precise conditions

Prompt Templates:

Photorealistic:

A [subject] in [location], [lens] lens. [Lighting] creates [mood]. [Details].
[Camera angle]. Professional photography, natural lighting.

Illustration:

[Art style] illustration of [subject]. [Color palette]. [Line style].
[Background]. [Mood].

Product:

[Product] on [surface]. Materials: [finish]. Lighting: [setup].
Camera: [angle]. Background: [type]. Style: [commercial/lifestyle].

Advanced Techniques

Iterative Refinement

# Initial generation
response1 = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A futuristic city skyline'
)

# Save first version
with open('v1.png', 'wb') as f:
    f.write(response1.candidates[0].content.parts[0].inline_data.data)

# Refine
img = PIL.Image.open('v1.png')
response2 = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add flying vehicles and neon signs',
        img
    ]
)

Negative Prompts (Indirect)

# Instead of "no blur", be specific about what you want
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
)

Consistent Style Across Images

base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"

prompts = [
    f"{base_prompt}, a warrior character",
    f"{base_prompt}, a mage character",
    f"{base_prompt}, a rogue character"
]

for i, prompt in enumerate(prompts):
    response = client.models.generate_content(
        model='gemini-2.5-flash-image',
        contents=prompt
    )
    # Save each character

Safety Settings

Configure Safety Filters

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

Available Categories

  • HARM_CATEGORY_HATE_SPEECH
  • HARM_CATEGORY_DANGEROUS_CONTENT
  • HARM_CATEGORY_HARASSMENT
  • HARM_CATEGORY_SEXUALLY_EXPLICIT

Thresholds

  • BLOCK_NONE: No blocking
  • BLOCK_LOW_AND_ABOVE: Block low probability and above
  • BLOCK_MEDIUM_AND_ABOVE: Block medium and above (default)
  • BLOCK_ONLY_HIGH: Block only high probability

Common Use Cases

1. Marketing Assets

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Professional product photography:
    - Sleek smartphone on minimalist white surface
    - Dramatic side lighting creating subtle shadows
    - Shallow depth of field, crisp focus
    - Clean, modern aesthetic
    - 4K quality
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='4:3'
    )
)

2. Concept Art

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Fantasy concept art:
    - Ancient floating islands connected by chains
    - Waterfalls cascading into clouds below
    - Magical crystals glowing on the islands
    - Epic scale, dramatic lighting
    - Detailed digital painting style
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

3. Social Media Graphics

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Instagram post design:
    - Pastel gradient background (pink to blue)
    - Motivational quote layout
    - Modern minimalist style
    - Clean typography
    - Mobile-friendly composition
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='1:1'
    )
)

4. Illustration

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Children's book illustration:
    - Friendly cartoon dragon reading a book
    - Bright, cheerful colors
    - Soft, rounded shapes
    - Whimsical forest background
    - Warm, inviting atmosphere
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='4:3'
    )
)

5. UI/UX Mockups

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Modern mobile app interface:
    - Clean dashboard design
    - Card-based layout
    - Soft shadows and gradients
    - Contemporary color scheme (blue and white)
    - Professional fintech aesthetic
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='9:16'
    )
)

Best Practices

Prompt Quality

  1. Be specific: More detail = better results
  2. Order matters: Most important elements first
  3. Use examples: Reference known styles or artists
  4. Avoid contradictions: Don't ask for opposing styles
  5. Test and iterate: Refine prompts based on results

File Management

# Save with descriptive names
timestamp = int(time.time())
filename = f'generated_{timestamp}_{aspect_ratio}.png'

with open(filename, 'wb') as f:
    f.write(image_data)

Cost Optimization

Nano Banana 2 pricing (per image):

Resolution Cost/Image Batch (50% off)
512px $0.045 $0.023
1K $0.067 $0.034
2K $0.101 $0.051
4K $0.151 $0.076

Flash Image token costs:

  • 1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
  • 10 images: 12,900 tokens = $0.0129
  • 100 images: 129,000 tokens = $0.129

Strategies:

  • Generate fewer iterations
  • Use text modality first to validate concept
  • Batch similar requests
  • Cache prompts for consistent style

Error Handling

Safety Filter Blocking

try:
    response = client.models.generate_content(
        model='gemini-2.5-flash-image',
        contents=prompt
    )
except Exception as e:
    # Check block reason
    if hasattr(e, 'prompt_feedback'):
        print(f"Blocked: {e.prompt_feedback.block_reason}")
        # Modify prompt and retry

Token Limit Exceeded

# Keep prompts concise
if len(prompt) > 1000:
    # Truncate or simplify
    prompt = prompt[:1000]

Limitations

Imagen 4 Constraints

  • Language: English prompts only
  • Prompt length: Maximum 480 tokens
  • Output: 1-4 images per request
  • Watermark: All images include SynthID watermark
  • Fast model: No imageSize parameter support (fixed resolution)
  • Text rendering: Limited to ~25 characters for optimal results
  • Regional restrictions: Child images restricted in EEA, CH, UK
  • Cannot replicate: Specific people or copyrighted characters

Nano Banana (Gemini) Constraints

  • Language: English prompts primary support
  • Context: 32K token window
  • Multi-image: Standard models ~3-5 refs; Pro up to 14 refs
  • Text rendering: Standard limited; Pro supports 4K text
  • Watermark: All images include SynthID watermark
  • Case sensitivity: response_modalities must be uppercase ('IMAGE', 'TEXT')
  • Size format: image_size must have uppercase K ('1K', '2K', '4K')

General Limitations

  • Maximum 14 input images for composition (Pro only)
  • No video or animation generation (use Veo for video)
  • No real-time generation

Troubleshooting

aspect_ratio Parameter Error

Error: Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]

Cause: The aspect_ratio parameter must be nested inside an image_config object, not passed directly to GenerateContentConfig.

Incorrect Usage:

# ❌ This will fail
config = types.GenerateContentConfig(
    response_modalities=['image'],
    aspect_ratio='16:9'  # Wrong - not a direct parameter
)

Correct Usage:

# ✅ Correct implementation
config = types.GenerateContentConfig(
    response_modalities=['Image'],  # Note: Capital 'I'
    image_config=types.ImageConfig(
        aspect_ratio='16:9'
    )
)

Response Modality Case Sensitivity

The response_modalities parameter expects uppercase values:

  • Correct: ['IMAGE'], ['TEXT'], ['IMAGE', 'TEXT']
  • Wrong: ['image'], ['text'], ['Image']

Image Size Parameter Not Supported

Error: 400 INVALID_ARGUMENT

Cause: The image_size parameter in ImageConfig is not supported by all Nano Banana models.

Solution: Don't pass image_size unless explicitly needed. The API uses sensible defaults.

# ✅ Works - no image_size
config=types.GenerateContentConfig(
    response_modalities=['IMAGE'],
    image_config=types.ImageConfig(
        aspect_ratio='16:9'  # Only aspect_ratio
    )
)

# ⚠️ May fail - with image_size (model-dependent)
config=types.GenerateContentConfig(
    response_modalities=['IMAGE'],
    image_config=types.ImageConfig(
        aspect_ratio='16:9',
        image_size='2K'  # Not supported by all models
    )
)

Multi-Image Reference Issues

Problem: Poor composition with multiple reference images

Solutions:

  1. Limit to 3-5 reference images for standard models
  2. Use Pro model for up to 14 references
  3. Collage multiple style refs into single image
  4. Provide clear textual descriptions of how to blend styles

Current: Image Generation

Related Capabilities:

Back to: AI Multimodal Skill