Files

2026-04-12 01:06:31 +07:00

29 KiB

Raw Blame History

Image Generation Reference

Comprehensive guide for image creation, editing, and composition using Imagen 4 and Gemini models ("Nano Banana").

Nano Banana = Google's internal name for native image generation in Gemini API. Three variants:

Nano Banana 2 (gemini-3.1-flash-image-preview) - NEW DEFAULT. 3-5x faster, 95% Pro quality, web grounding, 100+ language text rendering, character consistency (5 chars/14 objects). Released Feb 2026.

Nano Banana Flash (gemini-2.5-flash-image) - Previous default, still stable.

Nano Banana Pro (gemini-3-pro-image-preview) - Quality with reasoning, 4K text.

Core Capabilities

Text-to-Image: Generate images from text prompts
Image Editing: Modify existing images with text instructions
Multi-Image Composition: Combine up to 14 reference images (Pro model)
Iterative Refinement: Multi-turn conversational refinement
Aspect Ratios: 10 formats (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
Image Sizes: 1K, 2K, 4K (uppercase K required)
Quality Variants: Standard/Ultra/Fast for different needs
Text in Images: Up to 25 chars optimal (4K text in Pro)
Search Grounding: Real-time data integration (Pro only)
Thinking Mode: Advanced reasoning for complex prompts (Pro only)

Models

Nano Banana 2 (Default - Recommended)

gemini-3.1-flash-image-preview - Nano Banana 2 ⭐ NEW DEFAULT

Best for: General use, fast generation with near-Pro quality
Quality: High (95% parity with Pro)
Speed: 3-5x faster than previous Flash
Cost: ~$0.045/image (512px) to ~$0.151/image (4K); ~25-30% cheaper than Pro
Resolution: 512px to 4K with expanded aspect ratios
Text rendering: 100+ languages with proper formatting
Character consistency: Up to 5 characters and 14 objects
Reasoning levels: Minimal/High/Dynamic for complex prompts
Web grounding: Real-time data integration for brands, landmarks, recent events
Status: Preview (Feb 2026)

Nano Banana Flash (Previous Default)

gemini-2.5-flash-image - Nano Banana Flash

Best for: Speed, high-volume generation, rapid prototyping
Quality: High
Context: 65,536 input / 32,768 output tokens
Speed: Fast (~5-10s per image)
Cost: ~$1/1M input tokens
Aspect Ratios: All 10 supported
Image Sizes: 1K, 2K, 4K
Status: Stable (Oct 2025)

gemini-3-pro-image-preview - Nano Banana Pro

Best for: Professional assets, 4K text rendering, complex prompts
Quality: Ultra (with advanced reasoning)
Context: 65,536 input / 32,768 output tokens
Speed: Medium
Cost: ~$2/1M text input, $0.134/image (resolution-dependent)
Multi-Image: Up to 14 reference images (6 objects + 5 humans)
Features: Thinking mode, Google Search grounding
Status: Preview (Nov 2025)

Imagen 4 (Alternative - Production)

imagen-4.0-generate-001 - Standard quality, balanced performance

Best for: Production workflows, marketing assets
Quality: High
Speed: Medium (~5-10s per image)
Cost: ~$0.02/image (estimated)
Output: 1-4 images per request
Resolution: 1K or 2K
Updated: June 2025

imagen-4.0-ultra-generate-001 - Maximum quality

Best for: Final production, marketing assets, detailed artwork
Quality: Ultra (highest available)
Speed: Slow (~15-25s per image)
Cost: ~$0.04/image (estimated)
Output: 1-4 images per request
Resolution: 2K preferred
Updated: June 2025

imagen-4.0-fast-generate-001 - Fastest generation

Best for: Rapid iteration, bulk generation, real-time use
Quality: Good
Speed: Fast (~2-5s per image)
Cost: ~$0.01/image (estimated)
Output: 1-4 images per request
Resolution: 1K
Updated: June 2025

Legacy Models

gemini-2.0-flash-preview-image-generation - Legacy

Status: Deprecated (use Nano Banana or Imagen 4 instead)
Context: 32,768 input / 8,192 output tokens

Model Comparison

Model	Quality	Speed	Cost	Best For
gemini-3.1-flash-image-preview	⭐⭐⭐⭐½	🚀🚀 Fastest	💵 Low	NEW DEFAULT - General use
gemini-2.5-flash-image	⭐⭐⭐⭐	🚀 Fast	💵 Low	Previous default, stable
gemini-3-pro-image	⭐⭐⭐⭐⭐	💡 Medium	💰 Medium	Text/reasoning
imagen-4.0-generate	⭐⭐⭐⭐	💡 Medium	💰 Medium	Production (alternative)
imagen-4.0-ultra	⭐⭐⭐⭐⭐	🐢 Slow	💰💰 High	Marketing assets
imagen-4.0-fast	⭐⭐⭐	🚀 Fast	💵 Low	Bulk generation

Selection Guide:

Default/General: Use gemini-3.1-flash-image-preview (fastest, near-Pro quality, web grounding)
Stable Alternative: Use gemini-2.5-flash-image (previous default, fully stable)
Production Quality: Use imagen-4.0-generate-001 (alternative for final assets)
Marketing/Ultra Quality: Use imagen-4.0-ultra for maximum quality
Text-Heavy Images: Use gemini-3-pro-image-preview for 4K text rendering
Complex Prompts with Reasoning: Use gemini-3-pro-image-preview with Thinking mode
Real-time Data Integration: Use gemini-3.1-flash-image-preview or gemini-3-pro-image-preview with Search grounding

Quick Start

Basic Generation (Default - Nano Banana 2)

from google import genai
from google.genai import types
import os

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

# Nano Banana 2 - NEW DEFAULT (fastest, near-Pro quality, web grounding)
response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],  # Uppercase required
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='2K'  # 512px, 1K, 2K, 4K - uppercase K required
        )
    )
)

# Save images
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'output-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Alternative - Imagen 4 (Production Quality)

# Imagen 4 Standard - alternative for production workflows
response = client.models.generate_images(
    model='imagen-4.0-generate-001',
    prompt='Professional product photography of smartphone',
    config=types.GenerateImagesConfig(
        numberOfImages=1,
        aspectRatio='16:9',
        imageSize='1K'
    )
)

# Save Imagen 4 output
for i, generated_image in enumerate(response.generated_images):
    with open(f'output-{i}.png', 'wb') as f:
        f.write(generated_image.image.image_bytes)

Imagen 4 Quality Variants

# Ultra quality (marketing assets)
response = client.models.generate_images(
    model='imagen-4.0-ultra-generate-001',
    prompt='Professional product photography of smartphone',
    config=types.GenerateImagesConfig(
        numberOfImages=1,
        imageSize='2K'  # Use 2K for ultra (Standard/Ultra only)
    )
)

# Fast generation (bulk)
# Note: Fast model doesn't support imageSize parameter
response = client.models.generate_images(
    model='imagen-4.0-fast-generate-001',
    prompt='Quick concept sketch of robot character',
    config=types.GenerateImagesConfig(
        numberOfImages=4,  # Generate multiple variants (default: 4)
        aspectRatio='1:1'
    )
)

Nano Banana Pro (4K Text, Reasoning)

# Nano Banana Pro - for text rendering and complex prompts
response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents='A futuristic cityscape with neon lights',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],  # Uppercase required
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='4K'  # 4K text rendering
        )
    )
)

# Nano Banana Pro - with Thinking mode and Search grounding
response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents='Current weather in Tokyo visualized as artistic infographic',
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],  # Both text and image
        image_config=types.ImageConfig(
            aspect_ratio='1:1',
            image_size='4K'
        )
    ),
    tools=[{'google_search': {}}]  # Enable search grounding
)

# Save from content parts
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'output-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Multi-Image Reference (Nano Banana Pro)

from PIL import Image

# Up to 14 reference images (6 objects + 5 humans recommended)
img1 = Image.open('style_ref.png')
img2 = Image.open('color_ref.png')
img3 = Image.open('composition_ref.png')

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents=[
        'Blend these reference styles into a cohesive hero image for a tech product',
        img1, img2, img3
    ],
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='4K'
        )
    )
)

Nano Banana 2 with Web Grounding

# Nano Banana 2 - real-time web integration for brands, landmarks, events
response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',
    contents='Current Apple Vision Pro product shot with accurate branding',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='2K'
        )
    )
)

Nano Banana 2 with Reasoning Levels

# Use reasoning levels for complex prompts
response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',
    contents='A photorealistic scene of 5 diverse characters sitting around a campfire, each with distinct clothing and accessories, consistent lighting from the fire',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio='16:9',
            image_size='4K'
        )
    )
)
# Nano Banana 2 auto-selects reasoning level (Minimal/High/Dynamic)
# For explicit control, check API docs for reasoning_level parameter

Multi-Turn Refinement Chat

# Conversational image refinement (works with any Nano Banana model)
chat = client.chats.create(
    model='gemini-3.1-flash-image-preview',  # or gemini-2.5-flash-image
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE']
    )
)

# Initial generation
response1 = chat.send_message('Create a minimalist logo for a coffee brand called "Brew"')

# Iterative refinement
response2 = chat.send_message('Make the text bolder and add steam rising from the cup')
response3 = chat.send_message('Change the color palette to warm earth tones')

API Differences

Imagen 4 vs Nano Banana (Gemini Native)

Feature	Imagen 4	Nano Banana (Gemini)
Method	`generate_images()`	`generate_content()`
Config	`GenerateImagesConfig`	`GenerateContentConfig`
Prompt param	`prompt` (string)	`contents` (string/list)
Image count	`numberOfImages` (camelCase)	N/A (single per request)
Aspect ratio	`aspectRatio` (camelCase)	`aspect_ratio` (snake_case)
Size	`imageSize`	`image_size`
Response	`generated_images[i].image.image_bytes`	`candidates[0].content.parts[i].inline_data.data`
Multi-image input	❌	✅ Up to 14 references
Multi-turn chat	❌	✅ Conversational
Search grounding	❌	✅ (Pro only)
Thinking mode	❌	✅ (Pro only)
Text rendering	Limited	4K (Pro)

Imagen 4 uses generate_images():

response = client.models.generate_images(
    model='imagen-4.0-generate-001',
    prompt='...',
    config=types.GenerateImagesConfig(
        numberOfImages=1,      # camelCase
        aspectRatio='16:9',    # camelCase
        imageSize='1K'         # Standard/Ultra only
    )
)
# Access: response.generated_images[0].image.image_bytes

Nano Banana uses generate_content():

response = client.models.generate_content(
    model='gemini-3.1-flash-image-preview',  # or gemini-2.5-flash-image, gemini-3-pro-image-preview
    contents='...',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],  # Uppercase required
        image_config=types.ImageConfig(
            aspect_ratio='16:9',        # snake_case
            image_size='2K'             # 1K, 2K, 4K - uppercase K
        )
    )
)
# Access: response.candidates[0].content.parts[0].inline_data.data

Critical Notes:

response_modalities values MUST be uppercase: 'IMAGE', 'TEXT'
image_size value MUST have uppercase K: '1K', '2K', '4K'
Imagen 4 Fast model doesn't support imageSize parameter

Aspect Ratios

Ratio	Resolution (1K)	Use Case	Token Cost
1:1	1024×1024	Social media, avatars, icons	1290
2:3	682×1024	Vertical portraits	1290
3:2	1024×682	Horizontal portraits	1290
3:4	768×1024	Vertical posters	1290
4:3	1024×768	Traditional media	1290
4:5	819×1024	Instagram portrait	1290
5:4	1024×819	Horizontal photos	1290
9:16	576×1024	Mobile/stories/reels	1290
16:9	1024×576	Landscapes, banners, YouTube	1290
21:9	1024×438	Ultrawide/cinematic	1290

All ratios cost the same: 1,290 tokens per image (Gemini models).

Response Modalities

Image Only

config = types.GenerateContentConfig(
    response_modalities=['image'],
    aspect_ratio='1:1'
)

Text Only (No Image)

config = types.GenerateContentConfig(
    response_modalities=['text']
)
# Returns text description instead of generating image

Both Image and Text

config = types.GenerateContentConfig(
    response_modalities=['image', 'text'],
    aspect_ratio='16:9'
)
# Returns both generated image and description

Image Editing

Modify Existing Image

import PIL.Image

# Load original
img = PIL.Image.open('original.png')

# Edit with instructions
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ],
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

Style Transfer

img = PIL.Image.open('photo.jpg')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Transform this into an oil painting style',
        img
    ]
)

Object Addition/Removal

# Add object
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a vintage car parked on the street',
        img
    ]
)

# Remove object
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Remove the person on the left side',
        img
    ]
)

Multi-Image Composition

Combine Multiple Images

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
img3 = PIL.Image.open('overlay.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2,
        img3
    ],
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

Note: Recommended maximum 3 input images for best results.

Prompt Engineering

Core Principle: Narrative > Keywords

Nano Banana prompting: Write like you're briefing a photographer, not providing SEO keywords. Narrative paragraphs outperform keyword lists.

❌ Bad: "cat, 4k, masterpiece, trending, professional, ultra detailed, cinematic" ✅ Good: "A fluffy orange tabby cat with green eyes lounging on a sun-drenched windowsill. Soft morning light creates a warm glow. Shot with a 50mm lens at f/1.8 for shallow depth of field. Natural lighting, documentary photography style."

Effective Prompt Structure

Three key elements:

Subject: What to generate (be specific)
Context: Environmental setting (lighting, location, time)
Style: Artistic treatment (photography, illustration, etc.)

Quality Modifiers

Technical terms:

"4K", "8K", "high resolution"
"HDR", "high dynamic range"
"professional photography"
"studio lighting"
"ultra detailed"

Camera settings:

"35mm lens", "50mm lens"
"shallow depth of field"
"wide angle shot"
"macro photography"
"golden hour lighting"

Style Keywords

Art styles:

"oil painting", "watercolor", "sketch"
"digital art", "concept art"
"photorealistic", "hyperrealistic"
"minimalist", "abstract"
"cyberpunk", "steampunk", "fantasy"

Mood and atmosphere:

"dramatic lighting", "soft lighting"
"moody", "bright and cheerful"
"mysterious", "whimsical"
"dark and gritty", "pastel colors"

Subject Description

Be specific:

❌ "A cat"
✅ "A fluffy orange tabby cat with green eyes"

Add context:

❌ "A building"
✅ "A modern glass skyscraper reflecting sunset clouds"

Include details:

❌ "A person"
✅ "A young woman in a red dress holding an umbrella"

Composition and Framing

Camera angles:

"bird's eye view", "aerial shot"
"low angle", "high angle"
"close-up", "wide shot"
"centered composition"
"rule of thirds"

Perspective:

"first person view"
"third person perspective"
"isometric view"
"forced perspective"

Text in Images

Limitations:

Maximum 25 characters total for optimal results
Up to 3 distinct text phrases
For 4K text rendering, use gemini-3-pro-image-preview

Text prompt template:

Image with text "[EXACT TEXT]" in [font style].
Font: [style description].
Color: [hex code like #FF5733].
Position: [top/center/bottom].
Background: [description].
Context: [poster/sign/label].

Example:

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',  # Use Pro for better text
    contents='''
    Create a vintage travel poster with text "EXPLORE TOKYO" at the top.
    Font: Bold retro sans-serif, slightly condensed.
    Color: #F5E6D3 (cream white).
    Position: Top third of image.
    Background: Stylized Tokyo skyline with Mt. Fuji, sunset colors.
    Style: 1950s travel poster aesthetic, muted warm colors.
    '''
)

Font keywords:

"bold sans-serif", "handwritten script", "vintage letterpress"
"modern minimalist", "art deco", "neon sign"

Nano Banana Prompt Techniques

Technique	Example	Purpose
ALL CAPS emphasis	`The logo MUST be centered`	Force attention to critical requirements
Hex colors	`#9F2B68` instead of "dark magenta"	Exact color control
Negative constraints	`NEVER include text/watermarks. DO NOT add labels.`	Explicit exclusions
Realism trigger	`Natural lighting, DOF. Captured with Canon EOS 90D DSLR.`	Photography authenticity
Structured edits	`Make ALL edits: - [1] - [2] - [3]`	Multi-step changes
Complex logic	`Kittens MUST have heterochromatic eyes matching fur colors`	Precise conditions

Prompt Templates:

Photorealistic:

A [subject] in [location], [lens] lens. [Lighting] creates [mood]. [Details].
[Camera angle]. Professional photography, natural lighting.

Illustration:

[Art style] illustration of [subject]. [Color palette]. [Line style].
[Background]. [Mood].

Product:

[Product] on [surface]. Materials: [finish]. Lighting: [setup].
Camera: [angle]. Background: [type]. Style: [commercial/lifestyle].

Advanced Techniques

Iterative Refinement

# Initial generation
response1 = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A futuristic city skyline'
)

# Save first version
with open('v1.png', 'wb') as f:
    f.write(response1.candidates[0].content.parts[0].inline_data.data)

# Refine
img = PIL.Image.open('v1.png')
response2 = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add flying vehicles and neon signs',
        img
    ]
)

Negative Prompts (Indirect)

# Instead of "no blur", be specific about what you want
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
)

Consistent Style Across Images

base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"

prompts = [
    f"{base_prompt}, a warrior character",
    f"{base_prompt}, a mage character",
    f"{base_prompt}, a rogue character"
]

for i, prompt in enumerate(prompts):
    response = client.models.generate_content(
        model='gemini-2.5-flash-image',
        contents=prompt
    )
    # Save each character

Safety Settings

Configure Safety Filters

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

Available Categories

HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT

Thresholds

BLOCK_NONE: No blocking
BLOCK_LOW_AND_ABOVE: Block low probability and above
BLOCK_MEDIUM_AND_ABOVE: Block medium and above (default)
BLOCK_ONLY_HIGH: Block only high probability

Common Use Cases

1. Marketing Assets

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Professional product photography:
    - Sleek smartphone on minimalist white surface
    - Dramatic side lighting creating subtle shadows
    - Shallow depth of field, crisp focus
    - Clean, modern aesthetic
    - 4K quality
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='4:3'
    )
)

2. Concept Art

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Fantasy concept art:
    - Ancient floating islands connected by chains
    - Waterfalls cascading into clouds below
    - Magical crystals glowing on the islands
    - Epic scale, dramatic lighting
    - Detailed digital painting style
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Instagram post design:
    - Pastel gradient background (pink to blue)
    - Motivational quote layout
    - Modern minimalist style
    - Clean typography
    - Mobile-friendly composition
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='1:1'
    )
)

4. Illustration

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Children's book illustration:
    - Friendly cartoon dragon reading a book
    - Bright, cheerful colors
    - Soft, rounded shapes
    - Whimsical forest background
    - Warm, inviting atmosphere
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='4:3'
    )
)

5. UI/UX Mockups

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Modern mobile app interface:
    - Clean dashboard design
    - Card-based layout
    - Soft shadows and gradients
    - Contemporary color scheme (blue and white)
    - Professional fintech aesthetic
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='9:16'
    )
)

Best Practices

Prompt Quality

Be specific: More detail = better results
Order matters: Most important elements first
Use examples: Reference known styles or artists
Avoid contradictions: Don't ask for opposing styles
Test and iterate: Refine prompts based on results

File Management

# Save with descriptive names
timestamp = int(time.time())
filename = f'generated_{timestamp}_{aspect_ratio}.png'

with open(filename, 'wb') as f:
    f.write(image_data)

Cost Optimization

Nano Banana 2 pricing (per image):

Resolution	Cost/Image	Batch (50% off)
512px	$0.045	$0.023
1K	$0.067	$0.034
2K	$0.101	$0.051
4K	$0.151	$0.076

Flash Image token costs:

1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
10 images: 12,900 tokens = $0.0129
100 images: 129,000 tokens = $0.129

Strategies:

Generate fewer iterations
Use text modality first to validate concept
Batch similar requests
Cache prompts for consistent style

Error Handling

Safety Filter Blocking

try:
    response = client.models.generate_content(
        model='gemini-2.5-flash-image',
        contents=prompt
    )
except Exception as e:
    # Check block reason
    if hasattr(e, 'prompt_feedback'):
        print(f"Blocked: {e.prompt_feedback.block_reason}")
        # Modify prompt and retry

Token Limit Exceeded

# Keep prompts concise
if len(prompt) > 1000:
    # Truncate or simplify
    prompt = prompt[:1000]

Limitations

Imagen 4 Constraints

Language: English prompts only
Prompt length: Maximum 480 tokens
Output: 1-4 images per request
Watermark: All images include SynthID watermark
Fast model: No imageSize parameter support (fixed resolution)
Text rendering: Limited to ~25 characters for optimal results
Regional restrictions: Child images restricted in EEA, CH, UK
Cannot replicate: Specific people or copyrighted characters

Nano Banana (Gemini) Constraints

Language: English prompts primary support
Context: 32K token window
Multi-image: Standard models ~3-5 refs; Pro up to 14 refs
Text rendering: Standard limited; Pro supports 4K text
Watermark: All images include SynthID watermark
Case sensitivity: response_modalities must be uppercase ('IMAGE', 'TEXT')
Size format: image_size must have uppercase K ('1K', '2K', '4K')

General Limitations

Maximum 14 input images for composition (Pro only)
No video or animation generation (use Veo for video)
No real-time generation

Troubleshooting

aspect_ratio Parameter Error

Error: Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]

Cause: The aspect_ratio parameter must be nested inside an image_config object, not passed directly to GenerateContentConfig.

Incorrect Usage:

# ❌ This will fail
config = types.GenerateContentConfig(
    response_modalities=['image'],
    aspect_ratio='16:9'  # Wrong - not a direct parameter
)

Correct Usage:

# ✅ Correct implementation
config = types.GenerateContentConfig(
    response_modalities=['Image'],  # Note: Capital 'I'
    image_config=types.ImageConfig(
        aspect_ratio='16:9'
    )
)

Response Modality Case Sensitivity

The response_modalities parameter expects uppercase values:

✅ Correct: ['IMAGE'], ['TEXT'], ['IMAGE', 'TEXT']
❌ Wrong: ['image'], ['text'], ['Image']

Image Size Parameter Not Supported

Error: 400 INVALID_ARGUMENT

Cause: The image_size parameter in ImageConfig is not supported by all Nano Banana models.

Solution: Don't pass image_size unless explicitly needed. The API uses sensible defaults.

# ✅ Works - no image_size
config=types.GenerateContentConfig(
    response_modalities=['IMAGE'],
    image_config=types.ImageConfig(
        aspect_ratio='16:9'  # Only aspect_ratio
    )
)

# ⚠️ May fail - with image_size (model-dependent)
config=types.GenerateContentConfig(
    response_modalities=['IMAGE'],
    image_config=types.ImageConfig(
        aspect_ratio='16:9',
        image_size='2K'  # Not supported by all models
    )
)

Multi-Image Reference Issues

Problem: Poor composition with multiple reference images

Solutions:

Limit to 3-5 reference images for standard models
Use Pro model for up to 14 references
Collage multiple style refs into single image
Provide clear textual descriptions of how to blend styles

Current: Image Generation

Related Capabilities:

Image Understanding - Analyzing and editing reference images
Video Generation - Creating animated video content
Audio Processing - Text-to-speech for multimedia

Back to: AI Multimodal Skill

29 KiB Raw Blame History Unescape Escape