1003 lines
29 KiB
Markdown
1003 lines
29 KiB
Markdown
# Image Generation Reference
|
||
|
||
Comprehensive guide for image creation, editing, and composition using Imagen 4 and Gemini models ("Nano Banana").
|
||
|
||
> **Nano Banana** = Google's internal name for native image generation in Gemini API. Three variants:
|
||
> - **Nano Banana 2** (`gemini-3.1-flash-image-preview`) - NEW DEFAULT. 3-5x faster, 95% Pro quality, web grounding, 100+ language text rendering, character consistency (5 chars/14 objects). Released Feb 2026.
|
||
> - **Nano Banana Flash** (`gemini-2.5-flash-image`) - Previous default, still stable.
|
||
> - **Nano Banana Pro** (`gemini-3-pro-image-preview`) - Quality with reasoning, 4K text.
|
||
|
||
## Core Capabilities
|
||
|
||
- **Text-to-Image**: Generate images from text prompts
|
||
- **Image Editing**: Modify existing images with text instructions
|
||
- **Multi-Image Composition**: Combine up to 14 reference images (Pro model)
|
||
- **Iterative Refinement**: Multi-turn conversational refinement
|
||
- **Aspect Ratios**: 10 formats (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
|
||
- **Image Sizes**: 1K, 2K, 4K (uppercase K required)
|
||
- **Quality Variants**: Standard/Ultra/Fast for different needs
|
||
- **Text in Images**: Up to 25 chars optimal (4K text in Pro)
|
||
- **Search Grounding**: Real-time data integration (Pro only)
|
||
- **Thinking Mode**: Advanced reasoning for complex prompts (Pro only)
|
||
|
||
## Models
|
||
|
||
### Nano Banana 2 (Default - Recommended)
|
||
|
||
**gemini-3.1-flash-image-preview** - Nano Banana 2 ⭐ NEW DEFAULT
|
||
- Best for: General use, fast generation with near-Pro quality
|
||
- Quality: High (95% parity with Pro)
|
||
- Speed: 3-5x faster than previous Flash
|
||
- Cost: ~$0.045/image (512px) to ~$0.151/image (4K); ~25-30% cheaper than Pro
|
||
- Resolution: 512px to 4K with expanded aspect ratios
|
||
- Text rendering: 100+ languages with proper formatting
|
||
- Character consistency: Up to 5 characters and 14 objects
|
||
- Reasoning levels: Minimal/High/Dynamic for complex prompts
|
||
- Web grounding: Real-time data integration for brands, landmarks, recent events
|
||
- Status: Preview (Feb 2026)
|
||
|
||
### Nano Banana Flash (Previous Default)
|
||
|
||
**gemini-2.5-flash-image** - Nano Banana Flash
|
||
- Best for: Speed, high-volume generation, rapid prototyping
|
||
- Quality: High
|
||
- Context: 65,536 input / 32,768 output tokens
|
||
- Speed: Fast (~5-10s per image)
|
||
- Cost: ~$1/1M input tokens
|
||
- Aspect Ratios: All 10 supported
|
||
- Image Sizes: 1K, 2K, 4K
|
||
- Status: Stable (Oct 2025)
|
||
|
||
**gemini-3-pro-image-preview** - Nano Banana Pro
|
||
- Best for: Professional assets, 4K text rendering, complex prompts
|
||
- Quality: Ultra (with advanced reasoning)
|
||
- Context: 65,536 input / 32,768 output tokens
|
||
- Speed: Medium
|
||
- Cost: ~$2/1M text input, $0.134/image (resolution-dependent)
|
||
- Multi-Image: Up to 14 reference images (6 objects + 5 humans)
|
||
- Features: Thinking mode, Google Search grounding
|
||
- Status: Preview (Nov 2025)
|
||
|
||
### Imagen 4 (Alternative - Production)
|
||
|
||
**imagen-4.0-generate-001** - Standard quality, balanced performance
|
||
- Best for: Production workflows, marketing assets
|
||
- Quality: High
|
||
- Speed: Medium (~5-10s per image)
|
||
- Cost: ~$0.02/image (estimated)
|
||
- Output: 1-4 images per request
|
||
- Resolution: 1K or 2K
|
||
- Updated: June 2025
|
||
|
||
**imagen-4.0-ultra-generate-001** - Maximum quality
|
||
- Best for: Final production, marketing assets, detailed artwork
|
||
- Quality: Ultra (highest available)
|
||
- Speed: Slow (~15-25s per image)
|
||
- Cost: ~$0.04/image (estimated)
|
||
- Output: 1-4 images per request
|
||
- Resolution: 2K preferred
|
||
- Updated: June 2025
|
||
|
||
**imagen-4.0-fast-generate-001** - Fastest generation
|
||
- Best for: Rapid iteration, bulk generation, real-time use
|
||
- Quality: Good
|
||
- Speed: Fast (~2-5s per image)
|
||
- Cost: ~$0.01/image (estimated)
|
||
- Output: 1-4 images per request
|
||
- Resolution: 1K
|
||
- Updated: June 2025
|
||
|
||
### Legacy Models
|
||
|
||
**gemini-2.0-flash-preview-image-generation** - Legacy
|
||
- Status: Deprecated (use Nano Banana or Imagen 4 instead)
|
||
- Context: 32,768 input / 8,192 output tokens
|
||
|
||
## Model Comparison
|
||
|
||
| Model | Quality | Speed | Cost | Best For |
|
||
|-------|---------|-------|------|----------|
|
||
| gemini-3.1-flash-image-preview | ⭐⭐⭐⭐½ | 🚀🚀 Fastest | 💵 Low | **NEW DEFAULT** - General use |
|
||
| gemini-2.5-flash-image | ⭐⭐⭐⭐ | 🚀 Fast | 💵 Low | Previous default, stable |
|
||
| gemini-3-pro-image | ⭐⭐⭐⭐⭐ | 💡 Medium | 💰 Medium | Text/reasoning |
|
||
| imagen-4.0-generate | ⭐⭐⭐⭐ | 💡 Medium | 💰 Medium | Production (alternative) |
|
||
| imagen-4.0-ultra | ⭐⭐⭐⭐⭐ | 🐢 Slow | 💰💰 High | Marketing assets |
|
||
| imagen-4.0-fast | ⭐⭐⭐ | 🚀 Fast | 💵 Low | Bulk generation |
|
||
|
||
**Selection Guide**:
|
||
- **Default/General**: Use `gemini-3.1-flash-image-preview` (fastest, near-Pro quality, web grounding)
|
||
- **Stable Alternative**: Use `gemini-2.5-flash-image` (previous default, fully stable)
|
||
- **Production Quality**: Use `imagen-4.0-generate-001` (alternative for final assets)
|
||
- **Marketing/Ultra Quality**: Use `imagen-4.0-ultra` for maximum quality
|
||
- **Text-Heavy Images**: Use `gemini-3-pro-image-preview` for 4K text rendering
|
||
- **Complex Prompts with Reasoning**: Use `gemini-3-pro-image-preview` with Thinking mode
|
||
- **Real-time Data Integration**: Use `gemini-3.1-flash-image-preview` or `gemini-3-pro-image-preview` with Search grounding
|
||
|
||
## Quick Start
|
||
|
||
### Basic Generation (Default - Nano Banana 2)
|
||
|
||
```python
|
||
from google import genai
|
||
from google.genai import types
|
||
import os
|
||
|
||
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
|
||
|
||
# Nano Banana 2 - NEW DEFAULT (fastest, near-Pro quality, web grounding)
|
||
response = client.models.generate_content(
|
||
model='gemini-3.1-flash-image-preview',
|
||
contents='A serene mountain landscape at sunset with snow-capped peaks',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'], # Uppercase required
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9',
|
||
image_size='2K' # 512px, 1K, 2K, 4K - uppercase K required
|
||
)
|
||
)
|
||
)
|
||
|
||
# Save images
|
||
for i, part in enumerate(response.candidates[0].content.parts):
|
||
if part.inline_data:
|
||
with open(f'output-{i}.png', 'wb') as f:
|
||
f.write(part.inline_data.data)
|
||
```
|
||
|
||
### Alternative - Imagen 4 (Production Quality)
|
||
|
||
```python
|
||
# Imagen 4 Standard - alternative for production workflows
|
||
response = client.models.generate_images(
|
||
model='imagen-4.0-generate-001',
|
||
prompt='Professional product photography of smartphone',
|
||
config=types.GenerateImagesConfig(
|
||
numberOfImages=1,
|
||
aspectRatio='16:9',
|
||
imageSize='1K'
|
||
)
|
||
)
|
||
|
||
# Save Imagen 4 output
|
||
for i, generated_image in enumerate(response.generated_images):
|
||
with open(f'output-{i}.png', 'wb') as f:
|
||
f.write(generated_image.image.image_bytes)
|
||
```
|
||
|
||
### Imagen 4 Quality Variants
|
||
|
||
```python
|
||
# Ultra quality (marketing assets)
|
||
response = client.models.generate_images(
|
||
model='imagen-4.0-ultra-generate-001',
|
||
prompt='Professional product photography of smartphone',
|
||
config=types.GenerateImagesConfig(
|
||
numberOfImages=1,
|
||
imageSize='2K' # Use 2K for ultra (Standard/Ultra only)
|
||
)
|
||
)
|
||
|
||
# Fast generation (bulk)
|
||
# Note: Fast model doesn't support imageSize parameter
|
||
response = client.models.generate_images(
|
||
model='imagen-4.0-fast-generate-001',
|
||
prompt='Quick concept sketch of robot character',
|
||
config=types.GenerateImagesConfig(
|
||
numberOfImages=4, # Generate multiple variants (default: 4)
|
||
aspectRatio='1:1'
|
||
)
|
||
)
|
||
```
|
||
|
||
### Nano Banana Pro (4K Text, Reasoning)
|
||
|
||
```python
|
||
# Nano Banana Pro - for text rendering and complex prompts
|
||
response = client.models.generate_content(
|
||
model='gemini-3-pro-image-preview',
|
||
contents='A futuristic cityscape with neon lights',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'], # Uppercase required
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9',
|
||
image_size='4K' # 4K text rendering
|
||
)
|
||
)
|
||
)
|
||
|
||
# Nano Banana Pro - with Thinking mode and Search grounding
|
||
response = client.models.generate_content(
|
||
model='gemini-3-pro-image-preview',
|
||
contents='Current weather in Tokyo visualized as artistic infographic',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['TEXT', 'IMAGE'], # Both text and image
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='1:1',
|
||
image_size='4K'
|
||
)
|
||
),
|
||
tools=[{'google_search': {}}] # Enable search grounding
|
||
)
|
||
|
||
# Save from content parts
|
||
for i, part in enumerate(response.candidates[0].content.parts):
|
||
if part.inline_data:
|
||
with open(f'output-{i}.png', 'wb') as f:
|
||
f.write(part.inline_data.data)
|
||
```
|
||
|
||
### Multi-Image Reference (Nano Banana Pro)
|
||
|
||
```python
|
||
from PIL import Image
|
||
|
||
# Up to 14 reference images (6 objects + 5 humans recommended)
|
||
img1 = Image.open('style_ref.png')
|
||
img2 = Image.open('color_ref.png')
|
||
img3 = Image.open('composition_ref.png')
|
||
|
||
response = client.models.generate_content(
|
||
model='gemini-3-pro-image-preview',
|
||
contents=[
|
||
'Blend these reference styles into a cohesive hero image for a tech product',
|
||
img1, img2, img3
|
||
],
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'],
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9',
|
||
image_size='4K'
|
||
)
|
||
)
|
||
)
|
||
```
|
||
|
||
### Nano Banana 2 with Web Grounding
|
||
|
||
```python
|
||
# Nano Banana 2 - real-time web integration for brands, landmarks, events
|
||
response = client.models.generate_content(
|
||
model='gemini-3.1-flash-image-preview',
|
||
contents='Current Apple Vision Pro product shot with accurate branding',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'],
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9',
|
||
image_size='2K'
|
||
)
|
||
)
|
||
)
|
||
```
|
||
|
||
### Nano Banana 2 with Reasoning Levels
|
||
|
||
```python
|
||
# Use reasoning levels for complex prompts
|
||
response = client.models.generate_content(
|
||
model='gemini-3.1-flash-image-preview',
|
||
contents='A photorealistic scene of 5 diverse characters sitting around a campfire, each with distinct clothing and accessories, consistent lighting from the fire',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'],
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9',
|
||
image_size='4K'
|
||
)
|
||
)
|
||
)
|
||
# Nano Banana 2 auto-selects reasoning level (Minimal/High/Dynamic)
|
||
# For explicit control, check API docs for reasoning_level parameter
|
||
```
|
||
|
||
### Multi-Turn Refinement Chat
|
||
|
||
```python
|
||
# Conversational image refinement (works with any Nano Banana model)
|
||
chat = client.chats.create(
|
||
model='gemini-3.1-flash-image-preview', # or gemini-2.5-flash-image
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['TEXT', 'IMAGE']
|
||
)
|
||
)
|
||
|
||
# Initial generation
|
||
response1 = chat.send_message('Create a minimalist logo for a coffee brand called "Brew"')
|
||
|
||
# Iterative refinement
|
||
response2 = chat.send_message('Make the text bolder and add steam rising from the cup')
|
||
response3 = chat.send_message('Change the color palette to warm earth tones')
|
||
```
|
||
|
||
## API Differences
|
||
|
||
### Imagen 4 vs Nano Banana (Gemini Native)
|
||
|
||
| Feature | Imagen 4 | Nano Banana (Gemini) |
|
||
|---------|----------|---------------------|
|
||
| Method | `generate_images()` | `generate_content()` |
|
||
| Config | `GenerateImagesConfig` | `GenerateContentConfig` |
|
||
| Prompt param | `prompt` (string) | `contents` (string/list) |
|
||
| Image count | `numberOfImages` (camelCase) | N/A (single per request) |
|
||
| Aspect ratio | `aspectRatio` (camelCase) | `aspect_ratio` (snake_case) |
|
||
| Size | `imageSize` | `image_size` |
|
||
| Response | `generated_images[i].image.image_bytes` | `candidates[0].content.parts[i].inline_data.data` |
|
||
| Multi-image input | ❌ | ✅ Up to 14 references |
|
||
| Multi-turn chat | ❌ | ✅ Conversational |
|
||
| Search grounding | ❌ | ✅ (Pro only) |
|
||
| Thinking mode | ❌ | ✅ (Pro only) |
|
||
| Text rendering | Limited | 4K (Pro) |
|
||
|
||
**Imagen 4** uses `generate_images()`:
|
||
```python
|
||
response = client.models.generate_images(
|
||
model='imagen-4.0-generate-001',
|
||
prompt='...',
|
||
config=types.GenerateImagesConfig(
|
||
numberOfImages=1, # camelCase
|
||
aspectRatio='16:9', # camelCase
|
||
imageSize='1K' # Standard/Ultra only
|
||
)
|
||
)
|
||
# Access: response.generated_images[0].image.image_bytes
|
||
```
|
||
|
||
**Nano Banana** uses `generate_content()`:
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-3.1-flash-image-preview', # or gemini-2.5-flash-image, gemini-3-pro-image-preview
|
||
contents='...',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'], # Uppercase required
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9', # snake_case
|
||
image_size='2K' # 1K, 2K, 4K - uppercase K
|
||
)
|
||
)
|
||
)
|
||
# Access: response.candidates[0].content.parts[0].inline_data.data
|
||
```
|
||
|
||
**Critical Notes**:
|
||
1. `response_modalities` values MUST be uppercase: `'IMAGE'`, `'TEXT'`
|
||
2. `image_size` value MUST have uppercase K: `'1K'`, `'2K'`, `'4K'`
|
||
3. Imagen 4 Fast model doesn't support `imageSize` parameter
|
||
|
||
## Aspect Ratios
|
||
|
||
| Ratio | Resolution (1K) | Use Case | Token Cost |
|
||
|-------|----------------|----------|------------|
|
||
| 1:1 | 1024×1024 | Social media, avatars, icons | 1290 |
|
||
| 2:3 | 682×1024 | Vertical portraits | 1290 |
|
||
| 3:2 | 1024×682 | Horizontal portraits | 1290 |
|
||
| 3:4 | 768×1024 | Vertical posters | 1290 |
|
||
| 4:3 | 1024×768 | Traditional media | 1290 |
|
||
| 4:5 | 819×1024 | Instagram portrait | 1290 |
|
||
| 5:4 | 1024×819 | Horizontal photos | 1290 |
|
||
| 9:16 | 576×1024 | Mobile/stories/reels | 1290 |
|
||
| 16:9 | 1024×576 | Landscapes, banners, YouTube | 1290 |
|
||
| 21:9 | 1024×438 | Ultrawide/cinematic | 1290 |
|
||
|
||
All ratios cost the same: 1,290 tokens per image (Gemini models).
|
||
|
||
## Response Modalities
|
||
|
||
### Image Only
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='1:1'
|
||
)
|
||
```
|
||
|
||
### Text Only (No Image)
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['text']
|
||
)
|
||
# Returns text description instead of generating image
|
||
```
|
||
|
||
### Both Image and Text
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image', 'text'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
# Returns both generated image and description
|
||
```
|
||
|
||
## Image Editing
|
||
|
||
### Modify Existing Image
|
||
|
||
```python
|
||
import PIL.Image
|
||
|
||
# Load original
|
||
img = PIL.Image.open('original.png')
|
||
|
||
# Edit with instructions
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Add a red balloon floating in the sky',
|
||
img
|
||
],
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
### Style Transfer
|
||
|
||
```python
|
||
img = PIL.Image.open('photo.jpg')
|
||
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Transform this into an oil painting style',
|
||
img
|
||
]
|
||
)
|
||
```
|
||
|
||
### Object Addition/Removal
|
||
|
||
```python
|
||
# Add object
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Add a vintage car parked on the street',
|
||
img
|
||
]
|
||
)
|
||
|
||
# Remove object
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Remove the person on the left side',
|
||
img
|
||
]
|
||
)
|
||
```
|
||
|
||
## Multi-Image Composition
|
||
|
||
### Combine Multiple Images
|
||
|
||
```python
|
||
img1 = PIL.Image.open('background.png')
|
||
img2 = PIL.Image.open('foreground.png')
|
||
img3 = PIL.Image.open('overlay.png')
|
||
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Combine these images into a cohesive scene',
|
||
img1,
|
||
img2,
|
||
img3
|
||
],
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
**Note**: Recommended maximum 3 input images for best results.
|
||
|
||
## Prompt Engineering
|
||
|
||
### Core Principle: Narrative > Keywords
|
||
|
||
> **Nano Banana prompting**: Write like you're briefing a photographer, not providing SEO keywords. Narrative paragraphs outperform keyword lists.
|
||
|
||
❌ **Bad**: "cat, 4k, masterpiece, trending, professional, ultra detailed, cinematic"
|
||
✅ **Good**: "A fluffy orange tabby cat with green eyes lounging on a sun-drenched windowsill. Soft morning light creates a warm glow. Shot with a 50mm lens at f/1.8 for shallow depth of field. Natural lighting, documentary photography style."
|
||
|
||
### Effective Prompt Structure
|
||
|
||
**Three key elements**:
|
||
1. **Subject**: What to generate (be specific)
|
||
2. **Context**: Environmental setting (lighting, location, time)
|
||
3. **Style**: Artistic treatment (photography, illustration, etc.)
|
||
|
||
### Quality Modifiers
|
||
|
||
**Technical terms**:
|
||
- "4K", "8K", "high resolution"
|
||
- "HDR", "high dynamic range"
|
||
- "professional photography"
|
||
- "studio lighting"
|
||
- "ultra detailed"
|
||
|
||
**Camera settings**:
|
||
- "35mm lens", "50mm lens"
|
||
- "shallow depth of field"
|
||
- "wide angle shot"
|
||
- "macro photography"
|
||
- "golden hour lighting"
|
||
|
||
### Style Keywords
|
||
|
||
**Art styles**:
|
||
- "oil painting", "watercolor", "sketch"
|
||
- "digital art", "concept art"
|
||
- "photorealistic", "hyperrealistic"
|
||
- "minimalist", "abstract"
|
||
- "cyberpunk", "steampunk", "fantasy"
|
||
|
||
**Mood and atmosphere**:
|
||
- "dramatic lighting", "soft lighting"
|
||
- "moody", "bright and cheerful"
|
||
- "mysterious", "whimsical"
|
||
- "dark and gritty", "pastel colors"
|
||
|
||
### Subject Description
|
||
|
||
**Be specific**:
|
||
- ❌ "A cat"
|
||
- ✅ "A fluffy orange tabby cat with green eyes"
|
||
|
||
**Add context**:
|
||
- ❌ "A building"
|
||
- ✅ "A modern glass skyscraper reflecting sunset clouds"
|
||
|
||
**Include details**:
|
||
- ❌ "A person"
|
||
- ✅ "A young woman in a red dress holding an umbrella"
|
||
|
||
### Composition and Framing
|
||
|
||
**Camera angles**:
|
||
- "bird's eye view", "aerial shot"
|
||
- "low angle", "high angle"
|
||
- "close-up", "wide shot"
|
||
- "centered composition"
|
||
- "rule of thirds"
|
||
|
||
**Perspective**:
|
||
- "first person view"
|
||
- "third person perspective"
|
||
- "isometric view"
|
||
- "forced perspective"
|
||
|
||
### Text in Images
|
||
|
||
**Limitations**:
|
||
- Maximum 25 characters total for optimal results
|
||
- Up to 3 distinct text phrases
|
||
- For 4K text rendering, use `gemini-3-pro-image-preview`
|
||
|
||
**Text prompt template**:
|
||
```
|
||
Image with text "[EXACT TEXT]" in [font style].
|
||
Font: [style description].
|
||
Color: [hex code like #FF5733].
|
||
Position: [top/center/bottom].
|
||
Background: [description].
|
||
Context: [poster/sign/label].
|
||
```
|
||
|
||
**Example**:
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-3-pro-image-preview', # Use Pro for better text
|
||
contents='''
|
||
Create a vintage travel poster with text "EXPLORE TOKYO" at the top.
|
||
Font: Bold retro sans-serif, slightly condensed.
|
||
Color: #F5E6D3 (cream white).
|
||
Position: Top third of image.
|
||
Background: Stylized Tokyo skyline with Mt. Fuji, sunset colors.
|
||
Style: 1950s travel poster aesthetic, muted warm colors.
|
||
'''
|
||
)
|
||
```
|
||
|
||
**Font keywords**:
|
||
- "bold sans-serif", "handwritten script", "vintage letterpress"
|
||
- "modern minimalist", "art deco", "neon sign"
|
||
|
||
### Nano Banana Prompt Techniques
|
||
|
||
| Technique | Example | Purpose |
|
||
|-----------|---------|---------|
|
||
| ALL CAPS emphasis | `The logo MUST be centered` | Force attention to critical requirements |
|
||
| Hex colors | `#9F2B68` instead of "dark magenta" | Exact color control |
|
||
| Negative constraints | `NEVER include text/watermarks. DO NOT add labels.` | Explicit exclusions |
|
||
| Realism trigger | `Natural lighting, DOF. Captured with Canon EOS 90D DSLR.` | Photography authenticity |
|
||
| Structured edits | `Make ALL edits: - [1] - [2] - [3]` | Multi-step changes |
|
||
| Complex logic | `Kittens MUST have heterochromatic eyes matching fur colors` | Precise conditions |
|
||
|
||
**Prompt Templates**:
|
||
|
||
**Photorealistic**:
|
||
```
|
||
A [subject] in [location], [lens] lens. [Lighting] creates [mood]. [Details].
|
||
[Camera angle]. Professional photography, natural lighting.
|
||
```
|
||
|
||
**Illustration**:
|
||
```
|
||
[Art style] illustration of [subject]. [Color palette]. [Line style].
|
||
[Background]. [Mood].
|
||
```
|
||
|
||
**Product**:
|
||
```
|
||
[Product] on [surface]. Materials: [finish]. Lighting: [setup].
|
||
Camera: [angle]. Background: [type]. Style: [commercial/lifestyle].
|
||
```
|
||
|
||
## Advanced Techniques
|
||
|
||
### Iterative Refinement
|
||
|
||
```python
|
||
# Initial generation
|
||
response1 = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='A futuristic city skyline'
|
||
)
|
||
|
||
# Save first version
|
||
with open('v1.png', 'wb') as f:
|
||
f.write(response1.candidates[0].content.parts[0].inline_data.data)
|
||
|
||
# Refine
|
||
img = PIL.Image.open('v1.png')
|
||
response2 = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Add flying vehicles and neon signs',
|
||
img
|
||
]
|
||
)
|
||
```
|
||
|
||
### Negative Prompts (Indirect)
|
||
|
||
```python
|
||
# Instead of "no blur", be specific about what you want
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
|
||
)
|
||
```
|
||
|
||
### Consistent Style Across Images
|
||
|
||
```python
|
||
base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"
|
||
|
||
prompts = [
|
||
f"{base_prompt}, a warrior character",
|
||
f"{base_prompt}, a mage character",
|
||
f"{base_prompt}, a rogue character"
|
||
]
|
||
|
||
for i, prompt in enumerate(prompts):
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=prompt
|
||
)
|
||
# Save each character
|
||
```
|
||
|
||
## Safety Settings
|
||
|
||
### Configure Safety Filters
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
safety_settings=[
|
||
types.SafetySetting(
|
||
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
|
||
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
|
||
),
|
||
types.SafetySetting(
|
||
category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
|
||
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
|
||
)
|
||
]
|
||
)
|
||
```
|
||
|
||
### Available Categories
|
||
|
||
- `HARM_CATEGORY_HATE_SPEECH`
|
||
- `HARM_CATEGORY_DANGEROUS_CONTENT`
|
||
- `HARM_CATEGORY_HARASSMENT`
|
||
- `HARM_CATEGORY_SEXUALLY_EXPLICIT`
|
||
|
||
### Thresholds
|
||
|
||
- `BLOCK_NONE`: No blocking
|
||
- `BLOCK_LOW_AND_ABOVE`: Block low probability and above
|
||
- `BLOCK_MEDIUM_AND_ABOVE`: Block medium and above (default)
|
||
- `BLOCK_ONLY_HIGH`: Block only high probability
|
||
|
||
## Common Use Cases
|
||
|
||
### 1. Marketing Assets
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Professional product photography:
|
||
- Sleek smartphone on minimalist white surface
|
||
- Dramatic side lighting creating subtle shadows
|
||
- Shallow depth of field, crisp focus
|
||
- Clean, modern aesthetic
|
||
- 4K quality
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='4:3'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 2. Concept Art
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Fantasy concept art:
|
||
- Ancient floating islands connected by chains
|
||
- Waterfalls cascading into clouds below
|
||
- Magical crystals glowing on the islands
|
||
- Epic scale, dramatic lighting
|
||
- Detailed digital painting style
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 3. Social Media Graphics
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Instagram post design:
|
||
- Pastel gradient background (pink to blue)
|
||
- Motivational quote layout
|
||
- Modern minimalist style
|
||
- Clean typography
|
||
- Mobile-friendly composition
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='1:1'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 4. Illustration
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Children's book illustration:
|
||
- Friendly cartoon dragon reading a book
|
||
- Bright, cheerful colors
|
||
- Soft, rounded shapes
|
||
- Whimsical forest background
|
||
- Warm, inviting atmosphere
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='4:3'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 5. UI/UX Mockups
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Modern mobile app interface:
|
||
- Clean dashboard design
|
||
- Card-based layout
|
||
- Soft shadows and gradients
|
||
- Contemporary color scheme (blue and white)
|
||
- Professional fintech aesthetic
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='9:16'
|
||
)
|
||
)
|
||
```
|
||
|
||
## Best Practices
|
||
|
||
### Prompt Quality
|
||
|
||
1. **Be specific**: More detail = better results
|
||
2. **Order matters**: Most important elements first
|
||
3. **Use examples**: Reference known styles or artists
|
||
4. **Avoid contradictions**: Don't ask for opposing styles
|
||
5. **Test and iterate**: Refine prompts based on results
|
||
|
||
### File Management
|
||
|
||
```python
|
||
# Save with descriptive names
|
||
timestamp = int(time.time())
|
||
filename = f'generated_{timestamp}_{aspect_ratio}.png'
|
||
|
||
with open(filename, 'wb') as f:
|
||
f.write(image_data)
|
||
```
|
||
|
||
### Cost Optimization
|
||
|
||
**Nano Banana 2 pricing (per image)**:
|
||
| Resolution | Cost/Image | Batch (50% off) |
|
||
|-----------|-----------|-----------------|
|
||
| 512px | $0.045 | $0.023 |
|
||
| 1K | $0.067 | $0.034 |
|
||
| 2K | $0.101 | $0.051 |
|
||
| 4K | $0.151 | $0.076 |
|
||
|
||
**Flash Image token costs**:
|
||
- 1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
|
||
- 10 images: 12,900 tokens = $0.0129
|
||
- 100 images: 129,000 tokens = $0.129
|
||
|
||
**Strategies**:
|
||
- Generate fewer iterations
|
||
- Use text modality first to validate concept
|
||
- Batch similar requests
|
||
- Cache prompts for consistent style
|
||
|
||
## Error Handling
|
||
|
||
### Safety Filter Blocking
|
||
|
||
```python
|
||
try:
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=prompt
|
||
)
|
||
except Exception as e:
|
||
# Check block reason
|
||
if hasattr(e, 'prompt_feedback'):
|
||
print(f"Blocked: {e.prompt_feedback.block_reason}")
|
||
# Modify prompt and retry
|
||
```
|
||
|
||
### Token Limit Exceeded
|
||
|
||
```python
|
||
# Keep prompts concise
|
||
if len(prompt) > 1000:
|
||
# Truncate or simplify
|
||
prompt = prompt[:1000]
|
||
```
|
||
|
||
## Limitations
|
||
|
||
### Imagen 4 Constraints
|
||
- **Language**: English prompts only
|
||
- **Prompt length**: Maximum 480 tokens
|
||
- **Output**: 1-4 images per request
|
||
- **Watermark**: All images include SynthID watermark
|
||
- **Fast model**: No `imageSize` parameter support (fixed resolution)
|
||
- **Text rendering**: Limited to ~25 characters for optimal results
|
||
- **Regional restrictions**: Child images restricted in EEA, CH, UK
|
||
- **Cannot replicate**: Specific people or copyrighted characters
|
||
|
||
### Nano Banana (Gemini) Constraints
|
||
- **Language**: English prompts primary support
|
||
- **Context**: 32K token window
|
||
- **Multi-image**: Standard models ~3-5 refs; Pro up to 14 refs
|
||
- **Text rendering**: Standard limited; Pro supports 4K text
|
||
- **Watermark**: All images include SynthID watermark
|
||
- **Case sensitivity**: `response_modalities` must be uppercase (`'IMAGE'`, `'TEXT'`)
|
||
- **Size format**: `image_size` must have uppercase K (`'1K'`, `'2K'`, `'4K'`)
|
||
|
||
### General Limitations
|
||
- Maximum 14 input images for composition (Pro only)
|
||
- No video or animation generation (use Veo for video)
|
||
- No real-time generation
|
||
|
||
## Troubleshooting
|
||
|
||
### aspect_ratio Parameter Error
|
||
|
||
**Error**: `Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]`
|
||
|
||
**Cause**: The `aspect_ratio` parameter must be nested inside an `image_config` object, not passed directly to `GenerateContentConfig`.
|
||
|
||
**Incorrect Usage**:
|
||
```python
|
||
# ❌ This will fail
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9' # Wrong - not a direct parameter
|
||
)
|
||
```
|
||
|
||
**Correct Usage**:
|
||
```python
|
||
# ✅ Correct implementation
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['Image'], # Note: Capital 'I'
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
### Response Modality Case Sensitivity
|
||
|
||
The `response_modalities` parameter expects uppercase values:
|
||
- ✅ Correct: `['IMAGE']`, `['TEXT']`, `['IMAGE', 'TEXT']`
|
||
- ❌ Wrong: `['image']`, `['text']`, `['Image']`
|
||
|
||
### Image Size Parameter Not Supported
|
||
|
||
**Error**: `400 INVALID_ARGUMENT`
|
||
|
||
**Cause**: The `image_size` parameter in `ImageConfig` is not supported by all Nano Banana models.
|
||
|
||
**Solution**: Don't pass `image_size` unless explicitly needed. The API uses sensible defaults.
|
||
|
||
```python
|
||
# ✅ Works - no image_size
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'],
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9' # Only aspect_ratio
|
||
)
|
||
)
|
||
|
||
# ⚠️ May fail - with image_size (model-dependent)
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['IMAGE'],
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9',
|
||
image_size='2K' # Not supported by all models
|
||
)
|
||
)
|
||
```
|
||
|
||
### Multi-Image Reference Issues
|
||
|
||
**Problem**: Poor composition with multiple reference images
|
||
|
||
**Solutions**:
|
||
1. Limit to 3-5 reference images for standard models
|
||
2. Use Pro model for up to 14 references
|
||
3. Collage multiple style refs into single image
|
||
4. Provide clear textual descriptions of how to blend styles
|
||
|
||
---
|
||
|
||
## Related References
|
||
|
||
**Current**: Image Generation
|
||
|
||
**Related Capabilities**:
|
||
- [Image Understanding](./vision-understanding.md) - Analyzing and editing reference images
|
||
- [Video Generation](./video-generation.md) - Creating animated video content
|
||
- [Audio Processing](./audio-processing.md) - Text-to-speech for multimedia
|
||
|
||
**Back to**: [AI Multimodal Skill](../SKILL.md)
|