Skip to content

Customizing Reflection Prompts

This guide covers how to customize the reflection prompt used during evolution to improve instruction mutations.

Deprecation Notice

Direct LiteLLM reflection via reflection_model and reflection_prompt parameters is deprecated and will be removed in a future version. Use reflection_agent with an ADK LlmAgent instead for consistent execution and session management. See Issue #144.

Recommended approach:

from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm

reflection_agent = LlmAgent(
    name="Reflector",
    model=LiteLlm(model="ollama_chat/llama3.2:latest"),
    instruction="Your custom reflection prompt with {component_text} and {trials}",
)

result = await evolve(
    agent=my_agent,
    trainset=test_cases,
    critic=my_critic,  # Optional
    reflection_agent=reflection_agent,  # Recommended
)

Overview

The reflection prompt is the template sent to the reflection model (e.g., ollama_chat/gpt-oss:20b) to generate improved agent instructions. By customizing this prompt, you can:

  • Tailor the mutation strategy to your specific use case
  • Request specific output formats (e.g., JSON)
  • Add domain-specific guidelines
  • Optimize for different model capabilities

Available Placeholders

The reflection prompt template supports two placeholders that are filled at runtime:

Placeholder Content Description
{component_text} component being evolved The text content of the component_text (e.g., instruction)
{trials} trial data Collection of trials with feedback and trajectory for each test

ADK Template Syntax

gepa-adk uses ADK's native template substitution for injecting session state values into agent instructions. This provides cleaner separation between data (in session state) and instruction logic.

How It Works

ADK automatically replaces {key} placeholders in agent instructions with values from session.state[key]:

from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm

# Define agent with template placeholders
agent = LlmAgent(
    name="Reflector",
    model=LiteLlm(model="ollama_chat/llama3.2:latest"),
    instruction="""## Component Text
{component_text}

## Trial Results
{trials}

Improve the component text based on the trials.""",
)

# Session state is set up automatically by gepa-adk
# ADK's inject_session_state() replaces {component_text} and {trials}

Template Syntax Reference

Syntax Description Example
{key} Required placeholder - raises KeyError if missing {component_text}
{key?} Optional placeholder - returns empty string if missing {context?}
{app:key} Application-scoped state (advanced) {app:shared_config}
{user:key} User-scoped state (advanced) {user:preferences}

Type Handling

ADK converts all session state values to strings using str(). For complex types, gepa-adk pre-serializes to JSON:

# gepa-adk automatically handles serialization:
session_state = {
    "component_text": "Be helpful",  # String - used as-is
    "trials": json.dumps(trials),    # List → JSON string
}

Important: If you pass a dict/list directly without JSON serialization, ADK uses Python's repr() which may not be readable by the LLM.

Example Placeholder Values

{component_text}:

You are a helpful assistant that answers questions about Python programming.
Be concise and provide code examples when relevant.

{trials}:

Example 1:
  Input: "How do I read a file?"
  Expected: "Use open() with 'r' mode"
  Actual: "Files can be read using Python"
  Score: 0.3
  Feedback: Response too vague, missing code example

Example 2:
  Input: "What is a list?"
  Expected: "A mutable sequence type"
  Actual: "A list is a mutable sequence type in Python"
  Score: 0.9
  Feedback: Good explanation

Basic Usage

Using a Custom Prompt

from gepa_adk import evolve, EvolutionConfig

config = EvolutionConfig(
    reflection_prompt="""You are improving an AI agent's instructions.

## Current Instruction
{component_text}

## Evaluation Results
{trials}

## Your Task
Based on the feedback, propose ONE specific improvement to the instruction.
Focus on the most impactful change.

Return ONLY the improved instruction text."""
)

result = await evolve(
    agent=my_agent,
    trainset=test_cases,
    critic=my_critic,
    config=config,
)

Extending the Default Prompt

You can import and extend the default prompt template:

from gepa_adk import evolve, EvolutionConfig
from gepa_adk.engine.adk_reflection import REFLECTION_INSTRUCTION

# Add domain-specific context to the default
custom_prompt = REFLECTION_INSTRUCTION + """

Additional Guidelines:
- Focus on clarity and conciseness
- Preserve any safety constraints in the original
- Consider edge cases mentioned in feedback
"""

config = EvolutionConfig(reflection_prompt=custom_prompt)

Using the Default (No Configuration)

from gepa_adk import evolve

# reflection_prompt defaults to None → uses REFLECTION_INSTRUCTION
result = await evolve(
    agent=my_agent,
    trainset=test_cases,
    critic=my_critic,
)

Prompt Design Guidelines

1. Include Both Placeholders

Always include {component_text} and {trials} in your prompt. The system will warn (but not error) if either is missing:

# This will log a warning about missing {trials}
config = EvolutionConfig(
    reflection_prompt="Improve this: {component_text}"
)

2. Request Clear Output Format

Specify exactly what format you want the response in:

# Good: Clear output expectation
prompt = """...
Return ONLY the improved instruction text, with no additional commentary.
"""

# Also good: Structured output
prompt = """...
Respond with exactly this JSON structure:
{
  "analysis": "Brief analysis",
  "improved_instruction": "The improved text"
}
"""

3. Be Specific About the Task

Tell the model exactly what kind of improvements to make:

prompt = """...
## Your Task
1. Address the issues identified in negative feedback
2. Preserve elements that worked well in positive feedback
3. Maintain clarity and specificity
4. Keep the instruction concise (under 200 words)
"""

4. Consider Model Capabilities

Adjust prompt complexity based on your reflection model:

  • Smaller models (7B-13B): Use simpler, more direct prompts
  • Larger models (70B+): Can handle chain-of-thought, structured reasoning

Example Prompts

Minimal/Fast Prompt

For quick iterations with capable models:

minimal_prompt = """Instruction:
{component_text}

Feedback:
{trials}

Improved instruction:"""

Chain-of-Thought Prompt

For more thoughtful improvements:

cot_prompt = """You are an expert at improving AI instructions.

## Current Instruction
{component_text}

## Performance Feedback
{trials}

## Analysis Process
1. What patterns appear in successful examples?
2. What patterns appear in failed examples?
3. What specific changes would address the failures while preserving successes?

Think through each step, then provide the improved instruction.

## Improved Instruction
"""

JSON Output Format

For structured responses:

json_prompt = """Analyze the agent instruction and feedback, then respond with JSON.

## Current Instruction
{component_text}

## Feedback
{trials}

Respond with exactly this JSON structure:
{
  "analysis": "Brief analysis of what's working and what isn't",
  "improved_instruction": "The complete improved instruction text"
}
"""

Domain-Specific Prompt

For specialized use cases:

code_review_prompt = """You are improving a code review agent's instructions.

## Current Instruction
{component_text}

## Evaluation Feedback
{trials}

## Domain Guidelines
- The agent should identify bugs, security issues, and style problems
- Feedback should be actionable and specific
- Code examples should be provided when suggesting fixes
- Tone should be constructive, not critical

Provide an improved instruction that addresses the feedback while
following these domain guidelines.

## Improved Instruction
"""

Model Selection Guidance

The reflection model processes your custom prompt to generate improved instructions. Choosing the right model affects quality, speed, and cost.

Token Budget Considerations

Your reflection prompt plus placeholders consume context tokens:

Component Typical Size
Custom prompt template 100-500 tokens
{component_text} 50-500 tokens
{trials} 200-2000 tokens (depends on trial count)
Response 50-500 tokens

Recommendation: Keep your prompt template under 500 tokens. For larger instruction sets, consider reducing batch size or using a model with larger context.

Model Capability vs Task Complexity

Task Complexity Recommended Model Tier Examples
Simple rewording Local (7B-13B) Ollama gpt-oss:7b
Structured improvements Local (20B+) Ollama gpt-oss:20b (default)
Complex reasoning Cloud (cheap) GPT-4o-mini, Claude Haiku
Domain expertise Cloud (premium) GPT-4o, Claude Sonnet

Cost vs Quality Tradeoffs

Model Tier Cost Quality Speed When to Use
Local (Ollama) Free Good Medium Development, iteration, cost-sensitive production
Cloud Cheap ~$0.15/1M tokens Better Fast Production with budget constraints
Cloud Premium ~$5-15/1M tokens Best Fast High-stakes applications, complex domains

Configuring the Reflection Model

from gepa_adk import evolve, EvolutionConfig

# Local model (default)
config = EvolutionConfig(
    reflection_model="ollama_chat/gpt-oss:20b",
)

# Cloud model (OpenAI)
config = EvolutionConfig(
    reflection_model="gpt-4o-mini",
)

# Cloud model (Anthropic)
config = EvolutionConfig(
    reflection_model="claude-3-haiku-20240307",
)

Validation and Debugging

Check for Missing Placeholders

The system logs warnings if placeholders are missing:

WARNING: reflection_prompt is missing {trials} placeholder

Test Your Prompt

Before running evolution, test your prompt manually:

from gepa_adk.engine.adk_reflection import REFLECTION_INSTRUCTION

# See what the default looks like
print(REFLECTION_INSTRUCTION)

# Test your custom prompt with sample values
my_prompt = "..."
formatted = my_prompt.format(
    component_text="Be helpful",
    trials="Example 1: Score 0.5..."
)
print(formatted)

Migration from f-string Workaround

If you have custom reflection code that previously embedded data in user messages via f-strings, you can migrate to ADK's template syntax.

Before (f-string in user message)

# OLD: Data embedded directly in user message
user_message = f"""## Component Text to Improve
{component_text}

## Trials
{json.dumps(trials, indent=2)}

Propose an improved version..."""

async for event in runner.run_async(
    user_id="reflection",
    session_id=session_id,
    new_message=Content(role="user", parts=[Part(text=user_message)]),
):
    events.append(event)

After (ADK template substitution)

# NEW: Data in session state, placeholders in instruction
agent = LlmAgent(
    name="Reflector",
    model=LiteLlm(model="ollama_chat/llama3.2:latest"),
    instruction="""## Component Text to Improve
{component_text}

## Trials
{trials}

Propose an improved version...""",
)

# Session state set during session creation
session_state = {
    "component_text": component_text,
    "trials": json.dumps(trials),  # Pre-serialize complex types
}

await session_service.create_session(
    app_name="gepa_reflection",
    user_id="reflection",
    session_id=session_id,
    state=session_state,
)

# Simple trigger message - ADK handles substitution
async for event in runner.run_async(
    user_id="reflection",
    session_id=session_id,
    new_message=Content(role="user", parts=[Part(text="Improve the component.")]),
):
    events.append(event)

Benefits of Template Syntax

  1. Cleaner separation: Data lives in session state, logic in instruction
  2. ADK-native: Uses documented ADK patterns
  3. Testable: Session state can be inspected/mocked independently
  4. Consistent: Aligns with other ADK agents using template substitution