Single-Agent Evolution¶
This document explains how single-agent evolution works, including the roles of critic and reflection agents, trial structure, and how they work together.
Overview¶
Single-agent evolution optimizes one agent's components (instruction, output_schema, generate_content_config) through iterative improvement.
from gepa_adk import evolve
result = await evolve(
agent=my_agent,
trainset=examples,
critic=critic_agent,
)
Evolvable Components¶
| Component | Type | What It Controls |
|---|---|---|
instruction | str | The agent's prompt/instructions |
output_schema | type[BaseModel] | Pydantic model for structured output |
generate_content_config | GenerateContentConfig | LLM parameters (temperature, etc.) |
By default, only instruction evolves. Specify others explicitly:
result = await evolve(
agent=my_agent,
trainset=examples,
components=["instruction", "output_schema"],
)
The Critic Agent¶
The critic evaluates agent outputs and provides feedback for reflection.
Output Schemas¶
SimpleCriticOutput - Basic evaluation:
class SimpleCriticOutput(BaseModel):
score: float = Field(..., ge=0.0, le=1.0) # Required
feedback: str = Field(...) # Required
CriticOutput - Advanced with dimensions:
class CriticOutput(BaseModel):
score: float = Field(..., ge=0.0, le=1.0) # Required
feedback: str = Field(default="") # Optional
dimension_scores: dict[str, float] = Field(default_factory=dict)
actionable_guidance: str = Field(default="")
Default Instructions¶
Simple critic:
Evaluate the quality of the output.
Provide:
- A score from 0.0 (poor) to 1.0 (excellent)
- Feedback explaining what works and what doesn't
Focus on clarity, accuracy, and completeness in your evaluation.
Advanced critic:
Evaluate the quality of the output across multiple dimensions.
Provide:
- An overall score from 0.0 (poor) to 1.0 (excellent)
- Feedback explaining what works and what doesn't
- Dimension scores for specific quality aspects you identify
- Actionable guidance for concrete improvement steps
Critic Requirements¶
| Requirement | Details |
|---|---|
| Output Format | JSON with score field (required) |
| Score Range | 0.0 to 1.0 (float) |
| Feedback | Recommended for reflection quality |
| Normalization | normalize_feedback() converts to trial format |
The Reflection Agent¶
The reflection agent analyzes trial results and proposes improved component text.
Default Instruction¶
## Component Text to Improve
{component_text}
## Trials
{trials}
Propose an improved version of the component text based on the trials above.
Return ONLY the improved component text, nothing else.
Component-Aware Reflection¶
Different components get specialized reflection agents:
| Component | Factory | Special Tools |
|---|---|---|
output_schema | create_schema_reflection_agent | validate_output_schema tool |
generate_content_config | create_config_reflection_agent | None (YAML validation) |
| default | create_text_reflection_agent | None |
Reflection Requirements¶
| Requirement | Details |
|---|---|
| Placeholders | Must accept {component_text} and {trials} |
| Output Key | Must use output_key="proposed_component_text" |
| Return Format | Plain text (the improved component) |
| Session State | Receives component_text (str) and trials (JSON string) |
Trial Structure¶
Each trial record passed to reflection contains:
{
"feedback": {
"score": 0.85, # From critic
"feedback_text": "...", # From critic
"dimension_scores": {...}, # Optional
"actionable_guidance": "..." # Optional
},
"trajectory": {
"input": "...", # Original task input
"output": "...", # Agent's generated output
"trace": {...} # ADK execution trace
}
}
The trajectory captures execution context—tool calls, state changes, token usage—that helps the reflection agent understand how the agent arrived at its output.
How Critic + Reflection Work Together¶
┌─────────────┐ output ┌─────────────┐ score, ┌─────────────────┐
│ Agent │─────────────▶│ Critic │───feedback───▶│ Trial Builder │
│ (evolving) │ │ (scoring) │ │ │
└─────────────┘ └─────────────┘ └────────┬────────┘
│
│ trials
▼
┌─────────────┐ proposed ┌─────────────────┐ │
│ Component │◀─────text─────│ Reflection │◀──────────────────┘
│ Handler │ │ Agent │
└─────────────┘ └─────────────────┘
The flow:
- Agent produces output from input
- Critic scores output →
{score, feedback, dimensions, guidance} - Trial Builder combines into
{feedback, trajectory} - Reflection Agent receives
{component_text, trials}→ proposes improvement - Component Handler applies proposed text to candidate
- Engine re-evaluates → accepts if score improves
Example: Complete Single-Agent Evolution¶
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from gepa_adk import evolve, SimpleCriticOutput
# Agent to evolve
writer = LlmAgent(
name="writer",
model=LiteLlm(model="ollama_chat/llama3.2:latest"),
instruction="Write a haiku about the given topic.",
)
# Critic to evaluate
critic = LlmAgent(
name="critic",
model=LiteLlm(model="ollama_chat/llama3.2:latest"),
instruction="""
Evaluate this haiku for:
- Correct 5-7-5 syllable structure
- Imagery and emotional impact
- Connection to the topic
Score 0.0-1.0 and explain your reasoning.
""",
output_schema=SimpleCriticOutput,
)
# Training examples
trainset = [
{"topic": "autumn leaves"},
{"topic": "morning coffee"},
{"topic": "city rain"},
]
# Run evolution
result = await evolve(
agent=writer,
trainset=trainset,
critic=critic,
)
print(f"Improved: {result.original_score:.2f} → {result.final_score:.2f}")
print(f"New instruction: {result.evolved_components['instruction']}")
Next Steps¶
- Multi-Agent Evolution - How multiple agents evolve together
- Workflow Agents - How workflow structures evolve
- Critic Agents Guide - Practical guide to building critics