Utils
utils ¶
Utility functions for trajectory extraction and processing.
This module provides utilities for extracting, redacting, and truncating trajectory data from ADK agent execution events.
The primary function is extract_trajectory which orchestrates the complete extraction pipeline: raw data extraction → redaction → truncation → trajectory construction. Configuration is provided via TrajectoryConfig from the domain layer.
| ATTRIBUTE | DESCRIPTION |
|---|---|
extract_trajectory | Main trajectory extraction API with configuration support for redaction and truncation. TYPE: |
Examples:
Extract with default configuration (redaction + truncation enabled):
from gepa_adk.utils import extract_trajectory
from gepa_adk.domain.types import TrajectoryConfig
# Extract with defaults
trajectory = extract_trajectory(events, final_output="Response text")
# Custom configuration
config = TrajectoryConfig(
include_tool_calls=True,
redact_sensitive=True,
sensitive_keys=("password", "api_key", "secret"),
max_string_length=5000,
)
trajectory = extract_trajectory(events, config=config)
See Also
gepa_adk.utils.events: Implementation of extraction utilities- [
gepa_adk.domain.types.TrajectoryConfig] [gepa_adk.domain.types.TrajectoryConfig]: Configuration dataclass gepa_adk.domain.trajectory: Trajectory domain models
Note
These utilities are infrastructure concerns, not domain logic. They consume domain models but don't define them.
EncodingSafeProcessor ¶
Sanitize strings for console encoding compatibility.
A structlog processor that ensures all string values in the event dict can be safely written to the console regardless of its encoding (cp1252 on Windows, UTF-8 on macOS/Linux).
The sanitization strategy: 1. Apply smart character replacements (preserve meaning) 2. Encode to console encoding with 'replace' error handler 3. Decode back to string
This produces strings that are guaranteed to be writable to the console without raising UnicodeEncodeError.
| ATTRIBUTE | DESCRIPTION |
|---|---|
REPLACEMENTS | Class constant mapping Unicode characters to ASCII equivalents that preserve semantic meaning. TYPE: |
encoding | The target console encoding detected at initialization. TYPE: |
sanitize_string | Sanitize a single string for console encoding. TYPE: |
Examples:
Use as a structlog processor:
processor = EncodingSafeProcessor()
event = {"event": "User said \u2018hello\u2019"}
result = processor(None, "info", event)
# result["event"] == "User said 'hello'"
Note
All sanitization is idempotent - processing already-sanitized strings produces identical output.
Source code in src/gepa_adk/utils/encoding.py
| |
__init__ ¶
Initialize the processor with console encoding detection.
Detects the console encoding from sys.stdout.encoding, falling back to UTF-8 if detection fails (e.g., when stdout is redirected or unavailable).
Note
Console encoding is cached at initialization time to avoid repeated attribute lookups during log processing.
Source code in src/gepa_adk/utils/encoding.py
__call__ ¶
__call__(
logger: Any,
method_name: str,
event_dict: MutableMapping[str, Any],
) -> MutableMapping[str, Any]
Process event dictionary, sanitizing all string values.
Implements the structlog processor protocol. Recursively sanitizes all string values in the event dictionary, including nested dicts and lists.
| PARAMETER | DESCRIPTION |
|---|---|
logger | The wrapped logger instance (unused but required by protocol). TYPE: |
method_name | Name of the log method called (e.g., "info", "debug"). Unused but required by protocol. TYPE: |
event_dict | Mutable mapping of log event data to sanitize. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
MutableMapping[str, Any] | The event_dict with all string values sanitized for console |
MutableMapping[str, Any] | encoding compatibility. |
Note
Original event_dict is not modified; a new dict is returned with sanitized values.
Source code in src/gepa_adk/utils/encoding.py
sanitize_string ¶
Sanitize a single string for console encoding.
Applies smart character replacements first to preserve meaning, then uses encode/decode with 'replace' error handler for any remaining unencodable characters.
| PARAMETER | DESCRIPTION |
|---|---|
s | The string to sanitize. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | A string that is guaranteed to be encodable to the console |
str | encoding without raising UnicodeEncodeError. |
Examples:
Sanitize a string with smart quotes:
processor = EncodingSafeProcessor()
result = processor.sanitize_string("User said ‘hello’")
assert result == "User said 'hello'"
Note
Order of operations: smart replacements run first to preserve semantic meaning, then encode/decode handles remaining characters.
Source code in src/gepa_adk/utils/encoding.py
SchemaValidationResult dataclass ¶
Result of validating schema text.
This dataclass contains both the deserialized Pydantic model class and metadata about the schema structure.
| ATTRIBUTE | DESCRIPTION |
|---|---|
schema_class | The deserialized Pydantic BaseModel subclass. TYPE: |
class_name | Name of the class found in the schema text. TYPE: |
field_count | Number of fields defined in the schema. TYPE: |
field_names | Tuple of field names in the schema. TYPE: |
Examples:
Validating schema text and inspecting results:
result = validate_schema_text(schema_text)
print(f"Class: {result.class_name}")
print(f"Fields: {result.field_names}")
instance = result.schema_class(name="test", value=42)
Creating a result directly (typically done by validate_schema_text):
result = SchemaValidationResult(
schema_class=MySchema,
class_name="MySchema",
field_count=2,
field_names=("name", "value"),
)
Source code in src/gepa_adk/utils/schema_utils.py
StateGuard ¶
Validates and repairs mutated component_text to preserve ADK state tokens.
StateGuard ensures that required state injection tokens are preserved during component_text evolution, and escapes unauthorized new tokens introduced by reflection. Supports simple tokens ({name}), prefixed tokens ({app:settings}), optional tokens ({name?}), and combined formats ({app:config?}).
| ATTRIBUTE | DESCRIPTION |
|---|---|
required_tokens | List of tokens that must always be present, including braces (e.g., ["{user_id}", "{app:settings}", "{name?}"]). TYPE: |
repair_missing | Whether to re-append missing tokens. Defaults to True. TYPE: |
escape_unauthorized | Whether to escape new unauthorized tokens. Defaults to True. TYPE: |
_token_pattern | Compiled regex for token detection (private). TYPE: |
Examples:
Basic usage with token repair:
guard = StateGuard(required_tokens=["{user_id}", "{context}"])
original = "Hello {user_id}, context: {context}"
mutated = "Hello {user_id}, welcome!"
result = guard.validate(original, mutated)
# result == "Hello {user_id}, welcome!\n\n{context}"
Escaping unauthorized tokens:
guard = StateGuard(required_tokens=["{user_id}"])
original = "Process for {user_id}"
mutated = "Process for {user_id} with {malicious}"
result = guard.validate(original, mutated)
# result == "Process for {user_id} with {{malicious}}"
Note
All validation logic is stateless and operates on string inputs only. No external dependencies or I/O operations are performed.
Supported token formats: - Simple tokens: {name}, {user_id}, {context} - Prefixed tokens: {app:settings}, {user:api_key}, {temp:session} - Optional tokens: {name?}, {user_id?} - Combined: {app:config?}, {user:pref?}
Source code in src/gepa_adk/utils/state_guard.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
__init__ ¶
__init__(
required_tokens: list[str] | None = None,
repair_missing: bool = True,
escape_unauthorized: bool = True,
) -> None
Initialize StateGuard with configuration.
| PARAMETER | DESCRIPTION |
|---|---|
required_tokens | List of tokens that must be preserved, including braces (e.g., ["{user_id}", "{context}"]). Defaults to empty list. TYPE: |
repair_missing | If True, re-append missing required tokens. Defaults to True. TYPE: |
escape_unauthorized | If True, escape new unauthorized tokens. Defaults to True. TYPE: |
Note
Configuration determines which tokens are protected and which behaviors are enabled. Both repair and escape are enabled by default for maximum safety.
Source code in src/gepa_adk/utils/state_guard.py
get_validation_summary ¶
Summarize missing token repairs and unauthorized token escapes.
| PARAMETER | DESCRIPTION |
|---|---|
original | The component_text before mutation (reference for tokens). TYPE: |
mutated | The component_text after mutation (pre-validation). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
tuple[list[str], list[str]] | Tuple of (repaired_tokens, escaped_tokens) as token strings with braces. |
Examples:
Summarize repairs and escapes before validation:
guard = StateGuard(required_tokens=["{user_id}"])
repaired, escaped = guard.get_validation_summary(
"Hello {user_id}",
"Hello {user_id} {malicious}",
)
# repaired == []
# escaped == ["{malicious}"]
Note
Outputs what validate() would repair or escape given the current configuration and inputs, using the same token detection logic as validate(), without modifying the component_text.
Source code in src/gepa_adk/utils/state_guard.py
validate ¶
Validate and repair mutated component_text.
Compares the original and mutated component_text to: 1. Re-append missing required tokens (if repair_missing=True) 2. Escape unauthorized new tokens (if escape_unauthorized=True)
| PARAMETER | DESCRIPTION |
|---|---|
original | The component_text before mutation (reference for tokens). Used to determine which tokens were present initially. TYPE: |
mutated | The component_text after mutation (to be validated). This is the component_text that may have missing or unauthorized tokens. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | The mutated component_text with repairs and escapes applied. |
str | Missing required tokens are appended at the end with |
str | Unauthorized new tokens are escaped by doubling braces: |
Examples:
Repair missing token:
guard = StateGuard(required_tokens=["{user_id}"])
result = guard.validate("Hello {user_id}", "Hello")
# result == "Hello\n\n{user_id}"
Escape unauthorized token:
guard = StateGuard(required_tokens=["{user_id}"])
result = guard.validate(
"Process {user_id}", "Process {user_id} {malicious}"
)
# result == "Process {user_id} {{malicious}}"
Note
Only tokens present in both the original component_text and the required_tokens list are eligible for repair. New tokens not in required_tokens are escaped by default.
Source code in src/gepa_adk/utils/state_guard.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
deserialize_generate_config ¶
deserialize_generate_config(
yaml_text: str,
existing: "GenerateContentConfig | None" = None,
) -> "GenerateContentConfig"
Parse YAML text into GenerateContentConfig, optionally merging with existing.
Parses the YAML text and creates a new GenerateContentConfig. If an existing config is provided, unspecified parameters retain existing values (merge).
| PARAMETER | DESCRIPTION |
|---|---|
yaml_text | YAML string to parse. May be empty. TYPE: |
existing | Optional existing config to merge with. If provided, unspecified parameters retain existing values. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
'GenerateContentConfig' | New GenerateContentConfig instance. |
| RAISES | DESCRIPTION |
|---|---|
ConfigValidationError | If yaml_text is malformed YAML. |
Examples:
# Parse standalone
config = deserialize_generate_config("temperature: 0.5")
assert config.temperature == 0.5
# Merge with existing
existing = GenerateContentConfig(temperature=0.7, top_p=0.9)
merged = deserialize_generate_config("temperature: 0.5", existing)
assert merged.temperature == 0.5
assert merged.top_p == 0.9 # Preserved from existing
Note
Does NOT validate parameter constraints - use validate_generate_config() separately. This allows inspection of invalid values before rejection.
Source code in src/gepa_adk/utils/config_utils.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
serialize_generate_config ¶
Convert GenerateContentConfig to YAML string with parameter descriptions.
Serializes only the evolvable parameters (temperature, top_p, top_k, max_output_tokens, presence_penalty, frequency_penalty) with YAML comments describing each parameter.
| PARAMETER | DESCRIPTION |
|---|---|
config | The GenerateContentConfig instance to serialize. Returns empty string if None. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | YAML string with parameter descriptions as comments. |
str | Empty string if config is None or has no evolvable parameters set. |
Examples:
config = GenerateContentConfig(temperature=0.7, top_p=0.9)
yaml_text = serialize_generate_config(config)
# yaml_text contains:
# # temperature: Controls randomness (0.0=deterministic, 2.0=creative)
# temperature: 0.7
# # top_p: Nucleus sampling threshold (0.0-1.0)
# top_p: 0.9
Note
Output is parseable by yaml.safe_load() and includes only evolvable parameters that have non-None values.
Source code in src/gepa_adk/utils/config_utils.py
validate_generate_config ¶
Validate config dict against known parameter constraints.
Checks each parameter value against its defined constraints: - temperature: 0.0 to 2.0 - top_p: 0.0 to 1.0 - top_k: > 0 - max_output_tokens: > 0 - presence_penalty: -2.0 to 2.0 - frequency_penalty: -2.0 to 2.0
Unknown parameters trigger a warning log but do NOT cause errors (they may be model-specific parameters).
| PARAMETER | DESCRIPTION |
|---|---|
config_dict | Dictionary of parameter name to value. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
list[str] | List of validation error messages. Empty list means valid. |
Examples:
# Valid config
errors = validate_generate_config({"temperature": 0.7, "top_p": 0.9})
assert errors == []
# Invalid config
errors = validate_generate_config({"temperature": 3.0})
assert len(errors) == 1
assert "temperature" in errors[0]
# Unknown param - no error, just warning
errors = validate_generate_config({"unknown_param": 42})
assert errors == [] # Warning logged, but no error
Note
Only validates known evolvable parameters. Unknown parameters are logged as warnings but accepted (may be model-specific).
Source code in src/gepa_adk/utils/config_utils.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 | |
extract_final_output ¶
Extract final output text from ADK event stream.
Extracts the final response text from ADK events, handling both event.actions.response_content (preferred) and event.content.parts (fallback) response sources. Filters out reasoning/thought content marked with part.thought=True.
| PARAMETER | DESCRIPTION |
|---|---|
events | List of ADK Event objects from agent execution. TYPE: |
prefer_concatenated | If True, concatenate all non-thought text parts from all final response events. If False (default), return only the last non-thought text part from the last final response event. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Extracted output text as a string. Returns empty string if no valid |
str | output can be extracted (empty events, no final responses, all thought |
str | parts, or missing attributes). |
Examples:
Basic extraction (default mode - returns LAST final response):
Streaming/concatenation mode for CriticScorer:
events = await runner.run_async(...)
output = extract_final_output(events, prefer_concatenated=True)
Note
Scans events for is_final_response()=True, filters thought parts, skips empty/None text, and handles missing attributes gracefully. Response source priority: response_content > content.parts. Default mode returns LAST final response (for multi-agent pipelines).
Source code in src/gepa_adk/utils/events.py
extract_trajectory ¶
extract_trajectory(
events: list[Any],
final_output: str = "",
error: str | None = None,
config: TrajectoryConfig | None = None,
) -> ADKTrajectory
Extract trajectory from ADK execution events with optional processing.
Extracts tool calls, state deltas, and token usage from ADK event stream, applying redaction and truncation based on configuration.
| PARAMETER | DESCRIPTION |
|---|---|
events | List of ADK Event objects from agent execution. TYPE: |
final_output | Final text response from the agent. Defaults to empty string. TYPE: |
error | Error message if execution failed. Defaults to None. TYPE: |
config | Extraction configuration. If None, uses TrajectoryConfig defaults. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ADKTrajectory | ADKTrajectory with extracted and processed data according to config. |
Examples:
Basic extraction with defaults:
from google.adk.events import Event
events = [...] # From ADK runner
trajectory = extract_trajectory(events, final_output="Response")
Custom configuration:
config = TrajectoryConfig(
include_tool_calls=True,
include_state_deltas=False,
redact_sensitive=True,
max_string_length=5000,
)
trajectory = extract_trajectory(events, config=config)
With error:
Note
Extraction follows this order: 1. Extract raw data from events (tool calls, state, tokens) 2. Apply redaction if config.redact_sensitive is True 3. Apply truncation if config.max_string_length is not None 4. Build immutable ADKTrajectory
Empty events list is valid and returns empty trajectory.
Source code in src/gepa_adk/utils/events.py
| |
deserialize_schema ¶
Deserialize validated schema text to a Pydantic model class.
Convenience function that calls validate_schema_text and returns only the schema class. Use this when you only need the class and don't need the validation metadata.
| PARAMETER | DESCRIPTION |
|---|---|
schema_text | Python source code defining a Pydantic model. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
type[BaseModel] | The deserialized Pydantic BaseModel subclass. |
| RAISES | DESCRIPTION |
|---|---|
SchemaValidationError | If validation fails. |
Examples:
Deserialize schema text and create an instance:
from gepa_adk.utils.schema_utils import deserialize_schema
schema_text = '''
class EvolvedSchema(BaseModel):
result: str
confidence: float = Field(ge=0.0, le=1.0)
'''
EvolvedSchema = deserialize_schema(schema_text)
instance = EvolvedSchema(result="success", confidence=0.95)
Source code in src/gepa_adk/utils/schema_utils.py
serialize_pydantic_schema ¶
Serialize a Pydantic model class to Python source code.
Uses inspect.getsource() to retrieve the original source code definition of the schema class. This preserves Field() constraints, defaults, and docstrings.
| PARAMETER | DESCRIPTION |
|---|---|
schema_class | The Pydantic BaseModel subclass to serialize. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Python source code string defining the class. |
| RAISES | DESCRIPTION |
|---|---|
TypeError | If schema_class is not a BaseModel subclass or is an instance. |
OSError | If source code cannot be retrieved (e.g., dynamic class). |
Examples:
Serialize a schema to source code:
from pydantic import BaseModel, Field
from gepa_adk.utils.schema_utils import serialize_pydantic_schema
class MySchema(BaseModel):
name: str
value: int = Field(ge=0)
text = serialize_pydantic_schema(MySchema)
# text contains the Python source code for MySchema
Source code in src/gepa_adk/utils/schema_utils.py
validate_schema_text ¶
validate_schema_text(
schema_text: str,
*,
allowed_namespace: dict[str, Any] | None = None,
) -> SchemaValidationResult
Validate schema text and return the deserialized class.
Performs three-stage validation: 1. Preprocessing: Strip markdown code fences if present 2. Syntax: Parse Python syntax with ast.parse() 3. Structure: Check for security violations and BaseModel inheritance 4. Execution: Execute in controlled namespace and verify class
| PARAMETER | DESCRIPTION |
|---|---|
schema_text | Python source code defining a Pydantic model. May be wrapped in markdown code fences ( TYPE: |
| PARAMETER | DESCRIPTION |
|---|---|
allowed_namespace | Override default namespace. If None, uses SCHEMA_NAMESPACE. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
SchemaValidationResult | SchemaValidationResult with the deserialized class and metadata. |
| RAISES | DESCRIPTION |
|---|---|
SchemaValidationError | If validation fails at any stage. |
Examples:
Validate schema text and inspect results:
from gepa_adk.utils.schema_utils import validate_schema_text
schema_text = '''
class MySchema(BaseModel):
name: str
value: int
'''
result = validate_schema_text(schema_text)
assert result.class_name == "MySchema"
assert result.field_names == ("name", "value")
Source code in src/gepa_adk/utils/schema_utils.py
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 | |