Adapters

adapters ¶

Adapters layer - External implementations of ports.

Adapters connect the domain logic to external systems (Google ADK, LiteLLM, etc.). Each adapter implements one or more protocol interfaces from the ports layer.

ATTRIBUTE	DESCRIPTION
`ADKAdapter`	AsyncGEPAAdapter implementation for Google ADK agents. TYPE: `class`
`TrialBuilder`	Shared utility for building trial records in reflection datasets. TYPE: `class`

Examples:

Basic usage with Google ADK agent:

from google.adk.agents import LlmAgent
from gepa_adk.adapters import ADKAdapter

agent = LlmAgent(name="helper", model="gemini-2.5-flash")
adapter = ADKAdapter(agent=agent, scorer=my_scorer)
result = await adapter.evaluate(batch, candidate)

See Also

gepa_adk.ports.adapter: AsyncGEPAAdapter protocol.
gepa_adk.ports.scorer: Scorer protocol for metrics.
gepa_adk.domain.trajectory: ADKTrajectory types.

Note

This layer ONLY contains adapters - they import from ports/ and domain/ but never the reverse. This maintains hexagonal architecture boundaries.

ADKAdapter ¶

ADK implementation of AsyncGEPAAdapter protocol.

Bridges GEPA evaluation patterns to Google ADK's agent/runner architecture, enabling evolutionary optimization of ADK agents through instruction mutation and reflective learning.

ATTRIBUTE	DESCRIPTION
`agent`	The ADK LlmAgent to evaluate with different candidate instructions. TYPE: `LlmAgent`
`scorer`	Scoring implementation for evaluating agent outputs. TYPE: `Scorer`
`max_concurrent_evals`	Maximum number of concurrent evaluations to run in parallel. TYPE: `int`
`trajectory_config`	Configuration for trajectory extraction behavior (redaction, truncation, feature selection). TYPE: `TrajectoryConfig`
`_session_service`	Session service for managing agent state isolation. TYPE: `BaseSessionService`
`_app_name`	Application name used for session management. TYPE: `str`
`_proposer`	Mutation proposer for generating improved instructions via LLM reflection. TYPE: `AsyncReflectiveMutationProposer`
`_logger`	Bound logger with adapter context for structured logging. TYPE: `BoundLogger`

Examples:

Basic adapter setup:

from google.adk.agents import LlmAgent
from gepa_adk.adapters import ADKAdapter
from gepa_adk.adapters.agent_executor import AgentExecutor

agent = LlmAgent(
    name="helper",
    model="gemini-2.5-flash",
    instruction="Be helpful and concise",
)
scorer = MyScorer()  # Implements Scorer protocol
executor = AgentExecutor()
adapter = ADKAdapter(agent, scorer, executor)

# Evaluate with candidate instruction
batch = [{"input": "What is 2+2?", "expected": "4"}]
candidate = {"instruction": "Be very precise with math"}
result = await adapter.evaluate(batch, candidate)

Note

Adheres to AsyncGEPAAdapter[dict[str, Any], ADKTrajectory, str] protocol. All methods are async and follow ADK's async-first patterns.

Source code in src/gepa_adk/adapters/adk_adapter.py

class ADKAdapter:
    """ADK implementation of AsyncGEPAAdapter protocol.

    Bridges GEPA evaluation patterns to Google ADK's agent/runner architecture,
    enabling evolutionary optimization of ADK agents through instruction mutation
    and reflective learning.

    Attributes:
        agent (LlmAgent): The ADK LlmAgent to evaluate with different candidate
            instructions.
        scorer (Scorer): Scoring implementation for evaluating agent outputs.
        max_concurrent_evals (int): Maximum number of concurrent evaluations
            to run in parallel.
        trajectory_config (TrajectoryConfig): Configuration for trajectory
            extraction behavior (redaction, truncation, feature selection).
        _session_service (BaseSessionService): Session service for managing
            agent state isolation.
        _app_name (str): Application name used for session management.
        _proposer (AsyncReflectiveMutationProposer): Mutation proposer for
            generating improved instructions via LLM reflection.
        _logger (structlog.BoundLogger): Bound logger with adapter context for
            structured logging.

    Examples:
        Basic adapter setup:

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters import ADKAdapter
        from gepa_adk.adapters.agent_executor import AgentExecutor

        agent = LlmAgent(
            name="helper",
            model="gemini-2.5-flash",
            instruction="Be helpful and concise",
        )
        scorer = MyScorer()  # Implements Scorer protocol
        executor = AgentExecutor()
        adapter = ADKAdapter(agent, scorer, executor)

        # Evaluate with candidate instruction
        batch = [{"input": "What is 2+2?", "expected": "4"}]
        candidate = {"instruction": "Be very precise with math"}
        result = await adapter.evaluate(batch, candidate)
        ```

    Note:
        Adheres to AsyncGEPAAdapter[dict[str, Any], ADKTrajectory, str] protocol.
        All methods are async and follow ADK's async-first patterns.
    """

    def __init__(
        self,
        agent: LlmAgent,
        scorer: Scorer,
        executor: AgentExecutorProtocol,
        max_concurrent_evals: int = 5,
        session_service: BaseSessionService | None = None,
        app_name: str = "gepa_adk_eval",
        trajectory_config: TrajectoryConfig | None = None,
        proposer: AsyncReflectiveMutationProposer | None = None,
        reflection_agent: LlmAgent | None = None,
        reflection_output_field: str | None = None,
        schema_constraints: SchemaConstraints | None = None,
        video_service: VideoBlobServiceProtocol | None = None,
    ) -> None:
        """Initialize the ADK adapter with agent and scorer.

        Args:
            agent: The ADK LlmAgent to evaluate with different instructions.
            scorer: Scorer implementation for evaluating agent outputs.
            executor: AgentExecutorProtocol implementation for unified agent execution.
                The executor handles session management and execution, enabling feature
                parity across all agent types.
            max_concurrent_evals: Maximum number of concurrent evaluations to run
                in parallel. Must be at least 1. Defaults to 5.
            session_service: Optional session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for session identification.
            trajectory_config: Configuration for trajectory extraction behavior.
                If None, uses TrajectoryConfig defaults (secure, all features enabled).
            proposer: Optional mutation proposer for generating improved instructions
                via LLM reflection. If provided, takes precedence over reflection_agent.
            reflection_agent: ADK LlmAgent to use for reflection operations.
                Either this or proposer must be provided. When provided, creates an
                ADK-based reflection function and passes it to a new proposer.
            reflection_output_field: Field name to extract from structured output when
                reflection_agent has an output_schema. When the reflection agent returns
                structured output (dict), this specifies which field contains the proposed
                text. For schema evolution, use "class_definition" with a SchemaProposal
                output_schema. Only used when reflection_agent is provided.
            schema_constraints: Optional SchemaConstraints for output_schema evolution.
                When provided, proposed schema mutations are validated against these
                constraints. Mutations that violate constraints (e.g., remove required
                fields) are rejected and the original schema is preserved.
            video_service: Optional VideoBlobServiceProtocol for multimodal input support.
                When provided, enables processing of trainset examples with 'videos' field.
                If None, defaults to a new VideoBlobService instance.

        Raises:
            TypeError: If agent is not an LlmAgent instance.
            TypeError: If scorer does not satisfy Scorer protocol.
            TypeError: If reflection_agent is provided but not an LlmAgent instance.
            ValueError: If app_name is empty string or max_concurrent_evals < 1.
            ValueError: If neither proposer nor reflection_agent is provided.

        Examples:
            Basic setup with reflection agent:

            ```python
            from gepa_adk.adapters.agent_executor import AgentExecutor

            reflection_agent = LlmAgent(name="reflector", model="gemini-2.5-flash")
            executor = AgentExecutor()
            adapter = ADKAdapter(
                agent, scorer, executor, reflection_agent=reflection_agent
            )
            ```

            With custom trajectory configuration:

            ```python
            config = TrajectoryConfig(
                redact_sensitive=True,
                max_string_length=5000,
            )
            executor = AgentExecutor()
            adapter = ADKAdapter(
                agent,
                scorer,
                executor,
                reflection_agent=reflection_agent,
                trajectory_config=config,
            )
            ```

            With shared session service:

            ```python
            from google.adk.sessions import InMemorySessionService
            from gepa_adk.adapters.agent_executor import AgentExecutor

            session_service = InMemorySessionService()
            executor = AgentExecutor(session_service=session_service)
            adapter = ADKAdapter(
                agent,
                scorer,
                executor,
                reflection_agent=reflection_agent,
                session_service=session_service,
            )
            ```

        Note:
            Caches the agent's original instruction and restores it after
            each evaluation to ensure no side effects between evaluations.
        """
        # Type validation
        if not isinstance(agent, LlmAgent):
            raise TypeError(f"agent must be LlmAgent, got {type(agent)}")

        # Scorer protocol check (runtime_checkable)
        if not hasattr(scorer, "score") or not hasattr(scorer, "async_score"):
            raise TypeError(
                f"scorer must implement Scorer protocol, got {type(scorer)}"
            )

        if not app_name or not app_name.strip():
            raise ValueError("app_name cannot be empty")

        if max_concurrent_evals < 1:
            raise ValueError(
                f"max_concurrent_evals must be at least 1, got {max_concurrent_evals}"
            )

        # Validate reflection_agent if provided
        if reflection_agent is not None and not isinstance(reflection_agent, LlmAgent):
            raise TypeError(
                f"reflection_agent must be LlmAgent, got {type(reflection_agent)}"
            )

        self.agent = agent
        self.scorer = scorer
        self.max_concurrent_evals = max_concurrent_evals
        self.trajectory_config = trajectory_config or TrajectoryConfig()
        self._session_service = session_service or InMemorySessionService()
        self._app_name = app_name.strip()

        # Store executor for unified execution (T048)
        self._executor = executor

        # Store video service for multimodal input support
        self._video_service = video_service or VideoBlobService()

        # Store and apply schema constraints to output_schema handler
        self._schema_constraints = schema_constraints
        if schema_constraints is not None:
            handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
            if isinstance(handler, OutputSchemaHandler):
                handler.set_constraints(schema_constraints)

        # Bind logger with adapter context
        self._logger = logger.bind(
            adapter="ADKAdapter",
            agent_name=self.agent.name,
            app_name=self._app_name,
        )

        # Initialize trial builder for reflective dataset construction
        self._trial_builder = TrialBuilder()

        # Create proposer with clear precedence: proposer overrides reflection_agent.
        if proposer is not None:
            if reflection_agent is not None:
                self._logger.warning(
                    "adapter.proposer.precedence",
                    message="proposer parameter takes precedence over reflection_agent",
                )
            self._proposer = proposer
        elif reflection_agent is not None:
            # Create ADK reflection function and pass to proposer
            adk_reflection_fn = create_adk_reflection_fn(
                reflection_agent,
                executor=self._executor,
                session_service=self._session_service,
                output_field=reflection_output_field,
            )
            self._proposer = AsyncReflectiveMutationProposer(
                adk_reflection_fn=adk_reflection_fn
            )
        else:
            raise ValueError(
                "Either proposer or reflection_agent must be provided. "
                "Use reflection_agent with an ADK LlmAgent for reflection operations."
            )

        self._logger.info("adapter.initialized")

    def cleanup(self) -> None:
        """Clean up adapter resources and clear handler constraints.

        Clears any schema constraints set on the OutputSchemaHandler to prevent
        constraint leakage between evolution runs. Should be called when the
        adapter is no longer needed.

        Note:
            OutputSchemaHandler is a singleton, so constraints set during one
            evolution run could affect subsequent runs if not cleared.
        """
        if self._schema_constraints is not None:
            handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
            if isinstance(handler, OutputSchemaHandler):
                handler.set_constraints(None)
            self._logger.debug("adapter.cleanup.constraints_cleared")

    async def evaluate(
        self,
        batch: list[dict[str, Any]],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[ADKTrajectory, str]:
        """Evaluate agent with candidate instruction over a batch of inputs.

        Args:
            batch: List of input examples, each with "input" key and optional
                "expected" key for scoring.
            candidate: Component name to text mapping. If "instruction" key
                is present, it overrides the agent's instruction.
            capture_traces: Whether to capture execution traces (tool calls,
                state deltas, token usage).

        Returns:
            EvaluationBatch containing outputs, scores, and optional trajectories.

        Examples:
            Basic evaluation without traces:

            ```python
            batch = [
                {"input": "What is 2+2?", "expected": "4"},
                {"input": "Capital of France?", "expected": "Paris"},
            ]
            candidate = {"instruction": "Be concise"}
            result = await adapter.evaluate(batch, candidate)
            assert len(result.outputs) == 2
            assert len(result.scores) == 2
            ```

            With trace capture:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            assert result.trajectories is not None
            assert len(result.trajectories) == len(batch)
            ```

        Note:
            Original instruction is restored after evaluation completes,
            even if an exception occurs during evaluation.
            Evaluations run in parallel with concurrency controlled by
            max_concurrent_evals parameter. Results maintain input order
            despite parallel execution.
        """
        self._logger.info(
            "adapter.evaluate.start",
            batch_size=len(batch),
            max_concurrent=self.max_concurrent_evals,
            capture_traces=capture_traces,
        )

        # Handle empty batch case
        if not batch:
            self._logger.info("adapter.evaluate.complete", batch_size=0)
            return EvaluationBatch(outputs=[], scores=[], trajectories=None)

        # Apply candidate components (instruction and/or output_schema) and save originals
        originals = self._apply_candidate(candidate)

        try:
            # Create semaphore to limit concurrent evaluations
            semaphore = asyncio.Semaphore(self.max_concurrent_evals)

            # Create tasks for all examples
            tasks = [
                self._eval_single_with_semaphore(
                    example=example,
                    example_index=i,
                    candidate=candidate,
                    capture_traces=capture_traces,
                    semaphore=semaphore,
                )
                for i, example in enumerate(batch)
            ]

            # Execute all tasks in parallel with exception handling
            results = await asyncio.gather(*tasks, return_exceptions=True)

            # Process results and maintain order
            outputs: list[str] = []
            scores: list[float] = []
            trajectories: list[ADKTrajectory] | None = [] if capture_traces else None
            metadata_list: list[dict[str, Any]] = []
            inputs: list[str] = []  # Collect inputs for reflection

            successful = 0
            failed = 0

            for i, result in enumerate(results):
                # Collect input text for this example
                inputs.append(batch[i].get("input", ""))
                if isinstance(result, Exception):
                    # Handle exception case
                    self._logger.warning(
                        "adapter.evaluate.example.error",
                        example_index=i,
                        error=str(result),
                    )
                    outputs.append("")
                    scores.append(0.0)
                    metadata_list.append({})
                    failed += 1

                    if capture_traces:
                        error_trajectory = self._build_trajectory(
                            events=[],
                            final_output="",
                            error=str(result),
                        )
                        trajectories.append(error_trajectory)  # type: ignore
                else:
                    # Unpack success: (output_text, score, trajectory_or_none, metadata_or_none)
                    # After isinstance check, result is guaranteed to be the tuple type
                    output_text, score, trajectory, metadata = result  # type: ignore[misc]
                    outputs.append(output_text)
                    scores.append(score)
                    metadata_list.append(metadata if metadata is not None else {})
                    successful += 1

                    if capture_traces and trajectory is not None:
                        trajectories.append(trajectory)  # type: ignore

            avg_score = sum(scores) / len(scores) if scores else 0.0

            self._logger.info(
                "adapter.evaluate.complete",
                batch_size=len(batch),
                successful=successful,
                failed=failed,
                avg_score=avg_score,
            )

            # Only include metadata if at least one example has non-empty metadata
            # Check if any metadata dict has content (not just empty dicts)
            has_metadata = any(
                meta and isinstance(meta, dict) and meta for meta in metadata_list
            )
            final_metadata = metadata_list if has_metadata else None

            return EvaluationBatch(
                outputs=outputs,
                scores=scores,
                trajectories=trajectories,
                metadata=final_metadata,
                inputs=inputs,
            )

        finally:
            # Always restore original components via registry dispatch
            self._restore_agent(originals)

    def _apply_candidate(self, candidate: dict[str, str]) -> dict[str, Any]:
        """Apply candidate components to agent via registry dispatch.

        Args:
            candidate: Component name to evolved text mapping.
                Example: {"instruction": "Be helpful", "output_schema": "class ..."}

        Returns:
            Dictionary mapping component names to their original values.
            Original values are typed per handler (str for instruction,
            type[BaseModel] or None for output_schema).

        Raises:
            KeyError: If candidate contains unregistered component name.

        Note:
            Original values are captured before overwriting via ComponentHandler
            registry dispatch instead of hardcoded if/elif logic. Each handler's
            apply() method sets the new value and returns the original for
            later restoration.

        Examples:
            >>> originals = adapter._apply_candidate(
            ...     {
            ...         "instruction": "New prompt",
            ...         "output_schema": "class X(BaseModel): ...",
            ...     }
            ... )
            >>> originals
            {"instruction": "Original prompt", "output_schema": <class 'OriginalSchema'>}
        """
        originals: dict[str, Any] = {}

        for component_name, value in candidate.items():
            handler = get_handler(component_name)
            originals[component_name] = handler.apply(self.agent, value)
            self._logger.debug(
                "adapter.component.applied",
                component=component_name,
                value_preview=value[:50] if value else "",
            )

        return originals

    def _restore_agent(self, originals: dict[str, Any]) -> None:
        """Restore agent to original state via registry dispatch.

        Args:
            originals: Component name to original value mapping,
                as returned by _apply_candidate().

        Raises:
            KeyError: If originals contains unregistered component name.

        Note:
            Operates via ComponentHandler registry for dispatch. Each handler's
            restore() method reinstates the original value. Should always
            be called in finally block to ensure restoration even if
            evaluation fails.

        Examples:
            >>> adapter._restore_agent(
            ...     {"instruction": "Original prompt", "output_schema": OriginalSchema}
            ... )
            # agent.instruction == "Original prompt"
            # agent.output_schema == OriginalSchema
        """
        for component_name, original in originals.items():
            handler = get_handler(component_name)
            handler.restore(self.agent, original)

        self._logger.debug(
            "adapter.agent.restored",
            components=list(originals.keys()),
        )

    def _extract_tool_calls(self, events: list[Any]) -> list[ToolCallRecord]:
        """Extract tool call records from ADK Event stream.

        Args:
            events: List of ADK Event objects from runner.

        Returns:
            List of ToolCallRecord instances with tool name, arguments,
            and result (if available).

        Note:
            Scans function_call and function_response parts from
            Event.actions.function_calls if present. Tool calls without
            responses are still recorded. Handles both real ADK Events
            and test mocks gracefully.
        """
        tool_calls: list[ToolCallRecord] = []

        for event in events:
            # Check if event has function_calls in actions
            if hasattr(event, "actions") and hasattr(event.actions, "function_calls"):
                function_calls = event.actions.function_calls
                # function_calls could be None, a list, or a single object
                if function_calls:
                    # Ensure it's iterable
                    if not hasattr(function_calls, "__iter__"):
                        function_calls = [function_calls]

                    try:
                        for fc in function_calls:
                            # Extract name - be defensive for mocks and real objects
                            name = "unknown"

                            # Try direct access first
                            if hasattr(fc, "name"):
                                try:
                                    name_val = fc.name
                                    # Real string value
                                    if isinstance(name_val, str) and name_val != "name":
                                        name = name_val
                                    # Check if it's a Mock (has _mock_name attribute)
                                    elif hasattr(name_val, "_mock_name"):
                                        # Extract actual mock name, not the attribute name
                                        mock_name = str(name_val._mock_name)
                                        # Mock names often have format "mock.attribute.name"
                                        if "." in mock_name:
                                            name = mock_name.split(".")[-1]
                                        else:
                                            name = mock_name
                                except Exception as exc:
                                    logger.debug(
                                        "Failed to extract tool call name; using fallback.",
                                        error=str(exc),
                                        function_call_repr=repr(fc),
                                    )

                            # Extract arguments
                            args = getattr(fc, "args", {})
                            if not isinstance(args, dict):
                                args = {}

                            tool_calls.append(
                                ToolCallRecord(
                                    name=name,
                                    arguments=args,
                                    result=None,  # Will be populated if response found
                                    timestamp=0.0,  # Not tracked in current impl
                                )
                            )
                    except (TypeError, AttributeError):
                        # function_calls not iterable or other issues, skip
                        pass

        return tool_calls

    def _extract_state_deltas(self, events: list[Any]) -> list[dict[str, Any]]:
        """Extract state change records from ADK Event stream.

        Args:
            events: List of ADK Event objects from runner.

        Returns:
            List of dictionaries containing state delta information.
            Each dict has 'key' and 'value' fields from Event.state_delta.

        Note:
            Skips events with None state_delta attributes. State deltas
            capture changes to session or agent state during execution.
        """
        state_deltas: list[dict[str, Any]] = []

        for event in events:
            if hasattr(event, "state_delta") and event.state_delta is not None:
                state_deltas.append(
                    {
                        "key": getattr(event.state_delta, "key", "unknown"),
                        "value": getattr(event.state_delta, "value", None),
                    }
                )

        return state_deltas

    def _extract_token_usage(self, events: list[Any]) -> TokenUsage | None:
        """Extract token usage metadata from ADK Event stream.

        Args:
            events: List of ADK Event objects from runner.

        Returns:
            TokenUsage instance if usage metadata found, None otherwise.

        Note:
            Searches for usage_metadata on final response events. Returns
            the last found usage data (most complete metrics).
        """
        usage_data = None

        for event in events:
            if hasattr(event, "usage_metadata") and event.usage_metadata is not None:
                metadata = event.usage_metadata
                usage_data = TokenUsage(
                    input_tokens=getattr(metadata, "input_tokens", 0),
                    output_tokens=getattr(metadata, "output_tokens", 0),
                    total_tokens=getattr(metadata, "total_tokens", 0),
                )

        return usage_data

    def _build_trajectory(
        self,
        events: list[Any],
        final_output: str,
        error: str | None = None,
    ) -> ADKTrajectory:
        """Assemble complete trajectory from event stream.

        Args:
            events: List of ADK Event objects collected during execution.
            final_output: The final text response from the agent.
            error: Error message if execution failed, None otherwise.

        Returns:
            ADKTrajectory with tool calls, state deltas, token usage,
            final output, and error (if any).

        Note:
            Synthesizes trajectory by delegating to extract_trajectory utility
            with configured trajectory_config. This is the complete execution
            record for one batch example.
        """
        return extract_trajectory(
            events=events,
            final_output=final_output,
            error=error,
            config=self.trajectory_config,
        )

    async def _eval_single_with_semaphore(
        self,
        example: dict[str, Any],
        example_index: int,
        candidate: dict[str, str],
        capture_traces: bool,
        semaphore: asyncio.Semaphore,
    ) -> tuple[str, float, ADKTrajectory | None, dict[str, Any] | None]:
        """Evaluate a single example with semaphore-controlled concurrency.

        Args:
            example: Input example with "input" key and optional "expected" key.
            example_index: Index of example in batch (for logging).
            candidate: Candidate component values (for instruction override).
            capture_traces: Whether to capture execution traces.
            semaphore: Semaphore to control concurrent execution.

        Returns:
            Tuple of (output_text, score, trajectory_or_none, metadata_or_none).
            On failure, returns ("", 0.0, error_trajectory, None).

        Note:
            Semaphore-controlled wrapper around single example evaluation.
            Called from evaluate() for each example in the batch to ensure
            at most max_concurrent_evals evaluations run simultaneously.
        """
        async with semaphore:
            self._logger.debug(
                "adapter.evaluate.example",
                example_index=example_index,
                example_input=example.get("input", "")[:50],
            )

            try:
                # Run the agent for this example
                output_text: str
                if capture_traces:
                    result = await self._run_single_example(
                        example, capture_events=True
                    )
                    # When capture_events=True, result is tuple[str, list[Any]]
                    output_text, events = result
                    # Build trajectory from collected events
                    trajectory = self._build_trajectory(
                        events=list(events),
                        final_output=output_text,
                        error=None,
                    )
                else:
                    # When capture_events=False, result is str
                    run_result = await self._run_single_example(example)
                    output_text = str(run_result)
                    trajectory = None

                # Score the output
                input_text = example.get("input", "")
                expected = example.get("expected")
                score_result = await self.scorer.async_score(
                    input_text, output_text, expected
                )
                # Handle both float and tuple[float, dict] return types
                if isinstance(score_result, tuple):
                    score = score_result[0]
                    metadata = score_result[1]
                else:
                    score = float(score_result)
                    metadata = None

                return (output_text, score, trajectory, metadata)

            except Exception as e:
                # Wrap in domain exception per ADR-009 (preserve batch resilience)
                wrapped = EvaluationError(
                    f"Example {example_index} evaluation failed",
                    cause=e,
                    example_index=example_index,
                )

                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=example_index,
                    error=str(wrapped),
                )

                # Create error trajectory if capturing traces
                error_trajectory = None
                if capture_traces:
                    error_trajectory = self._build_trajectory(
                        events=[],
                        final_output="",
                        error=str(wrapped),
                    )

                return ("", 0.0, error_trajectory, None)

    async def _prepare_multimodal_content(
        self, example: dict[str, Any]
    ) -> Content | None:
        """Prepare multimodal Content from example with videos field.

        Args:
            example: Input example dict with optional 'input' text and 'videos' paths.

        Returns:
            Content object with text and video parts, or None if no videos present.

        Note:
            Sequences video loading via VideoBlobService then Content assembly.
            Text part is included first, followed by video parts in order.
        """
        videos = example.get("videos")
        if not videos:
            return None

        parts: list[Part] = []

        # Add text part first if present
        input_text = example.get("input")
        if input_text:
            parts.append(Part(text=input_text))

        # Load video parts
        video_parts = await self._video_service.prepare_video_parts(videos)
        parts.extend(video_parts)

        self._logger.debug(
            "adapter.multimodal_content.prepared",
            text_present=bool(input_text),
            video_count=len(videos),
            total_parts=len(parts),
        )

        return Content(parts=parts, role="user")

    async def _run_single_example(
        self, example: dict[str, Any], capture_events: bool = False
    ) -> tuple[str, list[Any]] | str:
        """Execute agent on a single input example via AgentExecutor.

        Args:
            example: Input example with "input" key and/or "videos" key.
            capture_events: If True, return (output, events) tuple.
                           If False, return just output string.

        Returns:
            If capture_events=True: (final_output, event_list) tuple
            If capture_events=False: final_output string only

        Raises:
            RuntimeError: If agent execution fails.

        Note:
            Delegates to AgentExecutor for unified execution path.
            When capture_events=True, collects all events for trace extraction.
            Supports multimodal inputs via 'videos' field in example.
        """
        input_text = example.get("input", "")
        videos = example.get("videos")

        # Check if we have any input (text or videos)
        if not input_text and not videos:
            return ("", []) if capture_events else ""

        # Prepare multimodal content if videos present
        input_content = await self._prepare_multimodal_content(example)

        result = await self._executor.execute_agent(
            agent=self.agent,
            input_text=input_text,
            input_content=input_content,
        )

        if result.status == ExecutionStatus.FAILED:
            raise RuntimeError(result.error_message or "Executor returned FAILED")

        final_output = result.extracted_value or ""
        events = result.captured_events or []

        return (final_output, events) if capture_events else final_output

    async def make_reflective_dataset(
        self,
        candidate: dict[str, str],
        eval_batch: EvaluationBatch[ADKTrajectory, str],
        components_to_update: list[str],
    ) -> Mapping[str, Sequence[Mapping[str, Any]]]:
        """Build trials from evaluation results for reflection.

        Terminology:
            - trial: One performance record {input, output, feedback, trajectory}
            - trials: Collection of trial records for a component

        Args:
            candidate: Current candidate component values.
            eval_batch: Evaluation results including trajectories and optional
                scorer metadata (e.g., from CriticScorer).
            components_to_update: List of component names to generate
                trials for.

        Returns:
            Mapping from component name to sequence of trials.
            Each trial contains input, output, feedback (with score and
            feedback_text), and optional trajectory.

        Examples:
            Generate trials for reflection:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            trials_dataset = await adapter.make_reflective_dataset(
                candidate, result, ["instruction"]
            )
            assert "instruction" in trials_dataset
            # Each trial has structured feedback
            trial = trials_dataset["instruction"][0]
            assert "input" in trial
            assert "output" in trial
            assert "feedback" in trial
            assert trial["feedback"]["score"] == 0.75
            ```

        Note:
            Operates on eval_batch trajectories (capture_traces=True required).
            Dataset format is compatible with proposer's trial-based interface.
            Scorer metadata (feedback_text, feedback_dimensions) from
            eval_batch.metadata is included in each trial's feedback dict.
        """
        self._logger.info(
            "adapter.make_reflective_dataset.start",
            num_trials=len(eval_batch.outputs),
            components=components_to_update,
        )

        # Build trials for each requested component
        result: dict[str, list[dict[str, Any]]] = {}

        for component in components_to_update:
            trials: list[dict[str, Any]] = []

            for i, (output, score) in enumerate(
                zip(eval_batch.outputs, eval_batch.scores, strict=True)
            ):
                # Get trajectory info if available
                trajectory = None
                if eval_batch.trajectories and i < len(eval_batch.trajectories):
                    trajectory = eval_batch.trajectories[i]

                # Get metadata if available
                metadata = None
                if eval_batch.metadata and i < len(eval_batch.metadata):
                    metadata = eval_batch.metadata[i]

                # Get input text if available
                input_text = ""
                if eval_batch.inputs and i < len(eval_batch.inputs):
                    input_text = eval_batch.inputs[i]

                trial = self._build_trial(
                    input_text=input_text,
                    output=output,
                    score=score,
                    trajectory=trajectory,
                    metadata=metadata,
                )
                trials.append(trial)

            result[component] = trials

        self._logger.info(
            "adapter.make_reflective_dataset.complete",
            num_components=len(result),
            total_trials=sum(len(t) for t in result.values()),
        )

        return result

    def _build_trace(self, trajectory: ADKTrajectory) -> dict[str, Any] | None:
        """Build trace dict from ADK trajectory data.

        Extracts execution details from ADK trajectory - the intermediate
        steps between input and output (tool calls, token usage, errors).

        Args:
            trajectory: ADK execution record with tool calls, tokens, etc.

        Returns:
            Trace dict with available details, or None if no trace data.
            Can include: tool_calls, tokens, error, and future fields
            like reasoning, state_deltas, etc.
        """
        trace: dict[str, Any] = {}

        if trajectory.tool_calls:
            trace["tool_calls"] = len(trajectory.tool_calls)
        if trajectory.error:
            trace["error"] = trajectory.error
        if trajectory.token_usage:
            trace["tokens"] = trajectory.token_usage.total_tokens

        # Return None if no trace data collected
        return trace if trace else None

    def _build_trial(
        self,
        input_text: str,
        output: str,
        score: float,
        trajectory: ADKTrajectory | None,
        metadata: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """Build a single trial record for reflection.

        Delegates to shared TrialBuilder for consistent trial structure.

        Terminology:
            - trial: One performance record {feedback, trajectory}
            - feedback: Critic evaluation {score, feedback_text, feedback_*}
            - trajectory: The journey from input to output with optional trace

        Args:
            input_text: The input that was given to the system.
            output: What the system produced.
            score: Evaluation score for this output.
            trajectory: Optional execution record with tool calls, state, etc.
            metadata: Optional scorer metadata dict (e.g., from CriticScorer).

        Returns:
            Trial dict with keys: feedback, trajectory.
            - feedback: score (mandatory), feedback_text, feedback_* (optional)
            - trajectory: input, output (mandatory), trace (optional)
        """
        # Build trace from ADK trajectory if available
        trace = self._build_trace(trajectory) if trajectory else None

        return self._trial_builder.build_trial(
            input_text=input_text,
            output=output,
            score=score,
            metadata=metadata,
            trace=trace,
            log_passthrough=True,
        )

    async def propose_new_texts(
        self,
        candidate: dict[str, str],
        reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
        components_to_update: list[str],
    ) -> dict[str, str]:
        """Propose new component texts based on trials.

        Delegates to AsyncReflectiveMutationProposer to generate improved
        component text via LLM reflection on trials. When the proposer returns
        None (no trials), falls back to unchanged candidate values.

        Args:
            candidate: Current candidate component texts (name → text).
            reflective_dataset: Trials from make_reflective_dataset().
                Maps component name to list of trial records.
            components_to_update: Components to generate proposals for.

        Returns:
            Dictionary mapping component names to proposed component text.
            When proposer returns None, returns unchanged candidate values.

        Examples:
            Using the proposer to generate improved component text:

            ```python
            # After evaluation with traces
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            trials = await adapter.make_reflective_dataset(
                candidate, result, ["instruction"]
            )

            # Propose new component text via LLM reflection on trials
            new_texts = await adapter.propose_new_texts(
                candidate, trials, ["instruction"]
            )
            # new_texts["instruction"] contains proposed component text
            ```

        Note:
            Delegates to AsyncReflectiveMutationProposer for actual mutation
            generation. Falls back gracefully when no trials available.
        """
        self._logger.debug(
            "propose_new_texts.delegating",
            components_requested=components_to_update,
        )

        result = await self._proposer.propose(
            candidate, reflective_dataset, components_to_update
        )

        if result is None:
            self._logger.info(
                "propose_new_texts.fallback",
                reason="proposer_returned_none",
                components_requested=components_to_update,
            )
            return {
                component: candidate.get(component, "")
                for component in components_to_update
            }

        # Merge with candidate for any missing components
        merged = {
            component: result.get(component, candidate.get(component, ""))
            for component in components_to_update
        }

        # Log proposed texts for each component
        for component, proposed_text in result.items():
            self._logger.info(
                "proposal.text",
                component=component,
                proposed_length=len(proposed_text),
                proposed_preview=proposed_text[:300] + "..."
                if len(proposed_text) > 300
                else proposed_text,
            )

        self._logger.info(
            "propose_new_texts.complete",
            components_proposed=list(result.keys()),
            components_requested=components_to_update,
        )

        return merged

init ¶

__init__(
    agent: LlmAgent,
    scorer: Scorer,
    executor: AgentExecutorProtocol,
    max_concurrent_evals: int = 5,
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_adk_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    reflection_agent: LlmAgent | None = None,
    reflection_output_field: str | None = None,
    schema_constraints: SchemaConstraints | None = None,
    video_service: VideoBlobServiceProtocol | None = None,
) -> None

Initialize the ADK adapter with agent and scorer.

PARAMETER	DESCRIPTION
`agent`	The ADK LlmAgent to evaluate with different instructions. TYPE: `LlmAgent`
`scorer`	Scorer implementation for evaluating agent outputs. TYPE: `Scorer`
`executor`	AgentExecutorProtocol implementation for unified agent execution. The executor handles session management and execution, enabling feature parity across all agent types. TYPE: `AgentExecutorProtocol`
`max_concurrent_evals`	Maximum number of concurrent evaluations to run in parallel. Must be at least 1. Defaults to 5. TYPE: `int` DEFAULT: `5`
`session_service`	Optional session service for state management. If None, creates an InMemorySessionService. TYPE: `BaseSessionService \| None` DEFAULT: `None`
`app_name`	Application name for session identification. TYPE: `str` DEFAULT: `'gepa_adk_eval'`
`trajectory_config`	Configuration for trajectory extraction behavior. If None, uses TrajectoryConfig defaults (secure, all features enabled). TYPE: `TrajectoryConfig \| None` DEFAULT: `None`
`proposer`	Optional mutation proposer for generating improved instructions via LLM reflection. If provided, takes precedence over reflection_agent. TYPE: `AsyncReflectiveMutationProposer \| None` DEFAULT: `None`
`reflection_agent`	ADK LlmAgent to use for reflection operations. Either this or proposer must be provided. When provided, creates an ADK-based reflection function and passes it to a new proposer. TYPE: `LlmAgent \| None` DEFAULT: `None`
`reflection_output_field`	Field name to extract from structured output when reflection_agent has an output_schema. When the reflection agent returns structured output (dict), this specifies which field contains the proposed text. For schema evolution, use "class_definition" with a SchemaProposal output_schema. Only used when reflection_agent is provided. TYPE: `str \| None` DEFAULT: `None`
`schema_constraints`	Optional SchemaConstraints for output_schema evolution. When provided, proposed schema mutations are validated against these constraints. Mutations that violate constraints (e.g., remove required fields) are rejected and the original schema is preserved. TYPE: `SchemaConstraints \| None` DEFAULT: `None`
`video_service`	Optional VideoBlobServiceProtocol for multimodal input support. When provided, enables processing of trainset examples with 'videos' field. If None, defaults to a new VideoBlobService instance. TYPE: `VideoBlobServiceProtocol \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`TypeError`	If agent is not an LlmAgent instance.
`TypeError`	If scorer does not satisfy Scorer protocol.
`TypeError`	If reflection_agent is provided but not an LlmAgent instance.
`ValueError`	If app_name is empty string or max_concurrent_evals < 1.
`ValueError`	If neither proposer nor reflection_agent is provided.

Examples:

Basic setup with reflection agent:

from gepa_adk.adapters.agent_executor import AgentExecutor

reflection_agent = LlmAgent(name="reflector", model="gemini-2.5-flash")
executor = AgentExecutor()
adapter = ADKAdapter(
    agent, scorer, executor, reflection_agent=reflection_agent
)

With custom trajectory configuration:

config = TrajectoryConfig(
    redact_sensitive=True,
    max_string_length=5000,
)
executor = AgentExecutor()
adapter = ADKAdapter(
    agent,
    scorer,
    executor,
    reflection_agent=reflection_agent,
    trajectory_config=config,
)

With shared session service:

from google.adk.sessions import InMemorySessionService
from gepa_adk.adapters.agent_executor import AgentExecutor

session_service = InMemorySessionService()
executor = AgentExecutor(session_service=session_service)
adapter = ADKAdapter(
    agent,
    scorer,
    executor,
    reflection_agent=reflection_agent,
    session_service=session_service,
)

Note

Caches the agent's original instruction and restores it after each evaluation to ensure no side effects between evaluations.

Source code in src/gepa_adk/adapters/adk_adapter.py

def __init__(
    self,
    agent: LlmAgent,
    scorer: Scorer,
    executor: AgentExecutorProtocol,
    max_concurrent_evals: int = 5,
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_adk_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    reflection_agent: LlmAgent | None = None,
    reflection_output_field: str | None = None,
    schema_constraints: SchemaConstraints | None = None,
    video_service: VideoBlobServiceProtocol | None = None,
) -> None:
    """Initialize the ADK adapter with agent and scorer.

    Args:
        agent: The ADK LlmAgent to evaluate with different instructions.
        scorer: Scorer implementation for evaluating agent outputs.
        executor: AgentExecutorProtocol implementation for unified agent execution.
            The executor handles session management and execution, enabling feature
            parity across all agent types.
        max_concurrent_evals: Maximum number of concurrent evaluations to run
            in parallel. Must be at least 1. Defaults to 5.
        session_service: Optional session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for session identification.
        trajectory_config: Configuration for trajectory extraction behavior.
            If None, uses TrajectoryConfig defaults (secure, all features enabled).
        proposer: Optional mutation proposer for generating improved instructions
            via LLM reflection. If provided, takes precedence over reflection_agent.
        reflection_agent: ADK LlmAgent to use for reflection operations.
            Either this or proposer must be provided. When provided, creates an
            ADK-based reflection function and passes it to a new proposer.
        reflection_output_field: Field name to extract from structured output when
            reflection_agent has an output_schema. When the reflection agent returns
            structured output (dict), this specifies which field contains the proposed
            text. For schema evolution, use "class_definition" with a SchemaProposal
            output_schema. Only used when reflection_agent is provided.
        schema_constraints: Optional SchemaConstraints for output_schema evolution.
            When provided, proposed schema mutations are validated against these
            constraints. Mutations that violate constraints (e.g., remove required
            fields) are rejected and the original schema is preserved.
        video_service: Optional VideoBlobServiceProtocol for multimodal input support.
            When provided, enables processing of trainset examples with 'videos' field.
            If None, defaults to a new VideoBlobService instance.

    Raises:
        TypeError: If agent is not an LlmAgent instance.
        TypeError: If scorer does not satisfy Scorer protocol.
        TypeError: If reflection_agent is provided but not an LlmAgent instance.
        ValueError: If app_name is empty string or max_concurrent_evals < 1.
        ValueError: If neither proposer nor reflection_agent is provided.

    Examples:
        Basic setup with reflection agent:

        ```python
        from gepa_adk.adapters.agent_executor import AgentExecutor

        reflection_agent = LlmAgent(name="reflector", model="gemini-2.5-flash")
        executor = AgentExecutor()
        adapter = ADKAdapter(
            agent, scorer, executor, reflection_agent=reflection_agent
        )
        ```

        With custom trajectory configuration:

        ```python
        config = TrajectoryConfig(
            redact_sensitive=True,
            max_string_length=5000,
        )
        executor = AgentExecutor()
        adapter = ADKAdapter(
            agent,
            scorer,
            executor,
            reflection_agent=reflection_agent,
            trajectory_config=config,
        )
        ```

        With shared session service:

        ```python
        from google.adk.sessions import InMemorySessionService
        from gepa_adk.adapters.agent_executor import AgentExecutor

        session_service = InMemorySessionService()
        executor = AgentExecutor(session_service=session_service)
        adapter = ADKAdapter(
            agent,
            scorer,
            executor,
            reflection_agent=reflection_agent,
            session_service=session_service,
        )
        ```

    Note:
        Caches the agent's original instruction and restores it after
        each evaluation to ensure no side effects between evaluations.
    """
    # Type validation
    if not isinstance(agent, LlmAgent):
        raise TypeError(f"agent must be LlmAgent, got {type(agent)}")

    # Scorer protocol check (runtime_checkable)
    if not hasattr(scorer, "score") or not hasattr(scorer, "async_score"):
        raise TypeError(
            f"scorer must implement Scorer protocol, got {type(scorer)}"
        )

    if not app_name or not app_name.strip():
        raise ValueError("app_name cannot be empty")

    if max_concurrent_evals < 1:
        raise ValueError(
            f"max_concurrent_evals must be at least 1, got {max_concurrent_evals}"
        )

    # Validate reflection_agent if provided
    if reflection_agent is not None and not isinstance(reflection_agent, LlmAgent):
        raise TypeError(
            f"reflection_agent must be LlmAgent, got {type(reflection_agent)}"
        )

    self.agent = agent
    self.scorer = scorer
    self.max_concurrent_evals = max_concurrent_evals
    self.trajectory_config = trajectory_config or TrajectoryConfig()
    self._session_service = session_service or InMemorySessionService()
    self._app_name = app_name.strip()

    # Store executor for unified execution (T048)
    self._executor = executor

    # Store video service for multimodal input support
    self._video_service = video_service or VideoBlobService()

    # Store and apply schema constraints to output_schema handler
    self._schema_constraints = schema_constraints
    if schema_constraints is not None:
        handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
        if isinstance(handler, OutputSchemaHandler):
            handler.set_constraints(schema_constraints)

    # Bind logger with adapter context
    self._logger = logger.bind(
        adapter="ADKAdapter",
        agent_name=self.agent.name,
        app_name=self._app_name,
    )

    # Initialize trial builder for reflective dataset construction
    self._trial_builder = TrialBuilder()

    # Create proposer with clear precedence: proposer overrides reflection_agent.
    if proposer is not None:
        if reflection_agent is not None:
            self._logger.warning(
                "adapter.proposer.precedence",
                message="proposer parameter takes precedence over reflection_agent",
            )
        self._proposer = proposer
    elif reflection_agent is not None:
        # Create ADK reflection function and pass to proposer
        adk_reflection_fn = create_adk_reflection_fn(
            reflection_agent,
            executor=self._executor,
            session_service=self._session_service,
            output_field=reflection_output_field,
        )
        self._proposer = AsyncReflectiveMutationProposer(
            adk_reflection_fn=adk_reflection_fn
        )
    else:
        raise ValueError(
            "Either proposer or reflection_agent must be provided. "
            "Use reflection_agent with an ADK LlmAgent for reflection operations."
        )

    self._logger.info("adapter.initialized")

cleanup ¶

cleanup() -> None

Clean up adapter resources and clear handler constraints.

Clears any schema constraints set on the OutputSchemaHandler to prevent constraint leakage between evolution runs. Should be called when the adapter is no longer needed.

Note

OutputSchemaHandler is a singleton, so constraints set during one evolution run could affect subsequent runs if not cleared.

Source code in src/gepa_adk/adapters/adk_adapter.py

def cleanup(self) -> None:
    """Clean up adapter resources and clear handler constraints.

    Clears any schema constraints set on the OutputSchemaHandler to prevent
    constraint leakage between evolution runs. Should be called when the
    adapter is no longer needed.

    Note:
        OutputSchemaHandler is a singleton, so constraints set during one
        evolution run could affect subsequent runs if not cleared.
    """
    if self._schema_constraints is not None:
        handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
        if isinstance(handler, OutputSchemaHandler):
            handler.set_constraints(None)
        self._logger.debug("adapter.cleanup.constraints_cleared")

evaluate `async` ¶

evaluate(
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[ADKTrajectory, str]

Evaluate agent with candidate instruction over a batch of inputs.

PARAMETER	DESCRIPTION
`batch`	List of input examples, each with "input" key and optional "expected" key for scoring. TYPE: `list[dict[str, Any]]`
`candidate`	Component name to text mapping. If "instruction" key is present, it overrides the agent's instruction. TYPE: `dict[str, str]`
`capture_traces`	Whether to capture execution traces (tool calls, state deltas, token usage). TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`EvaluationBatch[ADKTrajectory, str]`	EvaluationBatch containing outputs, scores, and optional trajectories.

Examples:

Basic evaluation without traces:

batch = [
    {"input": "What is 2+2?", "expected": "4"},
    {"input": "Capital of France?", "expected": "Paris"},
]
candidate = {"instruction": "Be concise"}
result = await adapter.evaluate(batch, candidate)
assert len(result.outputs) == 2
assert len(result.scores) == 2

With trace capture:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
assert result.trajectories is not None
assert len(result.trajectories) == len(batch)

Note

Original instruction is restored after evaluation completes, even if an exception occurs during evaluation. Evaluations run in parallel with concurrency controlled by max_concurrent_evals parameter. Results maintain input order despite parallel execution.

Source code in src/gepa_adk/adapters/adk_adapter.py

async def evaluate(
    self,
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[ADKTrajectory, str]:
    """Evaluate agent with candidate instruction over a batch of inputs.

    Args:
        batch: List of input examples, each with "input" key and optional
            "expected" key for scoring.
        candidate: Component name to text mapping. If "instruction" key
            is present, it overrides the agent's instruction.
        capture_traces: Whether to capture execution traces (tool calls,
            state deltas, token usage).

    Returns:
        EvaluationBatch containing outputs, scores, and optional trajectories.

    Examples:
        Basic evaluation without traces:

        ```python
        batch = [
            {"input": "What is 2+2?", "expected": "4"},
            {"input": "Capital of France?", "expected": "Paris"},
        ]
        candidate = {"instruction": "Be concise"}
        result = await adapter.evaluate(batch, candidate)
        assert len(result.outputs) == 2
        assert len(result.scores) == 2
        ```

        With trace capture:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        assert result.trajectories is not None
        assert len(result.trajectories) == len(batch)
        ```

    Note:
        Original instruction is restored after evaluation completes,
        even if an exception occurs during evaluation.
        Evaluations run in parallel with concurrency controlled by
        max_concurrent_evals parameter. Results maintain input order
        despite parallel execution.
    """
    self._logger.info(
        "adapter.evaluate.start",
        batch_size=len(batch),
        max_concurrent=self.max_concurrent_evals,
        capture_traces=capture_traces,
    )

    # Handle empty batch case
    if not batch:
        self._logger.info("adapter.evaluate.complete", batch_size=0)
        return EvaluationBatch(outputs=[], scores=[], trajectories=None)

    # Apply candidate components (instruction and/or output_schema) and save originals
    originals = self._apply_candidate(candidate)

    try:
        # Create semaphore to limit concurrent evaluations
        semaphore = asyncio.Semaphore(self.max_concurrent_evals)

        # Create tasks for all examples
        tasks = [
            self._eval_single_with_semaphore(
                example=example,
                example_index=i,
                candidate=candidate,
                capture_traces=capture_traces,
                semaphore=semaphore,
            )
            for i, example in enumerate(batch)
        ]

        # Execute all tasks in parallel with exception handling
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Process results and maintain order
        outputs: list[str] = []
        scores: list[float] = []
        trajectories: list[ADKTrajectory] | None = [] if capture_traces else None
        metadata_list: list[dict[str, Any]] = []
        inputs: list[str] = []  # Collect inputs for reflection

        successful = 0
        failed = 0

        for i, result in enumerate(results):
            # Collect input text for this example
            inputs.append(batch[i].get("input", ""))
            if isinstance(result, Exception):
                # Handle exception case
                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=i,
                    error=str(result),
                )
                outputs.append("")
                scores.append(0.0)
                metadata_list.append({})
                failed += 1

                if capture_traces:
                    error_trajectory = self._build_trajectory(
                        events=[],
                        final_output="",
                        error=str(result),
                    )
                    trajectories.append(error_trajectory)  # type: ignore
            else:
                # Unpack success: (output_text, score, trajectory_or_none, metadata_or_none)
                # After isinstance check, result is guaranteed to be the tuple type
                output_text, score, trajectory, metadata = result  # type: ignore[misc]
                outputs.append(output_text)
                scores.append(score)
                metadata_list.append(metadata if metadata is not None else {})
                successful += 1

                if capture_traces and trajectory is not None:
                    trajectories.append(trajectory)  # type: ignore

        avg_score = sum(scores) / len(scores) if scores else 0.0

        self._logger.info(
            "adapter.evaluate.complete",
            batch_size=len(batch),
            successful=successful,
            failed=failed,
            avg_score=avg_score,
        )

        # Only include metadata if at least one example has non-empty metadata
        # Check if any metadata dict has content (not just empty dicts)
        has_metadata = any(
            meta and isinstance(meta, dict) and meta for meta in metadata_list
        )
        final_metadata = metadata_list if has_metadata else None

        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories,
            metadata=final_metadata,
            inputs=inputs,
        )

    finally:
        # Always restore original components via registry dispatch
        self._restore_agent(originals)

make_reflective_dataset `async` ¶

make_reflective_dataset(
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[ADKTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]

Build trials from evaluation results for reflection.

Terminology

trial: One performance record {input, output, feedback, trajectory}
trials: Collection of trial records for a component

PARAMETER	DESCRIPTION
`candidate`	Current candidate component values. TYPE: `dict[str, str]`
`eval_batch`	Evaluation results including trajectories and optional scorer metadata (e.g., from CriticScorer). TYPE: `EvaluationBatch[ADKTrajectory, str]`
`components_to_update`	List of component names to generate trials for. TYPE: `list[str]`

RETURNS	DESCRIPTION
`Mapping[str, Sequence[Mapping[str, Any]]]`	Mapping from component name to sequence of trials.
`Mapping[str, Sequence[Mapping[str, Any]]]`	Each trial contains input, output, feedback (with score and
`Mapping[str, Sequence[Mapping[str, Any]]]`	feedback_text), and optional trajectory.

Examples:

Generate trials for reflection:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
trials_dataset = await adapter.make_reflective_dataset(
    candidate, result, ["instruction"]
)
assert "instruction" in trials_dataset
# Each trial has structured feedback
trial = trials_dataset["instruction"][0]
assert "input" in trial
assert "output" in trial
assert "feedback" in trial
assert trial["feedback"]["score"] == 0.75

Note

Operates on eval_batch trajectories (capture_traces=True required). Dataset format is compatible with proposer's trial-based interface. Scorer metadata (feedback_text, feedback_dimensions) from eval_batch.metadata is included in each trial's feedback dict.

Source code in src/gepa_adk/adapters/adk_adapter.py

async def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[ADKTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]:
    """Build trials from evaluation results for reflection.

    Terminology:
        - trial: One performance record {input, output, feedback, trajectory}
        - trials: Collection of trial records for a component

    Args:
        candidate: Current candidate component values.
        eval_batch: Evaluation results including trajectories and optional
            scorer metadata (e.g., from CriticScorer).
        components_to_update: List of component names to generate
            trials for.

    Returns:
        Mapping from component name to sequence of trials.
        Each trial contains input, output, feedback (with score and
        feedback_text), and optional trajectory.

    Examples:
        Generate trials for reflection:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        trials_dataset = await adapter.make_reflective_dataset(
            candidate, result, ["instruction"]
        )
        assert "instruction" in trials_dataset
        # Each trial has structured feedback
        trial = trials_dataset["instruction"][0]
        assert "input" in trial
        assert "output" in trial
        assert "feedback" in trial
        assert trial["feedback"]["score"] == 0.75
        ```

    Note:
        Operates on eval_batch trajectories (capture_traces=True required).
        Dataset format is compatible with proposer's trial-based interface.
        Scorer metadata (feedback_text, feedback_dimensions) from
        eval_batch.metadata is included in each trial's feedback dict.
    """
    self._logger.info(
        "adapter.make_reflective_dataset.start",
        num_trials=len(eval_batch.outputs),
        components=components_to_update,
    )

    # Build trials for each requested component
    result: dict[str, list[dict[str, Any]]] = {}

    for component in components_to_update:
        trials: list[dict[str, Any]] = []

        for i, (output, score) in enumerate(
            zip(eval_batch.outputs, eval_batch.scores, strict=True)
        ):
            # Get trajectory info if available
            trajectory = None
            if eval_batch.trajectories and i < len(eval_batch.trajectories):
                trajectory = eval_batch.trajectories[i]

            # Get metadata if available
            metadata = None
            if eval_batch.metadata and i < len(eval_batch.metadata):
                metadata = eval_batch.metadata[i]

            # Get input text if available
            input_text = ""
            if eval_batch.inputs and i < len(eval_batch.inputs):
                input_text = eval_batch.inputs[i]

            trial = self._build_trial(
                input_text=input_text,
                output=output,
                score=score,
                trajectory=trajectory,
                metadata=metadata,
            )
            trials.append(trial)

        result[component] = trials

    self._logger.info(
        "adapter.make_reflective_dataset.complete",
        num_components=len(result),
        total_trials=sum(len(t) for t in result.values()),
    )

    return result

propose_new_texts `async` ¶

propose_new_texts(
    candidate: dict[str, str],
    reflective_dataset: Mapping[
        str, Sequence[Mapping[str, Any]]
    ],
    components_to_update: list[str],
) -> dict[str, str]

Propose new component texts based on trials.

Delegates to AsyncReflectiveMutationProposer to generate improved component text via LLM reflection on trials. When the proposer returns None (no trials), falls back to unchanged candidate values.

PARAMETER	DESCRIPTION
`candidate`	Current candidate component texts (name → text). TYPE: `dict[str, str]`
`reflective_dataset`	Trials from make_reflective_dataset(). Maps component name to list of trial records. TYPE: `Mapping[str, Sequence[Mapping[str, Any]]]`
`components_to_update`	Components to generate proposals for. TYPE: `list[str]`

RETURNS	DESCRIPTION
`dict[str, str]`	Dictionary mapping component names to proposed component text.
`dict[str, str]`	When proposer returns None, returns unchanged candidate values.

Examples:

Using the proposer to generate improved component text:

# After evaluation with traces
result = await adapter.evaluate(batch, candidate, capture_traces=True)
trials = await adapter.make_reflective_dataset(
    candidate, result, ["instruction"]
)

# Propose new component text via LLM reflection on trials
new_texts = await adapter.propose_new_texts(
    candidate, trials, ["instruction"]
)
# new_texts["instruction"] contains proposed component text

Note

Delegates to AsyncReflectiveMutationProposer for actual mutation generation. Falls back gracefully when no trials available.

Source code in src/gepa_adk/adapters/adk_adapter.py

async def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]:
    """Propose new component texts based on trials.

    Delegates to AsyncReflectiveMutationProposer to generate improved
    component text via LLM reflection on trials. When the proposer returns
    None (no trials), falls back to unchanged candidate values.

    Args:
        candidate: Current candidate component texts (name → text).
        reflective_dataset: Trials from make_reflective_dataset().
            Maps component name to list of trial records.
        components_to_update: Components to generate proposals for.

    Returns:
        Dictionary mapping component names to proposed component text.
        When proposer returns None, returns unchanged candidate values.

    Examples:
        Using the proposer to generate improved component text:

        ```python
        # After evaluation with traces
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        trials = await adapter.make_reflective_dataset(
            candidate, result, ["instruction"]
        )

        # Propose new component text via LLM reflection on trials
        new_texts = await adapter.propose_new_texts(
            candidate, trials, ["instruction"]
        )
        # new_texts["instruction"] contains proposed component text
        ```

    Note:
        Delegates to AsyncReflectiveMutationProposer for actual mutation
        generation. Falls back gracefully when no trials available.
    """
    self._logger.debug(
        "propose_new_texts.delegating",
        components_requested=components_to_update,
    )

    result = await self._proposer.propose(
        candidate, reflective_dataset, components_to_update
    )

    if result is None:
        self._logger.info(
            "propose_new_texts.fallback",
            reason="proposer_returned_none",
            components_requested=components_to_update,
        )
        return {
            component: candidate.get(component, "")
            for component in components_to_update
        }

    # Merge with candidate for any missing components
    merged = {
        component: result.get(component, candidate.get(component, ""))
        for component in components_to_update
    }

    # Log proposed texts for each component
    for component, proposed_text in result.items():
        self._logger.info(
            "proposal.text",
            component=component,
            proposed_length=len(proposed_text),
            proposed_preview=proposed_text[:300] + "..."
            if len(proposed_text) > 300
            else proposed_text,
        )

    self._logger.info(
        "propose_new_texts.complete",
        components_proposed=list(result.keys()),
        components_requested=components_to_update,
    )

    return merged

AgentExecutor ¶

Unified agent execution adapter.

Provides a single execution path for all ADK agent types (generator, critic, reflection) with consistent session management, event capture, and result handling.

ATTRIBUTE	DESCRIPTION
`_session_service`	ADK session service for state management. TYPE: `BaseSessionService`
`_app_name`	Application name for ADK runner. TYPE: `str`

Examples:

Basic usage:

executor = AgentExecutor()
result = await executor.execute_agent(
    agent=my_agent,
    input_text="Hello, world!",
)
if result.status == ExecutionStatus.SUCCESS:
    print(f"Output: {result.extracted_value}")

With custom session service:

from google.adk.sessions import InMemorySessionService

session_service = InMemorySessionService()
executor = AgentExecutor(session_service=session_service)

Note

Adapter implements AgentExecutorProtocol for dependency injection and testing. All ADK-specific logic is encapsulated here.

Source code in src/gepa_adk/adapters/agent_executor.py

class AgentExecutor:
    """Unified agent execution adapter.

    Provides a single execution path for all ADK agent types (generator, critic,
    reflection) with consistent session management, event capture, and result
    handling.

    Attributes:
        _session_service (BaseSessionService): ADK session service for state management.
        _app_name (str): Application name for ADK runner.

    Examples:
        Basic usage:

        ```python
        executor = AgentExecutor()
        result = await executor.execute_agent(
            agent=my_agent,
            input_text="Hello, world!",
        )
        if result.status == ExecutionStatus.SUCCESS:
            print(f"Output: {result.extracted_value}")
        ```

        With custom session service:

        ```python
        from google.adk.sessions import InMemorySessionService

        session_service = InMemorySessionService()
        executor = AgentExecutor(session_service=session_service)
        ```

    Note:
        Adapter implements AgentExecutorProtocol for dependency injection
        and testing. All ADK-specific logic is encapsulated here.
    """

    def __init__(
        self,
        session_service: BaseSessionService | None = None,
        app_name: str = "gepa_executor",
    ) -> None:
        """Initialize AgentExecutor.

        Args:
            session_service: ADK session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for ADK runner. Defaults to "gepa_executor".

        Examples:
            Default initialization:

            ```python
            executor = AgentExecutor()
            ```

            With custom app name:

            ```python
            executor = AgentExecutor(app_name="my_app")
            ```

        Note:
            Creates a shared executor that uses the session service for all
            agent executions, allowing session state to be shared between
            executions when desired.
        """
        self._session_service = session_service or InMemorySessionService()
        self._app_name = app_name
        self._logger = logger.bind(component="AgentExecutor", app_name=app_name)

    async def _create_session(
        self,
        user_id: str,
        session_state: dict[str, Any] | None = None,
    ) -> Session:
        """Create a new session with optional initial state.

        Args:
            user_id: User identifier for the session.
            session_state: Initial state to inject into the session.

        Returns:
            Created ADK Session object.

        Note:
            Optional session state enables template variable substitution in
            agent instructions (e.g., {component_text} and {trials}).
        """
        session_id = f"exec_{uuid4()}"

        self._logger.debug(
            "session.creating",
            session_id=session_id,
            user_id=user_id,
            has_initial_state=session_state is not None,
        )

        session = await self._session_service.create_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
            state=session_state,
        )

        self._logger.debug(
            "session.created",
            session_id=session.id,
        )

        return session

    async def _get_session(self, session_id: str, user_id: str) -> Session:
        """Retrieve an existing session by ID.

        Args:
            session_id: The session ID to retrieve.
            user_id: User identifier for the session.

        Returns:
            Existing ADK Session object.

        Raises:
            SessionNotFoundError: If the session does not exist.

        Note:
            Only performs strict existence checks for sessions that must
            already exist. Callers use this to fail fast instead of creating
            a new session. For get-or-create semantics, use _get_or_create_session.
        """
        session = await self._session_service.get_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
        )

        if session is None:
            raise SessionNotFoundError(session_id)

        self._logger.debug(
            "session.retrieved",
            session_id=session_id,
        )

        return session

    async def _get_or_create_session(
        self,
        session_id: str,
        user_id: str,
        session_state: dict[str, Any] | None = None,
    ) -> Session:
        """Get an existing session or create a new one with the specified ID.

        Implements "get or create" semantics for session management. If the
        session exists, returns it. If not, creates a new session with the
        specified ID and optional initial state.

        Args:
            session_id: The session ID to retrieve or create.
            user_id: User identifier for the session.
            session_state: Initial state to inject if creating a new session.
                Ignored if session already exists.

        Returns:
            ADK Session object (existing or newly created).

        Note:
            Only applies initial state when creating new sessions. Existing
            sessions retain their current state regardless of session_state
            parameter.
        """
        # Try to get existing session first
        session = await self._session_service.get_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
        )

        if session is not None:
            self._logger.debug(
                "session.retrieved",
                session_id=session_id,
            )
            return session

        # Session doesn't exist, create it with the specified ID
        self._logger.debug(
            "session.creating_with_id",
            session_id=session_id,
            user_id=user_id,
            has_initial_state=session_state is not None,
        )

        session = await self._session_service.create_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
            state=session_state,
        )

        self._logger.debug(
            "session.created",
            session_id=session.id,
        )

        return session

    def _apply_overrides(
        self,
        agent: Any,
        instruction_override: str | None,
        output_schema_override: Any | None,
    ) -> Any:
        """Apply instruction and schema overrides to create modified agent copy.

        Args:
            agent: Original ADK LlmAgent.
            instruction_override: If provided, replaces agent instruction.
            output_schema_override: If provided, replaces output schema.

        Returns:
            Modified agent copy (or original if no overrides).

        Note:
            Original agent is preserved by creating a shallow copy with
            overridden attributes. The original agent is never modified.
        """
        if instruction_override is None and output_schema_override is None:
            return agent

        # Import LlmAgent here to avoid circular imports
        from google.adk.agents import LlmAgent

        # Create a copy with overrides
        # LlmAgent doesn't have a simple copy mechanism, so we recreate it
        # with the same parameters but modified instruction/schema
        # Extract agent attributes with proper defaults
        agent_tools = getattr(agent, "tools", None)
        modified_agent = LlmAgent(
            name=agent.name,
            model=agent.model,
            instruction=instruction_override or agent.instruction,
            output_schema=output_schema_override
            or getattr(agent, "output_schema", None),
            output_key=getattr(agent, "output_key", None),
            tools=agent_tools if agent_tools else [],
            before_model_callback=getattr(agent, "before_model_callback", None),
            after_model_callback=getattr(agent, "after_model_callback", None),
        )

        self._logger.debug(
            "agent.overrides_applied",
            instruction_override=instruction_override is not None,
            schema_override=output_schema_override is not None,
        )

        return modified_agent

    def _build_content(
        self,
        input_text: str,
        input_content: types.Content | None = None,
    ) -> types.Content:
        """Build Content for agent execution.

        Assembles the Content object to send to the agent. If input_content
        is provided, uses it directly. Otherwise, wraps input_text in a
        Content with a single text Part.

        Args:
            input_text: Text input for the agent.
            input_content: Pre-assembled multimodal Content. Takes precedence
                over input_text when provided.

        Returns:
            Content object for agent execution.

        Note:
            Serves as the central point for Content assembly, supporting
            both text-only (backward compatible) and multimodal inputs.
        """
        if input_content is not None:
            return input_content

        return types.Content(
            role="user",
            parts=[types.Part(text=input_text)],
        )

    async def _execute_runner(
        self,
        runner: Runner,
        session: Session,
        user_id: str,
        input_text: str,
        input_content: types.Content | None = None,
    ) -> list[Any]:
        """Execute the ADK Runner and capture events.

        Args:
            runner: ADK Runner instance.
            session: ADK Session for execution.
            user_id: User identifier.
            input_text: User message to send (used if input_content is None).
            input_content: Pre-assembled multimodal Content. Takes precedence
                over input_text when provided.

        Returns:
            List of captured ADK events.

        Note:
            Orchestrates the core Runner.run_async() loop, capturing all
            events for later output extraction.
        """
        content = self._build_content(input_text, input_content)

        events: list[Any] = []

        async for event in runner.run_async(
            user_id=user_id,
            session_id=session.id,
            new_message=content,
        ):
            events.append(event)

        return events

    async def _execute_with_timeout(
        self,
        runner: Runner,
        session: Session,
        user_id: str,
        input_text: str,
        timeout_seconds: int,
        input_content: types.Content | None = None,
    ) -> tuple[list[Any], bool]:
        """Execute runner with timeout handling.

        Args:
            runner: ADK Runner instance.
            session: ADK Session for execution.
            user_id: User identifier.
            input_text: User message to send (used if input_content is None).
            timeout_seconds: Maximum execution time.
            input_content: Pre-assembled multimodal Content. Takes precedence
                over input_text when provided.

        Returns:
            Tuple of (captured_events, timed_out).

        Note:
            On timeout, returns partial events captured before timeout.
            Uses asyncio.timeout for cancellation.
        """
        events: list[Any] = []
        timed_out = False

        try:
            async with asyncio.timeout(timeout_seconds):
                events = await self._execute_runner(
                    runner, session, user_id, input_text, input_content
                )
        except TimeoutError:
            timed_out = True
            self._logger.warning(
                "execution.timeout",
                session_id=session.id,
                timeout_seconds=timeout_seconds,
                events_captured=len(events),
            )

        return events, timed_out

    async def _extract_output(
        self,
        session: Session,
        events: list[Any],
        agent: Any,
    ) -> str | None:
        """Extract output from session state with event fallback.

        Args:
            session: ADK Session after execution.
            events: Captured events from execution.
            agent: Agent that was executed (for output_key).

        Returns:
            Extracted output string, or None if no output found.

        Note:
            Output extraction prioritizes state-based approach (using
            output_key), then falls back to event-based extraction.
        """
        # Try state-based extraction first (if agent has output_key)
        output_key = getattr(agent, "output_key", None)
        if output_key:
            # Refresh session state
            refreshed_session = await self._session_service.get_session(
                app_name=self._app_name,
                user_id=session.user_id,
                session_id=session.id,
            )
            if refreshed_session and refreshed_session.state:
                state_output = extract_output_from_state(
                    refreshed_session.state, output_key
                )
                if state_output:
                    self._logger.debug(
                        "output.extracted_from_state",
                        output_key=output_key,
                    )
                    return state_output

        # Fallback to event-based extraction
        event_output = extract_final_output(events)
        if event_output:
            self._logger.debug("output.extracted_from_events")
            return event_output

        return None

    async def execute_agent(
        self,
        agent: Any,
        input_text: str,
        *,
        input_content: types.Content | None = None,
        instruction_override: str | None = None,
        output_schema_override: Any | None = None,
        session_state: dict[str, Any] | None = None,
        existing_session_id: str | None = None,
        timeout_seconds: int = 300,
    ) -> ExecutionResult:
        """Execute an agent and return structured result.

        Runs the specified agent with the given input, optionally applying
        instruction or schema overrides for evolution scenarios. Manages
        session lifecycle and captures execution events.

        Args:
            agent: ADK LlmAgent to execute. The agent's tools, output_key,
                and other ADK features are preserved during execution.
            input_text: User message to send to the agent. Used when
                input_content is None for backward compatibility.
            input_content: Pre-assembled multimodal Content for the agent.
                When provided, takes precedence over input_text. Use this
                for multimodal inputs containing video or other media.
            instruction_override: If provided, replaces the agent's instruction
                for this execution only. Original agent is not modified.
            output_schema_override: If provided, replaces the agent's output
                schema for this execution only (type[BaseModel]). Used for schema evolution.
            session_state: Initial state to inject into the session. Used for
                template variable substitution (e.g., {component_text}).
            existing_session_id: If provided, uses get-or-create semantics to
                retrieve or create a session with this ID. Enables session sharing
                between agents (e.g., critic accessing generator state).
            timeout_seconds: Maximum execution time in seconds. Defaults to 300.
                Execution terminates with TIMEOUT status if exceeded.

        Returns:
            ExecutionResult with status, output, and debugging information.

        Examples:
            Basic execution:

            ```python
            result = await executor.execute_agent(
                agent=greeter,
                input_text="Hello!",
            )
            print(result.extracted_value)
            ```

            With session state for reflection:

            ```python
            result = await executor.execute_agent(
                agent=reflector,
                input_text="Improve the instruction",
                session_state={
                    "component_text": "Be helpful.",
                    "trials": '[{"score": 0.5}]',
                },
            )
            ```

            With multimodal content:

            ```python
            from google.genai.types import Content, Part

            content = Content(
                role="user",
                parts=[Part(text="Describe this video"), video_part],
            )
            result = await executor.execute_agent(
                agent=analyzer,
                input_text="",  # Can be empty when content provided
                input_content=content,
            )
            ```

        Note:
            Optional typing (Any) is used for agent parameter to avoid
            coupling to ADK types in the ports layer. Implementations
            should validate that the agent is a valid LlmAgent.
        """
        start_time = time.perf_counter()
        user_id = "exec_user"
        is_multimodal = input_content is not None

        self._logger.info(
            "execution.start",
            agent_name=getattr(agent, "name", "unknown"),
            input_length=len(input_text),
            is_multimodal=is_multimodal,
            has_instruction_override=instruction_override is not None,
            has_schema_override=output_schema_override is not None,
            has_session_state=session_state is not None,
            existing_session_id=existing_session_id,
            timeout_seconds=timeout_seconds,
        )

        # Get or create session
        session: Session
        if existing_session_id:
            session = await self._get_or_create_session(
                existing_session_id, user_id, session_state
            )
        else:
            session = await self._create_session(user_id, session_state)

        # Apply overrides if provided
        effective_agent = self._apply_overrides(
            agent, instruction_override, output_schema_override
        )

        # Create runner
        runner = Runner(
            agent=effective_agent,
            app_name=self._app_name,
            session_service=self._session_service,
        )

        # Execute with timeout and capture events
        events: list[Any] = []
        timed_out = False
        error_message: str | None = None

        try:
            events, timed_out = await self._execute_with_timeout(
                runner, session, user_id, input_text, timeout_seconds, input_content
            )
        except Exception as e:
            error_message = str(e)
            self._logger.error(
                "execution.error",
                session_id=session.id,
                error=error_message,
            )

        # Calculate execution time
        execution_time = time.perf_counter() - start_time

        # Determine status
        if error_message:
            status = ExecutionStatus.FAILED
        elif timed_out:
            status = ExecutionStatus.TIMEOUT
            error_message = f"Execution timed out after {timeout_seconds}s"
        else:
            status = ExecutionStatus.SUCCESS

        # Extract output (even on timeout, we try to get partial results)
        extracted_value: str | None = None
        if status == ExecutionStatus.SUCCESS or (timed_out and events):
            extracted_value = await self._extract_output(session, events, agent)

        self._logger.info(
            "execution.complete",
            session_id=session.id,
            status=status.value,
            execution_time_seconds=execution_time,
            events_captured=len(events),
            has_output=extracted_value is not None,
        )

        return ExecutionResult(
            status=status,
            session_id=session.id,
            extracted_value=extracted_value,
            error_message=error_message,
            execution_time_seconds=execution_time,
            captured_events=events,
        )

init ¶

__init__(
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_executor",
) -> None

Initialize AgentExecutor.

PARAMETER	DESCRIPTION
`session_service`	ADK session service for state management. If None, creates an InMemorySessionService. TYPE: `BaseSessionService \| None` DEFAULT: `None`
`app_name`	Application name for ADK runner. Defaults to "gepa_executor". TYPE: `str` DEFAULT: `'gepa_executor'`

Examples:

Default initialization:

executor = AgentExecutor()

With custom app name:

executor = AgentExecutor(app_name="my_app")

Note

Creates a shared executor that uses the session service for all agent executions, allowing session state to be shared between executions when desired.

Source code in src/gepa_adk/adapters/agent_executor.py

def __init__(
    self,
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_executor",
) -> None:
    """Initialize AgentExecutor.

    Args:
        session_service: ADK session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for ADK runner. Defaults to "gepa_executor".

    Examples:
        Default initialization:

        ```python
        executor = AgentExecutor()
        ```

        With custom app name:

        ```python
        executor = AgentExecutor(app_name="my_app")
        ```

    Note:
        Creates a shared executor that uses the session service for all
        agent executions, allowing session state to be shared between
        executions when desired.
    """
    self._session_service = session_service or InMemorySessionService()
    self._app_name = app_name
    self._logger = logger.bind(component="AgentExecutor", app_name=app_name)

execute_agent `async` ¶

execute_agent(
    agent: Any,
    input_text: str,
    *,
    input_content: Content | None = None,
    instruction_override: str | None = None,
    output_schema_override: Any | None = None,
    session_state: dict[str, Any] | None = None,
    existing_session_id: str | None = None,
    timeout_seconds: int = 300,
) -> ExecutionResult

Execute an agent and return structured result.

Runs the specified agent with the given input, optionally applying instruction or schema overrides for evolution scenarios. Manages session lifecycle and captures execution events.

PARAMETER	DESCRIPTION
`agent`	ADK LlmAgent to execute. The agent's tools, output_key, and other ADK features are preserved during execution. TYPE: `Any`
`input_text`	User message to send to the agent. Used when input_content is None for backward compatibility. TYPE: `str`
`input_content`	Pre-assembled multimodal Content for the agent. When provided, takes precedence over input_text. Use this for multimodal inputs containing video or other media. TYPE: `Content \| None` DEFAULT: `None`
`instruction_override`	If provided, replaces the agent's instruction for this execution only. Original agent is not modified. TYPE: `str \| None` DEFAULT: `None`
`output_schema_override`	If provided, replaces the agent's output schema for this execution only (type[BaseModel]). Used for schema evolution. TYPE: `Any \| None` DEFAULT: `None`
`session_state`	Initial state to inject into the session. Used for template variable substitution (e.g., {component_text}). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`existing_session_id`	If provided, uses get-or-create semantics to retrieve or create a session with this ID. Enables session sharing between agents (e.g., critic accessing generator state). TYPE: `str \| None` DEFAULT: `None`
`timeout_seconds`	Maximum execution time in seconds. Defaults to 300. Execution terminates with TIMEOUT status if exceeded. TYPE: `int` DEFAULT: `300`

RETURNS	DESCRIPTION
`ExecutionResult`	ExecutionResult with status, output, and debugging information.

Examples:

Basic execution:

result = await executor.execute_agent(
    agent=greeter,
    input_text="Hello!",
)
print(result.extracted_value)

With session state for reflection:

result = await executor.execute_agent(
    agent=reflector,
    input_text="Improve the instruction",
    session_state={
        "component_text": "Be helpful.",
        "trials": '[{"score": 0.5}]',
    },
)

With multimodal content:

from google.genai.types import Content, Part

content = Content(
    role="user",
    parts=[Part(text="Describe this video"), video_part],
)
result = await executor.execute_agent(
    agent=analyzer,
    input_text="",  # Can be empty when content provided
    input_content=content,
)

Note

Optional typing (Any) is used for agent parameter to avoid coupling to ADK types in the ports layer. Implementations should validate that the agent is a valid LlmAgent.

Source code in src/gepa_adk/adapters/agent_executor.py

async def execute_agent(
    self,
    agent: Any,
    input_text: str,
    *,
    input_content: types.Content | None = None,
    instruction_override: str | None = None,
    output_schema_override: Any | None = None,
    session_state: dict[str, Any] | None = None,
    existing_session_id: str | None = None,
    timeout_seconds: int = 300,
) -> ExecutionResult:
    """Execute an agent and return structured result.

    Runs the specified agent with the given input, optionally applying
    instruction or schema overrides for evolution scenarios. Manages
    session lifecycle and captures execution events.

    Args:
        agent: ADK LlmAgent to execute. The agent's tools, output_key,
            and other ADK features are preserved during execution.
        input_text: User message to send to the agent. Used when
            input_content is None for backward compatibility.
        input_content: Pre-assembled multimodal Content for the agent.
            When provided, takes precedence over input_text. Use this
            for multimodal inputs containing video or other media.
        instruction_override: If provided, replaces the agent's instruction
            for this execution only. Original agent is not modified.
        output_schema_override: If provided, replaces the agent's output
            schema for this execution only (type[BaseModel]). Used for schema evolution.
        session_state: Initial state to inject into the session. Used for
            template variable substitution (e.g., {component_text}).
        existing_session_id: If provided, uses get-or-create semantics to
            retrieve or create a session with this ID. Enables session sharing
            between agents (e.g., critic accessing generator state).
        timeout_seconds: Maximum execution time in seconds. Defaults to 300.
            Execution terminates with TIMEOUT status if exceeded.

    Returns:
        ExecutionResult with status, output, and debugging information.

    Examples:
        Basic execution:

        ```python
        result = await executor.execute_agent(
            agent=greeter,
            input_text="Hello!",
        )
        print(result.extracted_value)
        ```

        With session state for reflection:

        ```python
        result = await executor.execute_agent(
            agent=reflector,
            input_text="Improve the instruction",
            session_state={
                "component_text": "Be helpful.",
                "trials": '[{"score": 0.5}]',
            },
        )
        ```

        With multimodal content:

        ```python
        from google.genai.types import Content, Part

        content = Content(
            role="user",
            parts=[Part(text="Describe this video"), video_part],
        )
        result = await executor.execute_agent(
            agent=analyzer,
            input_text="",  # Can be empty when content provided
            input_content=content,
        )
        ```

    Note:
        Optional typing (Any) is used for agent parameter to avoid
        coupling to ADK types in the ports layer. Implementations
        should validate that the agent is a valid LlmAgent.
    """
    start_time = time.perf_counter()
    user_id = "exec_user"
    is_multimodal = input_content is not None

    self._logger.info(
        "execution.start",
        agent_name=getattr(agent, "name", "unknown"),
        input_length=len(input_text),
        is_multimodal=is_multimodal,
        has_instruction_override=instruction_override is not None,
        has_schema_override=output_schema_override is not None,
        has_session_state=session_state is not None,
        existing_session_id=existing_session_id,
        timeout_seconds=timeout_seconds,
    )

    # Get or create session
    session: Session
    if existing_session_id:
        session = await self._get_or_create_session(
            existing_session_id, user_id, session_state
        )
    else:
        session = await self._create_session(user_id, session_state)

    # Apply overrides if provided
    effective_agent = self._apply_overrides(
        agent, instruction_override, output_schema_override
    )

    # Create runner
    runner = Runner(
        agent=effective_agent,
        app_name=self._app_name,
        session_service=self._session_service,
    )

    # Execute with timeout and capture events
    events: list[Any] = []
    timed_out = False
    error_message: str | None = None

    try:
        events, timed_out = await self._execute_with_timeout(
            runner, session, user_id, input_text, timeout_seconds, input_content
        )
    except Exception as e:
        error_message = str(e)
        self._logger.error(
            "execution.error",
            session_id=session.id,
            error=error_message,
        )

    # Calculate execution time
    execution_time = time.perf_counter() - start_time

    # Determine status
    if error_message:
        status = ExecutionStatus.FAILED
    elif timed_out:
        status = ExecutionStatus.TIMEOUT
        error_message = f"Execution timed out after {timeout_seconds}s"
    else:
        status = ExecutionStatus.SUCCESS

    # Extract output (even on timeout, we try to get partial results)
    extracted_value: str | None = None
    if status == ExecutionStatus.SUCCESS or (timed_out and events):
        extracted_value = await self._extract_output(session, events, agent)

    self._logger.info(
        "execution.complete",
        session_id=session.id,
        status=status.value,
        execution_time_seconds=execution_time,
        events_captured=len(events),
        has_output=extracted_value is not None,
    )

    return ExecutionResult(
        status=status,
        session_id=session.id,
        extracted_value=extracted_value,
        error_message=error_message,
        execution_time_seconds=execution_time,
        captured_events=events,
    )

SessionNotFoundError ¶

Bases: EvolutionError


              flowchart TD
              gepa_adk.adapters.SessionNotFoundError[SessionNotFoundError]
              gepa_adk.domain.exceptions.EvolutionError[EvolutionError]

                              gepa_adk.domain.exceptions.EvolutionError --> gepa_adk.adapters.SessionNotFoundError
                


              click gepa_adk.adapters.SessionNotFoundError href "" "gepa_adk.adapters.SessionNotFoundError"
              click gepa_adk.domain.exceptions.EvolutionError href "" "gepa_adk.domain.exceptions.EvolutionError"

Raised when a requested session does not exist.

ATTRIBUTE	DESCRIPTION
`session_id`	The session ID that was not found. TYPE: `str`

Examples:

Handling session not found with strict existence checking:

from gepa_adk.adapters.agent_executor import SessionNotFoundError

try:
    session = await executor._get_session(
        session_id="invalid_session",
        user_id="user_123",
    )
except SessionNotFoundError as e:
    print(f"Session not found: {e.session_id}")

Note

Arises only from strict existence-checking paths like _get_session(). The execute_agent() method uses get-or-create semantics and will not raise this exception.

Source code in src/gepa_adk/adapters/agent_executor.py

class SessionNotFoundError(EvolutionError):
    """Raised when a requested session does not exist.

    Attributes:
        session_id (str): The session ID that was not found.

    Examples:
        Handling session not found with strict existence checking:

        ```python
        from gepa_adk.adapters.agent_executor import SessionNotFoundError

        try:
            session = await executor._get_session(
                session_id="invalid_session",
                user_id="user_123",
            )
        except SessionNotFoundError as e:
            print(f"Session not found: {e.session_id}")
        ```

    Note:
        Arises only from strict existence-checking paths like _get_session().
        The execute_agent() method uses get-or-create semantics and will not
        raise this exception.
    """

    def __init__(self, session_id: str) -> None:
        """Initialize SessionNotFoundError.

        Args:
            session_id: The session ID that was not found.
        """
        self.session_id = session_id
        super().__init__(f"Session not found: {session_id}")

init ¶

__init__(session_id: str) -> None

Initialize SessionNotFoundError.

PARAMETER	DESCRIPTION
`session_id`	The session ID that was not found. TYPE: `str`

Source code in src/gepa_adk/adapters/agent_executor.py

def __init__(self, session_id: str) -> None:
    """Initialize SessionNotFoundError.

    Args:
        session_id: The session ID that was not found.
    """
    self.session_id = session_id
    super().__init__(f"Session not found: {session_id}")

CurrentBestCandidateSelector ¶

Always select the candidate with the highest average score.

Note

A greedy selector always exploits the best-average candidate.

Examples:

selector = CurrentBestCandidateSelector()
candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

class CurrentBestCandidateSelector:
    """Always select the candidate with the highest average score.

    Note:
        A greedy selector always exploits the best-average candidate.

    Examples:
        ```python
        selector = CurrentBestCandidateSelector()
        candidate_idx = await selector.select_candidate(state)
        ```
    """

    async def select_candidate(self, state: ParetoState) -> int:
        """Return the best-average candidate index.

        Args:
            state: Current evolution state with Pareto tracking.

        Returns:
            Selected candidate index.

        Raises:
            NoCandidateAvailableError: If no candidates are available.

        Examples:
            ```python
            candidate_idx = await selector.select_candidate(state)
            ```
        """
        if state.best_average_idx is None:
            raise NoCandidateAvailableError("No candidates available for selection")
        return state.best_average_idx

select_candidate `async` ¶

select_candidate(state: ParetoState) -> int

Return the best-average candidate index.

PARAMETER	DESCRIPTION
`state`	Current evolution state with Pareto tracking. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`int`	Selected candidate index.

RAISES	DESCRIPTION
`NoCandidateAvailableError`	If no candidates are available.

Examples:

candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

async def select_candidate(self, state: ParetoState) -> int:
    """Return the best-average candidate index.

    Args:
        state: Current evolution state with Pareto tracking.

    Returns:
        Selected candidate index.

    Raises:
        NoCandidateAvailableError: If no candidates are available.

    Examples:
        ```python
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    if state.best_average_idx is None:
        raise NoCandidateAvailableError("No candidates available for selection")
    return state.best_average_idx

EpsilonGreedyCandidateSelector ¶

Epsilon-greedy selection balancing exploration and exploitation.

ATTRIBUTE	DESCRIPTION
`_epsilon`	Exploration probability. TYPE: `float`
`_rng`	RNG for exploration decisions. TYPE: `Random`

Note

A mixed strategy sometimes explores and otherwise exploits the best.

Examples:

selector = EpsilonGreedyCandidateSelector(epsilon=0.1, rng=random.Random(7))
candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

class EpsilonGreedyCandidateSelector:
    """Epsilon-greedy selection balancing exploration and exploitation.

    Attributes:
        _epsilon (float): Exploration probability.
        _rng (random.Random): RNG for exploration decisions.

    Note:
        A mixed strategy sometimes explores and otherwise exploits the best.

    Examples:
        ```python
        selector = EpsilonGreedyCandidateSelector(epsilon=0.1, rng=random.Random(7))
        candidate_idx = await selector.select_candidate(state)
        ```
    """

    def __init__(self, epsilon: float, rng: random.Random | None = None) -> None:
        """Initialize the selector.

        Args:
            epsilon: Probability of random exploration.
            rng: Optional random number generator for reproducibility.

        Raises:
            ConfigurationError: If epsilon is outside [0.0, 1.0].
        """
        if not 0.0 <= epsilon <= 1.0:
            raise ConfigurationError(
                "epsilon must be between 0.0 and 1.0",
                field="epsilon",
                value=epsilon,
                constraint="0.0 <= epsilon <= 1.0",
            )
        self._epsilon = epsilon
        self._rng = rng or random.Random()

    async def select_candidate(self, state: ParetoState) -> int:
        """Select a candidate using epsilon-greedy strategy.

        Args:
            state: Current evolution state with Pareto tracking.

        Returns:
            Selected candidate index.

        Raises:
            NoCandidateAvailableError: If no candidates are available.

        Examples:
            ```python
            candidate_idx = await selector.select_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available for selection")

        if self._rng.random() < self._epsilon:
            return self._rng.randint(0, len(state.candidates) - 1)

        if state.best_average_idx is None:
            raise NoCandidateAvailableError("No candidates available for selection")

        return state.best_average_idx

init ¶

__init__(epsilon: float, rng: Random | None = None) -> None

Initialize the selector.

PARAMETER	DESCRIPTION
`epsilon`	Probability of random exploration. TYPE: `float`
`rng`	Optional random number generator for reproducibility. TYPE: `Random \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`ConfigurationError`	If epsilon is outside [0.0, 1.0].

Source code in src/gepa_adk/adapters/candidate_selector.py

def __init__(self, epsilon: float, rng: random.Random | None = None) -> None:
    """Initialize the selector.

    Args:
        epsilon: Probability of random exploration.
        rng: Optional random number generator for reproducibility.

    Raises:
        ConfigurationError: If epsilon is outside [0.0, 1.0].
    """
    if not 0.0 <= epsilon <= 1.0:
        raise ConfigurationError(
            "epsilon must be between 0.0 and 1.0",
            field="epsilon",
            value=epsilon,
            constraint="0.0 <= epsilon <= 1.0",
        )
    self._epsilon = epsilon
    self._rng = rng or random.Random()

select_candidate `async` ¶

select_candidate(state: ParetoState) -> int

Select a candidate using epsilon-greedy strategy.

PARAMETER	DESCRIPTION
`state`	Current evolution state with Pareto tracking. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`int`	Selected candidate index.

RAISES	DESCRIPTION
`NoCandidateAvailableError`	If no candidates are available.

Examples:

candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

async def select_candidate(self, state: ParetoState) -> int:
    """Select a candidate using epsilon-greedy strategy.

    Args:
        state: Current evolution state with Pareto tracking.

    Returns:
        Selected candidate index.

    Raises:
        NoCandidateAvailableError: If no candidates are available.

    Examples:
        ```python
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available for selection")

    if self._rng.random() < self._epsilon:
        return self._rng.randint(0, len(state.candidates) - 1)

    if state.best_average_idx is None:
        raise NoCandidateAvailableError("No candidates available for selection")

    return state.best_average_idx

ParetoCandidateSelector ¶

Sample from the Pareto front proportional to leadership frequency.

ATTRIBUTE	DESCRIPTION
`_rng`	RNG for sampling candidates. TYPE: `Random`

Note

A Pareto selector emphasizes candidates that lead more examples.

Examples:

selector = ParetoCandidateSelector(rng=random.Random(42))
candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

class ParetoCandidateSelector:
    """Sample from the Pareto front proportional to leadership frequency.

    Attributes:
        _rng (random.Random): RNG for sampling candidates.

    Note:
        A Pareto selector emphasizes candidates that lead more examples.

    Examples:
        ```python
        selector = ParetoCandidateSelector(rng=random.Random(42))
        candidate_idx = await selector.select_candidate(state)
        ```
    """

    def __init__(self, rng: random.Random | None = None) -> None:
        """Initialize the selector.

        Args:
            rng: Optional random number generator for reproducibility.
        """
        self._rng = rng or random.Random()

    async def select_candidate(self, state: ParetoState) -> int:
        """Select a candidate index from the Pareto frontier.

        Args:
            state: Current evolution state with Pareto tracking.

        Returns:
            Selected candidate index.

        Raises:
            NoCandidateAvailableError: If no candidates or leaders exist.

        Examples:
            ```python
            candidate_idx = await selector.select_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available for selection")

        weights = state.frontier.get_selection_weights()
        if not weights:
            raise NoCandidateAvailableError("Pareto frontier is empty")

        sampling_list = [
            candidate_idx
            for candidate_idx, weight in weights.items()
            for _ in range(weight)
        ]
        if not sampling_list:
            raise NoCandidateAvailableError("Pareto frontier has no leaders")

        return self._rng.choice(sampling_list)

init ¶

__init__(rng: Random | None = None) -> None

Initialize the selector.

PARAMETER	DESCRIPTION
`rng`	Optional random number generator for reproducibility. TYPE: `Random \| None` DEFAULT: `None`

Source code in src/gepa_adk/adapters/candidate_selector.py

def __init__(self, rng: random.Random | None = None) -> None:
    """Initialize the selector.

    Args:
        rng: Optional random number generator for reproducibility.
    """
    self._rng = rng or random.Random()

select_candidate `async` ¶

select_candidate(state: ParetoState) -> int

Select a candidate index from the Pareto frontier.

PARAMETER	DESCRIPTION
`state`	Current evolution state with Pareto tracking. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`int`	Selected candidate index.

RAISES	DESCRIPTION
`NoCandidateAvailableError`	If no candidates or leaders exist.

Examples:

candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

async def select_candidate(self, state: ParetoState) -> int:
    """Select a candidate index from the Pareto frontier.

    Args:
        state: Current evolution state with Pareto tracking.

    Returns:
        Selected candidate index.

    Raises:
        NoCandidateAvailableError: If no candidates or leaders exist.

    Examples:
        ```python
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available for selection")

    weights = state.frontier.get_selection_weights()
    if not weights:
        raise NoCandidateAvailableError("Pareto frontier is empty")

    sampling_list = [
        candidate_idx
        for candidate_idx, weight in weights.items()
        for _ in range(weight)
    ]
    if not sampling_list:
        raise NoCandidateAvailableError("Pareto frontier has no leaders")

    return self._rng.choice(sampling_list)

ComponentHandlerRegistry ¶

Registry for component handlers with O(1) lookup.

Stores component handlers keyed by component name, providing registration, lookup, and existence checking operations.

ATTRIBUTE	DESCRIPTION
`_handlers`	Internal dict mapping component names to handlers. TYPE: `dict[str, ComponentHandler]`

Examples:

Create and use a registry:

registry = ComponentHandlerRegistry()
registry.register("instruction", InstructionHandler())
handler = registry.get("instruction")

See Also

get_handler(): Convenience function for default registry.

Note

A default registry instance is available as component_handlers module variable, with convenience functions get_handler() and register_handler().

Source code in src/gepa_adk/adapters/component_handlers.py

class ComponentHandlerRegistry:
    """Registry for component handlers with O(1) lookup.

    Stores component handlers keyed by component name, providing
    registration, lookup, and existence checking operations.

    Attributes:
        _handlers: Internal dict mapping component names to handlers.

    Examples:
        Create and use a registry:

        ```python
        registry = ComponentHandlerRegistry()
        registry.register("instruction", InstructionHandler())
        handler = registry.get("instruction")
        ```

    See Also:
        - [`get_handler()`][gepa_adk.adapters.component_handlers.get_handler]:
            Convenience function for default registry.

    Note:
        A default registry instance is available as `component_handlers`
        module variable, with convenience functions `get_handler()` and
        `register_handler()`.
    """

    def __init__(self) -> None:
        """Initialize empty registry.

        Examples:
            ```python
            registry = ComponentHandlerRegistry()
            assert not registry.has("instruction")
            ```

        Note:
            Creates an empty internal dict for handler storage.
        """
        self._handlers: dict[str, ComponentHandler] = {}

    def register(self, name: str, handler: ComponentHandler) -> None:
        """Register a handler for a component name.

        Args:
            name: Component name (e.g., "instruction", "output_schema").
                Must be a non-empty string.
            handler: Handler implementing ComponentHandler protocol.

        Raises:
            ValueError: If name is empty or None.
            TypeError: If handler doesn't implement ComponentHandler protocol.

        Examples:
            ```python
            registry.register("instruction", InstructionHandler())
            registry.register("output_schema", OutputSchemaHandler())
            ```

        Note:
            Overwrites existing handler if name already registered.
            Logs a debug message on replacement.
        """
        if not name:
            raise ValueError("Component name must be a non-empty string")

        if not isinstance(handler, ComponentHandler):
            raise TypeError(
                f"Handler does not implement ComponentHandler protocol: {type(handler)}"
            )

        if name in self._handlers:
            logger.debug(
                "component_handler.register.replace",
                name=name,
                old_handler=type(self._handlers[name]).__name__,
                new_handler=type(handler).__name__,
            )

        self._handlers[name] = handler
        logger.debug(
            "component_handler.register.success",
            name=name,
            handler=type(handler).__name__,
        )

    def get(self, name: str) -> ComponentHandler:
        """Retrieve handler for component name.

        Args:
            name: Component name to look up.

        Returns:
            The registered ComponentHandler.

        Raises:
            ValueError: If name is empty or None.
            KeyError: If no handler registered for name.

        Examples:
            ```python
            handler = registry.get("instruction")
            original = handler.apply(agent, "New value")
            ```
        """
        if not name:
            raise ValueError("Component name must be a non-empty string")

        if name not in self._handlers:
            raise KeyError(f"No handler registered for component: {name}")

        return self._handlers[name]

    def has(self, name: str) -> bool:
        """Check if handler exists for component name.

        Args:
            name: Component name to check.

        Returns:
            True if handler registered, False otherwise.

        Examples:
            ```python
            if registry.has("instruction"):
                handler = registry.get("instruction")
            ```

        Note:
            Outputs False for empty/None names (no ValueError).
            This allows safe checking without exception handling.
        """
        if not name:
            return False
        return name in self._handlers

    def names(self) -> list[str]:
        """Return sorted list of registered handler names.

        Returns:
            Sorted list of component names with registered handlers.

        Examples:
            ```python
            available = registry.names()
            # ["generate_content_config", "instruction", "output_schema"]
            ```

        Note:
            Output is sorted alphabetically for consistent error messages
            and validation feedback.
        """
        return sorted(self._handlers.keys())

init ¶

__init__() -> None

Initialize empty registry.

Examples:

registry = ComponentHandlerRegistry()
assert not registry.has("instruction")

Note

Creates an empty internal dict for handler storage.

Source code in src/gepa_adk/adapters/component_handlers.py

def __init__(self) -> None:
    """Initialize empty registry.

    Examples:
        ```python
        registry = ComponentHandlerRegistry()
        assert not registry.has("instruction")
        ```

    Note:
        Creates an empty internal dict for handler storage.
    """
    self._handlers: dict[str, ComponentHandler] = {}

register ¶

register(name: str, handler: ComponentHandler) -> None

Register a handler for a component name.

PARAMETER	DESCRIPTION
`name`	Component name (e.g., "instruction", "output_schema"). Must be a non-empty string. TYPE: `str`
`handler`	Handler implementing ComponentHandler protocol. TYPE: `ComponentHandler`

RAISES	DESCRIPTION
`ValueError`	If name is empty or None.
`TypeError`	If handler doesn't implement ComponentHandler protocol.

Examples:

registry.register("instruction", InstructionHandler())
registry.register("output_schema", OutputSchemaHandler())

Note

Overwrites existing handler if name already registered. Logs a debug message on replacement.

Source code in src/gepa_adk/adapters/component_handlers.py

def register(self, name: str, handler: ComponentHandler) -> None:
    """Register a handler for a component name.

    Args:
        name: Component name (e.g., "instruction", "output_schema").
            Must be a non-empty string.
        handler: Handler implementing ComponentHandler protocol.

    Raises:
        ValueError: If name is empty or None.
        TypeError: If handler doesn't implement ComponentHandler protocol.

    Examples:
        ```python
        registry.register("instruction", InstructionHandler())
        registry.register("output_schema", OutputSchemaHandler())
        ```

    Note:
        Overwrites existing handler if name already registered.
        Logs a debug message on replacement.
    """
    if not name:
        raise ValueError("Component name must be a non-empty string")

    if not isinstance(handler, ComponentHandler):
        raise TypeError(
            f"Handler does not implement ComponentHandler protocol: {type(handler)}"
        )

    if name in self._handlers:
        logger.debug(
            "component_handler.register.replace",
            name=name,
            old_handler=type(self._handlers[name]).__name__,
            new_handler=type(handler).__name__,
        )

    self._handlers[name] = handler
    logger.debug(
        "component_handler.register.success",
        name=name,
        handler=type(handler).__name__,
    )

get ¶

get(name: str) -> ComponentHandler

Retrieve handler for component name.

PARAMETER	DESCRIPTION
`name`	Component name to look up. TYPE: `str`

RETURNS	DESCRIPTION
`ComponentHandler`	The registered ComponentHandler.

RAISES	DESCRIPTION
`ValueError`	If name is empty or None.
`KeyError`	If no handler registered for name.

Examples:

handler = registry.get("instruction")
original = handler.apply(agent, "New value")

Source code in src/gepa_adk/adapters/component_handlers.py

def get(self, name: str) -> ComponentHandler:
    """Retrieve handler for component name.

    Args:
        name: Component name to look up.

    Returns:
        The registered ComponentHandler.

    Raises:
        ValueError: If name is empty or None.
        KeyError: If no handler registered for name.

    Examples:
        ```python
        handler = registry.get("instruction")
        original = handler.apply(agent, "New value")
        ```
    """
    if not name:
        raise ValueError("Component name must be a non-empty string")

    if name not in self._handlers:
        raise KeyError(f"No handler registered for component: {name}")

    return self._handlers[name]

has ¶

has(name: str) -> bool

Check if handler exists for component name.

PARAMETER	DESCRIPTION
`name`	Component name to check. TYPE: `str`

RETURNS	DESCRIPTION
`bool`	True if handler registered, False otherwise.

Examples:

if registry.has("instruction"):
    handler = registry.get("instruction")

Note

Outputs False for empty/None names (no ValueError). This allows safe checking without exception handling.

Source code in src/gepa_adk/adapters/component_handlers.py

def has(self, name: str) -> bool:
    """Check if handler exists for component name.

    Args:
        name: Component name to check.

    Returns:
        True if handler registered, False otherwise.

    Examples:
        ```python
        if registry.has("instruction"):
            handler = registry.get("instruction")
        ```

    Note:
        Outputs False for empty/None names (no ValueError).
        This allows safe checking without exception handling.
    """
    if not name:
        return False
    return name in self._handlers

names ¶

names() -> list[str]

Return sorted list of registered handler names.

RETURNS	DESCRIPTION
`list[str]`	Sorted list of component names with registered handlers.

Examples:

available = registry.names()
# ["generate_content_config", "instruction", "output_schema"]

Note

Output is sorted alphabetically for consistent error messages and validation feedback.

Source code in src/gepa_adk/adapters/component_handlers.py

def names(self) -> list[str]:
    """Return sorted list of registered handler names.

    Returns:
        Sorted list of component names with registered handlers.

    Examples:
        ```python
        available = registry.names()
        # ["generate_content_config", "instruction", "output_schema"]
        ```

    Note:
        Output is sorted alphabetically for consistent error messages
        and validation feedback.
    """
    return sorted(self._handlers.keys())

GenerateContentConfigHandler ¶

Handler for agent.generate_content_config component.

Manages serialization, application, and restoration of the agent's LLM generation configuration during evolution.

Examples:

handler = GenerateContentConfigHandler()
original = handler.serialize(agent)  # YAML string
handler.apply(agent, "temperature: 0.5")
# ... evaluate ...
handler.restore(agent, original_config)

Note

All state is stored in the agent object - handler is stateless. On invalid config, logs warning and keeps original.

Source code in src/gepa_adk/adapters/component_handlers.py

class GenerateContentConfigHandler:
    """Handler for agent.generate_content_config component.

    Manages serialization, application, and restoration of the
    agent's LLM generation configuration during evolution.

    Examples:
        ```python
        handler = GenerateContentConfigHandler()
        original = handler.serialize(agent)  # YAML string
        handler.apply(agent, "temperature: 0.5")
        # ... evaluate ...
        handler.restore(agent, original_config)
        ```

    Note:
        All state is stored in the agent object - handler is stateless.
        On invalid config, logs warning and keeps original.
    """

    def serialize(self, agent: "LlmAgent") -> str:
        """Extract generate_content_config from agent as YAML.

        Args:
            agent: The LlmAgent instance.

        Returns:
            YAML string with parameter descriptions as comments.
            Returns empty string if generate_content_config is None.

        Examples:
            ```python
            yaml_str = handler.serialize(agent)
            # yaml_str contains YAML with temperature, top_p, etc.
            ```
        """
        config = getattr(agent, "generate_content_config", None)
        return serialize_generate_config(config)

    def apply(self, agent: "LlmAgent", value: str) -> Any:
        """Apply new generate_content_config to agent, return original.

        Args:
            agent: The LlmAgent instance to modify.
            value: YAML string defining the new config parameters.

        Returns:
            The original GenerateContentConfig (or None).

        Examples:
            ```python
            original = handler.apply(agent, "temperature: 0.5")
            # agent.generate_content_config.temperature is now 0.5
            ```

        Note:
            On deserialization or validation failure, logs warning and
            keeps original config. Never raises exceptions.
        """
        original = getattr(agent, "generate_content_config", None)

        try:
            new_config = deserialize_generate_config(value, original)

            # Validate the parsed config
            config_dict = {}
            for param in [
                "temperature",
                "top_p",
                "top_k",
                "max_output_tokens",
                "presence_penalty",
                "frequency_penalty",
            ]:
                param_value = getattr(new_config, param, None)
                if param_value is not None:
                    config_dict[param] = param_value

            errors = validate_generate_config(config_dict)
            if errors:
                logger.warning(
                    "generate_content_config_handler.apply.validation_failed",
                    errors=errors,
                    config_preview=value[:100] if value else "",
                )
                # Keep original - don't modify agent
                return original

            agent.generate_content_config = new_config
            logger.debug(
                "generate_content_config_handler.apply",
                original_temp=getattr(original, "temperature", None)
                if original
                else None,
                new_temp=getattr(new_config, "temperature", None),
            )
        except ConfigValidationError as e:
            logger.warning(
                "generate_content_config_handler.apply.failed",
                error=str(e),
                config_preview=value[:100] if value else "",
            )
            # Keep original - don't modify agent

        return original

    def restore(self, agent: "LlmAgent", original: Any) -> None:
        """Restore original generate_content_config to agent.

        Args:
            agent: The LlmAgent instance to restore.
            original: The original GenerateContentConfig (or None).

        Examples:
            ```python
            handler.restore(agent, original_config)
            # agent.generate_content_config is back to original
            ```
        """
        agent.generate_content_config = original
        logger.debug(
            "generate_content_config_handler.restore",
            has_config=original is not None,
        )

serialize ¶

serialize(agent: 'LlmAgent') -> str

Extract generate_content_config from agent as YAML.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance. TYPE: `'LlmAgent'`

RETURNS	DESCRIPTION
`str`	YAML string with parameter descriptions as comments.
`str`	Returns empty string if generate_content_config is None.

Examples:

yaml_str = handler.serialize(agent)
# yaml_str contains YAML with temperature, top_p, etc.

Source code in src/gepa_adk/adapters/component_handlers.py

def serialize(self, agent: "LlmAgent") -> str:
    """Extract generate_content_config from agent as YAML.

    Args:
        agent: The LlmAgent instance.

    Returns:
        YAML string with parameter descriptions as comments.
        Returns empty string if generate_content_config is None.

    Examples:
        ```python
        yaml_str = handler.serialize(agent)
        # yaml_str contains YAML with temperature, top_p, etc.
        ```
    """
    config = getattr(agent, "generate_content_config", None)
    return serialize_generate_config(config)

apply ¶

apply(agent: 'LlmAgent', value: str) -> Any

Apply new generate_content_config to agent, return original.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance to modify. TYPE: `'LlmAgent'`
`value`	YAML string defining the new config parameters. TYPE: `str`

RETURNS	DESCRIPTION
`Any`	The original GenerateContentConfig (or None).

Examples:

original = handler.apply(agent, "temperature: 0.5")
# agent.generate_content_config.temperature is now 0.5

Note

On deserialization or validation failure, logs warning and keeps original config. Never raises exceptions.

Source code in src/gepa_adk/adapters/component_handlers.py

def apply(self, agent: "LlmAgent", value: str) -> Any:
    """Apply new generate_content_config to agent, return original.

    Args:
        agent: The LlmAgent instance to modify.
        value: YAML string defining the new config parameters.

    Returns:
        The original GenerateContentConfig (or None).

    Examples:
        ```python
        original = handler.apply(agent, "temperature: 0.5")
        # agent.generate_content_config.temperature is now 0.5
        ```

    Note:
        On deserialization or validation failure, logs warning and
        keeps original config. Never raises exceptions.
    """
    original = getattr(agent, "generate_content_config", None)

    try:
        new_config = deserialize_generate_config(value, original)

        # Validate the parsed config
        config_dict = {}
        for param in [
            "temperature",
            "top_p",
            "top_k",
            "max_output_tokens",
            "presence_penalty",
            "frequency_penalty",
        ]:
            param_value = getattr(new_config, param, None)
            if param_value is not None:
                config_dict[param] = param_value

        errors = validate_generate_config(config_dict)
        if errors:
            logger.warning(
                "generate_content_config_handler.apply.validation_failed",
                errors=errors,
                config_preview=value[:100] if value else "",
            )
            # Keep original - don't modify agent
            return original

        agent.generate_content_config = new_config
        logger.debug(
            "generate_content_config_handler.apply",
            original_temp=getattr(original, "temperature", None)
            if original
            else None,
            new_temp=getattr(new_config, "temperature", None),
        )
    except ConfigValidationError as e:
        logger.warning(
            "generate_content_config_handler.apply.failed",
            error=str(e),
            config_preview=value[:100] if value else "",
        )
        # Keep original - don't modify agent

    return original

restore ¶

restore(agent: 'LlmAgent', original: Any) -> None

Restore original generate_content_config to agent.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance to restore. TYPE: `'LlmAgent'`
`original`	The original GenerateContentConfig (or None). TYPE: `Any`

Examples:

handler.restore(agent, original_config)
# agent.generate_content_config is back to original

Source code in src/gepa_adk/adapters/component_handlers.py

def restore(self, agent: "LlmAgent", original: Any) -> None:
    """Restore original generate_content_config to agent.

    Args:
        agent: The LlmAgent instance to restore.
        original: The original GenerateContentConfig (or None).

    Examples:
        ```python
        handler.restore(agent, original_config)
        # agent.generate_content_config is back to original
        ```
    """
    agent.generate_content_config = original
    logger.debug(
        "generate_content_config_handler.restore",
        has_config=original is not None,
    )

InstructionHandler ¶

Handler for agent.instruction component.

Manages serialization, application, and restoration of the agent's instruction (system prompt) during evolution.

Examples:

handler = InstructionHandler()
original = handler.serialize(agent)  # "Be helpful"
handler.apply(agent, "Be concise")
# ... evaluate ...
handler.restore(agent, original)  # Back to "Be helpful"

Note

All state is stored in the agent object - handler is stateless. No instance attributes are maintained.

Source code in src/gepa_adk/adapters/component_handlers.py

class InstructionHandler:
    """Handler for agent.instruction component.

    Manages serialization, application, and restoration of the
    agent's instruction (system prompt) during evolution.

    Examples:
        ```python
        handler = InstructionHandler()
        original = handler.serialize(agent)  # "Be helpful"
        handler.apply(agent, "Be concise")
        # ... evaluate ...
        handler.restore(agent, original)  # Back to "Be helpful"
        ```

    Note:
        All state is stored in the agent object - handler is stateless.
        No instance attributes are maintained.
    """

    def serialize(self, agent: "LlmAgent") -> str:
        """Extract instruction from agent as string.

        Args:
            agent: The LlmAgent instance.

        Returns:
            The agent's instruction as string.
            Returns empty string if instruction is None.

        Examples:
            ```python
            text = handler.serialize(agent)
            # text == "You are a helpful assistant."
            ```
        """
        instruction = agent.instruction
        if instruction is None:
            return ""
        return str(instruction)

    def apply(self, agent: "LlmAgent", value: str) -> str:
        """Apply new instruction to agent, return original.

        Args:
            agent: The LlmAgent instance to modify.
            value: The new instruction string.

        Returns:
            The original instruction value.

        Examples:
            ```python
            original = handler.apply(agent, "New instruction")
            # agent.instruction is now "New instruction"
            ```
        """
        original = self.serialize(agent)
        agent.instruction = value
        logger.debug(
            "instruction_handler.apply",
            original_preview=original[:50] if original else "",
            new_preview=value[:50] if value else "",
        )
        return original

    def restore(self, agent: "LlmAgent", original: str) -> None:
        """Restore original instruction to agent.

        Args:
            agent: The LlmAgent instance to restore.
            original: The original instruction value.

        Examples:
            ```python
            handler.restore(agent, original)
            # agent.instruction is back to original
            ```
        """
        agent.instruction = original
        logger.debug(
            "instruction_handler.restore",
            instruction_preview=original[:50] if original else "",
        )

serialize ¶

serialize(agent: 'LlmAgent') -> str

Extract instruction from agent as string.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance. TYPE: `'LlmAgent'`

RETURNS	DESCRIPTION
`str`	The agent's instruction as string.
`str`	Returns empty string if instruction is None.

Examples:

text = handler.serialize(agent)
# text == "You are a helpful assistant."

Source code in src/gepa_adk/adapters/component_handlers.py

def serialize(self, agent: "LlmAgent") -> str:
    """Extract instruction from agent as string.

    Args:
        agent: The LlmAgent instance.

    Returns:
        The agent's instruction as string.
        Returns empty string if instruction is None.

    Examples:
        ```python
        text = handler.serialize(agent)
        # text == "You are a helpful assistant."
        ```
    """
    instruction = agent.instruction
    if instruction is None:
        return ""
    return str(instruction)

apply ¶

apply(agent: 'LlmAgent', value: str) -> str

Apply new instruction to agent, return original.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance to modify. TYPE: `'LlmAgent'`
`value`	The new instruction string. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The original instruction value.

Examples:

original = handler.apply(agent, "New instruction")
# agent.instruction is now "New instruction"

Source code in src/gepa_adk/adapters/component_handlers.py

def apply(self, agent: "LlmAgent", value: str) -> str:
    """Apply new instruction to agent, return original.

    Args:
        agent: The LlmAgent instance to modify.
        value: The new instruction string.

    Returns:
        The original instruction value.

    Examples:
        ```python
        original = handler.apply(agent, "New instruction")
        # agent.instruction is now "New instruction"
        ```
    """
    original = self.serialize(agent)
    agent.instruction = value
    logger.debug(
        "instruction_handler.apply",
        original_preview=original[:50] if original else "",
        new_preview=value[:50] if value else "",
    )
    return original

restore ¶

restore(agent: 'LlmAgent', original: str) -> None

Restore original instruction to agent.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance to restore. TYPE: `'LlmAgent'`
`original`	The original instruction value. TYPE: `str`

Examples:

handler.restore(agent, original)
# agent.instruction is back to original

Source code in src/gepa_adk/adapters/component_handlers.py

def restore(self, agent: "LlmAgent", original: str) -> None:
    """Restore original instruction to agent.

    Args:
        agent: The LlmAgent instance to restore.
        original: The original instruction value.

    Examples:
        ```python
        handler.restore(agent, original)
        # agent.instruction is back to original
        ```
    """
    agent.instruction = original
    logger.debug(
        "instruction_handler.restore",
        instruction_preview=original[:50] if original else "",
    )

OutputSchemaHandler ¶

Handler for agent.output_schema component.

Manages serialization, application, and restoration of the agent's output schema (Pydantic model) during evolution.

ATTRIBUTE	DESCRIPTION
`_constraints`	Optional SchemaConstraints for field preservation. TYPE: `SchemaConstraints \| None`

Examples:

handler = OutputSchemaHandler()
handler.set_constraints(SchemaConstraints(required_fields=("score",)))
original_schema = handler.apply(agent, new_schema_text)
# ... evaluate ...
handler.restore(agent, original_schema)

Note

Applies serialize_pydantic_schema and deserialize_schema utilities. On invalid schema text, keeps original and logs warning. When constraints are set, validates proposed schemas before applying.

Source code in src/gepa_adk/adapters/component_handlers.py

class OutputSchemaHandler:
    """Handler for agent.output_schema component.

    Manages serialization, application, and restoration of the
    agent's output schema (Pydantic model) during evolution.

    Attributes:
        _constraints: Optional SchemaConstraints for field preservation.

    Examples:
        ```python
        handler = OutputSchemaHandler()
        handler.set_constraints(SchemaConstraints(required_fields=("score",)))
        original_schema = handler.apply(agent, new_schema_text)
        # ... evaluate ...
        handler.restore(agent, original_schema)
        ```

    Note:
        Applies serialize_pydantic_schema and deserialize_schema utilities.
        On invalid schema text, keeps original and logs warning.
        When constraints are set, validates proposed schemas before applying.
    """

    def __init__(self) -> None:
        """Initialize handler with no constraints.

        Examples:
            ```python
            handler = OutputSchemaHandler()
            assert handler._constraints is None
            ```
        """
        self._constraints: SchemaConstraints | None = None

    def set_constraints(self, constraints: SchemaConstraints | None) -> None:
        """Set schema constraints for field preservation.

        Args:
            constraints: SchemaConstraints specifying required fields and
                type preservation rules. Pass None to clear constraints.

        Examples:
            ```python
            handler.set_constraints(SchemaConstraints(required_fields=("score",)))
            handler.set_constraints(None)  # Clear constraints
            ```

        Note:
            Once set, constraints are checked during apply() - proposed schemas
            that violate constraints will be rejected and the original kept.
        """
        self._constraints = constraints
        logger.debug(
            "output_schema_handler.set_constraints",
            has_constraints=constraints is not None,
            required_fields=constraints.required_fields if constraints else None,
        )

    def serialize(self, agent: "LlmAgent") -> str:
        """Extract output schema from agent as Python source.

        Args:
            agent: The LlmAgent instance.

        Returns:
            Python source code defining the schema class.
            Returns empty string if output_schema is None.

        Examples:
            ```python
            schema_text = handler.serialize(agent)
            # schema_text contains Python class definition
            ```
        """
        output_schema = getattr(agent, "output_schema", None)
        if output_schema is None:
            return ""
        try:
            return serialize_pydantic_schema(output_schema)
        except (TypeError, OSError) as e:
            logger.warning(
                "output_schema_handler.serialize.failed",
                error=str(e),
                schema_type=type(output_schema).__name__,
            )
            return ""

    def apply(self, agent: "LlmAgent", value: str) -> Any:
        """Apply new output schema to agent, return original.

        Args:
            agent: The LlmAgent instance to modify.
            value: Python source code defining the new schema.

        Returns:
            The original output_schema (class or None).

        Examples:
            ```python
            original = handler.apply(
                agent,
                '''
            class NewSchema(BaseModel):
                result: str
            ''',
            )
            ```

        Note:
            On deserialization failure, logs warning and keeps original.
            On constraint violation, keeps original.
            Never raises exceptions - graceful degradation.
        """
        original = getattr(agent, "output_schema", None)

        try:
            new_schema = deserialize_schema(value)

            # Validate against constraints if set
            if self._constraints is not None:
                is_valid, violations = validate_schema_against_constraints(
                    new_schema, original, self._constraints
                )
                if not is_valid:
                    logger.warning(
                        "output_schema_handler.apply.constraint_violation",
                        violations=violations,
                        schema_preview=value[:100] if value else "",
                    )
                    # Keep original - constraint violation
                    return original

            agent.output_schema = new_schema
            logger.debug(
                "output_schema_handler.apply",
                original_name=original.__name__ if original else None,
                new_name=new_schema.__name__,
            )
        except SchemaValidationError as e:
            logger.warning(
                "output_schema_handler.apply.failed",
                error=str(e),
                schema_preview=value[:100] if value else "",
            )
            # Keep original - don't modify agent

        return original

    def restore(self, agent: "LlmAgent", original: Any) -> None:
        """Restore original output schema to agent.

        Args:
            agent: The LlmAgent instance to restore.
            original: The original output_schema (class or None).

        Examples:
            ```python
            handler.restore(agent, original_schema)
            # agent.output_schema is back to original
            ```
        """
        agent.output_schema = original
        logger.debug(
            "output_schema_handler.restore",
            schema_name=original.__name__ if original else None,
        )

init ¶

__init__() -> None

Initialize handler with no constraints.

Examples:

handler = OutputSchemaHandler()
assert handler._constraints is None

Source code in src/gepa_adk/adapters/component_handlers.py

def __init__(self) -> None:
    """Initialize handler with no constraints.

    Examples:
        ```python
        handler = OutputSchemaHandler()
        assert handler._constraints is None
        ```
    """
    self._constraints: SchemaConstraints | None = None

set_constraints ¶

set_constraints(
    constraints: SchemaConstraints | None,
) -> None

Set schema constraints for field preservation.

PARAMETER	DESCRIPTION
`constraints`	SchemaConstraints specifying required fields and type preservation rules. Pass None to clear constraints. TYPE: `SchemaConstraints \| None`

Examples:

handler.set_constraints(SchemaConstraints(required_fields=("score",)))
handler.set_constraints(None)  # Clear constraints

Note

Once set, constraints are checked during apply() - proposed schemas that violate constraints will be rejected and the original kept.

Source code in src/gepa_adk/adapters/component_handlers.py

def set_constraints(self, constraints: SchemaConstraints | None) -> None:
    """Set schema constraints for field preservation.

    Args:
        constraints: SchemaConstraints specifying required fields and
            type preservation rules. Pass None to clear constraints.

    Examples:
        ```python
        handler.set_constraints(SchemaConstraints(required_fields=("score",)))
        handler.set_constraints(None)  # Clear constraints
        ```

    Note:
        Once set, constraints are checked during apply() - proposed schemas
        that violate constraints will be rejected and the original kept.
    """
    self._constraints = constraints
    logger.debug(
        "output_schema_handler.set_constraints",
        has_constraints=constraints is not None,
        required_fields=constraints.required_fields if constraints else None,
    )

serialize ¶

serialize(agent: 'LlmAgent') -> str

Extract output schema from agent as Python source.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance. TYPE: `'LlmAgent'`

RETURNS	DESCRIPTION
`str`	Python source code defining the schema class.
`str`	Returns empty string if output_schema is None.

Examples:

schema_text = handler.serialize(agent)
# schema_text contains Python class definition

Source code in src/gepa_adk/adapters/component_handlers.py

def serialize(self, agent: "LlmAgent") -> str:
    """Extract output schema from agent as Python source.

    Args:
        agent: The LlmAgent instance.

    Returns:
        Python source code defining the schema class.
        Returns empty string if output_schema is None.

    Examples:
        ```python
        schema_text = handler.serialize(agent)
        # schema_text contains Python class definition
        ```
    """
    output_schema = getattr(agent, "output_schema", None)
    if output_schema is None:
        return ""
    try:
        return serialize_pydantic_schema(output_schema)
    except (TypeError, OSError) as e:
        logger.warning(
            "output_schema_handler.serialize.failed",
            error=str(e),
            schema_type=type(output_schema).__name__,
        )
        return ""

apply ¶

apply(agent: 'LlmAgent', value: str) -> Any

Apply new output schema to agent, return original.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance to modify. TYPE: `'LlmAgent'`
`value`	Python source code defining the new schema. TYPE: `str`

RETURNS	DESCRIPTION
`Any`	The original output_schema (class or None).

Examples:

original = handler.apply(
    agent,
    '''
class NewSchema(BaseModel):
    result: str
''',
)

Note

On deserialization failure, logs warning and keeps original. On constraint violation, keeps original. Never raises exceptions - graceful degradation.

Source code in src/gepa_adk/adapters/component_handlers.py

def apply(self, agent: "LlmAgent", value: str) -> Any:
    """Apply new output schema to agent, return original.

    Args:
        agent: The LlmAgent instance to modify.
        value: Python source code defining the new schema.

    Returns:
        The original output_schema (class or None).

    Examples:
        ```python
        original = handler.apply(
            agent,
            '''
        class NewSchema(BaseModel):
            result: str
        ''',
        )
        ```

    Note:
        On deserialization failure, logs warning and keeps original.
        On constraint violation, keeps original.
        Never raises exceptions - graceful degradation.
    """
    original = getattr(agent, "output_schema", None)

    try:
        new_schema = deserialize_schema(value)

        # Validate against constraints if set
        if self._constraints is not None:
            is_valid, violations = validate_schema_against_constraints(
                new_schema, original, self._constraints
            )
            if not is_valid:
                logger.warning(
                    "output_schema_handler.apply.constraint_violation",
                    violations=violations,
                    schema_preview=value[:100] if value else "",
                )
                # Keep original - constraint violation
                return original

        agent.output_schema = new_schema
        logger.debug(
            "output_schema_handler.apply",
            original_name=original.__name__ if original else None,
            new_name=new_schema.__name__,
        )
    except SchemaValidationError as e:
        logger.warning(
            "output_schema_handler.apply.failed",
            error=str(e),
            schema_preview=value[:100] if value else "",
        )
        # Keep original - don't modify agent

    return original

restore ¶

restore(agent: 'LlmAgent', original: Any) -> None

Restore original output schema to agent.

PARAMETER	DESCRIPTION
`agent`	The LlmAgent instance to restore. TYPE: `'LlmAgent'`
`original`	The original output_schema (class or None). TYPE: `Any`

Examples:

handler.restore(agent, original_schema)
# agent.output_schema is back to original

Source code in src/gepa_adk/adapters/component_handlers.py

def restore(self, agent: "LlmAgent", original: Any) -> None:
    """Restore original output schema to agent.

    Args:
        agent: The LlmAgent instance to restore.
        original: The original output_schema (class or None).

    Examples:
        ```python
        handler.restore(agent, original_schema)
        # agent.output_schema is back to original
        ```
    """
    agent.output_schema = original
    logger.debug(
        "output_schema_handler.restore",
        schema_name=original.__name__ if original else None,
    )

AllComponentSelector ¶

Selects all available components for simultaneous update.

This selector returns the full list of components every time, enabling simultaneous evolution of all parts of the candidate.

Examples:

selector = AllComponentSelector()
all_comps = await selector.select_components(["a", "b"], 1, 0)
# Returns ["a", "b"]

Note

Always returns all components, enabling comprehensive mutations across the entire candidate in a single iteration.

Source code in src/gepa_adk/adapters/component_selector.py

class AllComponentSelector:
    """Selects all available components for simultaneous update.

    This selector returns the full list of components every time, enabling
    simultaneous evolution of all parts of the candidate.

    Examples:
        ```python
        selector = AllComponentSelector()
        all_comps = await selector.select_components(["a", "b"], 1, 0)
        # Returns ["a", "b"]
        ```

    Note:
        Always returns all components, enabling comprehensive mutations
        across the entire candidate in a single iteration.
    """

    async def select_components(
        self, components: list[str], iteration: int, candidate_idx: int
    ) -> list[str]:
        """Select all components to update.

        Args:
            components: List of available component keys.
            iteration: Current global iteration number (unused).
            candidate_idx: Index of the candidate being evolved (unused).

        Returns:
            List containing all component keys.

        Raises:
            ValueError: If components list is empty.

        Examples:
            ```python
            selected = await selector.select_components(["a", "b"], 1, 0)
            ```

        Note:
            Outputs the complete component list unchanged, enabling
            simultaneous evolution of all candidate parts.
        """
        if not components:
            raise ValueError("No components provided for selection")

        return list(components)

select_components `async` ¶

select_components(
    components: list[str],
    iteration: int,
    candidate_idx: int,
) -> list[str]

Select all components to update.

PARAMETER	DESCRIPTION
`components`	List of available component keys. TYPE: `list[str]`
`iteration`	Current global iteration number (unused). TYPE: `int`
`candidate_idx`	Index of the candidate being evolved (unused). TYPE: `int`

RETURNS	DESCRIPTION
`list[str]`	List containing all component keys.

RAISES	DESCRIPTION
`ValueError`	If components list is empty.

Examples:

selected = await selector.select_components(["a", "b"], 1, 0)

Note

Outputs the complete component list unchanged, enabling simultaneous evolution of all candidate parts.

Source code in src/gepa_adk/adapters/component_selector.py

async def select_components(
    self, components: list[str], iteration: int, candidate_idx: int
) -> list[str]:
    """Select all components to update.

    Args:
        components: List of available component keys.
        iteration: Current global iteration number (unused).
        candidate_idx: Index of the candidate being evolved (unused).

    Returns:
        List containing all component keys.

    Raises:
        ValueError: If components list is empty.

    Examples:
        ```python
        selected = await selector.select_components(["a", "b"], 1, 0)
        ```

    Note:
        Outputs the complete component list unchanged, enabling
        simultaneous evolution of all candidate parts.
    """
    if not components:
        raise ValueError("No components provided for selection")

    return list(components)

RoundRobinComponentSelector ¶

Selects components in a round-robin fashion.

This selector cycles through the list of components one by one, maintaining state per candidate index to ensure consistent rotation.

ATTRIBUTE	DESCRIPTION
`_next_index`	Mapping of candidate_idx to next component index. TYPE: `dict[int, int]`

Examples:

selector = RoundRobinComponentSelector()
# First call selects first component
c1 = await selector.select_components(["a", "b"], 1, 0)  # ["a"]
# Second call selects second component
c2 = await selector.select_components(["a", "b"], 2, 0)  # ["b"]

Note

Alternates through components sequentially, ensuring balanced evolution across all candidate parts.

Source code in src/gepa_adk/adapters/component_selector.py

class RoundRobinComponentSelector:
    """Selects components in a round-robin fashion.

    This selector cycles through the list of components one by one, maintaining
    state per candidate index to ensure consistent rotation.

    Attributes:
        _next_index (dict[int, int]): Mapping of candidate_idx to next component index.

    Examples:
        ```python
        selector = RoundRobinComponentSelector()
        # First call selects first component
        c1 = await selector.select_components(["a", "b"], 1, 0)  # ["a"]
        # Second call selects second component
        c2 = await selector.select_components(["a", "b"], 2, 0)  # ["b"]
        ```

    Note:
        Alternates through components sequentially, ensuring balanced evolution
        across all candidate parts.
    """

    def __init__(self) -> None:
        """Initialize the round-robin selector.

        Note:
            Creates empty index tracking dictionary for per-candidate rotation state.
        """
        self._next_index: dict[int, int] = defaultdict(int)

    async def select_components(
        self, components: list[str], iteration: int, candidate_idx: int
    ) -> list[str]:
        """Select a single component to update using round-robin logic.

        Args:
            components: List of available component keys.
            iteration: Current global iteration number (unused by this strategy).
            candidate_idx: Index of the candidate being evolved.

        Returns:
            List containing the single selected component key.

        Raises:
            ValueError: If components list is empty.

        Examples:
            ```python
            selected = await selector.select_components(["a", "b"], 1, 0)
            ```

        Note:
            Outputs one component per call, advancing the rotation index for
            the specified candidate.
        """
        if not components:
            raise ValueError("No components provided for selection")

        # Get current index for this candidate
        current_idx = self._next_index[candidate_idx]

        # Select component
        selected = components[current_idx % len(components)]

        # Advance index for next time
        self._next_index[candidate_idx] = (current_idx + 1) % len(components)

        return [selected]

init ¶

__init__() -> None

Initialize the round-robin selector.

Note

Creates empty index tracking dictionary for per-candidate rotation state.

Source code in src/gepa_adk/adapters/component_selector.py

def __init__(self) -> None:
    """Initialize the round-robin selector.

    Note:
        Creates empty index tracking dictionary for per-candidate rotation state.
    """
    self._next_index: dict[int, int] = defaultdict(int)

select_components `async` ¶

select_components(
    components: list[str],
    iteration: int,
    candidate_idx: int,
) -> list[str]

Select a single component to update using round-robin logic.

PARAMETER	DESCRIPTION
`components`	List of available component keys. TYPE: `list[str]`
`iteration`	Current global iteration number (unused by this strategy). TYPE: `int`
`candidate_idx`	Index of the candidate being evolved. TYPE: `int`

RETURNS	DESCRIPTION
`list[str]`	List containing the single selected component key.

RAISES	DESCRIPTION
`ValueError`	If components list is empty.

Examples:

selected = await selector.select_components(["a", "b"], 1, 0)

Note

Outputs one component per call, advancing the rotation index for the specified candidate.

Source code in src/gepa_adk/adapters/component_selector.py

async def select_components(
    self, components: list[str], iteration: int, candidate_idx: int
) -> list[str]:
    """Select a single component to update using round-robin logic.

    Args:
        components: List of available component keys.
        iteration: Current global iteration number (unused by this strategy).
        candidate_idx: Index of the candidate being evolved.

    Returns:
        List containing the single selected component key.

    Raises:
        ValueError: If components list is empty.

    Examples:
        ```python
        selected = await selector.select_components(["a", "b"], 1, 0)
        ```

    Note:
        Outputs one component per call, advancing the rotation index for
        the specified candidate.
    """
    if not components:
        raise ValueError("No components provided for selection")

    # Get current index for this candidate
    current_idx = self._next_index[candidate_idx]

    # Select component
    selected = components[current_idx % len(components)]

    # Advance index for next time
    self._next_index[candidate_idx] = (current_idx + 1) % len(components)

    return [selected]

CriticOutput ¶

Bases: BaseModel


              flowchart TD
              gepa_adk.adapters.CriticOutput[CriticOutput]

              

              click gepa_adk.adapters.CriticOutput href "" "gepa_adk.adapters.CriticOutput"

Advanced schema for structured critic feedback with dimensions.

This schema defines the expected JSON structure that critic agents should return when configured with output_schema. The score field is required, while other fields are optional and will be preserved in metadata.

ATTRIBUTE	DESCRIPTION
`score`	Score value between 0.0 and 1.0 (required). TYPE: `float`
`feedback`	Human-readable feedback text (optional). TYPE: `str`
`dimension_scores`	Per-dimension evaluation scores (optional). TYPE: `dict[str, float]`
`actionable_guidance`	Specific improvement suggestions (optional). TYPE: `str`

Examples:

Advanced critic output:

{
    "score": 0.75,
    "feedback": "Good response but could be more concise",
    "dimension_scores": {
        "accuracy": 0.9,
        "clarity": 0.6,
        "completeness": 0.8
    },
    "actionable_guidance": "Reduce response length by 30%"
}

Note

All critic agents using this schema must return structured JSON. When this schema is used as output_schema on an LlmAgent, the agent can ONLY reply and CANNOT use any tools. This is acceptable for critic agents focused on scoring.

See Also

SimpleCriticOutput: KISS schema with just score + feedback.

Source code in src/gepa_adk/adapters/critic_scorer.py

class CriticOutput(BaseModel):
    """Advanced schema for structured critic feedback with dimensions.

    This schema defines the expected JSON structure that critic agents
    should return when configured with output_schema. The score field is
    required, while other fields are optional and will be preserved in
    metadata.

    Attributes:
        score: Score value between 0.0 and 1.0 (required).
        feedback: Human-readable feedback text (optional).
        dimension_scores: Per-dimension evaluation scores (optional).
        actionable_guidance: Specific improvement suggestions (optional).

    Examples:
        Advanced critic output:

        ```json
        {
            "score": 0.75,
            "feedback": "Good response but could be more concise",
            "dimension_scores": {
                "accuracy": 0.9,
                "clarity": 0.6,
                "completeness": 0.8
            },
            "actionable_guidance": "Reduce response length by 30%"
        }
        ```

    Note:
        All critic agents using this schema must return structured JSON.
        When this schema is used as output_schema on an LlmAgent, the
        agent can ONLY reply and CANNOT use any tools. This is acceptable
        for critic agents focused on scoring.

    See Also:
        SimpleCriticOutput: KISS schema with just score + feedback.
    """

    score: float = Field(
        ...,
        ge=0.0,
        le=1.0,
        description="Score from 0.0 to 1.0",
    )
    feedback: str = Field(
        default="",
        description="Human-readable feedback",
    )
    dimension_scores: dict[str, float] = Field(
        default_factory=dict,
        description="Per-dimension scores",
    )
    actionable_guidance: str = Field(
        default="",
        description="Improvement suggestions",
    )

CriticScorer ¶

Adapter that wraps ADK critic agents to provide structured scoring.

CriticScorer implements the Scorer protocol, enabling integration with gepa-adk's evaluation and evolution workflows. It executes ADK critic agents (LlmAgent, SequentialAgent, etc.) and extracts structured scores with metadata from their outputs.

ATTRIBUTE	DESCRIPTION
`critic_agent`	ADK agent configured for evaluation. TYPE: `BaseAgent`
`_session_service`	Session service for state management. TYPE: `BaseSessionService`
`_app_name`	Application name for session identification. TYPE: `str`
`_logger`	Bound logger with scorer context. TYPE: `BoundLogger`

Examples:

Basic usage:

from google.adk.agents import LlmAgent
from gepa_adk.adapters.critic_scorer import CriticScorer, CriticOutput
from gepa_adk.adapters.agent_executor import AgentExecutor

critic = LlmAgent(
    name="quality_critic",
    model="gemini-2.5-flash",
    instruction="Evaluate response quality...",
    output_schema=CriticOutput,
)

executor = AgentExecutor()
scorer = CriticScorer(critic_agent=critic, executor=executor)
score, metadata = await scorer.async_score(
    input_text="What is Python?",
    output="Python is a programming language.",
)

Note

Adapter wraps ADK critic agents to provide structured scoring. Implements Scorer protocol for compatibility with evolution engine. Creates isolated sessions per scoring call unless session_id provided.

Source code in src/gepa_adk/adapters/critic_scorer.py

class CriticScorer:
    """Adapter that wraps ADK critic agents to provide structured scoring.

    CriticScorer implements the Scorer protocol, enabling integration with
    gepa-adk's evaluation and evolution workflows. It executes ADK critic
    agents (LlmAgent, SequentialAgent, etc.) and extracts structured scores
    with metadata from their outputs.

    Attributes:
        critic_agent (BaseAgent): ADK agent configured for evaluation.
        _session_service (BaseSessionService): Session service for state
            management.
        _app_name (str): Application name for session identification.
        _logger (structlog.BoundLogger): Bound logger with scorer context.

    Examples:
        Basic usage:

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters.critic_scorer import CriticScorer, CriticOutput
        from gepa_adk.adapters.agent_executor import AgentExecutor

        critic = LlmAgent(
            name="quality_critic",
            model="gemini-2.5-flash",
            instruction="Evaluate response quality...",
            output_schema=CriticOutput,
        )

        executor = AgentExecutor()
        scorer = CriticScorer(critic_agent=critic, executor=executor)
        score, metadata = await scorer.async_score(
            input_text="What is Python?",
            output="Python is a programming language.",
        )
        ```

    Note:
        Adapter wraps ADK critic agents to provide structured scoring.
        Implements Scorer protocol for compatibility with evolution engine.
        Creates isolated sessions per scoring call unless session_id provided.
    """

    def __init__(
        self,
        critic_agent: BaseAgent,
        executor: AgentExecutorProtocol,
        session_service: BaseSessionService | None = None,
        app_name: str = "critic_scorer",
    ) -> None:
        """Initialize CriticScorer with critic agent.

        Args:
            critic_agent: ADK agent (LlmAgent or workflow agent) configured
                for evaluation.
            executor: AgentExecutorProtocol implementation for unified agent
                execution. Handles session management and execution, enabling
                feature parity across all agent types.
            session_service: Optional session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for session identification.

        Raises:
            TypeError: If critic_agent is not a BaseAgent instance.
            ValueError: If app_name is empty string.

        Examples:
            Basic setup with executor:

            ```python
            from gepa_adk.adapters.agent_executor import AgentExecutor

            executor = AgentExecutor()
            scorer = CriticScorer(critic_agent=critic, executor=executor)
            ```

            With shared session service:

            ```python
            from google.adk.sessions import InMemorySessionService
            from gepa_adk.adapters.agent_executor import AgentExecutor

            session_service = InMemorySessionService()
            executor = AgentExecutor(session_service=session_service)
            scorer = CriticScorer(
                critic_agent=critic,
                executor=executor,
                session_service=session_service,
            )
            ```

        Note:
            Creates logger with scorer context and validates agent type.
        """
        if not isinstance(critic_agent, BaseAgent):
            raise TypeError(f"critic_agent must be BaseAgent, got {type(critic_agent)}")

        if not app_name or not app_name.strip():
            raise ValueError("app_name cannot be empty")

        self.critic_agent = critic_agent
        self._session_service = session_service or InMemorySessionService()
        self._app_name = app_name.strip()
        self._executor = executor

        # Bind logger with scorer context
        self._logger = logger.bind(
            scorer="CriticScorer",
            agent_name=self.critic_agent.name,
            app_name=self._app_name,
            uses_executor=True,  # Always true since executor is required
        )

        self._logger.info("scorer.initialized")

    def _format_critic_input(
        self,
        input_text: str,
        output: str,
        expected: str | None = None,
    ) -> str:
        """Format input for critic agent evaluation.

        Builds a prompt that presents the input query, agent output, and
        optionally the expected output for the critic to evaluate.

        Args:
            input_text: The original input provided to the agent being evaluated.
            output: The agent's generated output to score.
            expected: Optional expected/reference output for comparison.

        Returns:
            Formatted prompt string for the critic agent.

        Examples:
            Basic formatting:

            ```python
            prompt = scorer._format_critic_input(
                input_text="What is 2+2?",
                output="4",
                expected="4",
            )
            ```

        Note:
            Organizes input for critic evaluation with clearly labeled sections.
            Format is designed to give critic context for evaluation.
            Expected output is included only if provided.
        """
        parts = [
            "Input Query:",
            input_text,
            "",
            "Agent Output:",
            output,
        ]

        if expected is not None:
            parts.extend(
                [
                    "",
                    "Expected Output:",
                    expected,
                ]
            )

        parts.append("")
        parts.append(
            "Please evaluate the agent output and provide a score with feedback."
        )

        return "\n".join(parts)

    def _parse_critic_output(self, output_text: str) -> tuple[float, dict[str, Any]]:
        """Parse critic agent output and extract score with metadata.

        Parses the critic's output text as JSON and extracts the score field
        along with optional metadata (feedback, dimension_scores,
        actionable_guidance, and any additional fields).

        Args:
            output_text: Raw text output from critic agent.

        Returns:
            Tuple of (score, metadata) where:
            - score: Float value extracted from output
            - metadata: Dict containing feedback, dimension_scores,
                actionable_guidance, and any additional fields

        Raises:
            CriticOutputParseError: If output cannot be parsed as JSON.
            MissingScoreFieldError: If parsed JSON lacks required score field.

        Examples:
            Parse structured output:

            ```python
            output = '{"score": 0.75, "feedback": "Good", "dimension_scores": {"accuracy": 0.9}}'
            score, metadata = scorer._parse_critic_output(output)
            assert score == 0.75
            assert metadata["feedback"] == "Good"
            ```

        Note:
            Obtains score and metadata from critic JSON output with validation.
            Preserves all fields from parsed JSON in metadata, not just
            the known CriticOutput schema fields. This allows for extensibility.
        """
        # Parse JSON output
        try:
            parsed = json.loads(output_text)
        except json.JSONDecodeError as e:
            raise CriticOutputParseError(
                f"Critic output is not valid JSON: {e}",
                raw_output=output_text,
                parse_error=str(e),
                cause=e,
            ) from e

        # Validate parsed output is a dict
        if not isinstance(parsed, dict):
            raise CriticOutputParseError(
                f"Critic output must be a JSON object, got {type(parsed).__name__}",
                raw_output=output_text,
                parse_error="Not a JSON object",
            )

        # Extract required score field
        if "score" not in parsed:
            raise MissingScoreFieldError(
                "Critic output missing required 'score' field",
                parsed_output=parsed,
            )

        score = parsed["score"]
        if not isinstance(score, (int, float)):
            raise MissingScoreFieldError(
                f"Score field must be numeric, got {type(score).__name__}",
                parsed_output=parsed,
            )

        # Build metadata dict with known fields and any additional fields
        metadata: dict[str, Any] = {}

        # Extract known fields if present
        if "feedback" in parsed:
            metadata["feedback"] = str(parsed["feedback"])
        if "dimension_scores" in parsed:
            # Preserve dimension_scores as-is (may contain non-numeric values)
            metadata["dimension_scores"] = parsed["dimension_scores"]
        if "actionable_guidance" in parsed:
            metadata["actionable_guidance"] = str(parsed["actionable_guidance"])

        # Preserve any additional fields
        known_fields = {"score", "feedback", "dimension_scores", "actionable_guidance"}
        for key, value in parsed.items():
            if key not in known_fields:
                metadata[key] = value

        return float(score), metadata

    def _extract_json_from_text(self, text: str) -> str:
        """Extract JSON from text that may contain markdown code blocks.

        Minimal implementation - tries direct parse and markdown extraction.
        A more robust implementation will be added per GitHub issue #78.

        Args:
            text: Text that may contain JSON.

        Returns:
            Extracted JSON string, or original text if extraction fails.

        Note:
            Operates as a minimal JSON extractor; robust implementation planned
            per GitHub issue #78.
        """
        # Try parsing the entire text as-is
        try:
            json.loads(text.strip())
            return text.strip()
        except json.JSONDecodeError:
            pass

        # Extract from markdown code blocks (```json ... ``` or ``` ... ```)
        json_block_pattern = r"```(?:json)?\s*\n?(.*?)\n?```"
        matches = re.findall(json_block_pattern, text, re.DOTALL | re.IGNORECASE)
        for match in matches:
            try:
                json.loads(match.strip())
                return match.strip()
            except json.JSONDecodeError:
                continue

        # Try to find JSON object embedded in text (minimal regex for { ... })
        # Look for opening brace and try to find matching closing brace
        brace_start = text.find("{")
        if brace_start != -1:
            # Try to find the matching closing brace
            # NOTE: This algorithm doesn't account for braces within string literals
            # (e.g., JSON with template strings like "instruction": "Use {variable}").
            # This is a minimal implementation; a more robust parser will be added
            # per GitHub issue #78.
            depth = 0
            for i in range(brace_start, len(text)):
                if text[i] == "{":
                    depth += 1
                elif text[i] == "}":
                    depth -= 1
                    if depth == 0:
                        candidate = text[brace_start : i + 1]
                        try:
                            json.loads(candidate)
                            return candidate
                        except json.JSONDecodeError:
                            break

        # Return original text (will fail with clear error message)
        return text

    async def async_score(
        self,
        input_text: str,
        output: str,
        expected: str | None = None,
        session_id: str | None = None,
    ) -> tuple[float, dict[str, Any]]:
        """Score an agent output asynchronously using the critic agent.

        Executes the critic agent with formatted input and extracts structured
        score and metadata from the response.

        Args:
            input_text: The original input provided to the agent being evaluated.
            output: The agent's generated output to score.
            expected: Optional expected/reference output for comparison.
            session_id: Optional session ID to share state with main agent
                workflow. If None, creates an isolated session.

        Returns:
            Tuple of (score, metadata) where:
            - score: Float value, conventionally 0.0-1.0
            - metadata: Dict with feedback, dimension_scores,
                actionable_guidance, and any additional fields

        Raises:
            CriticOutputParseError: If critic output is not valid JSON.
            MissingScoreFieldError: If score field missing from output.

        Examples:
            Basic async scoring:

            ```python
            score, metadata = await scorer.async_score(
                input_text="What is Python?",
                output="Python is a programming language.",
            )
            ```

            With session sharing:

            ```python
            score, metadata = await scorer.async_score(
                input_text="...",
                output="...",
                session_id="existing_session_123",
            )
            ```

        Note:
            Orchestrates critic agent execution via AgentExecutor and extracts
            structured output. Creates isolated session unless session_id provided
            for state sharing.
        """
        self._logger.debug(
            "scorer.async_score.start",
            input_preview=input_text[:50] if input_text else "",
            output_preview=output[:50] if output else "",
            has_expected=expected is not None,
            session_id=session_id,
        )

        # Format input for critic
        critic_input = self._format_critic_input(input_text, output, expected)

        # Execute via AgentExecutor
        result = await self._executor.execute_agent(
            agent=self.critic_agent,
            input_text=critic_input,
            existing_session_id=session_id,
        )

        if result.status == ExecutionStatus.FAILED:
            raise ScoringError(
                f"Critic agent execution failed: {result.error_message}",
            )

        final_output = result.extracted_value

        if not final_output:
            raise ScoringError("Critic agent returned empty output")

        # Parse output and extract score
        try:
            score, metadata = self._parse_critic_output(final_output)
        except (CriticOutputParseError, MissingScoreFieldError) as e:
            self._logger.error(
                "scorer.async_score.parse_error",
                error=str(e),
                error_type=type(e).__name__,
            )
            raise

        # Log multi-dimensional scoring context if present
        log_context: dict[str, Any] = {
            "score": score,
            "has_feedback": "feedback" in metadata,
            "has_dimension_scores": "dimension_scores" in metadata,
            "has_actionable_guidance": "actionable_guidance" in metadata,
        }
        if "dimension_scores" in metadata:
            log_context["dimension_count"] = len(metadata["dimension_scores"])

        self._logger.info(
            "scorer.async_score.complete",
            **log_context,
        )

        return score, metadata

    def score(
        self,
        input_text: str,
        output: str,
        expected: str | None = None,
    ) -> tuple[float, dict[str, Any]]:
        """Score an agent output synchronously using the critic agent.

        Synchronous wrapper around async_score() using asyncio.run().

        Args:
            input_text: The original input provided to the agent being evaluated.
            output: The agent's generated output to score.
            expected: Optional expected/reference output for comparison.

        Returns:
            Tuple of (score, metadata) where:
            - score: Float value, conventionally 0.0-1.0
            - metadata: Dict with feedback, dimension_scores,
                actionable_guidance, and any additional fields

        Raises:
            CriticOutputParseError: If critic output is not valid JSON.
            MissingScoreFieldError: If score field missing from output.

        Examples:
            Basic sync scoring:

            ```python
            score, metadata = scorer.score(
                input_text="What is 2+2?",
                output="4",
                expected="4",
            )
            ```

        Note:
            Operates synchronously by wrapping async_score() with asyncio.run().
            Uses asyncio.run() to execute async_score(). Prefer async_score()
            for better performance in async contexts.
        """
        return asyncio.run(self.async_score(input_text, output, expected))

init ¶

__init__(
    critic_agent: BaseAgent,
    executor: AgentExecutorProtocol,
    session_service: BaseSessionService | None = None,
    app_name: str = "critic_scorer",
) -> None

Initialize CriticScorer with critic agent.

PARAMETER	DESCRIPTION
`critic_agent`	ADK agent (LlmAgent or workflow agent) configured for evaluation. TYPE: `BaseAgent`
`executor`	AgentExecutorProtocol implementation for unified agent execution. Handles session management and execution, enabling feature parity across all agent types. TYPE: `AgentExecutorProtocol`
`session_service`	Optional session service for state management. If None, creates an InMemorySessionService. TYPE: `BaseSessionService \| None` DEFAULT: `None`
`app_name`	Application name for session identification. TYPE: `str` DEFAULT: `'critic_scorer'`

RAISES	DESCRIPTION
`TypeError`	If critic_agent is not a BaseAgent instance.
`ValueError`	If app_name is empty string.

Examples:

Basic setup with executor:

from gepa_adk.adapters.agent_executor import AgentExecutor

executor = AgentExecutor()
scorer = CriticScorer(critic_agent=critic, executor=executor)

With shared session service:

from google.adk.sessions import InMemorySessionService
from gepa_adk.adapters.agent_executor import AgentExecutor

session_service = InMemorySessionService()
executor = AgentExecutor(session_service=session_service)
scorer = CriticScorer(
    critic_agent=critic,
    executor=executor,
    session_service=session_service,
)

Note

Creates logger with scorer context and validates agent type.

Source code in src/gepa_adk/adapters/critic_scorer.py

def __init__(
    self,
    critic_agent: BaseAgent,
    executor: AgentExecutorProtocol,
    session_service: BaseSessionService | None = None,
    app_name: str = "critic_scorer",
) -> None:
    """Initialize CriticScorer with critic agent.

    Args:
        critic_agent: ADK agent (LlmAgent or workflow agent) configured
            for evaluation.
        executor: AgentExecutorProtocol implementation for unified agent
            execution. Handles session management and execution, enabling
            feature parity across all agent types.
        session_service: Optional session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for session identification.

    Raises:
        TypeError: If critic_agent is not a BaseAgent instance.
        ValueError: If app_name is empty string.

    Examples:
        Basic setup with executor:

        ```python
        from gepa_adk.adapters.agent_executor import AgentExecutor

        executor = AgentExecutor()
        scorer = CriticScorer(critic_agent=critic, executor=executor)
        ```

        With shared session service:

        ```python
        from google.adk.sessions import InMemorySessionService
        from gepa_adk.adapters.agent_executor import AgentExecutor

        session_service = InMemorySessionService()
        executor = AgentExecutor(session_service=session_service)
        scorer = CriticScorer(
            critic_agent=critic,
            executor=executor,
            session_service=session_service,
        )
        ```

    Note:
        Creates logger with scorer context and validates agent type.
    """
    if not isinstance(critic_agent, BaseAgent):
        raise TypeError(f"critic_agent must be BaseAgent, got {type(critic_agent)}")

    if not app_name or not app_name.strip():
        raise ValueError("app_name cannot be empty")

    self.critic_agent = critic_agent
    self._session_service = session_service or InMemorySessionService()
    self._app_name = app_name.strip()
    self._executor = executor

    # Bind logger with scorer context
    self._logger = logger.bind(
        scorer="CriticScorer",
        agent_name=self.critic_agent.name,
        app_name=self._app_name,
        uses_executor=True,  # Always true since executor is required
    )

    self._logger.info("scorer.initialized")

async_score `async` ¶

async_score(
    input_text: str,
    output: str,
    expected: str | None = None,
    session_id: str | None = None,
) -> tuple[float, dict[str, Any]]

Score an agent output asynchronously using the critic agent.

Executes the critic agent with formatted input and extracts structured score and metadata from the response.

PARAMETER	DESCRIPTION
`input_text`	The original input provided to the agent being evaluated. TYPE: `str`
`output`	The agent's generated output to score. TYPE: `str`
`expected`	Optional expected/reference output for comparison. TYPE: `str \| None` DEFAULT: `None`
`session_id`	Optional session ID to share state with main agent workflow. If None, creates an isolated session. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`float`	Tuple of (score, metadata) where:
`dict[str, Any]`	score: Float value, conventionally 0.0-1.0
`tuple[float, dict[str, Any]]`	metadata: Dict with feedback, dimension_scores, actionable_guidance, and any additional fields

RAISES	DESCRIPTION
`CriticOutputParseError`	If critic output is not valid JSON.
`MissingScoreFieldError`	If score field missing from output.

Examples:

Basic async scoring:

score, metadata = await scorer.async_score(
    input_text="What is Python?",
    output="Python is a programming language.",
)

With session sharing:

score, metadata = await scorer.async_score(
    input_text="...",
    output="...",
    session_id="existing_session_123",
)

Note

Orchestrates critic agent execution via AgentExecutor and extracts structured output. Creates isolated session unless session_id provided for state sharing.

Source code in src/gepa_adk/adapters/critic_scorer.py

async def async_score(
    self,
    input_text: str,
    output: str,
    expected: str | None = None,
    session_id: str | None = None,
) -> tuple[float, dict[str, Any]]:
    """Score an agent output asynchronously using the critic agent.

    Executes the critic agent with formatted input and extracts structured
    score and metadata from the response.

    Args:
        input_text: The original input provided to the agent being evaluated.
        output: The agent's generated output to score.
        expected: Optional expected/reference output for comparison.
        session_id: Optional session ID to share state with main agent
            workflow. If None, creates an isolated session.

    Returns:
        Tuple of (score, metadata) where:
        - score: Float value, conventionally 0.0-1.0
        - metadata: Dict with feedback, dimension_scores,
            actionable_guidance, and any additional fields

    Raises:
        CriticOutputParseError: If critic output is not valid JSON.
        MissingScoreFieldError: If score field missing from output.

    Examples:
        Basic async scoring:

        ```python
        score, metadata = await scorer.async_score(
            input_text="What is Python?",
            output="Python is a programming language.",
        )
        ```

        With session sharing:

        ```python
        score, metadata = await scorer.async_score(
            input_text="...",
            output="...",
            session_id="existing_session_123",
        )
        ```

    Note:
        Orchestrates critic agent execution via AgentExecutor and extracts
        structured output. Creates isolated session unless session_id provided
        for state sharing.
    """
    self._logger.debug(
        "scorer.async_score.start",
        input_preview=input_text[:50] if input_text else "",
        output_preview=output[:50] if output else "",
        has_expected=expected is not None,
        session_id=session_id,
    )

    # Format input for critic
    critic_input = self._format_critic_input(input_text, output, expected)

    # Execute via AgentExecutor
    result = await self._executor.execute_agent(
        agent=self.critic_agent,
        input_text=critic_input,
        existing_session_id=session_id,
    )

    if result.status == ExecutionStatus.FAILED:
        raise ScoringError(
            f"Critic agent execution failed: {result.error_message}",
        )

    final_output = result.extracted_value

    if not final_output:
        raise ScoringError("Critic agent returned empty output")

    # Parse output and extract score
    try:
        score, metadata = self._parse_critic_output(final_output)
    except (CriticOutputParseError, MissingScoreFieldError) as e:
        self._logger.error(
            "scorer.async_score.parse_error",
            error=str(e),
            error_type=type(e).__name__,
        )
        raise

    # Log multi-dimensional scoring context if present
    log_context: dict[str, Any] = {
        "score": score,
        "has_feedback": "feedback" in metadata,
        "has_dimension_scores": "dimension_scores" in metadata,
        "has_actionable_guidance": "actionable_guidance" in metadata,
    }
    if "dimension_scores" in metadata:
        log_context["dimension_count"] = len(metadata["dimension_scores"])

    self._logger.info(
        "scorer.async_score.complete",
        **log_context,
    )

    return score, metadata

score ¶

score(
    input_text: str,
    output: str,
    expected: str | None = None,
) -> tuple[float, dict[str, Any]]

Score an agent output synchronously using the critic agent.

Synchronous wrapper around async_score() using asyncio.run().

PARAMETER	DESCRIPTION
`input_text`	The original input provided to the agent being evaluated. TYPE: `str`
`output`	The agent's generated output to score. TYPE: `str`
`expected`	Optional expected/reference output for comparison. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`float`	Tuple of (score, metadata) where:
`dict[str, Any]`	score: Float value, conventionally 0.0-1.0
`tuple[float, dict[str, Any]]`	metadata: Dict with feedback, dimension_scores, actionable_guidance, and any additional fields

RAISES	DESCRIPTION
`CriticOutputParseError`	If critic output is not valid JSON.
`MissingScoreFieldError`	If score field missing from output.

Examples:

Basic sync scoring:

score, metadata = scorer.score(
    input_text="What is 2+2?",
    output="4",
    expected="4",
)

Note

Operates synchronously by wrapping async_score() with asyncio.run(). Uses asyncio.run() to execute async_score(). Prefer async_score() for better performance in async contexts.

Source code in src/gepa_adk/adapters/critic_scorer.py

def score(
    self,
    input_text: str,
    output: str,
    expected: str | None = None,
) -> tuple[float, dict[str, Any]]:
    """Score an agent output synchronously using the critic agent.

    Synchronous wrapper around async_score() using asyncio.run().

    Args:
        input_text: The original input provided to the agent being evaluated.
        output: The agent's generated output to score.
        expected: Optional expected/reference output for comparison.

    Returns:
        Tuple of (score, metadata) where:
        - score: Float value, conventionally 0.0-1.0
        - metadata: Dict with feedback, dimension_scores,
            actionable_guidance, and any additional fields

    Raises:
        CriticOutputParseError: If critic output is not valid JSON.
        MissingScoreFieldError: If score field missing from output.

    Examples:
        Basic sync scoring:

        ```python
        score, metadata = scorer.score(
            input_text="What is 2+2?",
            output="4",
            expected="4",
        )
        ```

    Note:
        Operates synchronously by wrapping async_score() with asyncio.run().
        Uses asyncio.run() to execute async_score(). Prefer async_score()
        for better performance in async contexts.
    """
    return asyncio.run(self.async_score(input_text, output, expected))

SimpleCriticOutput ¶

Bases: BaseModel


              flowchart TD
              gepa_adk.adapters.SimpleCriticOutput[SimpleCriticOutput]

              

              click gepa_adk.adapters.SimpleCriticOutput href "" "gepa_adk.adapters.SimpleCriticOutput"

KISS schema for basic critic feedback.

This is the minimal schema for critic agents that only need to provide a score and text feedback. Use this for straightforward evaluation tasks where dimension breakdowns are not needed.

ATTRIBUTE	DESCRIPTION
`score`	Score value between 0.0 and 1.0 (required). TYPE: `float`
`feedback`	Human-readable feedback text (required). TYPE: `str`

Examples:

Simple critic output:

{
    "score": 0.75,
    "feedback": "Good response but could be more concise."
}

Using with LlmAgent:

from google.adk.agents import LlmAgent
from gepa_adk.adapters.critic_scorer import SimpleCriticOutput

critic = LlmAgent(
    name="simple_critic",
    model="gemini-2.5-flash",
    instruction=SIMPLE_CRITIC_INSTRUCTION,
    output_schema=SimpleCriticOutput,
)

Note

Applies to basic evaluation tasks where only a score and feedback are needed. For more detailed evaluations with dimension scores, use CriticOutput instead.

See Also

CriticOutput: Advanced schema with dimension scores and guidance.

Source code in src/gepa_adk/adapters/critic_scorer.py

class SimpleCriticOutput(BaseModel):
    """KISS schema for basic critic feedback.

    This is the minimal schema for critic agents that only need to provide
    a score and text feedback. Use this for straightforward evaluation tasks
    where dimension breakdowns are not needed.

    Attributes:
        score: Score value between 0.0 and 1.0 (required).
        feedback: Human-readable feedback text (required).

    Examples:
        Simple critic output:

        ```json
        {
            "score": 0.75,
            "feedback": "Good response but could be more concise."
        }
        ```

        Using with LlmAgent:

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters.critic_scorer import SimpleCriticOutput

        critic = LlmAgent(
            name="simple_critic",
            model="gemini-2.5-flash",
            instruction=SIMPLE_CRITIC_INSTRUCTION,
            output_schema=SimpleCriticOutput,
        )
        ```

    Note:
        Applies to basic evaluation tasks where only a score and feedback
        are needed. For more detailed evaluations with dimension scores,
        use CriticOutput instead.

    See Also:
        CriticOutput: Advanced schema with dimension scores and guidance.
    """

    score: float = Field(
        ...,
        ge=0.0,
        le=1.0,
        description="Score from 0.0 to 1.0",
    )
    feedback: str = Field(
        ...,
        description="Human-readable feedback explaining the score",
    )

FullEvaluationPolicy ¶

Evaluation policy that scores all validation examples every iteration.

This is the default evaluation policy, providing complete visibility into solution performance across all validation examples.

Note

Always returns all valset IDs, ensuring complete evaluation coverage each iteration.

Examples:

policy = FullEvaluationPolicy()
batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
# Returns: [0, 1, 2, 3, 4]

Source code in src/gepa_adk/adapters/evaluation_policy.py

class FullEvaluationPolicy:
    """Evaluation policy that scores all validation examples every iteration.

    This is the default evaluation policy, providing complete visibility
    into solution performance across all validation examples.

    Note:
        Always returns all valset IDs, ensuring complete evaluation coverage
        each iteration.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
        # Returns: [0, 1, 2, 3, 4]
        ```
    """

    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        """Return all validation example indices.

        Args:
            valset_ids (Sequence[int]): All available validation example indices.
            state (ParetoState): Current evolution state (unused for full evaluation).
            target_candidate_idx (int | None): Optional candidate being evaluated
                (unused).

        Returns:
            list[int]: List of all valset_ids.

        Note:
            Outputs the complete valset for comprehensive evaluation coverage.

        Examples:
            ```python
            policy = FullEvaluationPolicy()
            batch = policy.get_eval_batch([0, 1, 2], state)
            assert batch == [0, 1, 2]
            ```
        """
        return list(valset_ids)

    def get_best_candidate(self, state: ParetoState) -> int:
        """Return index of candidate with highest average score.

        Args:
            state: Current evolution state with candidate scores.

        Returns:
            Index of best performing candidate.

        Raises:
            NoCandidateAvailableError: If state has no candidates.

        Note:
            Outputs the candidate index with the highest mean score across
            all evaluated examples.

        Examples:
            ```python
            policy = FullEvaluationPolicy()
            best_idx = policy.get_best_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available")

        best_idx = None
        best_score = float("-inf")
        for candidate_idx in range(len(state.candidates)):
            score = self.get_valset_score(candidate_idx, state)
            if score > best_score:
                best_score = score
                best_idx = candidate_idx

        if best_idx is None:
            raise NoCandidateAvailableError("No scored candidates available")
        return best_idx

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        """Return mean score across all evaluated examples for a candidate.

        Args:
            candidate_idx (int): Index of candidate to score.
            state (ParetoState): Current evolution state.

        Returns:
            float: Mean score across all examples, or float('-inf') if no scores.

        Note:
            Outputs the arithmetic mean of all scores for the candidate,
            or negative infinity if no scores exist.

        Examples:
            ```python
            policy = FullEvaluationPolicy()
            score = policy.get_valset_score(0, state)
            ```
        """
        scores = state.candidate_scores.get(candidate_idx)
        if not scores:
            return float("-inf")
        return fmean(scores.values())

get_eval_batch ¶

get_eval_batch(
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]

Return all validation example indices.

PARAMETER	DESCRIPTION
`valset_ids`	All available validation example indices. TYPE: `Sequence[int]`
`state`	Current evolution state (unused for full evaluation). TYPE: `ParetoState`
`target_candidate_idx`	Optional candidate being evaluated (unused). TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[int]`	list[int]: List of all valset_ids.

Note

Outputs the complete valset for comprehensive evaluation coverage.

Examples:

policy = FullEvaluationPolicy()
batch = policy.get_eval_batch([0, 1, 2], state)
assert batch == [0, 1, 2]

Source code in src/gepa_adk/adapters/evaluation_policy.py

def get_eval_batch(
    self,
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]:
    """Return all validation example indices.

    Args:
        valset_ids (Sequence[int]): All available validation example indices.
        state (ParetoState): Current evolution state (unused for full evaluation).
        target_candidate_idx (int | None): Optional candidate being evaluated
            (unused).

    Returns:
        list[int]: List of all valset_ids.

    Note:
        Outputs the complete valset for comprehensive evaluation coverage.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        batch = policy.get_eval_batch([0, 1, 2], state)
        assert batch == [0, 1, 2]
        ```
    """
    return list(valset_ids)

get_best_candidate ¶

get_best_candidate(state: ParetoState) -> int

Return index of candidate with highest average score.

PARAMETER	DESCRIPTION
`state`	Current evolution state with candidate scores. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`int`	Index of best performing candidate.

RAISES	DESCRIPTION
`NoCandidateAvailableError`	If state has no candidates.

Note

Outputs the candidate index with the highest mean score across all evaluated examples.

Examples:

policy = FullEvaluationPolicy()
best_idx = policy.get_best_candidate(state)

Source code in src/gepa_adk/adapters/evaluation_policy.py

def get_best_candidate(self, state: ParetoState) -> int:
    """Return index of candidate with highest average score.

    Args:
        state: Current evolution state with candidate scores.

    Returns:
        Index of best performing candidate.

    Raises:
        NoCandidateAvailableError: If state has no candidates.

    Note:
        Outputs the candidate index with the highest mean score across
        all evaluated examples.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        best_idx = policy.get_best_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available")

    best_idx = None
    best_score = float("-inf")
    for candidate_idx in range(len(state.candidates)):
        score = self.get_valset_score(candidate_idx, state)
        if score > best_score:
            best_score = score
            best_idx = candidate_idx

    if best_idx is None:
        raise NoCandidateAvailableError("No scored candidates available")
    return best_idx

get_valset_score ¶

get_valset_score(
    candidate_idx: int, state: ParetoState
) -> float

Return mean score across all evaluated examples for a candidate.

PARAMETER	DESCRIPTION
`candidate_idx`	Index of candidate to score. TYPE: `int`
`state`	Current evolution state. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`float`	Mean score across all examples, or float('-inf') if no scores. TYPE: `float`

Note

Outputs the arithmetic mean of all scores for the candidate, or negative infinity if no scores exist.

Examples:

policy = FullEvaluationPolicy()
score = policy.get_valset_score(0, state)

Source code in src/gepa_adk/adapters/evaluation_policy.py

def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
    """Return mean score across all evaluated examples for a candidate.

    Args:
        candidate_idx (int): Index of candidate to score.
        state (ParetoState): Current evolution state.

    Returns:
        float: Mean score across all examples, or float('-inf') if no scores.

    Note:
        Outputs the arithmetic mean of all scores for the candidate,
        or negative infinity if no scores exist.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        score = policy.get_valset_score(0, state)
        ```
    """
    scores = state.candidate_scores.get(candidate_idx)
    if not scores:
        return float("-inf")
    return fmean(scores.values())

SubsetEvaluationPolicy ¶

Evaluation policy that scores a configurable subset with round-robin coverage.

This policy reduces evaluation cost for large validation sets by evaluating only a subset of examples per iteration, using round-robin selection to ensure all examples are eventually covered.

ATTRIBUTE	DESCRIPTION
`subset_size`	If int, absolute count of examples to evaluate per iteration. If float (0.0-1.0), fraction of total valset size. TYPE: `int \| float`
`_offset`	Internal state tracking current position for round-robin selection. TYPE: `int`

Note

Advances offset each iteration to provide round-robin coverage across the full valset over multiple iterations.

Examples:

# Evaluate 20% of valset per iteration
policy = SubsetEvaluationPolicy(subset_size=0.2)

# Evaluate exactly 5 examples per iteration
policy = SubsetEvaluationPolicy(subset_size=5)

Source code in src/gepa_adk/adapters/evaluation_policy.py

class SubsetEvaluationPolicy:
    """Evaluation policy that scores a configurable subset with round-robin coverage.

    This policy reduces evaluation cost for large validation sets by evaluating
    only a subset of examples per iteration, using round-robin selection to
    ensure all examples are eventually covered.

    Attributes:
        subset_size (int | float): If int, absolute count of examples to evaluate
            per iteration. If float (0.0-1.0), fraction of total valset size.
        _offset (int): Internal state tracking current position for round-robin
            selection.

    Note:
        Advances offset each iteration to provide round-robin coverage across
        the full valset over multiple iterations.

    Examples:
        ```python
        # Evaluate 20% of valset per iteration
        policy = SubsetEvaluationPolicy(subset_size=0.2)

        # Evaluate exactly 5 examples per iteration
        policy = SubsetEvaluationPolicy(subset_size=5)
        ```
    """

    def __init__(self, subset_size: int | float = 0.2) -> None:
        """Initialize subset evaluation policy.

        Args:
            subset_size: If int, evaluate this many examples per iteration.
                If float, evaluate this fraction of total valset.
                Default: 0.2 (20% of valset per iteration).

        Note:
            Creates policy with initial offset of 0 for round-robin selection.
        """
        self.subset_size = subset_size
        self._offset = 0

    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        """Return subset of validation example indices with round-robin selection.

        Args:
            valset_ids: All available validation example indices.
            state: Current evolution state (unused for subset selection).
            target_candidate_idx: Optional candidate being evaluated (unused).

        Returns:
            List of example indices to evaluate this iteration.
            Uses round-robin to ensure all examples are eventually covered.

        Raises:
            ValueError: If subset_size is outside the allowed range.

        Note:
            Outputs a subset of valset IDs starting at the current offset,
            wrapping around if needed to provide round-robin coverage.

        Examples:
            ```python
            policy = SubsetEvaluationPolicy(subset_size=0.25)
            batch = policy.get_eval_batch(list(range(8)), state)
            assert len(batch) == 2
            ```
        """
        if not valset_ids:
            return []

        total_size = len(valset_ids)

        # Calculate subset count
        if isinstance(self.subset_size, float):
            if not (0.0 < self.subset_size <= 1.0):
                raise ValueError(
                    f"subset_size float must be in (0.0, 1.0], got {self.subset_size}"
                )
            subset_count = max(1, int(self.subset_size * total_size))
        else:
            subset_count = self.subset_size

        # Fallback to full evaluation if subset_size exceeds valset size
        if subset_count >= total_size:
            return list(valset_ids)

        # Round-robin selection: slice starting at _offset, wrapping around
        result: list[int] = []
        for i in range(subset_count):
            idx = (self._offset + i) % total_size
            result.append(valset_ids[idx])

        # Advance offset for next iteration
        self._offset = (self._offset + subset_count) % total_size

        return result

    def get_best_candidate(self, state: ParetoState) -> int:
        """Return index of candidate with highest average score.

        Args:
            state (ParetoState): Current evolution state with candidate scores.

        Returns:
            int: Index of best performing candidate.

        Raises:
            NoCandidateAvailableError: If state has no candidates.

        Note:
            Outputs the candidate index with the highest mean score across
            evaluated examples, consistent with FullEvaluationPolicy behavior.

        Examples:
            ```python
            policy = SubsetEvaluationPolicy()
            best_idx = policy.get_best_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available")

        best_idx = None
        best_score = float("-inf")
        for candidate_idx in range(len(state.candidates)):
            score = self.get_valset_score(candidate_idx, state)
            if score > best_score:
                best_score = score
                best_idx = candidate_idx

        if best_idx is None:
            raise NoCandidateAvailableError("No scored candidates available")
        return best_idx

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        """Return mean score across evaluated examples for a candidate.

        Args:
            candidate_idx (int): Index of candidate to score.
            state (ParetoState): Current evolution state.

        Returns:
            float: Mean score across evaluated examples, or float('-inf') if no scores.

        Note:
            Outputs the arithmetic mean of scores for the candidate across
            only the examples that were actually evaluated (subset).

        Examples:
            ```python
            policy = SubsetEvaluationPolicy()
            score = policy.get_valset_score(0, state)
            ```
        """
        scores = state.candidate_scores.get(candidate_idx)
        if not scores:
            return float("-inf")
        return fmean(scores.values())

init ¶

__init__(subset_size: int | float = 0.2) -> None

Initialize subset evaluation policy.

PARAMETER	DESCRIPTION
`subset_size`	If int, evaluate this many examples per iteration. If float, evaluate this fraction of total valset. Default: 0.2 (20% of valset per iteration). TYPE: `int \| float` DEFAULT: `0.2`

Note

Creates policy with initial offset of 0 for round-robin selection.

Source code in src/gepa_adk/adapters/evaluation_policy.py

def __init__(self, subset_size: int | float = 0.2) -> None:
    """Initialize subset evaluation policy.

    Args:
        subset_size: If int, evaluate this many examples per iteration.
            If float, evaluate this fraction of total valset.
            Default: 0.2 (20% of valset per iteration).

    Note:
        Creates policy with initial offset of 0 for round-robin selection.
    """
    self.subset_size = subset_size
    self._offset = 0

get_eval_batch ¶

get_eval_batch(
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]

Return subset of validation example indices with round-robin selection.

PARAMETER	DESCRIPTION
`valset_ids`	All available validation example indices. TYPE: `Sequence[int]`
`state`	Current evolution state (unused for subset selection). TYPE: `ParetoState`
`target_candidate_idx`	Optional candidate being evaluated (unused). TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[int]`	List of example indices to evaluate this iteration.
`list[int]`	Uses round-robin to ensure all examples are eventually covered.

RAISES	DESCRIPTION
`ValueError`	If subset_size is outside the allowed range.

Note

Outputs a subset of valset IDs starting at the current offset, wrapping around if needed to provide round-robin coverage.

Examples:

policy = SubsetEvaluationPolicy(subset_size=0.25)
batch = policy.get_eval_batch(list(range(8)), state)
assert len(batch) == 2

Source code in src/gepa_adk/adapters/evaluation_policy.py

def get_eval_batch(
    self,
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]:
    """Return subset of validation example indices with round-robin selection.

    Args:
        valset_ids: All available validation example indices.
        state: Current evolution state (unused for subset selection).
        target_candidate_idx: Optional candidate being evaluated (unused).

    Returns:
        List of example indices to evaluate this iteration.
        Uses round-robin to ensure all examples are eventually covered.

    Raises:
        ValueError: If subset_size is outside the allowed range.

    Note:
        Outputs a subset of valset IDs starting at the current offset,
        wrapping around if needed to provide round-robin coverage.

    Examples:
        ```python
        policy = SubsetEvaluationPolicy(subset_size=0.25)
        batch = policy.get_eval_batch(list(range(8)), state)
        assert len(batch) == 2
        ```
    """
    if not valset_ids:
        return []

    total_size = len(valset_ids)

    # Calculate subset count
    if isinstance(self.subset_size, float):
        if not (0.0 < self.subset_size <= 1.0):
            raise ValueError(
                f"subset_size float must be in (0.0, 1.0], got {self.subset_size}"
            )
        subset_count = max(1, int(self.subset_size * total_size))
    else:
        subset_count = self.subset_size

    # Fallback to full evaluation if subset_size exceeds valset size
    if subset_count >= total_size:
        return list(valset_ids)

    # Round-robin selection: slice starting at _offset, wrapping around
    result: list[int] = []
    for i in range(subset_count):
        idx = (self._offset + i) % total_size
        result.append(valset_ids[idx])

    # Advance offset for next iteration
    self._offset = (self._offset + subset_count) % total_size

    return result

get_best_candidate ¶

get_best_candidate(state: ParetoState) -> int

Return index of candidate with highest average score.

PARAMETER	DESCRIPTION
`state`	Current evolution state with candidate scores. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`int`	Index of best performing candidate. TYPE: `int`

RAISES	DESCRIPTION
`NoCandidateAvailableError`	If state has no candidates.

Note

Outputs the candidate index with the highest mean score across evaluated examples, consistent with FullEvaluationPolicy behavior.

Examples:

policy = SubsetEvaluationPolicy()
best_idx = policy.get_best_candidate(state)

Source code in src/gepa_adk/adapters/evaluation_policy.py

def get_best_candidate(self, state: ParetoState) -> int:
    """Return index of candidate with highest average score.

    Args:
        state (ParetoState): Current evolution state with candidate scores.

    Returns:
        int: Index of best performing candidate.

    Raises:
        NoCandidateAvailableError: If state has no candidates.

    Note:
        Outputs the candidate index with the highest mean score across
        evaluated examples, consistent with FullEvaluationPolicy behavior.

    Examples:
        ```python
        policy = SubsetEvaluationPolicy()
        best_idx = policy.get_best_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available")

    best_idx = None
    best_score = float("-inf")
    for candidate_idx in range(len(state.candidates)):
        score = self.get_valset_score(candidate_idx, state)
        if score > best_score:
            best_score = score
            best_idx = candidate_idx

    if best_idx is None:
        raise NoCandidateAvailableError("No scored candidates available")
    return best_idx

get_valset_score ¶

get_valset_score(
    candidate_idx: int, state: ParetoState
) -> float

Return mean score across evaluated examples for a candidate.

PARAMETER	DESCRIPTION
`candidate_idx`	Index of candidate to score. TYPE: `int`
`state`	Current evolution state. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`float`	Mean score across evaluated examples, or float('-inf') if no scores. TYPE: `float`

Note

Outputs the arithmetic mean of scores for the candidate across only the examples that were actually evaluated (subset).

Examples:

policy = SubsetEvaluationPolicy()
score = policy.get_valset_score(0, state)

Source code in src/gepa_adk/adapters/evaluation_policy.py

def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
    """Return mean score across evaluated examples for a candidate.

    Args:
        candidate_idx (int): Index of candidate to score.
        state (ParetoState): Current evolution state.

    Returns:
        float: Mean score across evaluated examples, or float('-inf') if no scores.

    Note:
        Outputs the arithmetic mean of scores for the candidate across
        only the examples that were actually evaluated (subset).

    Examples:
        ```python
        policy = SubsetEvaluationPolicy()
        score = policy.get_valset_score(0, state)
        ```
    """
    scores = state.candidate_scores.get(candidate_idx)
    if not scores:
        return float("-inf")
    return fmean(scores.values())

MultiAgentAdapter ¶

Adapter for multi-agent pipeline evaluation with per-agent component routing.

Wraps multiple ADK agents into a SequentialAgent for evaluation, enabling session state sharing between agents. Implements AsyncGEPAAdapter protocol for use with AsyncGEPAEngine.

Supports per-agent component configuration via the components parameter, allowing different components to be evolved for each agent (e.g., evolve generator's instruction while evolving critic's generate_content_config).

ATTRIBUTE	DESCRIPTION
`agents`	Named ADK agents to evaluate together. TYPE: `dict[str, LlmAgent]`
`components`	Per-agent component configuration. TYPE: `ComponentsMapping`
`primary`	Name of agent whose output is used for scoring. TYPE: `str`
`scorer`	Scoring implementation (CriticScorer or similar). TYPE: `Scorer`
`share_session`	Whether agents share session state. TYPE: `bool`
`session_service`	Session service for state management. TYPE: `InMemorySessionService`
`trajectory_config`	Configuration for trajectory extraction. TYPE: `TrajectoryConfig \| None`
`_executor`	Optional unified executor for consistent agent execution. When None, uses legacy execution path. TYPE: `AgentExecutorProtocol \| None`
`_proposer`	Mutation proposer for generating improved instructions via LLM reflection. TYPE: `AsyncReflectiveMutationProposer`
`_logger`	Bound structlog logger with context. TYPE: `BoundLogger`

Examples:

Basic adapter setup with per-agent components (API v0.3.x):

from google.adk.agents import LlmAgent
from gepa_adk.adapters import MultiAgentAdapter
from gepa_adk.ports.scorer import Scorer

generator = LlmAgent(
    name="generator",
    model="gemini-2.5-flash",
    output_key="generated_code",
)
critic = LlmAgent(
    name="critic",
    model="gemini-2.5-flash",
    instruction="Review the code in {generated_code}.",
)
scorer = MyScorer()

adapter = MultiAgentAdapter(
    agents={"generator": generator, "critic": critic},
    primary="generator",
    scorer=scorer,
    components={
        "generator": ["instruction", "output_schema"],
        "critic": ["generate_content_config"],
    },
)

# Candidates use qualified names (agent.component format per ADR-012)
candidate = {
    "generator.instruction": "Generate high-quality code",
    "generator.output_schema": "class Output(BaseModel): ...",
    "critic.generate_content_config": "temperature: 0.3",
}
result = await adapter.evaluate(batch, candidate)

Exclude an agent from evolution:

adapter = MultiAgentAdapter(
    agents={"generator": gen, "validator": val},
    primary="generator",
    scorer=scorer,
    components={
        "generator": ["instruction"],
        "validator": [],  # Empty list = no evolution
    },
)

Note

Adheres to AsyncGEPAAdapter[dict[str, Any], MultiAgentTrajectory, str] protocol. All methods are async and follow ADK's async-first patterns.

Breaking Change (0.3.x): - agents parameter changed from list[LlmAgent] to dict[str, LlmAgent] - components parameter is now required - Candidate keys use qualified names (agent.component) instead of {agent_name}_instruction format

Session Sharing Behavior: - When share_session=True (default): Uses SequentialAgent to execute agents sequentially with shared InvocationContext. Earlier agents can write to session state (via output_key), later agents can read that state via template strings like {output_key} in their instructions. - When share_session=False: Each agent executes with an isolated session. Agents cannot access each other's outputs. This is useful when agents should not interfere with each other's state (EdgeCase-5: incompatible outputs behavior).

output_key State Propagation: - When an agent has output_key set, its final response is automatically saved to session.state[output_key]. - With share_session=True, subsequent agents can reference this via template strings: instruction="Process {output_key}". - With share_session=False, state is not shared and template references will not resolve (agents see empty or undefined state).

Source code in src/gepa_adk/adapters/multi_agent.py

class MultiAgentAdapter:
    """Adapter for multi-agent pipeline evaluation with per-agent component routing.

    Wraps multiple ADK agents into a SequentialAgent for evaluation,
    enabling session state sharing between agents. Implements
    AsyncGEPAAdapter protocol for use with AsyncGEPAEngine.

    Supports per-agent component configuration via the `components` parameter,
    allowing different components to be evolved for each agent (e.g., evolve
    generator's instruction while evolving critic's generate_content_config).

    Attributes:
        agents (dict[str, LlmAgent]): Named ADK agents to evaluate together.
        components (ComponentsMapping): Per-agent component configuration.
        primary (str): Name of agent whose output is used for scoring.
        scorer (Scorer): Scoring implementation (CriticScorer or similar).
        share_session (bool): Whether agents share session state.
        session_service (InMemorySessionService): Session service for state management.
        trajectory_config (TrajectoryConfig | None): Configuration for trajectory extraction.
        _executor (AgentExecutorProtocol | None): Optional unified executor for
            consistent agent execution. When None, uses legacy execution path.
        _proposer (AsyncReflectiveMutationProposer): Mutation proposer for
            generating improved instructions via LLM reflection.
        _logger (BoundLogger): Bound structlog logger with context.

    Examples:
        Basic adapter setup with per-agent components (API v0.3.x):

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters import MultiAgentAdapter
        from gepa_adk.ports.scorer import Scorer

        generator = LlmAgent(
            name="generator",
            model="gemini-2.5-flash",
            output_key="generated_code",
        )
        critic = LlmAgent(
            name="critic",
            model="gemini-2.5-flash",
            instruction="Review the code in {generated_code}.",
        )
        scorer = MyScorer()

        adapter = MultiAgentAdapter(
            agents={"generator": generator, "critic": critic},
            primary="generator",
            scorer=scorer,
            components={
                "generator": ["instruction", "output_schema"],
                "critic": ["generate_content_config"],
            },
        )

        # Candidates use qualified names (agent.component format per ADR-012)
        candidate = {
            "generator.instruction": "Generate high-quality code",
            "generator.output_schema": "class Output(BaseModel): ...",
            "critic.generate_content_config": "temperature: 0.3",
        }
        result = await adapter.evaluate(batch, candidate)
        ```

        Exclude an agent from evolution:

        ```python
        adapter = MultiAgentAdapter(
            agents={"generator": gen, "validator": val},
            primary="generator",
            scorer=scorer,
            components={
                "generator": ["instruction"],
                "validator": [],  # Empty list = no evolution
            },
        )
        ```

    Note:
        Adheres to AsyncGEPAAdapter[dict[str, Any], MultiAgentTrajectory, str] protocol.
        All methods are async and follow ADK's async-first patterns.

        **Breaking Change (0.3.x)**:
        - `agents` parameter changed from `list[LlmAgent]` to `dict[str, LlmAgent]`
        - `components` parameter is now required
        - Candidate keys use qualified names (agent.component) instead of
          {agent_name}_instruction format

        **Session Sharing Behavior**:
        - When `share_session=True` (default): Uses SequentialAgent to execute
          agents sequentially with shared InvocationContext. Earlier agents can
          write to session state (via `output_key`), later agents can read that
          state via template strings like `{output_key}` in their instructions.
        - When `share_session=False`: Each agent executes with an isolated session.
          Agents cannot access each other's outputs. This is useful when agents
          should not interfere with each other's state (EdgeCase-5: incompatible
          outputs behavior).

        **output_key State Propagation**:
        - When an agent has `output_key` set, its final response is automatically
          saved to `session.state[output_key]`.
        - With `share_session=True`, subsequent agents can reference this via
          template strings: `instruction="Process {output_key}"`.
        - With `share_session=False`, state is not shared and template references
          will not resolve (agents see empty or undefined state).
    """

    def __init__(
        self,
        agents: dict[str, LlmAgent],
        primary: str,
        components: ComponentsMapping,
        scorer: Scorer | None = None,
        share_session: bool = True,
        session_service: BaseSessionService | None = None,
        app_name: str = "multi_agent_eval",
        trajectory_config: TrajectoryConfig | None = None,
        proposer: AsyncReflectiveMutationProposer | None = None,
        executor: AgentExecutorProtocol | None = None,
        workflow: AnyAgentType | None = None,
    ) -> None:
        """Initialize the MultiAgent adapter with named agents and component config.

        Args:
            agents: Named ADK agents to evolve together. Must have at least
                one agent. Keys are agent names, values are LlmAgent instances.
            primary: Name of the agent whose output is used for scoring.
                Must match one of the agent names in the dict.
            components: Per-agent component configuration mapping agent names
                to lists of component names to evolve. All agents must have
                an entry (use empty list to exclude from evolution). Component
                names must have registered handlers.
            scorer: Optional scorer implementation. If None, the primary agent
                must have an output_schema for schema-based scoring.
            share_session: Whether agents share session state during execution.
                When True (default), uses SequentialAgent. When False, agents
                execute with isolated sessions.
            session_service: Optional session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for session identification.
            trajectory_config: Configuration for trajectory extraction behavior.
                If None, uses TrajectoryConfig defaults.
            proposer: Mutation proposer for generating improved instructions
                via LLM reflection. Required. Create using `create_adk_reflection_fn()`
                with an ADK LlmAgent for reflection.
            executor: Optional unified executor for consistent agent execution.
                If None, uses legacy execution path with direct Runner calls.
                When provided, all agent executions use the executor's execute_agent
                method for consistent session management and feature parity (FR-001).
            workflow: Optional original workflow structure to preserve during
                cloning. When provided, _build_pipeline() uses
                clone_workflow_with_overrides() to preserve workflow type
                (LoopAgent iterations, ParallelAgent concurrency). When None,
                creates a flat SequentialAgent (legacy behavior).

        Raises:
            MultiAgentValidationError: If agents dict is empty, primary agent
                not found, or no scorer and primary lacks output_schema.
            ValueError: If proposer is not provided, or if components mapping
                contains unknown agents, unknown component handlers, or is
                missing entries for agents in the agents dict.

        Examples:
            With per-agent components (API v0.3.x):

            ```python
            from gepa_adk.engine import (
                create_adk_reflection_fn,
                AsyncReflectiveMutationProposer,
            )

            reflection_fn = create_adk_reflection_fn(reflection_agent, executor)
            proposer = AsyncReflectiveMutationProposer(adk_reflection_fn=reflection_fn)
            adapter = MultiAgentAdapter(
                agents={"generator": gen, "critic": critic},
                primary="generator",
                components={
                    "generator": ["instruction", "output_schema"],
                    "critic": ["instruction"],
                },
                scorer=scorer,
                proposer=proposer,
            )
            ```

            Excluding an agent from evolution:

            ```python
            adapter = MultiAgentAdapter(
                agents={"generator": gen, "validator": val},
                primary="generator",
                components={
                    "generator": ["instruction"],
                    "validator": [],  # Excluded from evolution
                },
                scorer=scorer,
                proposer=proposer,
            )
            ```

        Note:
            Clones agents during evaluation to apply candidate instructions.
            Original agents are never mutated.
        """
        # Validation
        if not agents:
            raise MultiAgentValidationError(
                "agents dict cannot be empty",
                field="agents",
                value={},
                constraint="len >= 1",
            )

        agent_names = list(agents.keys())

        # Check primary agent exists
        if primary not in agent_names:
            raise MultiAgentValidationError(
                f"primary agent '{primary}' not found in agents dict",
                field="primary",
                value=primary,
                constraint=f"must be one of {agent_names}",
            )

        # Check scorer or output_schema
        primary_agent = agents[primary]
        if scorer is None and primary_agent.output_schema is None:
            raise MultiAgentValidationError(
                "no scorer and primary agent lacks output_schema",
                field="scorer",
                value=None,
                constraint="scorer must be provided or primary agent must have output_schema",
            )

        self.agents = agents
        self.components = components
        self.primary = primary
        self.scorer = scorer
        self.share_session = share_session
        self.session_service = session_service or InMemorySessionService()
        self.app_name = app_name
        self.trajectory_config = trajectory_config or TrajectoryConfig()
        self._workflow = workflow  # Store original workflow for structure preservation
        if proposer is None:
            raise ValueError(
                "proposer is required. Create one using create_adk_reflection_fn() "
                "with an ADK LlmAgent for reflection operations."
            )
        self._proposer = proposer
        self._executor = executor

        # Validate components mapping (fail-fast)
        self._validate_components()

        # Bind logger with adapter context (FR-008)
        self._logger = logger.bind(
            adapter="MultiAgentAdapter",
            primary_agent=self.primary,
            agent_count=len(self.agents),
            app_name=self.app_name,
            uses_executor=executor is not None,
        )

        # Initialize trial builder for reflective dataset construction
        self._trial_builder = TrialBuilder()

        self._logger.info("adapter.initialized")

    def _validate_components(self) -> None:
        """Validate components mapping at initialization time.

        Performs fail-fast validation to ensure:
        1. All agent names in components exist in agents dict
        2. All agents in agents dict have entries in components
        3. All component names have registered handlers

        Raises:
            ValueError: If validation fails with descriptive error message.

        Note:
            Called during __init__ to catch configuration errors early.
        """
        agent_names = set(self.agents.keys())

        # Check all agent names in components exist in agents dict
        for agent_name in self.components:
            if agent_name not in agent_names:
                raise ValueError(
                    f"Agent '{agent_name}' not found in agents dict. "
                    f"Available: {sorted(agent_names)}"
                )

        # Check all agents in agents dict have entries in components
        missing_agents = agent_names - set(self.components.keys())
        if missing_agents:
            raise ValueError(
                f"Agents {sorted(missing_agents)} missing from components mapping. "
                "All agents must have an entry in components (use empty list to exclude)."
            )

        # Check all component names have handlers
        for agent_name, comp_list in self.components.items():
            for comp_name in comp_list:
                if not component_handlers.has(comp_name):
                    available = component_handlers.names()
                    raise ValueError(
                        f"No handler registered for component '{comp_name}'. "
                        f"Available: {available}"
                    )

        logger.debug(
            "components.validated",
            agents=list(self.components.keys()),
            components_per_agent={k: len(v) for k, v in self.components.items()},
        )

    def _apply_candidate(self, candidate: dict[str, str]) -> dict[str, Any]:
        """Apply candidate component values to agents, tracking originals.

        Routes each candidate component to the correct agent based on
        qualified component names (agent.component format per ADR-012).

        Args:
            candidate: Mapping of qualified component names to new values.
                Keys must be in format 'agent.component'.

        Returns:
            Dictionary mapping qualified names to original values for restoration.

        Raises:
            ValueError: If qualified name format is invalid.
            KeyError: If agent not found or handler not registered.

        Examples:
            Apply candidate and get originals:

            ```python
            candidate = {
                "generator.instruction": "evolved text",
                "critic.generate_content_config": "temperature: 0.3",
            }
            originals = adapter._apply_candidate(candidate)
            # originals["generator.instruction"] contains original instruction
            ```

        Note:
            Returns originals dict for use with _restore_agents(). Does not
            modify self.agents - modifications are applied in-place to agent
            objects which are later cloned for pipeline execution.
        """
        originals: dict[str, Any] = {}

        for qualified_name, value in candidate.items():
            # Skip non-component keys (e.g., evolution_id)
            if "." not in qualified_name:
                continue

            spec = ComponentSpec.parse(qualified_name)

            if spec.agent not in self.agents:
                raise KeyError(
                    f"Agent '{spec.agent}' not found. "
                    f"Available: {list(self.agents.keys())}"
                )

            agent = self.agents[spec.agent]
            handler = get_handler(spec.component)
            originals[qualified_name] = handler.apply(agent, value)

            self._logger.debug(
                "component.applied",
                qualified_name=qualified_name,
                agent=spec.agent,
                component=spec.component,
            )

        return originals

    def _restore_agents(self, originals: dict[str, Any]) -> None:
        """Restore all agents to original state after evaluation.

        Uses best-effort restoration: all components are attempted even if
        some fail. Errors are aggregated and raised as RestoreError after
        all restoration attempts complete.

        Args:
            originals: Mapping of qualified names to original values,
                as returned by _apply_candidate().

        Raises:
            RestoreError: If one or more components fail to restore, containing
                list of (qualified_name, exception) pairs.

        Note:
            Always attempts to restore all components even if some fail,
            to minimize state corruption. Uses try/except for each component.
        """
        errors: list[tuple[str, Exception]] = []

        for qualified_name, original in originals.items():
            try:
                spec = ComponentSpec.parse(qualified_name)
                agent = self.agents[spec.agent]
                handler = get_handler(spec.component)
                handler.restore(agent, original)

                self._logger.debug(
                    "component.restored",
                    qualified_name=qualified_name,
                )
            except Exception as e:
                errors.append((qualified_name, e))
                self._logger.warning(
                    "component.restore_failed",
                    qualified_name=qualified_name,
                    error=str(e),
                )

        if errors:
            raise RestoreError(
                f"Failed to restore {len(errors)} components",
                errors=errors,
            )

    def _build_pipeline(
        self,
        candidate: dict[str, str],
    ) -> AnyAgentType:
        """Build pipeline with instruction overrides, preserving workflow structure.

        When a workflow was provided at initialization, clones the workflow
        structure recursively using clone_workflow_with_overrides(), preserving
        LoopAgent iterations and ParallelAgent concurrency. Otherwise, creates
        a flat SequentialAgent (legacy behavior).

        Non-instruction components (output_schema, generate_content_config)
        must be applied to original agents via _apply_candidate() before
        calling this method.

        Args:
            candidate: Qualified component name to text mapping. Keys should
                follow the pattern `{agent_name}.{component_name}` per ADR-012.

        Returns:
            Cloned workflow with same structure as original, or SequentialAgent
            if no workflow was provided. LlmAgents have instruction overrides
            applied; container agents preserve properties like max_iterations.

        Examples:
            Building pipeline with instruction overrides:

            ```python
            candidate = {
                "generator.instruction": "Generate code...",
                "critic.instruction": "Review code...",
            }
            pipeline = adapter._build_pipeline(candidate)
            ```

            With workflow preservation (LoopAgent):

            ```python
            # If adapter was initialized with workflow=LoopAgent(max_iterations=3, ...)
            pipeline = adapter._build_pipeline(candidate)
            # pipeline is LoopAgent with max_iterations=3 preserved
            ```

        Note:
            Only instruction components are applied via cloning. Other components
            are inherited from the (pre-modified) original agents.
            See evaluate() for the full execution flow.
        """
        # Use structure-preserving cloning when workflow is provided
        if self._workflow is not None:
            self._logger.debug(
                "Using structure-preserving clone",
                workflow_type=type(self._workflow).__name__,
            )
            return clone_workflow_with_overrides(self._workflow, candidate)

        # Legacy behavior: flatten to SequentialAgent
        cloned_agents = []

        # Build set of updates per agent from qualified names
        agent_updates: dict[str, dict[str, Any]] = {name: {} for name in self.agents}

        for qualified_name, value in candidate.items():
            # Skip non-component keys (e.g., evolution_id)
            if "." not in qualified_name:
                continue

            spec = ComponentSpec.parse(qualified_name)
            if spec.agent in agent_updates:
                # Only instruction is cloned via model_copy; other components
                # (output_schema, generate_content_config) are applied to the
                # original agents via _apply_candidate() BEFORE this method.
                #
                # Full execution flow in evaluate():
                # 1. _apply_candidate() - applies ALL components to originals
                # 2. _build_pipeline() - clones agents with instruction override
                # 3. Run pipeline (clones inherit non-instruction from originals)
                # 4. _restore_agents() - restores original state
                if spec.component == "instruction":
                    agent_updates[spec.agent]["instruction"] = value

        for agent_name, agent in self.agents.items():
            updates = agent_updates.get(agent_name, {})
            # Always clear parent_agent to allow re-parenting
            updates["parent_agent"] = None

            cloned = agent.model_copy(update=updates)
            cloned_agents.append(cloned)

        pipeline = SequentialAgent(
            name="MultiAgentPipeline",
            sub_agents=cloned_agents,
        )

        return pipeline

    def _extract_primary_output(
        self,
        pipeline_output: str,
        session_state: dict[str, Any],
        primary_agent: LlmAgent,
    ) -> str:
        """Extract primary agent's output from pipeline execution.

        If the primary agent has an output_key, retrieves the output from
        session state using the shared extract_output_from_state utility.
        Otherwise, uses the pipeline's final output.

        Args:
            pipeline_output: Final output from SequentialAgent execution.
            session_state: Session state dictionary from execution.
            primary_agent: The primary agent instance.

        Returns:
            Primary agent's output text for scoring.

        Examples:
            Extracting output with output_key:

            ```python
            primary_agent = LlmAgent(name="generator", output_key="generated_code")
            output = adapter._extract_primary_output(
                pipeline_output="...",
                session_state={"generated_code": "def foo(): ..."},
                primary_agent=primary_agent,
            )
            # output == "def foo(): ..."
            ```

        Note:
            Outputs are saved to session state via the agent's output_key property.
            When share_session=True, later agents can access earlier outputs via
            template strings like {output_key} in their instructions. When
            share_session=False, agents have isolated sessions and cannot
            access each other's outputs (EdgeCase-5: incompatible outputs behavior).

            Uses the shared extract_output_from_state utility for consistent
            state-based output extraction across the codebase.
        """
        # Try state-based extraction using shared utility
        output_key = getattr(primary_agent, "output_key", None)
        state_output = extract_output_from_state(session_state, output_key)
        if state_output is not None:
            return state_output

        # Fallback to pipeline output
        return pipeline_output

    def _create_session_id(self) -> str:
        """Create a unique session ID for evaluation isolation.

        Returns:
            Unique session identifier string.

        Note:
            Outputs unique session identifiers using timestamp and random components.
        """
        import time
        import uuid

        return f"eval_{int(time.time() * 1000)}_{uuid.uuid4().hex[:8]}"

    def _cleanup_session(self, session_id: str) -> None:
        """Clean up session after evaluation.

        Args:
            session_id: Session identifier to clean up.

        Note:
            Optional cleanup method. Currently a no-op. Future implementations
            may delete sessions from persistent storage.
        """
        # No-op for now - sessions are in-memory and cleaned up automatically
        pass

    async def evaluate(
        self,
        batch: list[dict[str, Any]],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[MultiAgentTrajectory, str]:
        """Evaluate multi-agent pipeline with candidate component values over a batch.

        Args:
            batch: List of input examples, each with "input" key and optional
                "expected" key for scoring.
            candidate: Qualified component name to text mapping. Keys should
                follow the pattern `{agent_name}.{component_name}` per ADR-012.
            capture_traces: Whether to capture execution traces (tool calls,
                state deltas, token usage).

        Returns:
            EvaluationBatch containing outputs, scores, and optional trajectories.

        Examples:
            Basic evaluation without traces:

            ```python
            batch = [
                {"input": "Generate code...", "expected": "def foo(): ..."},
            ]
            candidate = {
                "generator.instruction": "Generate high-quality code",
                "critic.instruction": "Review thoroughly",
            }
            result = await adapter.evaluate(batch, candidate)
            assert len(result.outputs) == 1
            assert len(result.scores) == 1
            ```

            With trace capture:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            assert result.trajectories is not None
            assert len(result.trajectories) == len(batch)
            ```

        Note:
            Orchestrates evaluation by applying candidate components to agents,
            building a SequentialAgent pipeline with cloned agents, then restoring
            original agent state. Primary agent's output is scored.

            Uses try/finally to ensure agents are restored even on evaluation errors,
            preventing state corruption between candidate evaluations.
        """
        self._logger.info(
            "adapter.evaluate.start",
            batch_size=len(batch),
            capture_traces=capture_traces,
            evolution_id=candidate.get("evolution_id", "unknown"),
        )

        # Handle empty batch case
        if not batch:
            self._logger.info("adapter.evaluate.complete", batch_size=0)
            return EvaluationBatch(outputs=[], scores=[], trajectories=None)

        # Apply candidate components to agents, track originals for restoration
        # This applies all component types (instruction, output_schema,
        # generate_content_config) via their handlers per FR-003
        originals = self._apply_candidate(candidate)

        try:
            # Build pipeline with candidate instructions (if sharing session)
            # For isolated sessions, we'll clone agents with candidate instructions
            # Note: _build_pipeline clones agents; non-instruction components are
            # inherited from the (now-modified) original agents
            if self.share_session:
                pipeline = self._build_pipeline(candidate)
            else:
                pipeline = None

            primary_agent = self.agents[self.primary]

            # Create semaphore for concurrency control
            semaphore = asyncio.Semaphore(5)  # Default max concurrent evals

            # Create tasks for all examples
            tasks = [
                self._eval_single_with_semaphore(
                    example=example,
                    example_index=i,
                    pipeline=pipeline,
                    primary_agent=primary_agent,
                    candidate=candidate,
                    capture_traces=capture_traces,
                    semaphore=semaphore,
                )
                for i, example in enumerate(batch)
            ]

            # Execute all tasks in parallel with exception handling
            results = await asyncio.gather(*tasks, return_exceptions=True)

            # Process results and maintain order
            outputs: list[str] = []
            scores: list[float] = []
            trajectories: list[MultiAgentTrajectory] | None = (
                [] if capture_traces else None
            )
            metadata_list: list[dict[str, Any]] = []
            inputs: list[str] = []

            successful = 0
            failed = 0

            for i, result in enumerate(results):
                if isinstance(result, Exception):
                    # Handle exception case
                    self._logger.warning(
                        "adapter.evaluate.example.error",
                        example_index=i,
                        error=str(result),
                    )
                    outputs.append("")
                    scores.append(0.0)
                    metadata_list.append({})
                    inputs.append(batch[i].get("input", "") if i < len(batch) else "")
                    failed += 1

                    if capture_traces:
                        error_trajectory = MultiAgentTrajectory(
                            agent_trajectories={},
                            pipeline_output="",
                            total_token_usage=None,
                            error=str(result),
                        )
                        trajectories.append(error_trajectory)  # type: ignore
                else:
                    # Unpack success case: (output, score, trajectory, metadata, input_text)
                    output_text, score, trajectory, metadata, input_text = result  # type: ignore[misc]
                    outputs.append(output_text)
                    scores.append(score)
                    metadata_list.append(metadata or {})
                    inputs.append(input_text or "")
                    successful += 1

                    if capture_traces and trajectory is not None:
                        trajectories.append(trajectory)  # type: ignore

            avg_score = sum(scores) / len(scores) if scores else 0.0

            self._logger.info(
                "adapter.evaluate.complete",
                batch_size=len(batch),
                successful=successful,
                failed=failed,
                avg_score=avg_score,
            )

            return EvaluationBatch(
                outputs=outputs,
                scores=scores,
                trajectories=trajectories,
                metadata=metadata_list,
                inputs=inputs,
            )
        finally:
            # Restore all agents to original state per FR-004
            # Uses best-effort restoration: attempts all components even if some fail
            self._restore_agents(originals)

    async def _eval_single_with_semaphore(
        self,
        example: dict[str, Any],
        example_index: int,
        pipeline: AnyAgentType | None,
        primary_agent: LlmAgent,
        candidate: dict[str, str],
        capture_traces: bool,
        semaphore: asyncio.Semaphore,
    ) -> tuple[str, float, MultiAgentTrajectory | None, dict[str, Any] | None, str]:
        """Evaluate a single example with semaphore-controlled concurrency.

        Args:
            example: Input example with "input" key and optional "expected" key.
            example_index: Index of example in batch (for logging).
            pipeline: Workflow pipeline to execute (if share_session=True).
            primary_agent: Primary agent for output extraction.
            candidate: Candidate instructions to apply (for isolated sessions).
            capture_traces: Whether to capture execution traces.
            semaphore: Semaphore to control concurrent execution.

        Returns:
            Tuple of (output_text, score, trajectory_or_none, metadata, input_text).
            On failure, returns ("", 0.0, error_trajectory, None, input_text).

        Note:
            Orchestrates single example evaluation with semaphore-controlled concurrency.
        """
        async with semaphore:
            self._logger.debug(
                "adapter.evaluate.example",
                example_index=example_index,
                example_input=example.get("input", "")[:50],
            )

            try:
                # Run the pipeline for this example
                output_text: str
                session_state: dict[str, Any] = {}

                if capture_traces:
                    result = await self._run_single_example(
                        example, pipeline, candidate, capture_events=True
                    )
                    # When capture_events=True, result is a 3-tuple of
                    # (output_text, events, session_state).
                    output_text, events, session_state = result  # type: ignore[misc]
                    # Build trajectory from collected events
                    trajectory = self._build_trajectory(
                        events=list(events),
                        final_output=output_text,
                        session_state=session_state,
                        error=None,
                    )
                else:
                    # When capture_events=False, result is a 2-tuple of
                    # (output_text, session_state).
                    run_result = await self._run_single_example(
                        example, pipeline, candidate, capture_events=False
                    )
                    output_text, session_state = run_result  # type: ignore[misc]
                    trajectory = None

                # Extract primary agent output
                primary_output = self._extract_primary_output(
                    output_text, session_state, primary_agent
                )

                # Score the output
                input_text = example.get("input", "")
                expected = example.get("expected")
                metadata: dict[str, Any] | None = None
                if self.scorer:
                    score_result = await self.scorer.async_score(
                        input_text, primary_output, expected
                    )
                    # Handle both float and tuple[float, dict] return types
                    # The dict contains metadata like feedback, dimension_scores, etc.
                    if isinstance(score_result, tuple):
                        score = score_result[0]
                        metadata = score_result[1] if len(score_result) > 1 else None
                    else:
                        score = float(score_result)
                else:
                    # Simple schema-based scoring fallback when no external scorer is provided.
                    # If an expected value is given, return 1.0 on exact match of the primary
                    # output and 0.0 otherwise; if no expected is provided, return 0.0.
                    if expected is None:
                        score = 0.0
                    else:
                        score = 1.0 if primary_output == expected else 0.0

                return (primary_output, score, trajectory, metadata, input_text)

            except Exception as e:
                # Wrap in domain exception per ADR-009 (preserve batch resilience)
                wrapped = EvaluationError(
                    f"Example {example_index} evaluation failed",
                    cause=e,
                    example_index=example_index,
                )

                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=example_index,
                    error=str(wrapped),
                )

                # Create error trajectory if capturing traces
                error_trajectory = None
                if capture_traces:
                    error_trajectory = MultiAgentTrajectory(
                        agent_trajectories={},
                        pipeline_output="",
                        total_token_usage=None,
                        error=str(wrapped),
                    )

                return ("", 0.0, error_trajectory, None, example.get("input", ""))

    async def _run_single_example(
        self,
        example: dict[str, Any],
        pipeline: AnyAgentType | None,
        candidate: dict[str, str],
        capture_events: bool = False,
    ) -> tuple[str, list[Any], dict[str, Any]] | tuple[str, dict[str, Any]]:
        """Execute pipeline on a single input example.

        Args:
            example: Input example with "input" key.
            pipeline: Workflow pipeline to execute (if share_session=True).
                      If None, executes agents independently (share_session=False).
            candidate: Candidate instructions to apply (for isolated sessions).
            capture_events: If True, return (output, events, state) tuple.
                           If False, return (output, state) tuple.

        Returns:
            If capture_events=True: (final_output, event_list, session_state) tuple
            If capture_events=False: (final_output, session_state) tuple

        Raises:
            RuntimeError: If pipeline execution fails.

        Note:
            Orchestrates execution based on session sharing mode. When
            share_session=False, executes agents independently with isolated
            sessions. When share_session=True, uses SequentialAgent pipeline.
            Streams events via ADK Runner pattern with async iteration.
        """
        input_text = example.get("input", "")
        if not input_text:
            return ("", [], {}) if capture_events else ("", {})

        if self.share_session:
            # Use SequentialAgent pipeline (shared session)
            if pipeline is None:
                raise RuntimeError("Pipeline required for shared session mode")
            return await self._run_shared_session(
                input_text,
                pipeline,
                capture_events,
            )
        else:
            # Execute agents independently (isolated sessions)
            return await self._run_isolated_sessions(
                input_text, candidate, capture_events
            )

    async def _run_shared_session(
        self,
        input_text: str,
        pipeline: AnyAgentType,
        capture_events: bool,
    ) -> tuple[str, list[Any], dict[str, Any]] | tuple[str, dict[str, Any]]:
        """Execute agents with shared session state via workflow pipeline.

        Args:
            input_text: Input text for the pipeline.
            pipeline: Workflow pipeline to execute.
            capture_events: Whether to capture events.

        Returns:
            Tuple of (output, events, state) or (output, state).

        Note:
            Uses unified AgentExecutor when available (FR-002), otherwise
            falls back to legacy execution via direct Runner calls.
        """
        # Use executor if available (FR-002)
        if self._executor is not None:
            from gepa_adk.ports.agent_executor import ExecutionStatus

            result = await self._executor.execute_agent(
                agent=pipeline,
                input_text=input_text,
            )

            if result.status == ExecutionStatus.FAILED:
                # Log failure and return empty output
                self._logger.error(
                    "pipeline.execution.failed",
                    session_id=result.session_id,
                    error=result.error_message,
                )
                if capture_events:
                    return ("", result.captured_events or [], {})
                return ("", {})

            final_output = result.extracted_value or ""
            session_state: dict[str, Any] = {}

            # Retrieve session state from session service
            try:
                session = await self.session_service.get_session(
                    app_name=self.app_name,
                    user_id="eval_user",
                    session_id=result.session_id,
                )
                if session and hasattr(session, "state"):
                    session_state = dict(session.state)
            except (KeyError, AttributeError, TypeError) as exc:
                # Session state retrieval failed due to expected lookup/attribute issues.
                self._logger.debug(
                    "session_state.retrieval_failed",
                    session_id=result.session_id,
                    error=str(exc),
                )

            if capture_events:
                return (final_output, result.captured_events or [], session_state)
            return (final_output, session_state)

        # Legacy execution path (no executor)
        from google.genai import types

        runner = Runner(
            agent=pipeline,
            app_name=self.app_name,
            session_service=self.session_service,
        )

        content = types.Content(
            role="user",
            parts=[types.Part(text=input_text)],
        )

        # Create session in session service first (required by ADK Runner)
        session = await self.session_service.create_session(
            app_name=self.app_name,
            user_id="eval_user",
        )
        session_id = session.id

        events: list[Any] = []
        session_state_legacy: dict[str, Any] = {}

        try:
            async for event in runner.run_async(
                user_id="eval_user",
                session_id=session_id,
                new_message=content,
            ):
                events.append(event)

                if hasattr(event, "session") and event.session:
                    if hasattr(event.session, "state"):
                        session_state_legacy = dict(event.session.state)  # type: ignore
        finally:
            self._cleanup_session(session_id)

        # Extract final output using shared utility (filters thought parts)
        final_output = extract_final_output(events)

        if capture_events:
            return (final_output, events, session_state_legacy)
        return (final_output, session_state_legacy)

    async def _run_isolated_sessions(
        self,
        input_text: str,
        candidate: dict[str, str],
        capture_events: bool,
    ) -> tuple[str, list[Any], dict[str, Any]] | tuple[str, dict[str, Any]]:
        """Execute agents independently with isolated sessions.

        Args:
            input_text: Input text for the first agent.
            candidate: Candidate component values using qualified names.
            capture_events: Whether to capture events.

        Returns:
            Tuple of (output, events, state) or (output, state).
            Returns the primary agent's output.

        Note:
            Orchestrates independent execution of each agent with its own session.
            State is not shared between agents. The primary agent's output is returned.
            Agents are cloned with candidate instructions to avoid mutation.
        """
        from google.genai import types

        # Clone primary agent with candidate instruction using qualified name
        primary_agent = self.agents[self.primary]
        instruction_key = f"{self.primary}.instruction"
        if instruction_key in candidate:
            primary_agent = primary_agent.model_copy(
                update={"instruction": candidate[instruction_key]}
            )

        all_events: list[Any] = []
        session_state: dict[str, Any] = {}

        # Use executor if available (FR-002)
        if self._executor is not None:
            from gepa_adk.ports.agent_executor import ExecutionStatus

            result = await self._executor.execute_agent(
                agent=primary_agent,
                input_text=input_text,
            )

            if result.status == ExecutionStatus.FAILED:
                self._logger.warning(
                    "isolated_session_failed",
                    agent=primary_agent.name,
                    session_id=result.session_id,
                    error=result.error_message,
                )
                # Return empty output on failure
                if capture_events:
                    return ("", result.captured_events or [], {})
                return ("", {})

            final_output = result.extracted_value or ""

            # Retrieve session state from session service if session was created
            if result.session_id:
                try:
                    session = await self.session_service.get_session(
                        app_name=self.app_name,
                        user_id="eval_user",
                        session_id=result.session_id,
                    )
                    if session and hasattr(session, "state"):
                        session_state = dict(session.state)
                except KeyError:
                    # Session might not exist or might have been cleaned up.
                    self._logger.debug(
                        "session_state_unavailable",
                        app_name=self.app_name,
                        user_id="eval_user",
                        session_id=result.session_id,
                    )
                except ValueError:
                    # Session identifier or parameters may be invalid; ignore for evaluation.
                    self._logger.debug(
                        "session_state_invalid",
                        app_name=self.app_name,
                        user_id="eval_user",
                        session_id=result.session_id,
                    )

            if capture_events:
                return (final_output, result.captured_events or [], session_state)
            return (final_output, session_state)

        # Legacy execution path (no executor)
        # Execute primary agent with isolated session
        runner = Runner(
            agent=primary_agent,
            app_name=self.app_name,
            session_service=self.session_service,
        )

        content = types.Content(
            role="user",
            parts=[types.Part(text=input_text)],
        )

        # Create session in session service first (required by ADK Runner)
        session = await self.session_service.create_session(
            app_name=self.app_name,
            user_id="eval_user",
        )
        session_id = session.id

        try:
            async for event in runner.run_async(
                user_id="eval_user",
                session_id=session_id,
                new_message=content,
            ):
                all_events.append(event)

                if hasattr(event, "session") and event.session:
                    if hasattr(event.session, "state"):
                        session_state = dict(event.session.state)  # type: ignore
        finally:
            self._cleanup_session(session_id)

        # Extract final output using shared utility (filters thought parts)
        final_output = extract_final_output(all_events)

        if capture_events:
            return (final_output, all_events, session_state)
        return (final_output, session_state)

    def _aggregate_token_usage(
        self,
        agent_trajectories: dict[str, ADKTrajectory],
    ) -> TokenUsage | None:
        """Sum token usage across all agent trajectories.

        Args:
            agent_trajectories: Mapping of agent names to their trajectories.

        Returns:
            TokenUsage with summed totals if any agent has usage data,
            None if no usage data available.

        Note:
            Sums input, output, and total tokens across all agents to provide
            pipeline-level token consumption metrics.
        """
        total_input = 0
        total_output = 0
        total = 0
        has_usage = False

        for trajectory in agent_trajectories.values():
            if trajectory.token_usage:
                has_usage = True
                total_input += trajectory.token_usage.input_tokens
                total_output += trajectory.token_usage.output_tokens
                total += trajectory.token_usage.total_tokens

        if has_usage:
            return TokenUsage(
                input_tokens=total_input,
                output_tokens=total_output,
                total_tokens=total,
            )
        return None

    def _build_trajectory(
        self,
        events: list[Any],
        final_output: str,
        session_state: dict[str, Any],
        error: str | None = None,
    ) -> MultiAgentTrajectory:
        """Assemble complete multi-agent trajectory from event stream.

        Partitions events by originating agent (using event.author) and builds
        individual ADKTrajectory objects for each agent. This enables fine-grained
        observability into which sub-agent contributed which tool calls, state
        changes, and token usage.

        Args:
            events: List of ADK Event objects collected during execution.
            final_output: The final text response from the pipeline.
            session_state: Session state dictionary from execution.
            error: Error message if execution failed, None otherwise.

        Returns:
            MultiAgentTrajectory with per-agent trajectories, pipeline output,
            aggregated token usage, and error (if any).

        Note:
            Organizes trajectory data by extracting individual agent trajectories
            from events and aggregating token usage across all agents.
        """
        # Partition events by originating agent
        partitions = partition_events_by_agent(events)

        # Build per-agent trajectories
        if not partitions:
            # If no agent-authored events were found, fall back to a minimal
            # trajectory for the primary agent using the full event list.
            agent_trajectories: dict[str, ADKTrajectory] = {
                self.primary: extract_trajectory(
                    events=events,
                    final_output=final_output,
                    error=error,
                    config=self.trajectory_config,
                )
            }
        else:
            agent_trajectories = {}
            for agent_name, agent_events in partitions.items():
                # Only primary agent gets final_output and error attribution
                agent_output = final_output if agent_name == self.primary else ""
                agent_error = error if agent_name == self.primary else None

                agent_trajectories[agent_name] = extract_trajectory(
                    events=agent_events,
                    final_output=agent_output,
                    error=agent_error,
                    config=self.trajectory_config,
                )

        # Aggregate token usage across all agents
        total_token_usage = self._aggregate_token_usage(agent_trajectories)

        return MultiAgentTrajectory(
            agent_trajectories=agent_trajectories,
            pipeline_output=final_output,
            total_token_usage=total_token_usage,
            error=error,
        )

    async def make_reflective_dataset(
        self,
        candidate: dict[str, str],
        eval_batch: EvaluationBatch[MultiAgentTrajectory, str],
        components_to_update: list[str],
    ) -> Mapping[str, Sequence[Mapping[str, Any]]]:
        """Build reflective datasets from evaluation results with traces.

        Args:
            candidate: Current candidate component values.
            eval_batch: Evaluation results including trajectories.
            components_to_update: List of component names to generate
                datasets for.

        Returns:
            Mapping from component name to sequence of reflection examples.
            Each example contains input, output, score, and trace context.

        Examples:
            Generate reflection dataset:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            dataset = await adapter.make_reflective_dataset(
                candidate, result, ["generator_instruction", "critic_instruction"]
            )
            assert "generator_instruction" in dataset
            ```

        Note:
            Operates on eval_batch trajectories (capture_traces=True required).
            Dataset format is compatible with MutationProposer interface.
        """
        self._logger.info(
            "adapter.make_reflective_dataset.start",
            num_examples=len(eval_batch.outputs),
            components=components_to_update,
        )

        # Build reflective dataset for each requested component
        result: dict[str, list[dict[str, Any]]] = {}

        for component in components_to_update:
            examples: list[dict[str, Any]] = []

            for i, (output, score) in enumerate(
                zip(eval_batch.outputs, eval_batch.scores, strict=True)
            ):
                # Get trajectory info if available
                trajectory = None
                if eval_batch.trajectories and i < len(eval_batch.trajectories):
                    trajectory = eval_batch.trajectories[i]

                # Get metadata (critic feedback) if available
                metadata = None
                if eval_batch.metadata and i < len(eval_batch.metadata):
                    metadata = eval_batch.metadata[i]

                # Get input text if available
                input_text = None
                if eval_batch.inputs and i < len(eval_batch.inputs):
                    input_text = eval_batch.inputs[i]

                example = self._build_reflection_example(
                    input_text=input_text,
                    output=output,
                    score=score,
                    trajectory=trajectory,
                    metadata=metadata,
                    component_name=component,
                    component_value=candidate.get(component, ""),
                )
                examples.append(example)

            result[component] = examples

        self._logger.info(
            "adapter.make_reflective_dataset.complete",
            num_components=len(result),
            total_examples=sum(len(exs) for exs in result.values()),
        )

        return result

    def _build_reflection_example(
        self,
        input_text: str | None,
        output: str,
        score: float,
        trajectory: MultiAgentTrajectory | None,
        metadata: dict[str, Any] | None,
        component_name: str,
        component_value: str,
    ) -> dict[str, Any]:
        """Build a single trial record for reflection.

        Delegates to shared TrialBuilder for consistent trial structure.

        Terminology:
            - trial: One performance record {feedback, trajectory}
            - feedback: Critic evaluation {score, feedback_text, ...}
            - trajectory: The journey from input to output with optional trace

        Args:
            input_text: The input that was given to the pipeline.
            output: Pipeline output text.
            score: Evaluation score for this output.
            trajectory: Optional trajectory with execution trace.
            metadata: Optional scorer metadata dict (e.g., from CriticScorer)
                containing 'feedback', 'dimension_scores', 'actionable_guidance'.
            component_name: Name of the component being evaluated.
            component_value: Current value of the component.

        Returns:
            Trial dict with keys: feedback, trajectory.
            - feedback: score (mandatory), feedback_text (mandatory if available)
            - trajectory: input, output (mandatory), component context

        Note:
            Aligns with whitepaper requirements: score + feedback_text are the
            minimum required fields. Extra metadata (dimensions, guidance) is
            passed through when available.
        """
        # Extract error from trajectory if present
        error = trajectory.error if trajectory else None

        # Build extra trajectory fields specific to multi-agent pipelines
        extra_trajectory: dict[str, Any] = {
            "component": component_name,
            "component_value": component_value,
        }

        # Add token usage from trajectory if available
        if trajectory and trajectory.total_token_usage:
            extra_trajectory["tokens"] = trajectory.total_token_usage.total_tokens

        return self._trial_builder.build_trial(
            input_text=input_text,
            output=output,
            score=score,
            metadata=metadata,
            error=error,
            extra_trajectory=extra_trajectory,
        )

    async def propose_new_texts(
        self,
        candidate: dict[str, str],
        reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
        components_to_update: list[str],
    ) -> dict[str, str]:
        """Propose new component texts based on reflective dataset.

        Delegates to AsyncReflectiveMutationProposer to generate improved
        instruction text via LLM reflection. When the proposer returns None
        (empty dataset), falls back to unchanged candidate values.

        Args:
            candidate: Current candidate component values.
            reflective_dataset: Dataset from make_reflective_dataset(), keyed by
                component name with sequences of reflection examples in the format
                produced by build_reflection_example().
            components_to_update: Components to generate proposals for.

        Returns:
            Dictionary mapping component names to proposed new text values.
            When proposer returns None, returns unchanged candidate values.

        Examples:
            Propose new component texts via LLM reflection:

            ```python
            # After evaluation with traces
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            dataset = await adapter.make_reflective_dataset(
                candidate, result, ["generator_instruction", "critic_instruction"]
            )

            # Propose new texts via LLM reflection
            proposals = await adapter.propose_new_texts(
                candidate,
                dataset,
                ["generator_instruction", "critic_instruction"],
            )
            # proposals contains improved instructions based on feedback
            ```

        Note:
            Delegates to AsyncReflectiveMutationProposer for actual mutation
            generation. Falls back gracefully when dataset is empty.
        """
        self._logger.debug(
            "propose_new_texts.delegating",
            components_requested=components_to_update,
        )

        result = await self._proposer.propose(
            candidate, reflective_dataset, components_to_update
        )

        if result is None:
            self._logger.info(
                "propose_new_texts.fallback",
                reason="proposer_returned_none",
                components_requested=components_to_update,
            )
            return {
                component: candidate.get(component, "")
                for component in components_to_update
            }

        # Merge with candidate for any missing components
        merged = {
            component: result.get(component, candidate.get(component, ""))
            for component in components_to_update
        }

        # Log proposed texts for each component
        for component, proposed_text in result.items():
            self._logger.info(
                "proposal.text",
                component=component,
                proposed_length=len(proposed_text),
                proposed_preview=proposed_text[:300] + "..."
                if len(proposed_text) > 300
                else proposed_text,
            )

        self._logger.info(
            "propose_new_texts.complete",
            components_proposed=list(result.keys()),
            components_requested=components_to_update,
        )

        return merged

init ¶

__init__(
    agents: dict[str, LlmAgent],
    primary: str,
    components: ComponentsMapping,
    scorer: Scorer | None = None,
    share_session: bool = True,
    session_service: BaseSessionService | None = None,
    app_name: str = "multi_agent_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    executor: AgentExecutorProtocol | None = None,
    workflow: AnyAgentType | None = None,
) -> None

Initialize the MultiAgent adapter with named agents and component config.

PARAMETER	DESCRIPTION
`agents`	Named ADK agents to evolve together. Must have at least one agent. Keys are agent names, values are LlmAgent instances. TYPE: `dict[str, LlmAgent]`
`primary`	Name of the agent whose output is used for scoring. Must match one of the agent names in the dict. TYPE: `str`
`components`	Per-agent component configuration mapping agent names to lists of component names to evolve. All agents must have an entry (use empty list to exclude from evolution). Component names must have registered handlers. TYPE: `ComponentsMapping`
`scorer`	Optional scorer implementation. If None, the primary agent must have an output_schema for schema-based scoring. TYPE: `Scorer \| None` DEFAULT: `None`
`share_session`	Whether agents share session state during execution. When True (default), uses SequentialAgent. When False, agents execute with isolated sessions. TYPE: `bool` DEFAULT: `True`
`session_service`	Optional session service for state management. If None, creates an InMemorySessionService. TYPE: `BaseSessionService \| None` DEFAULT: `None`
`app_name`	Application name for session identification. TYPE: `str` DEFAULT: `'multi_agent_eval'`
`trajectory_config`	Configuration for trajectory extraction behavior. If None, uses TrajectoryConfig defaults. TYPE: `TrajectoryConfig \| None` DEFAULT: `None`
`proposer`	Mutation proposer for generating improved instructions via LLM reflection. Required. Create using `create_adk_reflection_fn()` with an ADK LlmAgent for reflection. TYPE: `AsyncReflectiveMutationProposer \| None` DEFAULT: `None`
`executor`	Optional unified executor for consistent agent execution. If None, uses legacy execution path with direct Runner calls. When provided, all agent executions use the executor's execute_agent method for consistent session management and feature parity (FR-001). TYPE: `AgentExecutorProtocol \| None` DEFAULT: `None`
`workflow`	Optional original workflow structure to preserve during cloning. When provided, _build_pipeline() uses clone_workflow_with_overrides() to preserve workflow type (LoopAgent iterations, ParallelAgent concurrency). When None, creates a flat SequentialAgent (legacy behavior). TYPE: `AnyAgentType \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`MultiAgentValidationError`	If agents dict is empty, primary agent not found, or no scorer and primary lacks output_schema.
`ValueError`	If proposer is not provided, or if components mapping contains unknown agents, unknown component handlers, or is missing entries for agents in the agents dict.

Examples:

With per-agent components (API v0.3.x):

from gepa_adk.engine import (
    create_adk_reflection_fn,
    AsyncReflectiveMutationProposer,
)

reflection_fn = create_adk_reflection_fn(reflection_agent, executor)
proposer = AsyncReflectiveMutationProposer(adk_reflection_fn=reflection_fn)
adapter = MultiAgentAdapter(
    agents={"generator": gen, "critic": critic},
    primary="generator",
    components={
        "generator": ["instruction", "output_schema"],
        "critic": ["instruction"],
    },
    scorer=scorer,
    proposer=proposer,
)

Excluding an agent from evolution:

adapter = MultiAgentAdapter(
    agents={"generator": gen, "validator": val},
    primary="generator",
    components={
        "generator": ["instruction"],
        "validator": [],  # Excluded from evolution
    },
    scorer=scorer,
    proposer=proposer,
)

Note

Clones agents during evaluation to apply candidate instructions. Original agents are never mutated.

Source code in src/gepa_adk/adapters/multi_agent.py

def __init__(
    self,
    agents: dict[str, LlmAgent],
    primary: str,
    components: ComponentsMapping,
    scorer: Scorer | None = None,
    share_session: bool = True,
    session_service: BaseSessionService | None = None,
    app_name: str = "multi_agent_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    executor: AgentExecutorProtocol | None = None,
    workflow: AnyAgentType | None = None,
) -> None:
    """Initialize the MultiAgent adapter with named agents and component config.

    Args:
        agents: Named ADK agents to evolve together. Must have at least
            one agent. Keys are agent names, values are LlmAgent instances.
        primary: Name of the agent whose output is used for scoring.
            Must match one of the agent names in the dict.
        components: Per-agent component configuration mapping agent names
            to lists of component names to evolve. All agents must have
            an entry (use empty list to exclude from evolution). Component
            names must have registered handlers.
        scorer: Optional scorer implementation. If None, the primary agent
            must have an output_schema for schema-based scoring.
        share_session: Whether agents share session state during execution.
            When True (default), uses SequentialAgent. When False, agents
            execute with isolated sessions.
        session_service: Optional session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for session identification.
        trajectory_config: Configuration for trajectory extraction behavior.
            If None, uses TrajectoryConfig defaults.
        proposer: Mutation proposer for generating improved instructions
            via LLM reflection. Required. Create using `create_adk_reflection_fn()`
            with an ADK LlmAgent for reflection.
        executor: Optional unified executor for consistent agent execution.
            If None, uses legacy execution path with direct Runner calls.
            When provided, all agent executions use the executor's execute_agent
            method for consistent session management and feature parity (FR-001).
        workflow: Optional original workflow structure to preserve during
            cloning. When provided, _build_pipeline() uses
            clone_workflow_with_overrides() to preserve workflow type
            (LoopAgent iterations, ParallelAgent concurrency). When None,
            creates a flat SequentialAgent (legacy behavior).

    Raises:
        MultiAgentValidationError: If agents dict is empty, primary agent
            not found, or no scorer and primary lacks output_schema.
        ValueError: If proposer is not provided, or if components mapping
            contains unknown agents, unknown component handlers, or is
            missing entries for agents in the agents dict.

    Examples:
        With per-agent components (API v0.3.x):

        ```python
        from gepa_adk.engine import (
            create_adk_reflection_fn,
            AsyncReflectiveMutationProposer,
        )

        reflection_fn = create_adk_reflection_fn(reflection_agent, executor)
        proposer = AsyncReflectiveMutationProposer(adk_reflection_fn=reflection_fn)
        adapter = MultiAgentAdapter(
            agents={"generator": gen, "critic": critic},
            primary="generator",
            components={
                "generator": ["instruction", "output_schema"],
                "critic": ["instruction"],
            },
            scorer=scorer,
            proposer=proposer,
        )
        ```

        Excluding an agent from evolution:

        ```python
        adapter = MultiAgentAdapter(
            agents={"generator": gen, "validator": val},
            primary="generator",
            components={
                "generator": ["instruction"],
                "validator": [],  # Excluded from evolution
            },
            scorer=scorer,
            proposer=proposer,
        )
        ```

    Note:
        Clones agents during evaluation to apply candidate instructions.
        Original agents are never mutated.
    """
    # Validation
    if not agents:
        raise MultiAgentValidationError(
            "agents dict cannot be empty",
            field="agents",
            value={},
            constraint="len >= 1",
        )

    agent_names = list(agents.keys())

    # Check primary agent exists
    if primary not in agent_names:
        raise MultiAgentValidationError(
            f"primary agent '{primary}' not found in agents dict",
            field="primary",
            value=primary,
            constraint=f"must be one of {agent_names}",
        )

    # Check scorer or output_schema
    primary_agent = agents[primary]
    if scorer is None and primary_agent.output_schema is None:
        raise MultiAgentValidationError(
            "no scorer and primary agent lacks output_schema",
            field="scorer",
            value=None,
            constraint="scorer must be provided or primary agent must have output_schema",
        )

    self.agents = agents
    self.components = components
    self.primary = primary
    self.scorer = scorer
    self.share_session = share_session
    self.session_service = session_service or InMemorySessionService()
    self.app_name = app_name
    self.trajectory_config = trajectory_config or TrajectoryConfig()
    self._workflow = workflow  # Store original workflow for structure preservation
    if proposer is None:
        raise ValueError(
            "proposer is required. Create one using create_adk_reflection_fn() "
            "with an ADK LlmAgent for reflection operations."
        )
    self._proposer = proposer
    self._executor = executor

    # Validate components mapping (fail-fast)
    self._validate_components()

    # Bind logger with adapter context (FR-008)
    self._logger = logger.bind(
        adapter="MultiAgentAdapter",
        primary_agent=self.primary,
        agent_count=len(self.agents),
        app_name=self.app_name,
        uses_executor=executor is not None,
    )

    # Initialize trial builder for reflective dataset construction
    self._trial_builder = TrialBuilder()

    self._logger.info("adapter.initialized")

evaluate `async` ¶

evaluate(
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[MultiAgentTrajectory, str]

Evaluate multi-agent pipeline with candidate component values over a batch.

PARAMETER	DESCRIPTION
`batch`	List of input examples, each with "input" key and optional "expected" key for scoring. TYPE: `list[dict[str, Any]]`
`candidate`	Qualified component name to text mapping. Keys should follow the pattern `{agent_name}.{component_name}` per ADR-012. TYPE: `dict[str, str]`
`capture_traces`	Whether to capture execution traces (tool calls, state deltas, token usage). TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`EvaluationBatch[MultiAgentTrajectory, str]`	EvaluationBatch containing outputs, scores, and optional trajectories.

Examples:

Basic evaluation without traces:

batch = [
    {"input": "Generate code...", "expected": "def foo(): ..."},
]
candidate = {
    "generator.instruction": "Generate high-quality code",
    "critic.instruction": "Review thoroughly",
}
result = await adapter.evaluate(batch, candidate)
assert len(result.outputs) == 1
assert len(result.scores) == 1

With trace capture:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
assert result.trajectories is not None
assert len(result.trajectories) == len(batch)

Note

Orchestrates evaluation by applying candidate components to agents, building a SequentialAgent pipeline with cloned agents, then restoring original agent state. Primary agent's output is scored.

Uses try/finally to ensure agents are restored even on evaluation errors, preventing state corruption between candidate evaluations.

Source code in src/gepa_adk/adapters/multi_agent.py

async def evaluate(
    self,
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[MultiAgentTrajectory, str]:
    """Evaluate multi-agent pipeline with candidate component values over a batch.

    Args:
        batch: List of input examples, each with "input" key and optional
            "expected" key for scoring.
        candidate: Qualified component name to text mapping. Keys should
            follow the pattern `{agent_name}.{component_name}` per ADR-012.
        capture_traces: Whether to capture execution traces (tool calls,
            state deltas, token usage).

    Returns:
        EvaluationBatch containing outputs, scores, and optional trajectories.

    Examples:
        Basic evaluation without traces:

        ```python
        batch = [
            {"input": "Generate code...", "expected": "def foo(): ..."},
        ]
        candidate = {
            "generator.instruction": "Generate high-quality code",
            "critic.instruction": "Review thoroughly",
        }
        result = await adapter.evaluate(batch, candidate)
        assert len(result.outputs) == 1
        assert len(result.scores) == 1
        ```

        With trace capture:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        assert result.trajectories is not None
        assert len(result.trajectories) == len(batch)
        ```

    Note:
        Orchestrates evaluation by applying candidate components to agents,
        building a SequentialAgent pipeline with cloned agents, then restoring
        original agent state. Primary agent's output is scored.

        Uses try/finally to ensure agents are restored even on evaluation errors,
        preventing state corruption between candidate evaluations.
    """
    self._logger.info(
        "adapter.evaluate.start",
        batch_size=len(batch),
        capture_traces=capture_traces,
        evolution_id=candidate.get("evolution_id", "unknown"),
    )

    # Handle empty batch case
    if not batch:
        self._logger.info("adapter.evaluate.complete", batch_size=0)
        return EvaluationBatch(outputs=[], scores=[], trajectories=None)

    # Apply candidate components to agents, track originals for restoration
    # This applies all component types (instruction, output_schema,
    # generate_content_config) via their handlers per FR-003
    originals = self._apply_candidate(candidate)

    try:
        # Build pipeline with candidate instructions (if sharing session)
        # For isolated sessions, we'll clone agents with candidate instructions
        # Note: _build_pipeline clones agents; non-instruction components are
        # inherited from the (now-modified) original agents
        if self.share_session:
            pipeline = self._build_pipeline(candidate)
        else:
            pipeline = None

        primary_agent = self.agents[self.primary]

        # Create semaphore for concurrency control
        semaphore = asyncio.Semaphore(5)  # Default max concurrent evals

        # Create tasks for all examples
        tasks = [
            self._eval_single_with_semaphore(
                example=example,
                example_index=i,
                pipeline=pipeline,
                primary_agent=primary_agent,
                candidate=candidate,
                capture_traces=capture_traces,
                semaphore=semaphore,
            )
            for i, example in enumerate(batch)
        ]

        # Execute all tasks in parallel with exception handling
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Process results and maintain order
        outputs: list[str] = []
        scores: list[float] = []
        trajectories: list[MultiAgentTrajectory] | None = (
            [] if capture_traces else None
        )
        metadata_list: list[dict[str, Any]] = []
        inputs: list[str] = []

        successful = 0
        failed = 0

        for i, result in enumerate(results):
            if isinstance(result, Exception):
                # Handle exception case
                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=i,
                    error=str(result),
                )
                outputs.append("")
                scores.append(0.0)
                metadata_list.append({})
                inputs.append(batch[i].get("input", "") if i < len(batch) else "")
                failed += 1

                if capture_traces:
                    error_trajectory = MultiAgentTrajectory(
                        agent_trajectories={},
                        pipeline_output="",
                        total_token_usage=None,
                        error=str(result),
                    )
                    trajectories.append(error_trajectory)  # type: ignore
            else:
                # Unpack success case: (output, score, trajectory, metadata, input_text)
                output_text, score, trajectory, metadata, input_text = result  # type: ignore[misc]
                outputs.append(output_text)
                scores.append(score)
                metadata_list.append(metadata or {})
                inputs.append(input_text or "")
                successful += 1

                if capture_traces and trajectory is not None:
                    trajectories.append(trajectory)  # type: ignore

        avg_score = sum(scores) / len(scores) if scores else 0.0

        self._logger.info(
            "adapter.evaluate.complete",
            batch_size=len(batch),
            successful=successful,
            failed=failed,
            avg_score=avg_score,
        )

        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories,
            metadata=metadata_list,
            inputs=inputs,
        )
    finally:
        # Restore all agents to original state per FR-004
        # Uses best-effort restoration: attempts all components even if some fail
        self._restore_agents(originals)

make_reflective_dataset `async` ¶

make_reflective_dataset(
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[MultiAgentTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]

Build reflective datasets from evaluation results with traces.

PARAMETER	DESCRIPTION
`candidate`	Current candidate component values. TYPE: `dict[str, str]`
`eval_batch`	Evaluation results including trajectories. TYPE: `EvaluationBatch[MultiAgentTrajectory, str]`
`components_to_update`	List of component names to generate datasets for. TYPE: `list[str]`

RETURNS	DESCRIPTION
`Mapping[str, Sequence[Mapping[str, Any]]]`	Mapping from component name to sequence of reflection examples.
`Mapping[str, Sequence[Mapping[str, Any]]]`	Each example contains input, output, score, and trace context.

Examples:

Generate reflection dataset:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
dataset = await adapter.make_reflective_dataset(
    candidate, result, ["generator_instruction", "critic_instruction"]
)
assert "generator_instruction" in dataset

Note

Operates on eval_batch trajectories (capture_traces=True required). Dataset format is compatible with MutationProposer interface.

Source code in src/gepa_adk/adapters/multi_agent.py

async def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[MultiAgentTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]:
    """Build reflective datasets from evaluation results with traces.

    Args:
        candidate: Current candidate component values.
        eval_batch: Evaluation results including trajectories.
        components_to_update: List of component names to generate
            datasets for.

    Returns:
        Mapping from component name to sequence of reflection examples.
        Each example contains input, output, score, and trace context.

    Examples:
        Generate reflection dataset:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        dataset = await adapter.make_reflective_dataset(
            candidate, result, ["generator_instruction", "critic_instruction"]
        )
        assert "generator_instruction" in dataset
        ```

    Note:
        Operates on eval_batch trajectories (capture_traces=True required).
        Dataset format is compatible with MutationProposer interface.
    """
    self._logger.info(
        "adapter.make_reflective_dataset.start",
        num_examples=len(eval_batch.outputs),
        components=components_to_update,
    )

    # Build reflective dataset for each requested component
    result: dict[str, list[dict[str, Any]]] = {}

    for component in components_to_update:
        examples: list[dict[str, Any]] = []

        for i, (output, score) in enumerate(
            zip(eval_batch.outputs, eval_batch.scores, strict=True)
        ):
            # Get trajectory info if available
            trajectory = None
            if eval_batch.trajectories and i < len(eval_batch.trajectories):
                trajectory = eval_batch.trajectories[i]

            # Get metadata (critic feedback) if available
            metadata = None
            if eval_batch.metadata and i < len(eval_batch.metadata):
                metadata = eval_batch.metadata[i]

            # Get input text if available
            input_text = None
            if eval_batch.inputs and i < len(eval_batch.inputs):
                input_text = eval_batch.inputs[i]

            example = self._build_reflection_example(
                input_text=input_text,
                output=output,
                score=score,
                trajectory=trajectory,
                metadata=metadata,
                component_name=component,
                component_value=candidate.get(component, ""),
            )
            examples.append(example)

        result[component] = examples

    self._logger.info(
        "adapter.make_reflective_dataset.complete",
        num_components=len(result),
        total_examples=sum(len(exs) for exs in result.values()),
    )

    return result

propose_new_texts `async` ¶

propose_new_texts(
    candidate: dict[str, str],
    reflective_dataset: Mapping[
        str, Sequence[Mapping[str, Any]]
    ],
    components_to_update: list[str],
) -> dict[str, str]

Propose new component texts based on reflective dataset.

Delegates to AsyncReflectiveMutationProposer to generate improved instruction text via LLM reflection. When the proposer returns None (empty dataset), falls back to unchanged candidate values.

PARAMETER	DESCRIPTION
`candidate`	Current candidate component values. TYPE: `dict[str, str]`
`reflective_dataset`	Dataset from make_reflective_dataset(), keyed by component name with sequences of reflection examples in the format produced by build_reflection_example(). TYPE: `Mapping[str, Sequence[Mapping[str, Any]]]`
`components_to_update`	Components to generate proposals for. TYPE: `list[str]`

RETURNS	DESCRIPTION
`dict[str, str]`	Dictionary mapping component names to proposed new text values.
`dict[str, str]`	When proposer returns None, returns unchanged candidate values.

Examples:

Propose new component texts via LLM reflection:

# After evaluation with traces
result = await adapter.evaluate(batch, candidate, capture_traces=True)
dataset = await adapter.make_reflective_dataset(
    candidate, result, ["generator_instruction", "critic_instruction"]
)

# Propose new texts via LLM reflection
proposals = await adapter.propose_new_texts(
    candidate,
    dataset,
    ["generator_instruction", "critic_instruction"],
)
# proposals contains improved instructions based on feedback

Note

Delegates to AsyncReflectiveMutationProposer for actual mutation generation. Falls back gracefully when dataset is empty.

Source code in src/gepa_adk/adapters/multi_agent.py

async def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]:
    """Propose new component texts based on reflective dataset.

    Delegates to AsyncReflectiveMutationProposer to generate improved
    instruction text via LLM reflection. When the proposer returns None
    (empty dataset), falls back to unchanged candidate values.

    Args:
        candidate: Current candidate component values.
        reflective_dataset: Dataset from make_reflective_dataset(), keyed by
            component name with sequences of reflection examples in the format
            produced by build_reflection_example().
        components_to_update: Components to generate proposals for.

    Returns:
        Dictionary mapping component names to proposed new text values.
        When proposer returns None, returns unchanged candidate values.

    Examples:
        Propose new component texts via LLM reflection:

        ```python
        # After evaluation with traces
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        dataset = await adapter.make_reflective_dataset(
            candidate, result, ["generator_instruction", "critic_instruction"]
        )

        # Propose new texts via LLM reflection
        proposals = await adapter.propose_new_texts(
            candidate,
            dataset,
            ["generator_instruction", "critic_instruction"],
        )
        # proposals contains improved instructions based on feedback
        ```

    Note:
        Delegates to AsyncReflectiveMutationProposer for actual mutation
        generation. Falls back gracefully when dataset is empty.
    """
    self._logger.debug(
        "propose_new_texts.delegating",
        components_requested=components_to_update,
    )

    result = await self._proposer.propose(
        candidate, reflective_dataset, components_to_update
    )

    if result is None:
        self._logger.info(
            "propose_new_texts.fallback",
            reason="proposer_returned_none",
            components_requested=components_to_update,
        )
        return {
            component: candidate.get(component, "")
            for component in components_to_update
        }

    # Merge with candidate for any missing components
    merged = {
        component: result.get(component, candidate.get(component, ""))
        for component in components_to_update
    }

    # Log proposed texts for each component
    for component, proposed_text in result.items():
        self._logger.info(
            "proposal.text",
            component=component,
            proposed_length=len(proposed_text),
            proposed_preview=proposed_text[:300] + "..."
            if len(proposed_text) > 300
            else proposed_text,
        )

    self._logger.info(
        "propose_new_texts.complete",
        components_proposed=list(result.keys()),
        components_requested=components_to_update,
    )

    return merged

TimeoutStopper ¶

Stop evolution after a specified timeout.

Terminates evolution when wall-clock time exceeds the configured timeout duration. Useful for resource management and CI/CD pipelines.

ATTRIBUTE	DESCRIPTION
`timeout_seconds`	Maximum wall-clock time for evolution. TYPE: `float`

Examples:

Creating a 5-minute timeout:

from gepa_adk.adapters.stoppers import TimeoutStopper
from gepa_adk.domain.stopper import StopperState

stopper = TimeoutStopper(300.0)  # 5 minutes

state = StopperState(
    iteration=10,
    best_score=0.8,
    stagnation_counter=2,
    total_evaluations=50,
    candidates_count=3,
    elapsed_seconds=400.0,  # Exceeds timeout
)

stopper(state)  # Returns True (should stop)

Note

All timeout values must be positive. Zero and negative values raise ValueError to prevent invalid configurations.

Source code in src/gepa_adk/adapters/stoppers/timeout.py

class TimeoutStopper:
    """Stop evolution after a specified timeout.

    Terminates evolution when wall-clock time exceeds the configured
    timeout duration. Useful for resource management and CI/CD pipelines.

    Attributes:
        timeout_seconds (float): Maximum wall-clock time for evolution.

    Examples:
        Creating a 5-minute timeout:

        ```python
        from gepa_adk.adapters.stoppers import TimeoutStopper
        from gepa_adk.domain.stopper import StopperState

        stopper = TimeoutStopper(300.0)  # 5 minutes

        state = StopperState(
            iteration=10,
            best_score=0.8,
            stagnation_counter=2,
            total_evaluations=50,
            candidates_count=3,
            elapsed_seconds=400.0,  # Exceeds timeout
        )

        stopper(state)  # Returns True (should stop)
        ```

    Note:
        All timeout values must be positive. Zero and negative values
        raise ValueError to prevent invalid configurations.
    """

    def __init__(self, timeout_seconds: float) -> None:
        """Initialize timeout stopper with maximum duration.

        Args:
            timeout_seconds: Maximum wall-clock time for evolution in seconds.
                Must be a positive value.

        Raises:
            ValueError: If timeout_seconds is zero or negative.

        Examples:
            ```python
            stopper = TimeoutStopper(60.0)  # 1 minute
            stopper = TimeoutStopper(3600.0)  # 1 hour
            ```

        Note:
            Consider using reasonable timeouts for your use case. Very
            short timeouts may not allow sufficient evolution progress.
        """
        if timeout_seconds <= 0:
            raise ValueError("timeout_seconds must be positive")
        self.timeout_seconds = timeout_seconds

    def __call__(self, state: StopperState) -> bool:
        """Check if evolution should stop due to timeout.

        Args:
            state: Current evolution state snapshot containing elapsed_seconds.

        Returns:
            True if elapsed time meets or exceeds timeout, False otherwise.

        Examples:
            ```python
            stopper = TimeoutStopper(60.0)

            # Not yet timed out
            state1 = StopperState(
                iteration=5,
                best_score=0.5,
                stagnation_counter=0,
                total_evaluations=25,
                candidates_count=1,
                elapsed_seconds=30.0,
            )
            stopper(state1)  # False

            # Timed out
            state2 = StopperState(
                iteration=10,
                best_score=0.7,
                stagnation_counter=1,
                total_evaluations=50,
                candidates_count=2,
                elapsed_seconds=65.0,
            )
            stopper(state2)  # True
            ```

        Note:
            Often called after each iteration. Returns True as soon as
            the time limit is reached.
        """
        return state.elapsed_seconds >= self.timeout_seconds

init ¶

__init__(timeout_seconds: float) -> None

Initialize timeout stopper with maximum duration.

PARAMETER	DESCRIPTION
`timeout_seconds`	Maximum wall-clock time for evolution in seconds. Must be a positive value. TYPE: `float`

RAISES	DESCRIPTION
`ValueError`	If timeout_seconds is zero or negative.

Examples:

stopper = TimeoutStopper(60.0)  # 1 minute
stopper = TimeoutStopper(3600.0)  # 1 hour

Note

Consider using reasonable timeouts for your use case. Very short timeouts may not allow sufficient evolution progress.

Source code in src/gepa_adk/adapters/stoppers/timeout.py

def __init__(self, timeout_seconds: float) -> None:
    """Initialize timeout stopper with maximum duration.

    Args:
        timeout_seconds: Maximum wall-clock time for evolution in seconds.
            Must be a positive value.

    Raises:
        ValueError: If timeout_seconds is zero or negative.

    Examples:
        ```python
        stopper = TimeoutStopper(60.0)  # 1 minute
        stopper = TimeoutStopper(3600.0)  # 1 hour
        ```

    Note:
        Consider using reasonable timeouts for your use case. Very
        short timeouts may not allow sufficient evolution progress.
    """
    if timeout_seconds <= 0:
        raise ValueError("timeout_seconds must be positive")
    self.timeout_seconds = timeout_seconds

call ¶

__call__(state: StopperState) -> bool

Check if evolution should stop due to timeout.

PARAMETER	DESCRIPTION
`state`	Current evolution state snapshot containing elapsed_seconds. TYPE: `StopperState`

RETURNS	DESCRIPTION
`bool`	True if elapsed time meets or exceeds timeout, False otherwise.

Examples:

stopper = TimeoutStopper(60.0)

# Not yet timed out
state1 = StopperState(
    iteration=5,
    best_score=0.5,
    stagnation_counter=0,
    total_evaluations=25,
    candidates_count=1,
    elapsed_seconds=30.0,
)
stopper(state1)  # False

# Timed out
state2 = StopperState(
    iteration=10,
    best_score=0.7,
    stagnation_counter=1,
    total_evaluations=50,
    candidates_count=2,
    elapsed_seconds=65.0,
)
stopper(state2)  # True

Note

Often called after each iteration. Returns True as soon as the time limit is reached.

Source code in src/gepa_adk/adapters/stoppers/timeout.py

def __call__(self, state: StopperState) -> bool:
    """Check if evolution should stop due to timeout.

    Args:
        state: Current evolution state snapshot containing elapsed_seconds.

    Returns:
        True if elapsed time meets or exceeds timeout, False otherwise.

    Examples:
        ```python
        stopper = TimeoutStopper(60.0)

        # Not yet timed out
        state1 = StopperState(
            iteration=5,
            best_score=0.5,
            stagnation_counter=0,
            total_evaluations=25,
            candidates_count=1,
            elapsed_seconds=30.0,
        )
        stopper(state1)  # False

        # Timed out
        state2 = StopperState(
            iteration=10,
            best_score=0.7,
            stagnation_counter=1,
            total_evaluations=50,
            candidates_count=2,
            elapsed_seconds=65.0,
        )
        stopper(state2)  # True
        ```

    Note:
        Often called after each iteration. Returns True as soon as
        the time limit is reached.
    """
    return state.elapsed_seconds >= self.timeout_seconds

TrialBuilder ¶

Build trial records for reflection datasets.

Constructs consistent trial structures following the GEPA whitepaper format. Extracts feedback fields from scorer metadata and builds trajectory dicts.

ATTRIBUTE	DESCRIPTION
`_logger`	Logger for metadata passthrough debugging. TYPE: `BoundLogger`

Examples:

Basic trial building:

builder = TrialBuilder()

# With minimal data
trial = builder.build_trial(
    input_text="Hello",
    output="Hi there!",
    score=0.8,
)

# With full metadata
trial = builder.build_trial(
    input_text="Explain AI",
    output="AI is...",
    score=0.9,
    metadata={
        "feedback": "Clear explanation",
        "dimension_scores": {"clarity": 0.95},
        "actionable_guidance": "Add examples",
    },
    extra_trajectory={"component": "instruction"},
)

See Also

build_feedback: Build just the feedback dict.

Note

All optional metadata fields are validated before inclusion to prevent malformed data from propagating to reflection prompts.

Source code in src/gepa_adk/adapters/trial_builder.py

class TrialBuilder:
    """Build trial records for reflection datasets.

    Constructs consistent trial structures following the GEPA whitepaper format.
    Extracts feedback fields from scorer metadata and builds trajectory dicts.

    Attributes:
        _logger (structlog.BoundLogger): Logger for metadata passthrough debugging.

    Examples:
        Basic trial building:

        ```python
        builder = TrialBuilder()

        # With minimal data
        trial = builder.build_trial(
            input_text="Hello",
            output="Hi there!",
            score=0.8,
        )

        # With full metadata
        trial = builder.build_trial(
            input_text="Explain AI",
            output="AI is...",
            score=0.9,
            metadata={
                "feedback": "Clear explanation",
                "dimension_scores": {"clarity": 0.95},
                "actionable_guidance": "Add examples",
            },
            extra_trajectory={"component": "instruction"},
        )
        ```

    See Also:
        - [`build_feedback`][gepa_adk.adapters.trial_builder.TrialBuilder.build_feedback]:
          Build just the feedback dict.

    Note:
        All optional metadata fields are validated before inclusion to prevent
        malformed data from propagating to reflection prompts.
    """

    def __init__(self) -> None:
        """Initialize the TrialBuilder.

        Examples:
            ```python
            builder = TrialBuilder()
            ```

        Note:
            Creates a module-scoped logger for metadata passthrough debugging.
        """
        self._logger = structlog.get_logger(__name__)

    def build_feedback(
        self,
        score: float,
        metadata: dict[str, Any] | None = None,
        *,
        error: str | None = None,
        log_passthrough: bool = False,
    ) -> dict[str, Any]:
        """Build the feedback dict from score and metadata.

        Extracts and validates feedback fields from scorer metadata, including
        feedback_text, feedback_guidance, and feedback_dimensions.

        Args:
            score: Evaluation score (mandatory).
            metadata: Optional scorer metadata dict containing:
                - feedback: Text feedback from critic.
                - actionable_guidance: Improvement suggestions.
                - dimension_scores: Per-dimension score breakdown.
            error: Optional error message to include in feedback.
            log_passthrough: If True, log debug info about metadata extraction.

        Returns:
            Feedback dict with keys:
                - score (mandatory): The evaluation score.
                - feedback_text (if available): Text feedback from critic.
                - feedback_guidance (if available): Improvement suggestions.
                - feedback_dimensions (if available): Dimension score breakdown.
                - error (if provided): Error message from execution.

        Examples:
            ```python
            builder = TrialBuilder()

            # Minimal feedback
            feedback = builder.build_feedback(0.75)
            assert feedback == {"score": 0.75}

            # With metadata
            feedback = builder.build_feedback(
                0.85,
                metadata={
                    "feedback": "Good work",
                    "dimension_scores": {"accuracy": 0.9},
                },
            )
            assert "feedback_text" in feedback
            ```

        Note:
            Only non-empty strings and dicts are included to keep feedback clean.
        """
        # Use normalize_feedback for consistent field mapping
        feedback = normalize_feedback(score, metadata)

        # Log metadata passthrough for debugging if requested
        if log_passthrough and metadata and isinstance(metadata, dict):
            has_feedback = bool(
                metadata.get("feedback") or metadata.get("feedback_text")
            )
            has_guidance = bool(metadata.get("actionable_guidance"))
            has_dimensions = bool(metadata.get("dimension_scores"))
            self._logger.debug(
                "trial_builder.metadata.passthrough",
                has_feedback=has_feedback,
                has_guidance=has_guidance,
                has_dimensions=has_dimensions,
            )

        # Add error if present
        if error:
            feedback["error"] = error

        return feedback

    def build_trial(
        self,
        input_text: str | None,
        output: str,
        score: float,
        metadata: dict[str, Any] | None = None,
        *,
        error: str | None = None,
        trace: dict[str, Any] | None = None,
        extra_trajectory: dict[str, Any] | None = None,
        log_passthrough: bool = False,
    ) -> dict[str, Any]:
        """Build a complete trial record for reflection.

        Constructs a trial with feedback and trajectory dicts following the
        GEPA whitepaper structure.

        Args:
            input_text: The input that was given to the system. Can be None for
                pipelines where input context is implicit.
            output: What the system produced.
            score: Evaluation score for this output.
            metadata: Optional scorer metadata dict (from CriticScorer).
            error: Optional error message from execution.
            trace: Optional execution trace dict (tool calls, state, tokens).
            extra_trajectory: Optional extra fields to include in trajectory
                (e.g., component name, component value, tokens).
            log_passthrough: If True, log debug info about metadata extraction.

        Returns:
            Trial dict with keys:
                - feedback: Evaluation feedback (score, feedback_text, etc.)
                - trajectory: Execution journey (input, output, trace, etc.)

        Examples:
            ```python
            builder = TrialBuilder()

            # Simple trial
            trial = builder.build_trial(
                input_text="What is Python?",
                output="A programming language",
                score=0.9,
            )
            assert trial["feedback"]["score"] == 0.9
            assert trial["trajectory"]["output"] == "A programming language"

            # With trace and extra trajectory data
            trial = builder.build_trial(
                input_text="Count to 3",
                output="1, 2, 3",
                score=1.0,
                trace={"tool_calls": [{"name": "count"}]},
                extra_trajectory={"component": "counter"},
            )
            assert "trace" in trial["trajectory"]
            assert trial["trajectory"]["component"] == "counter"
            ```

        Note:
            Optional input is only included in trajectory when not None,
            supporting pipelines where input context is implicit.
        """
        # Build feedback dict
        feedback = self.build_feedback(
            score,
            metadata,
            error=error,
            log_passthrough=log_passthrough,
        )

        # Build trajectory dict
        trajectory: dict[str, Any] = {"output": output}

        # Add input if available
        if input_text is not None:
            trajectory["input"] = input_text

        # Add trace if available
        if trace:
            trajectory["trace"] = trace

        # Add extra trajectory fields if provided
        if extra_trajectory:
            trajectory.update(extra_trajectory)

        # Build trial record
        return {
            "feedback": feedback,
            "trajectory": trajectory,
        }

init ¶

__init__() -> None

Initialize the TrialBuilder.

Examples:

builder = TrialBuilder()

Note

Creates a module-scoped logger for metadata passthrough debugging.

Source code in src/gepa_adk/adapters/trial_builder.py

def __init__(self) -> None:
    """Initialize the TrialBuilder.

    Examples:
        ```python
        builder = TrialBuilder()
        ```

    Note:
        Creates a module-scoped logger for metadata passthrough debugging.
    """
    self._logger = structlog.get_logger(__name__)

build_feedback ¶

build_feedback(
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]

Build the feedback dict from score and metadata.

Extracts and validates feedback fields from scorer metadata, including feedback_text, feedback_guidance, and feedback_dimensions.

PARAMETER	DESCRIPTION
`score`	Evaluation score (mandatory). TYPE: `float`
`metadata`	Optional scorer metadata dict containing: - feedback: Text feedback from critic. - actionable_guidance: Improvement suggestions. - dimension_scores: Per-dimension score breakdown. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`error`	Optional error message to include in feedback. TYPE: `str \| None` DEFAULT: `None`
`log_passthrough`	If True, log debug info about metadata extraction. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`dict[str, Any]`	Feedback dict with keys: - score (mandatory): The evaluation score. - feedback_text (if available): Text feedback from critic. - feedback_guidance (if available): Improvement suggestions. - feedback_dimensions (if available): Dimension score breakdown. - error (if provided): Error message from execution.

Examples:

builder = TrialBuilder()

# Minimal feedback
feedback = builder.build_feedback(0.75)
assert feedback == {"score": 0.75}

# With metadata
feedback = builder.build_feedback(
    0.85,
    metadata={
        "feedback": "Good work",
        "dimension_scores": {"accuracy": 0.9},
    },
)
assert "feedback_text" in feedback

Note

Only non-empty strings and dicts are included to keep feedback clean.

Source code in src/gepa_adk/adapters/trial_builder.py

def build_feedback(
    self,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]:
    """Build the feedback dict from score and metadata.

    Extracts and validates feedback fields from scorer metadata, including
    feedback_text, feedback_guidance, and feedback_dimensions.

    Args:
        score: Evaluation score (mandatory).
        metadata: Optional scorer metadata dict containing:
            - feedback: Text feedback from critic.
            - actionable_guidance: Improvement suggestions.
            - dimension_scores: Per-dimension score breakdown.
        error: Optional error message to include in feedback.
        log_passthrough: If True, log debug info about metadata extraction.

    Returns:
        Feedback dict with keys:
            - score (mandatory): The evaluation score.
            - feedback_text (if available): Text feedback from critic.
            - feedback_guidance (if available): Improvement suggestions.
            - feedback_dimensions (if available): Dimension score breakdown.
            - error (if provided): Error message from execution.

    Examples:
        ```python
        builder = TrialBuilder()

        # Minimal feedback
        feedback = builder.build_feedback(0.75)
        assert feedback == {"score": 0.75}

        # With metadata
        feedback = builder.build_feedback(
            0.85,
            metadata={
                "feedback": "Good work",
                "dimension_scores": {"accuracy": 0.9},
            },
        )
        assert "feedback_text" in feedback
        ```

    Note:
        Only non-empty strings and dicts are included to keep feedback clean.
    """
    # Use normalize_feedback for consistent field mapping
    feedback = normalize_feedback(score, metadata)

    # Log metadata passthrough for debugging if requested
    if log_passthrough and metadata and isinstance(metadata, dict):
        has_feedback = bool(
            metadata.get("feedback") or metadata.get("feedback_text")
        )
        has_guidance = bool(metadata.get("actionable_guidance"))
        has_dimensions = bool(metadata.get("dimension_scores"))
        self._logger.debug(
            "trial_builder.metadata.passthrough",
            has_feedback=has_feedback,
            has_guidance=has_guidance,
            has_dimensions=has_dimensions,
        )

    # Add error if present
    if error:
        feedback["error"] = error

    return feedback

build_trial ¶

build_trial(
    input_text: str | None,
    output: str,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    trace: dict[str, Any] | None = None,
    extra_trajectory: dict[str, Any] | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]

Build a complete trial record for reflection.

Constructs a trial with feedback and trajectory dicts following the GEPA whitepaper structure.

PARAMETER	DESCRIPTION
`input_text`	The input that was given to the system. Can be None for pipelines where input context is implicit. TYPE: `str \| None`
`output`	What the system produced. TYPE: `str`
`score`	Evaluation score for this output. TYPE: `float`
`metadata`	Optional scorer metadata dict (from CriticScorer). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`error`	Optional error message from execution. TYPE: `str \| None` DEFAULT: `None`
`trace`	Optional execution trace dict (tool calls, state, tokens). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`extra_trajectory`	Optional extra fields to include in trajectory (e.g., component name, component value, tokens). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`log_passthrough`	If True, log debug info about metadata extraction. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`dict[str, Any]`	Trial dict with keys: - feedback: Evaluation feedback (score, feedback_text, etc.) - trajectory: Execution journey (input, output, trace, etc.)

Examples:

builder = TrialBuilder()

# Simple trial
trial = builder.build_trial(
    input_text="What is Python?",
    output="A programming language",
    score=0.9,
)
assert trial["feedback"]["score"] == 0.9
assert trial["trajectory"]["output"] == "A programming language"

# With trace and extra trajectory data
trial = builder.build_trial(
    input_text="Count to 3",
    output="1, 2, 3",
    score=1.0,
    trace={"tool_calls": [{"name": "count"}]},
    extra_trajectory={"component": "counter"},
)
assert "trace" in trial["trajectory"]
assert trial["trajectory"]["component"] == "counter"

Note

Optional input is only included in trajectory when not None, supporting pipelines where input context is implicit.

Source code in src/gepa_adk/adapters/trial_builder.py

def build_trial(
    self,
    input_text: str | None,
    output: str,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    trace: dict[str, Any] | None = None,
    extra_trajectory: dict[str, Any] | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]:
    """Build a complete trial record for reflection.

    Constructs a trial with feedback and trajectory dicts following the
    GEPA whitepaper structure.

    Args:
        input_text: The input that was given to the system. Can be None for
            pipelines where input context is implicit.
        output: What the system produced.
        score: Evaluation score for this output.
        metadata: Optional scorer metadata dict (from CriticScorer).
        error: Optional error message from execution.
        trace: Optional execution trace dict (tool calls, state, tokens).
        extra_trajectory: Optional extra fields to include in trajectory
            (e.g., component name, component value, tokens).
        log_passthrough: If True, log debug info about metadata extraction.

    Returns:
        Trial dict with keys:
            - feedback: Evaluation feedback (score, feedback_text, etc.)
            - trajectory: Execution journey (input, output, trace, etc.)

    Examples:
        ```python
        builder = TrialBuilder()

        # Simple trial
        trial = builder.build_trial(
            input_text="What is Python?",
            output="A programming language",
            score=0.9,
        )
        assert trial["feedback"]["score"] == 0.9
        assert trial["trajectory"]["output"] == "A programming language"

        # With trace and extra trajectory data
        trial = builder.build_trial(
            input_text="Count to 3",
            output="1, 2, 3",
            score=1.0,
            trace={"tool_calls": [{"name": "count"}]},
            extra_trajectory={"component": "counter"},
        )
        assert "trace" in trial["trajectory"]
        assert trial["trajectory"]["component"] == "counter"
        ```

    Note:
        Optional input is only included in trajectory when not None,
        supporting pipelines where input context is implicit.
    """
    # Build feedback dict
    feedback = self.build_feedback(
        score,
        metadata,
        error=error,
        log_passthrough=log_passthrough,
    )

    # Build trajectory dict
    trajectory: dict[str, Any] = {"output": output}

    # Add input if available
    if input_text is not None:
        trajectory["input"] = input_text

    # Add trace if available
    if trace:
        trajectory["trace"] = trace

    # Add extra trajectory fields if provided
    if extra_trajectory:
        trajectory.update(extra_trajectory)

    # Build trial record
    return {
        "feedback": feedback,
        "trajectory": trajectory,
    }

VideoBlobService ¶

Video blob loading service for multimodal content.

Implements VideoBlobServiceProtocol to convert video files to ADK Part objects. Validates video files for existence, size limits, and MIME types before loading.

ATTRIBUTE	DESCRIPTION
`_logger`	Structured logger for video operations. TYPE: `BoundLogger`

Examples:

Load a single video:

service = VideoBlobService()
parts = await service.prepare_video_parts(["/data/lecture.mp4"])
assert len(parts) == 1

Load multiple videos:

paths = ["/data/intro.mp4", "/data/main.mp4"]
parts = await service.prepare_video_parts(paths)
assert len(parts) == 2

Handle validation errors:

from gepa_adk.domain.exceptions import VideoValidationError

try:
    parts = await service.prepare_video_parts(["/missing.mp4"])
except VideoValidationError as e:
    print(f"Invalid: {e.video_path}")

Note

Adapter implements VideoBlobServiceProtocol for dependency injection and testing. All ADK-specific Part creation is encapsulated here.

Source code in src/gepa_adk/adapters/video_blob_service.py

class VideoBlobService:
    """Video blob loading service for multimodal content.

    Implements VideoBlobServiceProtocol to convert video files to ADK Part
    objects. Validates video files for existence, size limits, and MIME types
    before loading.

    Attributes:
        _logger (BoundLogger): Structured logger for video operations.

    Examples:
        Load a single video:

        ```python
        service = VideoBlobService()
        parts = await service.prepare_video_parts(["/data/lecture.mp4"])
        assert len(parts) == 1
        ```

        Load multiple videos:

        ```python
        paths = ["/data/intro.mp4", "/data/main.mp4"]
        parts = await service.prepare_video_parts(paths)
        assert len(parts) == 2
        ```

        Handle validation errors:

        ```python
        from gepa_adk.domain.exceptions import VideoValidationError

        try:
            parts = await service.prepare_video_parts(["/missing.mp4"])
        except VideoValidationError as e:
            print(f"Invalid: {e.video_path}")
        ```

    Note:
        Adapter implements VideoBlobServiceProtocol for dependency injection
        and testing. All ADK-specific Part creation is encapsulated here.
    """

    def __init__(self) -> None:
        """Initialize VideoBlobService.

        Examples:
            Default initialization:

            ```python
            service = VideoBlobService()
            ```

        Note:
            Creates a service instance with a bound logger for video
            operations. No external dependencies are required.
        """
        self._logger = logger.bind(component="VideoBlobService")

    def _detect_mime_type(self, video_path: str) -> str:
        """Detect MIME type of a video file.

        Uses Python's mimetypes module to guess the MIME type based on
        file extension.

        Args:
            video_path: Path to the video file.

        Returns:
            Detected MIME type string (e.g., "video/mp4").

        Note:
            Scans file extension only, not file content. Falls back to
            "application/octet-stream" for unknown types.
        """
        mime_type, _ = mimetypes.guess_type(video_path)
        return mime_type or "application/octet-stream"

    def validate_video_file(self, video_path: str) -> VideoFileInfo:
        """Validate a video file and return its metadata.

        Checks that the file exists, is within size limits, and has
        a valid video MIME type.

        Args:
            video_path: Absolute path to the video file to validate.

        Returns:
            VideoFileInfo containing validated metadata.

        Raises:
            VideoValidationError: If validation fails.

        Examples:
            Validate a video file:

            ```python
            info = service.validate_video_file("/data/video.mp4")
            print(f"Size: {info.size_bytes}")
            ```

        Note:
            Operates synchronously for fast pre-validation. File content
            is not read, only metadata is checked.
        """
        path = Path(video_path)

        # Check file existence
        if not path.exists():
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="file_not_found",
            )
            raise VideoValidationError(
                f"Video file not found: {video_path}",
                video_path=video_path,
                constraint="file must exist",
            )

        # Check file is not a directory
        if not path.is_file():
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="not_a_file",
            )
            raise VideoValidationError(
                f"Path is not a file: {video_path}",
                video_path=video_path,
                constraint="path must be a file",
            )

        # Check file size
        size_bytes = path.stat().st_size
        if size_bytes > MAX_VIDEO_SIZE_BYTES:
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="file_too_large",
                size_bytes=size_bytes,
                max_bytes=MAX_VIDEO_SIZE_BYTES,
            )
            raise VideoValidationError(
                f"Video exceeds 2GB limit: {size_bytes} bytes",
                video_path=video_path,
                constraint="size <= 2GB",
            )

        # Check MIME type
        mime_type = self._detect_mime_type(video_path)
        if not mime_type.startswith("video/"):
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="invalid_mime_type",
                mime_type=mime_type,
            )
            raise VideoValidationError(
                f"Not a video file: {mime_type}",
                video_path=video_path,
                constraint="must be video/* MIME type",
            )

        self._logger.debug(
            "video.validated",
            video_path=video_path,
            size_bytes=size_bytes,
            mime_type=mime_type,
        )

        return VideoFileInfo(
            path=video_path,
            size_bytes=size_bytes,
            mime_type=mime_type,
        )

    async def _load_video_bytes(self, video_path: str) -> bytes:
        """Load video file bytes asynchronously.

        Reads the file content in a thread pool to avoid blocking
        the event loop.

        Args:
            video_path: Path to the video file.

        Returns:
            Raw bytes of the video file.

        Note:
            Spawns blocking file read in thread pool via run_in_executor
            to prevent blocking the event loop during large file reads.
        """
        loop = asyncio.get_running_loop()

        def read_file() -> bytes:
            with open(video_path, "rb") as f:
                return f.read()

        return await loop.run_in_executor(None, read_file)

    async def prepare_video_parts(self, video_paths: list[str]) -> list[Part]:
        """Load video files and create Part objects for multimodal content.

        Validates and loads all video files, converting them to ADK Part
        objects with inline video data.

        Args:
            video_paths: List of absolute paths to video files.

        Returns:
            List of Part objects, one per input path. Order is preserved.

        Raises:
            ValueError: If video_paths is empty.
            VideoValidationError: If any file fails validation.

        Examples:
            Load videos:

            ```python
            parts = await service.prepare_video_parts(
                [
                    "/data/intro.mp4",
                    "/data/main.mp4",
                ]
            )
            assert len(parts) == 2
            ```

        Note:
            Operations include async file I/O. Videos are validated before
            loading to provide early failure with clear error messages.
        """
        if not video_paths:
            raise ValueError("video_paths cannot be empty")

        self._logger.info(
            "video.prepare_start",
            video_count=len(video_paths),
        )

        parts: list[Part] = []

        for video_path in video_paths:
            # Validate first
            info = self.validate_video_file(video_path)

            # Load video bytes
            video_bytes = await self._load_video_bytes(video_path)

            # Create Part with inline data
            part = Part.from_bytes(
                data=video_bytes,
                mime_type=info.mime_type,
            )
            parts.append(part)

            self._logger.debug(
                "video.loaded",
                video_path=video_path,
                size_bytes=info.size_bytes,
                mime_type=info.mime_type,
            )

        self._logger.info(
            "video.prepare_complete",
            video_count=len(parts),
        )

        return parts

init ¶

__init__() -> None

Initialize VideoBlobService.

Examples:

Default initialization:

service = VideoBlobService()

Note

Creates a service instance with a bound logger for video operations. No external dependencies are required.

Source code in src/gepa_adk/adapters/video_blob_service.py

def __init__(self) -> None:
    """Initialize VideoBlobService.

    Examples:
        Default initialization:

        ```python
        service = VideoBlobService()
        ```

    Note:
        Creates a service instance with a bound logger for video
        operations. No external dependencies are required.
    """
    self._logger = logger.bind(component="VideoBlobService")

validate_video_file ¶

validate_video_file(video_path: str) -> VideoFileInfo

Validate a video file and return its metadata.

Checks that the file exists, is within size limits, and has a valid video MIME type.

PARAMETER	DESCRIPTION
`video_path`	Absolute path to the video file to validate. TYPE: `str`

RETURNS	DESCRIPTION
`VideoFileInfo`	VideoFileInfo containing validated metadata.

RAISES	DESCRIPTION
`VideoValidationError`	If validation fails.

Examples:

Validate a video file:

info = service.validate_video_file("/data/video.mp4")
print(f"Size: {info.size_bytes}")

Note

Operates synchronously for fast pre-validation. File content is not read, only metadata is checked.

Source code in src/gepa_adk/adapters/video_blob_service.py

def validate_video_file(self, video_path: str) -> VideoFileInfo:
    """Validate a video file and return its metadata.

    Checks that the file exists, is within size limits, and has
    a valid video MIME type.

    Args:
        video_path: Absolute path to the video file to validate.

    Returns:
        VideoFileInfo containing validated metadata.

    Raises:
        VideoValidationError: If validation fails.

    Examples:
        Validate a video file:

        ```python
        info = service.validate_video_file("/data/video.mp4")
        print(f"Size: {info.size_bytes}")
        ```

    Note:
        Operates synchronously for fast pre-validation. File content
        is not read, only metadata is checked.
    """
    path = Path(video_path)

    # Check file existence
    if not path.exists():
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="file_not_found",
        )
        raise VideoValidationError(
            f"Video file not found: {video_path}",
            video_path=video_path,
            constraint="file must exist",
        )

    # Check file is not a directory
    if not path.is_file():
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="not_a_file",
        )
        raise VideoValidationError(
            f"Path is not a file: {video_path}",
            video_path=video_path,
            constraint="path must be a file",
        )

    # Check file size
    size_bytes = path.stat().st_size
    if size_bytes > MAX_VIDEO_SIZE_BYTES:
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="file_too_large",
            size_bytes=size_bytes,
            max_bytes=MAX_VIDEO_SIZE_BYTES,
        )
        raise VideoValidationError(
            f"Video exceeds 2GB limit: {size_bytes} bytes",
            video_path=video_path,
            constraint="size <= 2GB",
        )

    # Check MIME type
    mime_type = self._detect_mime_type(video_path)
    if not mime_type.startswith("video/"):
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="invalid_mime_type",
            mime_type=mime_type,
        )
        raise VideoValidationError(
            f"Not a video file: {mime_type}",
            video_path=video_path,
            constraint="must be video/* MIME type",
        )

    self._logger.debug(
        "video.validated",
        video_path=video_path,
        size_bytes=size_bytes,
        mime_type=mime_type,
    )

    return VideoFileInfo(
        path=video_path,
        size_bytes=size_bytes,
        mime_type=mime_type,
    )

prepare_video_parts `async` ¶

prepare_video_parts(video_paths: list[str]) -> list[Part]

Load video files and create Part objects for multimodal content.

Validates and loads all video files, converting them to ADK Part objects with inline video data.

PARAMETER	DESCRIPTION
`video_paths`	List of absolute paths to video files. TYPE: `list[str]`

RETURNS	DESCRIPTION
`list[Part]`	List of Part objects, one per input path. Order is preserved.

RAISES	DESCRIPTION
`ValueError`	If video_paths is empty.
`VideoValidationError`	If any file fails validation.

Examples:

Load videos:

parts = await service.prepare_video_parts(
    [
        "/data/intro.mp4",
        "/data/main.mp4",
    ]
)
assert len(parts) == 2

Note

Operations include async file I/O. Videos are validated before loading to provide early failure with clear error messages.

Source code in src/gepa_adk/adapters/video_blob_service.py

async def prepare_video_parts(self, video_paths: list[str]) -> list[Part]:
    """Load video files and create Part objects for multimodal content.

    Validates and loads all video files, converting them to ADK Part
    objects with inline video data.

    Args:
        video_paths: List of absolute paths to video files.

    Returns:
        List of Part objects, one per input path. Order is preserved.

    Raises:
        ValueError: If video_paths is empty.
        VideoValidationError: If any file fails validation.

    Examples:
        Load videos:

        ```python
        parts = await service.prepare_video_parts(
            [
                "/data/intro.mp4",
                "/data/main.mp4",
            ]
        )
        assert len(parts) == 2
        ```

    Note:
        Operations include async file I/O. Videos are validated before
        loading to provide early failure with clear error messages.
    """
    if not video_paths:
        raise ValueError("video_paths cannot be empty")

    self._logger.info(
        "video.prepare_start",
        video_count=len(video_paths),
    )

    parts: list[Part] = []

    for video_path in video_paths:
        # Validate first
        info = self.validate_video_file(video_path)

        # Load video bytes
        video_bytes = await self._load_video_bytes(video_path)

        # Create Part with inline data
        part = Part.from_bytes(
            data=video_bytes,
            mime_type=info.mime_type,
        )
        parts.append(part)

        self._logger.debug(
            "video.loaded",
            video_path=video_path,
            size_bytes=info.size_bytes,
            mime_type=info.mime_type,
        )

    self._logger.info(
        "video.prepare_complete",
        video_count=len(parts),
    )

    return parts

create_candidate_selector ¶

create_candidate_selector(
    selector_type: str,
    *,
    epsilon: float = 0.1,
    rng: Random | None = None,
) -> CandidateSelectorProtocol

Create a candidate selector by name.

PARAMETER	DESCRIPTION
`selector_type`	Selector identifier (pareto, greedy, epsilon_greedy). TYPE: `str`
`epsilon`	Exploration rate for epsilon-greedy selector. TYPE: `float` DEFAULT: `0.1`
`rng`	Optional RNG for selectors using randomness. TYPE: `Random \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`CandidateSelectorProtocol`	CandidateSelectorProtocol implementation.

RAISES	DESCRIPTION
`ConfigurationError`	If selector_type is unsupported.

Examples:

selector = create_candidate_selector("pareto")
candidate_idx = await selector.select_candidate(state)

Source code in src/gepa_adk/adapters/candidate_selector.py

def create_candidate_selector(
    selector_type: str,
    *,
    epsilon: float = 0.1,
    rng: random.Random | None = None,
) -> CandidateSelectorProtocol:
    """Create a candidate selector by name.

    Args:
        selector_type: Selector identifier (pareto, greedy, epsilon_greedy).
        epsilon: Exploration rate for epsilon-greedy selector.
        rng: Optional RNG for selectors using randomness.

    Returns:
        CandidateSelectorProtocol implementation.

    Raises:
        ConfigurationError: If selector_type is unsupported.

    Examples:
        ```python
        selector = create_candidate_selector("pareto")
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    normalized = selector_type.strip().lower()
    if normalized in {"pareto"}:
        return ParetoCandidateSelector(rng=rng)
    if normalized in {"greedy", "current_best", "current-best"}:
        return CurrentBestCandidateSelector()
    if normalized in {"epsilon_greedy", "epsilon-greedy"}:
        return EpsilonGreedyCandidateSelector(epsilon=epsilon, rng=rng)
    raise ConfigurationError(
        "selector_type must be one of pareto, greedy, epsilon_greedy",
        field="selector_type",
        value=selector_type,
        constraint="pareto|greedy|epsilon_greedy",
    )

get_handler ¶

get_handler(name: str) -> ComponentHandler

Get handler from default registry.

PARAMETER	DESCRIPTION
`name`	Component name to look up. TYPE: `str`

RETURNS	DESCRIPTION
`ComponentHandler`	The registered ComponentHandler.

RAISES	DESCRIPTION
`ValueError`	If name is empty or None.
`KeyError`	If no handler registered for name.

Examples:

handler = get_handler("instruction")
original = handler.apply(agent, "New instruction")

See Also

[ComponentHandlerRegistry.get()]: Underlying registry method.

Note

Shortcut for component_handlers.get(name).

Source code in src/gepa_adk/adapters/component_handlers.py

def get_handler(name: str) -> ComponentHandler:
    """Get handler from default registry.

    Args:
        name: Component name to look up.

    Returns:
        The registered ComponentHandler.

    Raises:
        ValueError: If name is empty or None.
        KeyError: If no handler registered for name.

    Examples:
        ```python
        handler = get_handler("instruction")
        original = handler.apply(agent, "New instruction")
        ```

    See Also:
        - [`ComponentHandlerRegistry.get()`]: Underlying registry method.

    Note:
        Shortcut for component_handlers.get(name).
    """
    return component_handlers.get(name)

register_handler ¶

register_handler(
    name: str, handler: ComponentHandler
) -> None

Register handler in default registry.

PARAMETER	DESCRIPTION
`name`	Component name to register. TYPE: `str`
`handler`	Handler implementing ComponentHandler protocol. TYPE: `ComponentHandler`

RAISES	DESCRIPTION
`ValueError`	If name is empty or None.
`TypeError`	If handler doesn't implement ComponentHandler protocol.

Examples:

register_handler("my_component", MyHandler())
handler = get_handler("my_component")

See Also

[ComponentHandlerRegistry.register()]: Underlying registry method.

Note

Shortcut for component_handlers.register(name, handler).

Source code in src/gepa_adk/adapters/component_handlers.py

def register_handler(name: str, handler: ComponentHandler) -> None:
    """Register handler in default registry.

    Args:
        name: Component name to register.
        handler: Handler implementing ComponentHandler protocol.

    Raises:
        ValueError: If name is empty or None.
        TypeError: If handler doesn't implement ComponentHandler protocol.

    Examples:
        ```python
        register_handler("my_component", MyHandler())
        handler = get_handler("my_component")
        ```

    See Also:
        - [`ComponentHandlerRegistry.register()`]: Underlying registry method.

    Note:
        Shortcut for component_handlers.register(name, handler).
    """
    component_handlers.register(name, handler)

create_component_selector ¶

create_component_selector(
    selector_type: str,
) -> ComponentSelectorProtocol

Create a component selector strategy from a string alias.

PARAMETER	DESCRIPTION
`selector_type`	Name of the selector strategy. Supported values: - 'round_robin', 'roundrobin': Round-robin cycling. - 'all', 'all_components': All components simultaneously. TYPE: `str`

RETURNS	DESCRIPTION
`ComponentSelectorProtocol`	Instance of requested component selector.

RAISES	DESCRIPTION
`ValueError`	If selector_type is unknown.

Examples:

# Create round-robin selector
selector = create_component_selector("round_robin")

# Create all-components selector
selector = create_component_selector("all")

Note

Supports flexible string aliases with normalization for common variations (underscores, hyphens, case-insensitive).

Source code in src/gepa_adk/adapters/component_selector.py

def create_component_selector(selector_type: str) -> ComponentSelectorProtocol:
    """Create a component selector strategy from a string alias.

    Args:
        selector_type: Name of the selector strategy.
            Supported values:
            - 'round_robin', 'roundrobin': Round-robin cycling.
            - 'all', 'all_components': All components simultaneously.

    Returns:
        Instance of requested component selector.

    Raises:
        ValueError: If selector_type is unknown.

    Examples:
        ```python
        # Create round-robin selector
        selector = create_component_selector("round_robin")

        # Create all-components selector
        selector = create_component_selector("all")
        ```

    Note:
        Supports flexible string aliases with normalization for common
        variations (underscores, hyphens, case-insensitive).
    """
    normalized = selector_type.lower().replace("_", "").replace("-", "")

    if normalized == "roundrobin":
        return RoundRobinComponentSelector()
    elif normalized in ("all", "allcomponents"):
        return AllComponentSelector()

    raise ValueError(f"Unknown component selector: {selector_type}")

normalize_feedback ¶

normalize_feedback(
    score: float, metadata: dict[str, Any] | None
) -> dict[str, Any]

Normalize critic feedback to consistent trial format.

Converts both simple and advanced critic outputs to a standardized format for use in trial records. This enables the reflection agent to receive consistent feedback regardless of which critic schema was used.

PARAMETER	DESCRIPTION
`score`	The numeric score from the critic (0.0-1.0). TYPE: `float`
`metadata`	Optional metadata dict from critic output. May contain: - feedback (str): Simple feedback text - dimension_scores (dict): Per-dimension scores - actionable_guidance (str): Improvement suggestions - Any additional fields from critic output TYPE: `dict[str, Any] \| None`

RETURNS	DESCRIPTION
`dict[str, Any]`	Normalized feedback dict with structure:
`dict[str, Any]`	```python
`dict[str, Any]`	{ "score": 0.75, "feedback_text": "Main feedback message", "dimension_scores": {...}, # Optional "actionable_guidance": "...", # Optional
`dict[str, Any]`	}
`dict[str, Any]`	```

Examples:

Normalize simple feedback:

normalized = normalize_feedback(0.8, {"feedback": "Good job"})
# {"score": 0.8, "feedback_text": "Good job"}

Normalize advanced feedback:

normalized = normalize_feedback(
    0.6,
    {
        "feedback": "Needs work",
        "dimension_scores": {"clarity": 0.5},
        "actionable_guidance": "Add examples",
    },
)
# {
#     "score": 0.6,
#     "feedback_text": "Needs work",
#     "dimension_scores": {"clarity": 0.5},
#     "actionable_guidance": "Add examples",
# }

Handle missing feedback:

normalized = normalize_feedback(0.5, None)
# {"score": 0.5, "feedback_text": ""}

Note

Supports both SimpleCriticOutput and CriticOutput schemas for flexible critic integration. Extracts the "feedback" field and renames it to "feedback_text" for consistent trial structure. Additional fields like dimension_scores are preserved when present.

Source code in src/gepa_adk/adapters/critic_scorer.py

def normalize_feedback(
    score: float,
    metadata: dict[str, Any] | None,
) -> dict[str, Any]:
    """Normalize critic feedback to consistent trial format.

    Converts both simple and advanced critic outputs to a standardized
    format for use in trial records. This enables the reflection agent
    to receive consistent feedback regardless of which critic schema
    was used.

    Args:
        score: The numeric score from the critic (0.0-1.0).
        metadata: Optional metadata dict from critic output. May contain:
            - feedback (str): Simple feedback text
            - dimension_scores (dict): Per-dimension scores
            - actionable_guidance (str): Improvement suggestions
            - Any additional fields from critic output

    Returns:
        Normalized feedback dict with structure:
        ```python
        {
            "score": 0.75,
            "feedback_text": "Main feedback message",
            "dimension_scores": {...},  # Optional
            "actionable_guidance": "...",  # Optional
        }
        ```

    Examples:
        Normalize simple feedback:

        ```python
        normalized = normalize_feedback(0.8, {"feedback": "Good job"})
        # {"score": 0.8, "feedback_text": "Good job"}
        ```

        Normalize advanced feedback:

        ```python
        normalized = normalize_feedback(
            0.6,
            {
                "feedback": "Needs work",
                "dimension_scores": {"clarity": 0.5},
                "actionable_guidance": "Add examples",
            },
        )
        # {
        #     "score": 0.6,
        #     "feedback_text": "Needs work",
        #     "dimension_scores": {"clarity": 0.5},
        #     "actionable_guidance": "Add examples",
        # }
        ```

        Handle missing feedback:

        ```python
        normalized = normalize_feedback(0.5, None)
        # {"score": 0.5, "feedback_text": ""}
        ```

    Note:
        Supports both SimpleCriticOutput and CriticOutput schemas for flexible
        critic integration. Extracts the "feedback" field and renames it to
        "feedback_text" for consistent trial structure. Additional fields
        like dimension_scores are preserved when present.
    """
    result: dict[str, Any] = {"score": score}

    if metadata is None:
        result["feedback_text"] = ""
        return result

    # Extract feedback text - handle both "feedback" and "feedback_text" keys
    feedback_text = metadata.get("feedback_text") or metadata.get("feedback") or ""
    if isinstance(feedback_text, str) and feedback_text.strip():
        result["feedback_text"] = feedback_text.strip()
    else:
        result["feedback_text"] = ""

    # Preserve dimension_scores if present
    dimension_scores = metadata.get("dimension_scores")
    if dimension_scores and isinstance(dimension_scores, dict):
        result["dimension_scores"] = dimension_scores

    # Preserve actionable_guidance if present
    actionable_guidance = metadata.get("actionable_guidance")
    if actionable_guidance and isinstance(actionable_guidance, str):
        guidance_str = actionable_guidance.strip()
        if guidance_str:
            result["actionable_guidance"] = guidance_str

    return result

find_llm_agents ¶

find_llm_agents(
    agent: object,
    max_depth: int = 5,
    current_depth: int = 0,
) -> list["LlmAgent"]

Find all LlmAgents in a workflow (recursive traversal with depth limiting).

Traverses a workflow agent structure recursively to discover all LlmAgent instances at any nesting level, up to the specified maximum depth.

PARAMETER	DESCRIPTION
`agent`	Agent or workflow to search. Can be LlmAgent, workflow agent, or any object. TYPE: `object`
`max_depth`	Maximum recursion depth (default: 5). When current_depth reaches max_depth, traversal stops. Must be >= 1 for meaningful results. TYPE: `int` DEFAULT: `5`
`current_depth`	Current recursion level (internal use, default: 0). TYPE: `int` DEFAULT: `0`

RETURNS	DESCRIPTION
`list['LlmAgent']`	List of LlmAgent instances found. Only includes agents with string
`list['LlmAgent']`	instructions (skips InstructionProvider callables).

Examples:

Finding LlmAgents in a SequentialAgent:

from google.adk.agents import LlmAgent, SequentialAgent
from gepa_adk.adapters.workflow import find_llm_agents

agent1 = LlmAgent(name="agent1", instruction="First")
agent2 = LlmAgent(name="agent2", instruction="Second")
workflow = SequentialAgent(name="pipeline", sub_agents=[agent1, agent2])

agents = find_llm_agents(workflow)
assert len(agents) == 2

Finding LlmAgents in nested workflows:

# Sequential -> Parallel -> LlmAgents
nested_parallel = ParallelAgent(name="parallel", sub_agents=[agent2, agent3])
workflow = SequentialAgent(
    name="pipeline", sub_agents=[agent1, nested_parallel]
)

agents = find_llm_agents(workflow, max_depth=5)
assert len(agents) == 3  # Finds all agents across levels

Note

Operates recursively with depth limiting to discover nested LlmAgents. Skips LlmAgents with InstructionProvider callables (non-string instructions). Respects max_depth to prevent infinite recursion.

Source code in src/gepa_adk/adapters/workflow.py

def find_llm_agents(
    agent: object,
    max_depth: int = 5,
    current_depth: int = 0,
) -> list["LlmAgent"]:
    """Find all LlmAgents in a workflow (recursive traversal with depth limiting).

    Traverses a workflow agent structure recursively to discover all LlmAgent
    instances at any nesting level, up to the specified maximum depth.

    Args:
        agent: Agent or workflow to search. Can be LlmAgent, workflow agent,
            or any object.
        max_depth: Maximum recursion depth (default: 5). When current_depth
            reaches max_depth, traversal stops. Must be >= 1 for meaningful
            results.
        current_depth: Current recursion level (internal use, default: 0).

    Returns:
        List of LlmAgent instances found. Only includes agents with string
        instructions (skips InstructionProvider callables).

    Examples:
        Finding LlmAgents in a SequentialAgent:

        ```python
        from google.adk.agents import LlmAgent, SequentialAgent
        from gepa_adk.adapters.workflow import find_llm_agents

        agent1 = LlmAgent(name="agent1", instruction="First")
        agent2 = LlmAgent(name="agent2", instruction="Second")
        workflow = SequentialAgent(name="pipeline", sub_agents=[agent1, agent2])

        agents = find_llm_agents(workflow)
        assert len(agents) == 2
        ```

        Finding LlmAgents in nested workflows:

        ```python
        # Sequential -> Parallel -> LlmAgents
        nested_parallel = ParallelAgent(name="parallel", sub_agents=[agent2, agent3])
        workflow = SequentialAgent(
            name="pipeline", sub_agents=[agent1, nested_parallel]
        )

        agents = find_llm_agents(workflow, max_depth=5)
        assert len(agents) == 3  # Finds all agents across levels
        ```

    Note:
        Operates recursively with depth limiting to discover nested LlmAgents.
        Skips LlmAgents with InstructionProvider callables (non-string
        instructions). Respects max_depth to prevent infinite recursion.
    """
    from google.adk.agents import LlmAgent

    # Check depth limit first (before processing)
    # Use > instead of >= to allow processing at max_depth
    # e.g., with max_depth=3, we can process agents at depth 0, 1, 2, 3
    if current_depth > max_depth:
        logger.debug(
            "Max depth exceeded, stopping traversal",
            current_depth=current_depth,
            max_depth=max_depth,
        )
        return []

    # If it's an LlmAgent with string instruction, return it
    if isinstance(agent, LlmAgent):
        # Only include string instructions (skip InstructionProvider callables)
        if isinstance(agent.instruction, str):
            logger.debug(
                "Found LlmAgent",
                agent_name=agent.name,
                depth=current_depth,
            )
            return [agent]
        logger.warning(
            "Skipping LlmAgent with non-string instruction",
            agent_name=agent.name,
            instruction_type=type(agent.instruction).__name__,
            depth=current_depth,
        )
        return []

    # If it's a workflow agent, recursively search sub_agents
    if is_workflow_agent(agent):
        logger.debug(
            "Traversing workflow agent",
            workflow_name=getattr(agent, "name", "unknown"),
            workflow_type=type(agent).__name__,
            depth=current_depth,
            max_depth=max_depth,
        )
        agents: list[LlmAgent] = []
        # Recursive traversal: iterate sub_agents and recurse into nested workflows
        # Type narrowing: is_workflow_agent() ensures agent is SequentialAgent,
        # LoopAgent, or ParallelAgent. All inherit from BaseAgent with sub_agents.
        if isinstance(agent, (SequentialAgent, LoopAgent, ParallelAgent)):
            for sub_agent in agent.sub_agents:
                # Recursively search each sub-agent
                nested_agents = find_llm_agents(
                    sub_agent, max_depth=max_depth, current_depth=current_depth + 1
                )
                agents.extend(nested_agents)
        logger.debug(
            "Completed workflow traversal",
            workflow_name=getattr(agent, "name", "unknown"),
            agents_found=len(agents),
            depth=current_depth,
        )
        return agents

    # Not an agent or workflow - return empty list
    logger.debug(
        "Skipping non-agent object",
        object_type=type(agent).__name__,
        depth=current_depth,
    )
    return []

is_workflow_agent ¶

is_workflow_agent(agent: object) -> bool

Check if an agent is a workflow type.

Detects whether an agent is a workflow agent (SequentialAgent, LoopAgent, or ParallelAgent) versus a regular LlmAgent or other agent type.

PARAMETER	DESCRIPTION
`agent`	Agent instance to check. Can be any object type. TYPE: `object`

RETURNS	DESCRIPTION
`bool`	True if agent is SequentialAgent, LoopAgent, or ParallelAgent.
`bool`	False otherwise (including LlmAgent, None, or non-agent objects).

Examples:

Detecting workflow agents:

from google.adk.agents import SequentialAgent, LlmAgent
from gepa_adk.adapters.workflow import is_workflow_agent

sequential = SequentialAgent(name="Pipeline", sub_agents=[])
assert is_workflow_agent(sequential) is True

llm = LlmAgent(name="Agent", instruction="Be helpful")
assert is_workflow_agent(llm) is False

Note

Only workflow agent types (SequentialAgent, LoopAgent, ParallelAgent) are detected. All workflow agents inherit from BaseAgent and have sub_agents, but type detection uses specific class checks for accuracy.

Source code in src/gepa_adk/adapters/workflow.py

def is_workflow_agent(agent: object) -> bool:
    """Check if an agent is a workflow type.

    Detects whether an agent is a workflow agent (SequentialAgent, LoopAgent,
    or ParallelAgent) versus a regular LlmAgent or other agent type.

    Args:
        agent: Agent instance to check. Can be any object type.

    Returns:
        True if agent is SequentialAgent, LoopAgent, or ParallelAgent.
        False otherwise (including LlmAgent, None, or non-agent objects).

    Examples:
        Detecting workflow agents:

        ```python
        from google.adk.agents import SequentialAgent, LlmAgent
        from gepa_adk.adapters.workflow import is_workflow_agent

        sequential = SequentialAgent(name="Pipeline", sub_agents=[])
        assert is_workflow_agent(sequential) is True

        llm = LlmAgent(name="Agent", instruction="Be helpful")
        assert is_workflow_agent(llm) is False
        ```

    Note:
        Only workflow agent types (SequentialAgent, LoopAgent, ParallelAgent)
        are detected. All workflow agents inherit from BaseAgent and have
        sub_agents, but type detection uses specific class checks for accuracy.
    """
    return isinstance(agent, (SequentialAgent, LoopAgent, ParallelAgent))

Adapters

adapters ¶

ADKAdapter ¶

__init__ ¶

cleanup ¶

evaluate async ¶

make_reflective_dataset async ¶

propose_new_texts async ¶

AgentExecutor ¶

__init__ ¶

execute_agent async ¶

SessionNotFoundError ¶

__init__ ¶

CurrentBestCandidateSelector ¶

select_candidate async ¶

EpsilonGreedyCandidateSelector ¶

__init__ ¶

select_candidate async ¶

ParetoCandidateSelector ¶

__init__ ¶

select_candidate async ¶

ComponentHandlerRegistry ¶

__init__ ¶

register ¶

get ¶

has ¶

names ¶

GenerateContentConfigHandler ¶

serialize ¶

apply ¶

restore ¶

InstructionHandler ¶

serialize ¶

apply ¶

restore ¶

OutputSchemaHandler ¶

__init__ ¶

set_constraints ¶

serialize ¶

apply ¶

restore ¶

AllComponentSelector ¶

select_components async ¶

RoundRobinComponentSelector ¶

__init__ ¶

select_components async ¶

CriticOutput ¶

CriticScorer ¶

__init__ ¶

async_score async ¶

score ¶

SimpleCriticOutput ¶

FullEvaluationPolicy ¶

get_eval_batch ¶

get_best_candidate ¶

get_valset_score ¶

SubsetEvaluationPolicy ¶

__init__ ¶

get_eval_batch ¶

get_best_candidate ¶

get_valset_score ¶

MultiAgentAdapter ¶

__init__ ¶

evaluate async ¶

make_reflective_dataset async ¶

propose_new_texts async ¶

TimeoutStopper ¶

__init__ ¶

__call__ ¶

TrialBuilder ¶

__init__ ¶

build_feedback ¶

build_trial ¶

VideoBlobService ¶

__init__ ¶

validate_video_file ¶

prepare_video_parts async ¶

create_candidate_selector ¶

get_handler ¶

register_handler ¶

init ¶

evaluate `async` ¶

make_reflective_dataset `async` ¶

propose_new_texts `async` ¶

init ¶

execute_agent `async` ¶

init ¶

select_candidate `async` ¶

init ¶

select_candidate `async` ¶

init ¶

select_candidate `async` ¶

init ¶

init ¶

select_components `async` ¶

init ¶

select_components `async` ¶

init ¶

async_score `async` ¶

init ¶

init ¶

evaluate `async` ¶

make_reflective_dataset `async` ¶

propose_new_texts `async` ¶

init ¶

call ¶

init ¶

init ¶

prepare_video_parts `async` ¶