Trial builder

trial_builder ¶

Trial building utilities for reflection datasets.

This module provides a shared TrialBuilder class for constructing trial records from evaluation results. Both ADKAdapter and MultiAgentAdapter use this to build consistent trial structures for reflection.

Terminology

trial: One performance record {feedback, trajectory}
feedback: Critic evaluation {score, feedback_text, feedback_*}
trajectory: The journey from input to output with optional trace

ATTRIBUTE	DESCRIPTION
`TrialBuilder`	Builder for trial records from evaluation results. TYPE: `class`

Examples:

Build a trial record:

from gepa_adk.adapters.trial_builder import TrialBuilder

builder = TrialBuilder()
trial = builder.build_trial(
    input_text="What is 2+2?",
    output="4",
    score=0.95,
    metadata={"feedback": "Correct answer"},
)
assert trial["feedback"]["score"] == 0.95
assert trial["trajectory"]["input"] == "What is 2+2?"

TrialBuilder ¶

Build trial records for reflection datasets.

Constructs consistent trial structures following the GEPA whitepaper format. Extracts feedback fields from scorer metadata and builds trajectory dicts.

ATTRIBUTE	DESCRIPTION
`_logger`	Logger for metadata passthrough debugging. TYPE: `BoundLogger`

Examples:

Basic trial building:

builder = TrialBuilder()

# With minimal data
trial = builder.build_trial(
    input_text="Hello",
    output="Hi there!",
    score=0.8,
)

# With full metadata
trial = builder.build_trial(
    input_text="Explain AI",
    output="AI is...",
    score=0.9,
    metadata={
        "feedback": "Clear explanation",
        "dimension_scores": {"clarity": 0.95},
        "actionable_guidance": "Add examples",
    },
    extra_trajectory={"component": "instruction"},
)

See Also

build_feedback: Build just the feedback dict.

Note

All optional metadata fields are validated before inclusion to prevent malformed data from propagating to reflection prompts.

Source code in src/gepa_adk/adapters/trial_builder.py

class TrialBuilder:
    """Build trial records for reflection datasets.

    Constructs consistent trial structures following the GEPA whitepaper format.
    Extracts feedback fields from scorer metadata and builds trajectory dicts.

    Attributes:
        _logger (structlog.BoundLogger): Logger for metadata passthrough debugging.

    Examples:
        Basic trial building:

        ```python
        builder = TrialBuilder()

        # With minimal data
        trial = builder.build_trial(
            input_text="Hello",
            output="Hi there!",
            score=0.8,
        )

        # With full metadata
        trial = builder.build_trial(
            input_text="Explain AI",
            output="AI is...",
            score=0.9,
            metadata={
                "feedback": "Clear explanation",
                "dimension_scores": {"clarity": 0.95},
                "actionable_guidance": "Add examples",
            },
            extra_trajectory={"component": "instruction"},
        )
        ```

    See Also:
        - [`build_feedback`][gepa_adk.adapters.trial_builder.TrialBuilder.build_feedback]:
          Build just the feedback dict.

    Note:
        All optional metadata fields are validated before inclusion to prevent
        malformed data from propagating to reflection prompts.
    """

    def __init__(self) -> None:
        """Initialize the TrialBuilder.

        Examples:
            ```python
            builder = TrialBuilder()
            ```

        Note:
            Creates a module-scoped logger for metadata passthrough debugging.
        """
        self._logger = structlog.get_logger(__name__)

    def build_feedback(
        self,
        score: float,
        metadata: dict[str, Any] | None = None,
        *,
        error: str | None = None,
        log_passthrough: bool = False,
    ) -> dict[str, Any]:
        """Build the feedback dict from score and metadata.

        Extracts and validates feedback fields from scorer metadata, including
        feedback_text, feedback_guidance, and feedback_dimensions.

        Args:
            score: Evaluation score (mandatory).
            metadata: Optional scorer metadata dict containing:
                - feedback: Text feedback from critic.
                - actionable_guidance: Improvement suggestions.
                - dimension_scores: Per-dimension score breakdown.
            error: Optional error message to include in feedback.
            log_passthrough: If True, log debug info about metadata extraction.

        Returns:
            Feedback dict with keys:
                - score (mandatory): The evaluation score.
                - feedback_text (if available): Text feedback from critic.
                - feedback_guidance (if available): Improvement suggestions.
                - feedback_dimensions (if available): Dimension score breakdown.
                - error (if provided): Error message from execution.

        Examples:
            ```python
            builder = TrialBuilder()

            # Minimal feedback
            feedback = builder.build_feedback(0.75)
            assert feedback == {"score": 0.75}

            # With metadata
            feedback = builder.build_feedback(
                0.85,
                metadata={
                    "feedback": "Good work",
                    "dimension_scores": {"accuracy": 0.9},
                },
            )
            assert "feedback_text" in feedback
            ```

        Note:
            Only non-empty strings and dicts are included to keep feedback clean.
        """
        # Use normalize_feedback for consistent field mapping
        feedback = normalize_feedback(score, metadata)

        # Log metadata passthrough for debugging if requested
        if log_passthrough and metadata and isinstance(metadata, dict):
            has_feedback = bool(
                metadata.get("feedback") or metadata.get("feedback_text")
            )
            has_guidance = bool(metadata.get("actionable_guidance"))
            has_dimensions = bool(metadata.get("dimension_scores"))
            self._logger.debug(
                "trial_builder.metadata.passthrough",
                has_feedback=has_feedback,
                has_guidance=has_guidance,
                has_dimensions=has_dimensions,
            )

        # Add error if present
        if error:
            feedback["error"] = error

        return feedback

    def build_trial(
        self,
        input_text: str | None,
        output: str,
        score: float,
        metadata: dict[str, Any] | None = None,
        *,
        error: str | None = None,
        trace: dict[str, Any] | None = None,
        extra_trajectory: dict[str, Any] | None = None,
        log_passthrough: bool = False,
    ) -> dict[str, Any]:
        """Build a complete trial record for reflection.

        Constructs a trial with feedback and trajectory dicts following the
        GEPA whitepaper structure.

        Args:
            input_text: The input that was given to the system. Can be None for
                pipelines where input context is implicit.
            output: What the system produced.
            score: Evaluation score for this output.
            metadata: Optional scorer metadata dict (from CriticScorer).
            error: Optional error message from execution.
            trace: Optional execution trace dict (tool calls, state, tokens).
            extra_trajectory: Optional extra fields to include in trajectory
                (e.g., component name, component value, tokens).
            log_passthrough: If True, log debug info about metadata extraction.

        Returns:
            Trial dict with keys:
                - feedback: Evaluation feedback (score, feedback_text, etc.)
                - trajectory: Execution journey (input, output, trace, etc.)

        Examples:
            ```python
            builder = TrialBuilder()

            # Simple trial
            trial = builder.build_trial(
                input_text="What is Python?",
                output="A programming language",
                score=0.9,
            )
            assert trial["feedback"]["score"] == 0.9
            assert trial["trajectory"]["output"] == "A programming language"

            # With trace and extra trajectory data
            trial = builder.build_trial(
                input_text="Count to 3",
                output="1, 2, 3",
                score=1.0,
                trace={"tool_calls": [{"name": "count"}]},
                extra_trajectory={"component": "counter"},
            )
            assert "trace" in trial["trajectory"]
            assert trial["trajectory"]["component"] == "counter"
            ```

        Note:
            Optional input is only included in trajectory when not None,
            supporting pipelines where input context is implicit.
        """
        # Build feedback dict
        feedback = self.build_feedback(
            score,
            metadata,
            error=error,
            log_passthrough=log_passthrough,
        )

        # Build trajectory dict
        trajectory: dict[str, Any] = {"output": output}

        # Add input if available
        if input_text is not None:
            trajectory["input"] = input_text

        # Add trace if available
        if trace:
            trajectory["trace"] = trace

        # Add extra trajectory fields if provided
        if extra_trajectory:
            trajectory.update(extra_trajectory)

        # Build trial record
        return {
            "feedback": feedback,
            "trajectory": trajectory,
        }

init ¶

__init__() -> None

Initialize the TrialBuilder.

Examples:

builder = TrialBuilder()

Note

Creates a module-scoped logger for metadata passthrough debugging.

Source code in src/gepa_adk/adapters/trial_builder.py

def __init__(self) -> None:
    """Initialize the TrialBuilder.

    Examples:
        ```python
        builder = TrialBuilder()
        ```

    Note:
        Creates a module-scoped logger for metadata passthrough debugging.
    """
    self._logger = structlog.get_logger(__name__)

build_feedback ¶

build_feedback(
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]

Build the feedback dict from score and metadata.

Extracts and validates feedback fields from scorer metadata, including feedback_text, feedback_guidance, and feedback_dimensions.

PARAMETER	DESCRIPTION
`score`	Evaluation score (mandatory). TYPE: `float`
`metadata`	Optional scorer metadata dict containing: - feedback: Text feedback from critic. - actionable_guidance: Improvement suggestions. - dimension_scores: Per-dimension score breakdown. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`error`	Optional error message to include in feedback. TYPE: `str \| None` DEFAULT: `None`
`log_passthrough`	If True, log debug info about metadata extraction. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`dict[str, Any]`	Feedback dict with keys: - score (mandatory): The evaluation score. - feedback_text (if available): Text feedback from critic. - feedback_guidance (if available): Improvement suggestions. - feedback_dimensions (if available): Dimension score breakdown. - error (if provided): Error message from execution.

Examples:

builder = TrialBuilder()

# Minimal feedback
feedback = builder.build_feedback(0.75)
assert feedback == {"score": 0.75}

# With metadata
feedback = builder.build_feedback(
    0.85,
    metadata={
        "feedback": "Good work",
        "dimension_scores": {"accuracy": 0.9},
    },
)
assert "feedback_text" in feedback

Note

Only non-empty strings and dicts are included to keep feedback clean.

Source code in src/gepa_adk/adapters/trial_builder.py

def build_feedback(
    self,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]:
    """Build the feedback dict from score and metadata.

    Extracts and validates feedback fields from scorer metadata, including
    feedback_text, feedback_guidance, and feedback_dimensions.

    Args:
        score: Evaluation score (mandatory).
        metadata: Optional scorer metadata dict containing:
            - feedback: Text feedback from critic.
            - actionable_guidance: Improvement suggestions.
            - dimension_scores: Per-dimension score breakdown.
        error: Optional error message to include in feedback.
        log_passthrough: If True, log debug info about metadata extraction.

    Returns:
        Feedback dict with keys:
            - score (mandatory): The evaluation score.
            - feedback_text (if available): Text feedback from critic.
            - feedback_guidance (if available): Improvement suggestions.
            - feedback_dimensions (if available): Dimension score breakdown.
            - error (if provided): Error message from execution.

    Examples:
        ```python
        builder = TrialBuilder()

        # Minimal feedback
        feedback = builder.build_feedback(0.75)
        assert feedback == {"score": 0.75}

        # With metadata
        feedback = builder.build_feedback(
            0.85,
            metadata={
                "feedback": "Good work",
                "dimension_scores": {"accuracy": 0.9},
            },
        )
        assert "feedback_text" in feedback
        ```

    Note:
        Only non-empty strings and dicts are included to keep feedback clean.
    """
    # Use normalize_feedback for consistent field mapping
    feedback = normalize_feedback(score, metadata)

    # Log metadata passthrough for debugging if requested
    if log_passthrough and metadata and isinstance(metadata, dict):
        has_feedback = bool(
            metadata.get("feedback") or metadata.get("feedback_text")
        )
        has_guidance = bool(metadata.get("actionable_guidance"))
        has_dimensions = bool(metadata.get("dimension_scores"))
        self._logger.debug(
            "trial_builder.metadata.passthrough",
            has_feedback=has_feedback,
            has_guidance=has_guidance,
            has_dimensions=has_dimensions,
        )

    # Add error if present
    if error:
        feedback["error"] = error

    return feedback

build_trial ¶

build_trial(
    input_text: str | None,
    output: str,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    trace: dict[str, Any] | None = None,
    extra_trajectory: dict[str, Any] | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]

Build a complete trial record for reflection.

Constructs a trial with feedback and trajectory dicts following the GEPA whitepaper structure.

PARAMETER	DESCRIPTION
`input_text`	The input that was given to the system. Can be None for pipelines where input context is implicit. TYPE: `str \| None`
`output`	What the system produced. TYPE: `str`
`score`	Evaluation score for this output. TYPE: `float`
`metadata`	Optional scorer metadata dict (from CriticScorer). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`error`	Optional error message from execution. TYPE: `str \| None` DEFAULT: `None`
`trace`	Optional execution trace dict (tool calls, state, tokens). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`extra_trajectory`	Optional extra fields to include in trajectory (e.g., component name, component value, tokens). TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`log_passthrough`	If True, log debug info about metadata extraction. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`dict[str, Any]`	Trial dict with keys: - feedback: Evaluation feedback (score, feedback_text, etc.) - trajectory: Execution journey (input, output, trace, etc.)

Examples:

builder = TrialBuilder()

# Simple trial
trial = builder.build_trial(
    input_text="What is Python?",
    output="A programming language",
    score=0.9,
)
assert trial["feedback"]["score"] == 0.9
assert trial["trajectory"]["output"] == "A programming language"

# With trace and extra trajectory data
trial = builder.build_trial(
    input_text="Count to 3",
    output="1, 2, 3",
    score=1.0,
    trace={"tool_calls": [{"name": "count"}]},
    extra_trajectory={"component": "counter"},
)
assert "trace" in trial["trajectory"]
assert trial["trajectory"]["component"] == "counter"

Note

Optional input is only included in trajectory when not None, supporting pipelines where input context is implicit.

Source code in src/gepa_adk/adapters/trial_builder.py

def build_trial(
    self,
    input_text: str | None,
    output: str,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    trace: dict[str, Any] | None = None,
    extra_trajectory: dict[str, Any] | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]:
    """Build a complete trial record for reflection.

    Constructs a trial with feedback and trajectory dicts following the
    GEPA whitepaper structure.

    Args:
        input_text: The input that was given to the system. Can be None for
            pipelines where input context is implicit.
        output: What the system produced.
        score: Evaluation score for this output.
        metadata: Optional scorer metadata dict (from CriticScorer).
        error: Optional error message from execution.
        trace: Optional execution trace dict (tool calls, state, tokens).
        extra_trajectory: Optional extra fields to include in trajectory
            (e.g., component name, component value, tokens).
        log_passthrough: If True, log debug info about metadata extraction.

    Returns:
        Trial dict with keys:
            - feedback: Evaluation feedback (score, feedback_text, etc.)
            - trajectory: Execution journey (input, output, trace, etc.)

    Examples:
        ```python
        builder = TrialBuilder()

        # Simple trial
        trial = builder.build_trial(
            input_text="What is Python?",
            output="A programming language",
            score=0.9,
        )
        assert trial["feedback"]["score"] == 0.9
        assert trial["trajectory"]["output"] == "A programming language"

        # With trace and extra trajectory data
        trial = builder.build_trial(
            input_text="Count to 3",
            output="1, 2, 3",
            score=1.0,
            trace={"tool_calls": [{"name": "count"}]},
            extra_trajectory={"component": "counter"},
        )
        assert "trace" in trial["trajectory"]
        assert trial["trajectory"]["component"] == "counter"
        ```

    Note:
        Optional input is only included in trajectory when not None,
        supporting pipelines where input context is implicit.
    """
    # Build feedback dict
    feedback = self.build_feedback(
        score,
        metadata,
        error=error,
        log_passthrough=log_passthrough,
    )

    # Build trajectory dict
    trajectory: dict[str, Any] = {"output": output}

    # Add input if available
    if input_text is not None:
        trajectory["input"] = input_text

    # Add trace if available
    if trace:
        trajectory["trace"] = trace

    # Add extra trajectory fields if provided
    if extra_trajectory:
        trajectory.update(extra_trajectory)

    # Build trial record
    return {
        "feedback": feedback,
        "trajectory": trajectory,
    }

normalize_feedback ¶

normalize_feedback(
    score: float, raw_feedback: str | dict[str, Any] | None
) -> dict[str, Any]

Normalize simple or advanced feedback to consistent trial format.

Converts both simple string feedback and advanced dict feedback to a standardized format for use in trial records. This enables the reflection agent to receive consistent feedback regardless of which format the scorer used.

PARAMETER	DESCRIPTION
`score`	The evaluation score (0.0-1.0). TYPE: `float`
`raw_feedback`	Either a string (simple format) or dict (advanced format). If None, feedback_text defaults to empty string. TYPE: `str \| dict[str, Any] \| None`

RETURNS	DESCRIPTION
`dict[str, Any]`	Normalized feedback dict with at minimum: - score: float - feedback_text: str
`dict[str, Any]`	Plus optional fields if provided in advanced format: - dimensions: dict[str, float] - guidance: str - Any custom fields from input dict

Examples:

Simple string feedback:

result = normalize_feedback(0.75, "Good but verbose")
# {"score": 0.75, "feedback_text": "Good but verbose"}

Advanced dict feedback:

result = normalize_feedback(
    0.45,
    {
        "feedback_text": "Too clinical",
        "dimension_scores": {"voice": 0.2},
        "actionable_guidance": "Add I statements",
    },
)
# {
#     "score": 0.45,
#     "feedback_text": "Too clinical",
#     "dimensions": {"voice": 0.2},
#     "guidance": "Add I statements"
# }

Fallback to legacy "feedback" key:

result = normalize_feedback(0.6, {"feedback": "Legacy format"})
# {"score": 0.6, "feedback_text": "Legacy format"}

Handle None:

result = normalize_feedback(1.0, None)
# {"score": 1.0, "feedback_text": ""}

Note

Score parameter always takes precedence over any "score" key in dict. Field mapping applies: "dimension_scores" → "dimensions", "actionable_guidance" → "guidance". Non-string feedback_text values convert to strings. Empty dimension dicts are excluded. Custom fields pass through unchanged.

Source code in src/gepa_adk/adapters/trial_builder.py

def normalize_feedback(
    score: float,
    raw_feedback: str | dict[str, Any] | None,
) -> dict[str, Any]:
    """Normalize simple or advanced feedback to consistent trial format.

    Converts both simple string feedback and advanced dict feedback to a
    standardized format for use in trial records. This enables the reflection
    agent to receive consistent feedback regardless of which format the scorer
    used.

    Args:
        score: The evaluation score (0.0-1.0).
        raw_feedback: Either a string (simple format) or dict (advanced format).
            If None, feedback_text defaults to empty string.

    Returns:
        Normalized feedback dict with at minimum:
            - score: float
            - feedback_text: str

        Plus optional fields if provided in advanced format:
            - dimensions: dict[str, float]
            - guidance: str
            - Any custom fields from input dict

    Examples:
        Simple string feedback:

        ```python
        result = normalize_feedback(0.75, "Good but verbose")
        # {"score": 0.75, "feedback_text": "Good but verbose"}
        ```

        Advanced dict feedback:

        ```python
        result = normalize_feedback(
            0.45,
            {
                "feedback_text": "Too clinical",
                "dimension_scores": {"voice": 0.2},
                "actionable_guidance": "Add I statements",
            },
        )
        # {
        #     "score": 0.45,
        #     "feedback_text": "Too clinical",
        #     "dimensions": {"voice": 0.2},
        #     "guidance": "Add I statements"
        # }
        ```

        Fallback to legacy "feedback" key:

        ```python
        result = normalize_feedback(0.6, {"feedback": "Legacy format"})
        # {"score": 0.6, "feedback_text": "Legacy format"}
        ```

        Handle None:

        ```python
        result = normalize_feedback(1.0, None)
        # {"score": 1.0, "feedback_text": ""}
        ```

    Note:
        Score parameter always takes precedence over any "score" key in dict.
        Field mapping applies: "dimension_scores" → "dimensions",
        "actionable_guidance" → "guidance". Non-string feedback_text values
        convert to strings. Empty dimension dicts are excluded. Custom fields
        pass through unchanged.
    """
    result: dict[str, Any] = {"score": score}

    # Handle None input
    if raw_feedback is None:
        result["feedback_text"] = ""
        return result

    # Handle string input - wrap in dict
    if isinstance(raw_feedback, str):
        result["feedback_text"] = raw_feedback
        return result

    # Handle dict input
    if not isinstance(raw_feedback, dict):
        # Fallback: convert non-dict to string
        result["feedback_text"] = str(raw_feedback)
        return result

    # Extract feedback_text (primary) or fall back to "feedback" key only if missing
    feedback_text = raw_feedback.get("feedback_text")
    if feedback_text is None:
        feedback_text = raw_feedback.get("feedback")
    if feedback_text is None:
        result["feedback_text"] = ""
    elif isinstance(feedback_text, str):
        result["feedback_text"] = feedback_text
    else:
        # Convert non-string to string
        result["feedback_text"] = str(feedback_text)

    # Map dimension_scores → dimensions (rename + preserve if non-empty dict)
    dimension_scores = raw_feedback.get("dimension_scores")
    if dimension_scores and isinstance(dimension_scores, dict):
        result["dimensions"] = dimension_scores

    # Map actionable_guidance → guidance (rename + preserve if non-empty string)
    actionable_guidance = raw_feedback.get("actionable_guidance")
    if actionable_guidance and isinstance(actionable_guidance, str):
        guidance_str = actionable_guidance.strip()
        if guidance_str:
            result["guidance"] = guidance_str

    # Pass through any custom fields (excluding already-processed keys)
    processed_keys = {
        "feedback_text",
        "feedback",
        "dimension_scores",
        "actionable_guidance",
        "score",  # Explicit parameter wins
    }
    for key, value in raw_feedback.items():
        if key not in processed_keys:
            result[key] = value

    return result

Trial builder

trial_builder ¶

TrialBuilder ¶

__init__ ¶

build_feedback ¶

build_trial ¶

normalize_feedback ¶

init ¶