Evaluation policy

evaluation_policy ¶

Protocol definition for valset evaluation strategies.

ATTRIBUTE	DESCRIPTION
`EvaluationPolicyProtocol`	Protocol for valset evaluation strategies.

Examples:

Implement a full-evaluation policy:

from gepa_adk.ports.evaluation_policy import EvaluationPolicyProtocol
from gepa_adk.domain.state import ParetoState
from typing import Sequence


class FullEvalPolicy:
    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        return list(valset_ids)

    def get_best_candidate(self, state: ParetoState) -> int:
        return 0

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        return 0.0


policy = FullEvalPolicy()
assert isinstance(policy, EvaluationPolicyProtocol)

EvaluationPolicyProtocol ¶

Bases: Protocol


              flowchart TD
              gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol[EvaluationPolicyProtocol]

              

              click gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol href "" "gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol"

Protocol for valset evaluation strategies.

Determines which validation examples to evaluate per iteration and how to identify the best candidate based on evaluation results.

Note

Adapters implementing this protocol control evaluation cost and candidate selection strategies for scalable evolution runs.

Examples:

class MyPolicy:
    def get_eval_batch(
        self, valset_ids: Sequence[int], state: ParetoState
    ) -> list[int]:
        return list(valset_ids)

    def get_best_candidate(self, state: ParetoState) -> int:
        return 0

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        return 0.5

Source code in src/gepa_adk/ports/evaluation_policy.py

@runtime_checkable
class EvaluationPolicyProtocol(Protocol):
    """Protocol for valset evaluation strategies.

    Determines which validation examples to evaluate per iteration
    and how to identify the best candidate based on evaluation results.

    Note:
        Adapters implementing this protocol control evaluation cost and
        candidate selection strategies for scalable evolution runs.

    Examples:
        ```python
        class MyPolicy:
            def get_eval_batch(
                self, valset_ids: Sequence[int], state: ParetoState
            ) -> list[int]:
                return list(valset_ids)

            def get_best_candidate(self, state: ParetoState) -> int:
                return 0

            def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
                return 0.5
        ```
    """

    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        """Select validation example indices to evaluate.

        Args:
            valset_ids: All available validation example indices (0 to N-1).
            state: Current evolution state with candidate scores.
            target_candidate_idx: Optional candidate being evaluated (for adaptive policies).

        Returns:
            List of example indices to evaluate this iteration.
            Must be a subset of valset_ids (or equal).

        Note:
            Outputs a list of valset indices to evaluate, which may be a subset
            or the full valset depending on the policy implementation.

        Examples:
            ```python
            batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
            # Returns: [0, 1, 2, 3, 4] for full evaluation
            ```
        """
        ...

    def get_best_candidate(self, state: ParetoState) -> int:
        """Return index of best candidate based on evaluation results.

        Args:
            state: Current evolution state with candidate_scores populated.

        Returns:
            Index of best performing candidate.

        Raises:
            NoCandidateAvailableError: If state has no candidates.

        Note:
            Outputs the index of the best candidate based on the policy's
            scoring strategy (e.g., highest average score).

        Examples:
            ```python
            best_idx = policy.get_best_candidate(state)
            ```
        """
        ...

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        """Return aggregated valset score for a candidate.

        Args:
            candidate_idx: Index of candidate to score.
            state: Current evolution state.

        Returns:
            Aggregated score (typically mean across evaluated examples).
            Returns float('-inf') if candidate has no evaluated scores.

        Note:
            Outputs an aggregated score for the candidate, typically the mean
            across evaluated examples, or negative infinity if no scores exist.

        Examples:
            ```python
            score = policy.get_valset_score(0, state)
            ```
        """
        ...

get_eval_batch ¶

get_eval_batch(
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]

Select validation example indices to evaluate.

PARAMETER	DESCRIPTION
`valset_ids`	All available validation example indices (0 to N-1). TYPE: `Sequence[int]`
`state`	Current evolution state with candidate scores. TYPE: `ParetoState`
`target_candidate_idx`	Optional candidate being evaluated (for adaptive policies). TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[int]`	List of example indices to evaluate this iteration.
`list[int]`	Must be a subset of valset_ids (or equal).

Note

Outputs a list of valset indices to evaluate, which may be a subset or the full valset depending on the policy implementation.

Examples:

batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
# Returns: [0, 1, 2, 3, 4] for full evaluation

Source code in src/gepa_adk/ports/evaluation_policy.py

def get_eval_batch(
    self,
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]:
    """Select validation example indices to evaluate.

    Args:
        valset_ids: All available validation example indices (0 to N-1).
        state: Current evolution state with candidate scores.
        target_candidate_idx: Optional candidate being evaluated (for adaptive policies).

    Returns:
        List of example indices to evaluate this iteration.
        Must be a subset of valset_ids (or equal).

    Note:
        Outputs a list of valset indices to evaluate, which may be a subset
        or the full valset depending on the policy implementation.

    Examples:
        ```python
        batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
        # Returns: [0, 1, 2, 3, 4] for full evaluation
        ```
    """
    ...

get_best_candidate ¶

get_best_candidate(state: ParetoState) -> int

Return index of best candidate based on evaluation results.

PARAMETER	DESCRIPTION
`state`	Current evolution state with candidate_scores populated. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`int`	Index of best performing candidate.

RAISES	DESCRIPTION
`NoCandidateAvailableError`	If state has no candidates.

Note

Outputs the index of the best candidate based on the policy's scoring strategy (e.g., highest average score).

Examples:

best_idx = policy.get_best_candidate(state)

Source code in src/gepa_adk/ports/evaluation_policy.py

def get_best_candidate(self, state: ParetoState) -> int:
    """Return index of best candidate based on evaluation results.

    Args:
        state: Current evolution state with candidate_scores populated.

    Returns:
        Index of best performing candidate.

    Raises:
        NoCandidateAvailableError: If state has no candidates.

    Note:
        Outputs the index of the best candidate based on the policy's
        scoring strategy (e.g., highest average score).

    Examples:
        ```python
        best_idx = policy.get_best_candidate(state)
        ```
    """
    ...

get_valset_score ¶

get_valset_score(
    candidate_idx: int, state: ParetoState
) -> float

Return aggregated valset score for a candidate.

PARAMETER	DESCRIPTION
`candidate_idx`	Index of candidate to score. TYPE: `int`
`state`	Current evolution state. TYPE: `ParetoState`

RETURNS	DESCRIPTION
`float`	Aggregated score (typically mean across evaluated examples).
`float`	Returns float('-inf') if candidate has no evaluated scores.

Note

Outputs an aggregated score for the candidate, typically the mean across evaluated examples, or negative infinity if no scores exist.

Examples:

score = policy.get_valset_score(0, state)

Source code in src/gepa_adk/ports/evaluation_policy.py

def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
    """Return aggregated valset score for a candidate.

    Args:
        candidate_idx: Index of candidate to score.
        state: Current evolution state.

    Returns:
        Aggregated score (typically mean across evaluated examples).
        Returns float('-inf') if candidate has no evaluated scores.

    Note:
        Outputs an aggregated score for the candidate, typically the mean
        across evaluated examples, or negative infinity if no scores exist.

    Examples:
        ```python
        score = policy.get_valset_score(0, state)
        ```
    """
    ...