Skip to content

Evaluation policy

evaluation_policy

Protocol definition for valset evaluation strategies.

ATTRIBUTE DESCRIPTION
EvaluationPolicyProtocol

Protocol for valset evaluation strategies.

Examples:

Implement a full-evaluation policy:

from gepa_adk.ports.evaluation_policy import EvaluationPolicyProtocol
from gepa_adk.domain.state import ParetoState
from typing import Sequence


class FullEvalPolicy:
    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        return list(valset_ids)

    def get_best_candidate(self, state: ParetoState) -> int:
        return 0

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        return 0.0


policy = FullEvalPolicy()
assert isinstance(policy, EvaluationPolicyProtocol)
See Also

EvaluationPolicyProtocol

Bases: Protocol


              flowchart TD
              gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol[EvaluationPolicyProtocol]

              

              click gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol href "" "gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol"
            

Protocol for valset evaluation strategies.

Determines which validation examples to evaluate per iteration and how to identify the best candidate based on evaluation results.

Note

Adapters implementing this protocol control evaluation cost and candidate selection strategies for scalable evolution runs.

Examples:

class MyPolicy:
    def get_eval_batch(
        self, valset_ids: Sequence[int], state: ParetoState
    ) -> list[int]:
        return list(valset_ids)

    def get_best_candidate(self, state: ParetoState) -> int:
        return 0

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        return 0.5
Source code in src/gepa_adk/ports/evaluation_policy.py
@runtime_checkable
class EvaluationPolicyProtocol(Protocol):
    """Protocol for valset evaluation strategies.

    Determines which validation examples to evaluate per iteration
    and how to identify the best candidate based on evaluation results.

    Note:
        Adapters implementing this protocol control evaluation cost and
        candidate selection strategies for scalable evolution runs.

    Examples:
        ```python
        class MyPolicy:
            def get_eval_batch(
                self, valset_ids: Sequence[int], state: ParetoState
            ) -> list[int]:
                return list(valset_ids)

            def get_best_candidate(self, state: ParetoState) -> int:
                return 0

            def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
                return 0.5
        ```
    """

    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        """Select validation example indices to evaluate.

        Args:
            valset_ids: All available validation example indices (0 to N-1).
            state: Current evolution state with candidate scores.
            target_candidate_idx: Optional candidate being evaluated (for adaptive policies).

        Returns:
            List of example indices to evaluate this iteration.
            Must be a subset of valset_ids (or equal).

        Note:
            Outputs a list of valset indices to evaluate, which may be a subset
            or the full valset depending on the policy implementation.

        Examples:
            ```python
            batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
            # Returns: [0, 1, 2, 3, 4] for full evaluation
            ```
        """
        ...

    def get_best_candidate(self, state: ParetoState) -> int:
        """Return index of best candidate based on evaluation results.

        Args:
            state: Current evolution state with candidate_scores populated.

        Returns:
            Index of best performing candidate.

        Raises:
            NoCandidateAvailableError: If state has no candidates.

        Note:
            Outputs the index of the best candidate based on the policy's
            scoring strategy (e.g., highest average score).

        Examples:
            ```python
            best_idx = policy.get_best_candidate(state)
            ```
        """
        ...

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        """Return aggregated valset score for a candidate.

        Args:
            candidate_idx: Index of candidate to score.
            state: Current evolution state.

        Returns:
            Aggregated score (typically mean across evaluated examples).
            Returns float('-inf') if candidate has no evaluated scores.

        Note:
            Outputs an aggregated score for the candidate, typically the mean
            across evaluated examples, or negative infinity if no scores exist.

        Examples:
            ```python
            score = policy.get_valset_score(0, state)
            ```
        """
        ...

get_eval_batch

get_eval_batch(
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]

Select validation example indices to evaluate.

PARAMETER DESCRIPTION
valset_ids

All available validation example indices (0 to N-1).

TYPE: Sequence[int]

state

Current evolution state with candidate scores.

TYPE: ParetoState

target_candidate_idx

Optional candidate being evaluated (for adaptive policies).

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
list[int]

List of example indices to evaluate this iteration.

list[int]

Must be a subset of valset_ids (or equal).

Note

Outputs a list of valset indices to evaluate, which may be a subset or the full valset depending on the policy implementation.

Examples:

batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
# Returns: [0, 1, 2, 3, 4] for full evaluation
Source code in src/gepa_adk/ports/evaluation_policy.py
def get_eval_batch(
    self,
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]:
    """Select validation example indices to evaluate.

    Args:
        valset_ids: All available validation example indices (0 to N-1).
        state: Current evolution state with candidate scores.
        target_candidate_idx: Optional candidate being evaluated (for adaptive policies).

    Returns:
        List of example indices to evaluate this iteration.
        Must be a subset of valset_ids (or equal).

    Note:
        Outputs a list of valset indices to evaluate, which may be a subset
        or the full valset depending on the policy implementation.

    Examples:
        ```python
        batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
        # Returns: [0, 1, 2, 3, 4] for full evaluation
        ```
    """
    ...

get_best_candidate

get_best_candidate(state: ParetoState) -> int

Return index of best candidate based on evaluation results.

PARAMETER DESCRIPTION
state

Current evolution state with candidate_scores populated.

TYPE: ParetoState

RETURNS DESCRIPTION
int

Index of best performing candidate.

RAISES DESCRIPTION
NoCandidateAvailableError

If state has no candidates.

Note

Outputs the index of the best candidate based on the policy's scoring strategy (e.g., highest average score).

Examples:

best_idx = policy.get_best_candidate(state)
Source code in src/gepa_adk/ports/evaluation_policy.py
def get_best_candidate(self, state: ParetoState) -> int:
    """Return index of best candidate based on evaluation results.

    Args:
        state: Current evolution state with candidate_scores populated.

    Returns:
        Index of best performing candidate.

    Raises:
        NoCandidateAvailableError: If state has no candidates.

    Note:
        Outputs the index of the best candidate based on the policy's
        scoring strategy (e.g., highest average score).

    Examples:
        ```python
        best_idx = policy.get_best_candidate(state)
        ```
    """
    ...

get_valset_score

get_valset_score(
    candidate_idx: int, state: ParetoState
) -> float

Return aggregated valset score for a candidate.

PARAMETER DESCRIPTION
candidate_idx

Index of candidate to score.

TYPE: int

state

Current evolution state.

TYPE: ParetoState

RETURNS DESCRIPTION
float

Aggregated score (typically mean across evaluated examples).

float

Returns float('-inf') if candidate has no evaluated scores.

Note

Outputs an aggregated score for the candidate, typically the mean across evaluated examples, or negative infinity if no scores exist.

Examples:

score = policy.get_valset_score(0, state)
Source code in src/gepa_adk/ports/evaluation_policy.py
def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
    """Return aggregated valset score for a candidate.

    Args:
        candidate_idx: Index of candidate to score.
        state: Current evolution state.

    Returns:
        Aggregated score (typically mean across evaluated examples).
        Returns float('-inf') if candidate has no evaluated scores.

    Note:
        Outputs an aggregated score for the candidate, typically the mean
        across evaluated examples, or negative infinity if no scores exist.

    Examples:
        ```python
        score = policy.get_valset_score(0, state)
        ```
    """
    ...