Evaluation policy
evaluation_policy ¶
Protocol definition for valset evaluation strategies.
| ATTRIBUTE | DESCRIPTION |
|---|---|
EvaluationPolicyProtocol | Protocol for valset evaluation strategies.
|
Examples:
Implement a full-evaluation policy:
from gepa_adk.ports.evaluation_policy import EvaluationPolicyProtocol
from gepa_adk.domain.state import ParetoState
from typing import Sequence
class FullEvalPolicy:
def get_eval_batch(
self,
valset_ids: Sequence[int],
state: ParetoState,
target_candidate_idx: int | None = None,
) -> list[int]:
return list(valset_ids)
def get_best_candidate(self, state: ParetoState) -> int:
return 0
def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
return 0.0
policy = FullEvalPolicy()
assert isinstance(policy, EvaluationPolicyProtocol)
See Also
gepa_adk.engine: Evolution engine that uses evaluation policies.AsyncGEPAAdapter: Adapter protocol that performs the evaluations selected by policies.
EvaluationPolicyProtocol ¶
Bases: Protocol
flowchart TD
gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol[EvaluationPolicyProtocol]
click gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol href "" "gepa_adk.ports.evaluation_policy.EvaluationPolicyProtocol"
Protocol for valset evaluation strategies.
Determines which validation examples to evaluate per iteration and how to identify the best candidate based on evaluation results.
Note
Adapters implementing this protocol control evaluation cost and candidate selection strategies for scalable evolution runs.
Examples:
class MyPolicy:
def get_eval_batch(
self, valset_ids: Sequence[int], state: ParetoState
) -> list[int]:
return list(valset_ids)
def get_best_candidate(self, state: ParetoState) -> int:
return 0
def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
return 0.5
Source code in src/gepa_adk/ports/evaluation_policy.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
get_eval_batch ¶
get_eval_batch(
valset_ids: Sequence[int],
state: ParetoState,
target_candidate_idx: int | None = None,
) -> list[int]
Select validation example indices to evaluate.
| PARAMETER | DESCRIPTION |
|---|---|
valset_ids | All available validation example indices (0 to N-1). TYPE: |
state | Current evolution state with candidate scores. TYPE: |
target_candidate_idx | Optional candidate being evaluated (for adaptive policies). TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
list[int] | List of example indices to evaluate this iteration. |
list[int] | Must be a subset of valset_ids (or equal). |
Note
Outputs a list of valset indices to evaluate, which may be a subset or the full valset depending on the policy implementation.
Examples:
batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
# Returns: [0, 1, 2, 3, 4] for full evaluation
Source code in src/gepa_adk/ports/evaluation_policy.py
get_best_candidate ¶
get_best_candidate(state: ParetoState) -> int
Return index of best candidate based on evaluation results.
| PARAMETER | DESCRIPTION |
|---|---|
state | Current evolution state with candidate_scores populated. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
int | Index of best performing candidate. |
| RAISES | DESCRIPTION |
|---|---|
NoCandidateAvailableError | If state has no candidates. |
Note
Outputs the index of the best candidate based on the policy's scoring strategy (e.g., highest average score).
Examples:
Source code in src/gepa_adk/ports/evaluation_policy.py
get_valset_score ¶
get_valset_score(
candidate_idx: int, state: ParetoState
) -> float
Return aggregated valset score for a candidate.
| PARAMETER | DESCRIPTION |
|---|---|
candidate_idx | Index of candidate to score. TYPE: |
state | Current evolution state. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
float | Aggregated score (typically mean across evaluated examples). |
float | Returns float('-inf') if candidate has no evaluated scores. |
Note
Outputs an aggregated score for the candidate, typically the mean across evaluated examples, or negative infinity if no scores exist.
Examples: