Skip to content

Adapters

adapters

Adapters layer - External implementations of ports.

Adapters connect the domain logic to external systems (Google ADK, LiteLLM, etc.). Each adapter implements one or more protocol interfaces from the ports layer.

ATTRIBUTE DESCRIPTION
ADKAdapter

AsyncGEPAAdapter implementation for Google ADK agents.

TYPE: class

TrialBuilder

Shared utility for building trial records in reflection datasets.

TYPE: class

Examples:

Basic usage with Google ADK agent:

from google.adk.agents import LlmAgent
from gepa_adk.adapters import ADKAdapter

agent = LlmAgent(name="helper", model="gemini-2.5-flash")
adapter = ADKAdapter(agent=agent, scorer=my_scorer)
result = await adapter.evaluate(batch, candidate)
See Also
Note

This layer ONLY contains adapters - they import from ports/ and domain/ but never the reverse. This maintains hexagonal architecture boundaries.

ADKAdapter

ADK implementation of AsyncGEPAAdapter protocol.

Bridges GEPA evaluation patterns to Google ADK's agent/runner architecture, enabling evolutionary optimization of ADK agents through instruction mutation and reflective learning.

ATTRIBUTE DESCRIPTION
agent

The ADK LlmAgent to evaluate with different candidate instructions.

TYPE: LlmAgent

scorer

Scoring implementation for evaluating agent outputs.

TYPE: Scorer

max_concurrent_evals

Maximum number of concurrent evaluations to run in parallel.

TYPE: int

trajectory_config

Configuration for trajectory extraction behavior (redaction, truncation, feature selection).

TYPE: TrajectoryConfig

_session_service

Session service for managing agent state isolation.

TYPE: BaseSessionService

_app_name

Application name used for session management.

TYPE: str

_proposer

Mutation proposer for generating improved instructions via LLM reflection.

TYPE: AsyncReflectiveMutationProposer

_logger

Bound logger with adapter context for structured logging.

TYPE: BoundLogger

Examples:

Basic adapter setup:

from google.adk.agents import LlmAgent
from gepa_adk.adapters import ADKAdapter
from gepa_adk.adapters.agent_executor import AgentExecutor

agent = LlmAgent(
    name="helper",
    model="gemini-2.5-flash",
    instruction="Be helpful and concise",
)
scorer = MyScorer()  # Implements Scorer protocol
executor = AgentExecutor()
adapter = ADKAdapter(agent, scorer, executor)

# Evaluate with candidate instruction
batch = [{"input": "What is 2+2?", "expected": "4"}]
candidate = {"instruction": "Be very precise with math"}
result = await adapter.evaluate(batch, candidate)
Note

Adheres to AsyncGEPAAdapter[dict[str, Any], ADKTrajectory, str] protocol. All methods are async and follow ADK's async-first patterns.

Source code in src/gepa_adk/adapters/adk_adapter.py
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
class ADKAdapter:
    """ADK implementation of AsyncGEPAAdapter protocol.

    Bridges GEPA evaluation patterns to Google ADK's agent/runner architecture,
    enabling evolutionary optimization of ADK agents through instruction mutation
    and reflective learning.

    Attributes:
        agent (LlmAgent): The ADK LlmAgent to evaluate with different candidate
            instructions.
        scorer (Scorer): Scoring implementation for evaluating agent outputs.
        max_concurrent_evals (int): Maximum number of concurrent evaluations
            to run in parallel.
        trajectory_config (TrajectoryConfig): Configuration for trajectory
            extraction behavior (redaction, truncation, feature selection).
        _session_service (BaseSessionService): Session service for managing
            agent state isolation.
        _app_name (str): Application name used for session management.
        _proposer (AsyncReflectiveMutationProposer): Mutation proposer for
            generating improved instructions via LLM reflection.
        _logger (structlog.BoundLogger): Bound logger with adapter context for
            structured logging.

    Examples:
        Basic adapter setup:

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters import ADKAdapter
        from gepa_adk.adapters.agent_executor import AgentExecutor

        agent = LlmAgent(
            name="helper",
            model="gemini-2.5-flash",
            instruction="Be helpful and concise",
        )
        scorer = MyScorer()  # Implements Scorer protocol
        executor = AgentExecutor()
        adapter = ADKAdapter(agent, scorer, executor)

        # Evaluate with candidate instruction
        batch = [{"input": "What is 2+2?", "expected": "4"}]
        candidate = {"instruction": "Be very precise with math"}
        result = await adapter.evaluate(batch, candidate)
        ```

    Note:
        Adheres to AsyncGEPAAdapter[dict[str, Any], ADKTrajectory, str] protocol.
        All methods are async and follow ADK's async-first patterns.
    """

    def __init__(
        self,
        agent: LlmAgent,
        scorer: Scorer,
        executor: AgentExecutorProtocol,
        max_concurrent_evals: int = 5,
        session_service: BaseSessionService | None = None,
        app_name: str = "gepa_adk_eval",
        trajectory_config: TrajectoryConfig | None = None,
        proposer: AsyncReflectiveMutationProposer | None = None,
        reflection_agent: LlmAgent | None = None,
        reflection_output_field: str | None = None,
        schema_constraints: SchemaConstraints | None = None,
        video_service: VideoBlobServiceProtocol | None = None,
    ) -> None:
        """Initialize the ADK adapter with agent and scorer.

        Args:
            agent: The ADK LlmAgent to evaluate with different instructions.
            scorer: Scorer implementation for evaluating agent outputs.
            executor: AgentExecutorProtocol implementation for unified agent execution.
                The executor handles session management and execution, enabling feature
                parity across all agent types.
            max_concurrent_evals: Maximum number of concurrent evaluations to run
                in parallel. Must be at least 1. Defaults to 5.
            session_service: Optional session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for session identification.
            trajectory_config: Configuration for trajectory extraction behavior.
                If None, uses TrajectoryConfig defaults (secure, all features enabled).
            proposer: Optional mutation proposer for generating improved instructions
                via LLM reflection. If provided, takes precedence over reflection_agent.
            reflection_agent: ADK LlmAgent to use for reflection operations.
                Either this or proposer must be provided. When provided, creates an
                ADK-based reflection function and passes it to a new proposer.
            reflection_output_field: Field name to extract from structured output when
                reflection_agent has an output_schema. When the reflection agent returns
                structured output (dict), this specifies which field contains the proposed
                text. For schema evolution, use "class_definition" with a SchemaProposal
                output_schema. Only used when reflection_agent is provided.
            schema_constraints: Optional SchemaConstraints for output_schema evolution.
                When provided, proposed schema mutations are validated against these
                constraints. Mutations that violate constraints (e.g., remove required
                fields) are rejected and the original schema is preserved.
            video_service: Optional VideoBlobServiceProtocol for multimodal input support.
                When provided, enables processing of trainset examples with 'videos' field.
                If None, defaults to a new VideoBlobService instance.

        Raises:
            TypeError: If agent is not an LlmAgent instance.
            TypeError: If scorer does not satisfy Scorer protocol.
            TypeError: If reflection_agent is provided but not an LlmAgent instance.
            ValueError: If app_name is empty string or max_concurrent_evals < 1.
            ValueError: If neither proposer nor reflection_agent is provided.

        Examples:
            Basic setup with reflection agent:

            ```python
            from gepa_adk.adapters.agent_executor import AgentExecutor

            reflection_agent = LlmAgent(name="reflector", model="gemini-2.5-flash")
            executor = AgentExecutor()
            adapter = ADKAdapter(
                agent, scorer, executor, reflection_agent=reflection_agent
            )
            ```

            With custom trajectory configuration:

            ```python
            config = TrajectoryConfig(
                redact_sensitive=True,
                max_string_length=5000,
            )
            executor = AgentExecutor()
            adapter = ADKAdapter(
                agent,
                scorer,
                executor,
                reflection_agent=reflection_agent,
                trajectory_config=config,
            )
            ```

            With shared session service:

            ```python
            from google.adk.sessions import InMemorySessionService
            from gepa_adk.adapters.agent_executor import AgentExecutor

            session_service = InMemorySessionService()
            executor = AgentExecutor(session_service=session_service)
            adapter = ADKAdapter(
                agent,
                scorer,
                executor,
                reflection_agent=reflection_agent,
                session_service=session_service,
            )
            ```

        Note:
            Caches the agent's original instruction and restores it after
            each evaluation to ensure no side effects between evaluations.
        """
        # Type validation
        if not isinstance(agent, LlmAgent):
            raise TypeError(f"agent must be LlmAgent, got {type(agent)}")

        # Scorer protocol check (runtime_checkable)
        if not hasattr(scorer, "score") or not hasattr(scorer, "async_score"):
            raise TypeError(
                f"scorer must implement Scorer protocol, got {type(scorer)}"
            )

        if not app_name or not app_name.strip():
            raise ValueError("app_name cannot be empty")

        if max_concurrent_evals < 1:
            raise ValueError(
                f"max_concurrent_evals must be at least 1, got {max_concurrent_evals}"
            )

        # Validate reflection_agent if provided
        if reflection_agent is not None and not isinstance(reflection_agent, LlmAgent):
            raise TypeError(
                f"reflection_agent must be LlmAgent, got {type(reflection_agent)}"
            )

        self.agent = agent
        self.scorer = scorer
        self.max_concurrent_evals = max_concurrent_evals
        self.trajectory_config = trajectory_config or TrajectoryConfig()
        self._session_service = session_service or InMemorySessionService()
        self._app_name = app_name.strip()

        # Store executor for unified execution (T048)
        self._executor = executor

        # Store video service for multimodal input support
        self._video_service = video_service or VideoBlobService()

        # Store and apply schema constraints to output_schema handler
        self._schema_constraints = schema_constraints
        if schema_constraints is not None:
            handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
            if isinstance(handler, OutputSchemaHandler):
                handler.set_constraints(schema_constraints)

        # Bind logger with adapter context
        self._logger = logger.bind(
            adapter="ADKAdapter",
            agent_name=self.agent.name,
            app_name=self._app_name,
        )

        # Initialize trial builder for reflective dataset construction
        self._trial_builder = TrialBuilder()

        # Create proposer with clear precedence: proposer overrides reflection_agent.
        if proposer is not None:
            if reflection_agent is not None:
                self._logger.warning(
                    "adapter.proposer.precedence",
                    message="proposer parameter takes precedence over reflection_agent",
                )
            self._proposer = proposer
        elif reflection_agent is not None:
            # Create ADK reflection function and pass to proposer
            adk_reflection_fn = create_adk_reflection_fn(
                reflection_agent,
                executor=self._executor,
                session_service=self._session_service,
                output_field=reflection_output_field,
            )
            self._proposer = AsyncReflectiveMutationProposer(
                adk_reflection_fn=adk_reflection_fn
            )
        else:
            raise ValueError(
                "Either proposer or reflection_agent must be provided. "
                "Use reflection_agent with an ADK LlmAgent for reflection operations."
            )

        self._logger.info("adapter.initialized")

    def cleanup(self) -> None:
        """Clean up adapter resources and clear handler constraints.

        Clears any schema constraints set on the OutputSchemaHandler to prevent
        constraint leakage between evolution runs. Should be called when the
        adapter is no longer needed.

        Note:
            OutputSchemaHandler is a singleton, so constraints set during one
            evolution run could affect subsequent runs if not cleared.
        """
        if self._schema_constraints is not None:
            handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
            if isinstance(handler, OutputSchemaHandler):
                handler.set_constraints(None)
            self._logger.debug("adapter.cleanup.constraints_cleared")

    async def evaluate(
        self,
        batch: list[dict[str, Any]],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[ADKTrajectory, str]:
        """Evaluate agent with candidate instruction over a batch of inputs.

        Args:
            batch: List of input examples, each with "input" key and optional
                "expected" key for scoring.
            candidate: Component name to text mapping. If "instruction" key
                is present, it overrides the agent's instruction.
            capture_traces: Whether to capture execution traces (tool calls,
                state deltas, token usage).

        Returns:
            EvaluationBatch containing outputs, scores, and optional trajectories.

        Examples:
            Basic evaluation without traces:

            ```python
            batch = [
                {"input": "What is 2+2?", "expected": "4"},
                {"input": "Capital of France?", "expected": "Paris"},
            ]
            candidate = {"instruction": "Be concise"}
            result = await adapter.evaluate(batch, candidate)
            assert len(result.outputs) == 2
            assert len(result.scores) == 2
            ```

            With trace capture:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            assert result.trajectories is not None
            assert len(result.trajectories) == len(batch)
            ```

        Note:
            Original instruction is restored after evaluation completes,
            even if an exception occurs during evaluation.
            Evaluations run in parallel with concurrency controlled by
            max_concurrent_evals parameter. Results maintain input order
            despite parallel execution.
        """
        self._logger.info(
            "adapter.evaluate.start",
            batch_size=len(batch),
            max_concurrent=self.max_concurrent_evals,
            capture_traces=capture_traces,
        )

        # Handle empty batch case
        if not batch:
            self._logger.info("adapter.evaluate.complete", batch_size=0)
            return EvaluationBatch(outputs=[], scores=[], trajectories=None)

        # Apply candidate components (instruction and/or output_schema) and save originals
        originals = self._apply_candidate(candidate)

        try:
            # Create semaphore to limit concurrent evaluations
            semaphore = asyncio.Semaphore(self.max_concurrent_evals)

            # Create tasks for all examples
            tasks = [
                self._eval_single_with_semaphore(
                    example=example,
                    example_index=i,
                    candidate=candidate,
                    capture_traces=capture_traces,
                    semaphore=semaphore,
                )
                for i, example in enumerate(batch)
            ]

            # Execute all tasks in parallel with exception handling
            results = await asyncio.gather(*tasks, return_exceptions=True)

            # Process results and maintain order
            outputs: list[str] = []
            scores: list[float] = []
            trajectories: list[ADKTrajectory] | None = [] if capture_traces else None
            metadata_list: list[dict[str, Any]] = []
            inputs: list[str] = []  # Collect inputs for reflection

            successful = 0
            failed = 0

            for i, result in enumerate(results):
                # Collect input text for this example
                inputs.append(batch[i].get("input", ""))
                if isinstance(result, Exception):
                    # Handle exception case
                    self._logger.warning(
                        "adapter.evaluate.example.error",
                        example_index=i,
                        error=str(result),
                    )
                    outputs.append("")
                    scores.append(0.0)
                    metadata_list.append({})
                    failed += 1

                    if capture_traces:
                        error_trajectory = self._build_trajectory(
                            events=[],
                            final_output="",
                            error=str(result),
                        )
                        trajectories.append(error_trajectory)  # type: ignore
                else:
                    # Unpack success: (output_text, score, trajectory_or_none, metadata_or_none)
                    # After isinstance check, result is guaranteed to be the tuple type
                    output_text, score, trajectory, metadata = result  # type: ignore[misc]
                    outputs.append(output_text)
                    scores.append(score)
                    metadata_list.append(metadata if metadata is not None else {})
                    successful += 1

                    if capture_traces and trajectory is not None:
                        trajectories.append(trajectory)  # type: ignore

            avg_score = sum(scores) / len(scores) if scores else 0.0

            self._logger.info(
                "adapter.evaluate.complete",
                batch_size=len(batch),
                successful=successful,
                failed=failed,
                avg_score=avg_score,
            )

            # Only include metadata if at least one example has non-empty metadata
            # Check if any metadata dict has content (not just empty dicts)
            has_metadata = any(
                meta and isinstance(meta, dict) and meta for meta in metadata_list
            )
            final_metadata = metadata_list if has_metadata else None

            return EvaluationBatch(
                outputs=outputs,
                scores=scores,
                trajectories=trajectories,
                metadata=final_metadata,
                inputs=inputs,
            )

        finally:
            # Always restore original components via registry dispatch
            self._restore_agent(originals)

    def _apply_candidate(self, candidate: dict[str, str]) -> dict[str, Any]:
        """Apply candidate components to agent via registry dispatch.

        Args:
            candidate: Component name to evolved text mapping.
                Example: {"instruction": "Be helpful", "output_schema": "class ..."}

        Returns:
            Dictionary mapping component names to their original values.
            Original values are typed per handler (str for instruction,
            type[BaseModel] or None for output_schema).

        Raises:
            KeyError: If candidate contains unregistered component name.

        Note:
            Original values are captured before overwriting via ComponentHandler
            registry dispatch instead of hardcoded if/elif logic. Each handler's
            apply() method sets the new value and returns the original for
            later restoration.

        Examples:
            >>> originals = adapter._apply_candidate(
            ...     {
            ...         "instruction": "New prompt",
            ...         "output_schema": "class X(BaseModel): ...",
            ...     }
            ... )
            >>> originals
            {"instruction": "Original prompt", "output_schema": <class 'OriginalSchema'>}
        """
        originals: dict[str, Any] = {}

        for component_name, value in candidate.items():
            handler = get_handler(component_name)
            originals[component_name] = handler.apply(self.agent, value)
            self._logger.debug(
                "adapter.component.applied",
                component=component_name,
                value_preview=value[:50] if value else "",
            )

        return originals

    def _restore_agent(self, originals: dict[str, Any]) -> None:
        """Restore agent to original state via registry dispatch.

        Args:
            originals: Component name to original value mapping,
                as returned by _apply_candidate().

        Raises:
            KeyError: If originals contains unregistered component name.

        Note:
            Operates via ComponentHandler registry for dispatch. Each handler's
            restore() method reinstates the original value. Should always
            be called in finally block to ensure restoration even if
            evaluation fails.

        Examples:
            >>> adapter._restore_agent(
            ...     {"instruction": "Original prompt", "output_schema": OriginalSchema}
            ... )
            # agent.instruction == "Original prompt"
            # agent.output_schema == OriginalSchema
        """
        for component_name, original in originals.items():
            handler = get_handler(component_name)
            handler.restore(self.agent, original)

        self._logger.debug(
            "adapter.agent.restored",
            components=list(originals.keys()),
        )

    def _extract_tool_calls(self, events: list[Any]) -> list[ToolCallRecord]:
        """Extract tool call records from ADK Event stream.

        Args:
            events: List of ADK Event objects from runner.

        Returns:
            List of ToolCallRecord instances with tool name, arguments,
            and result (if available).

        Note:
            Scans function_call and function_response parts from
            Event.actions.function_calls if present. Tool calls without
            responses are still recorded. Handles both real ADK Events
            and test mocks gracefully.
        """
        tool_calls: list[ToolCallRecord] = []

        for event in events:
            # Check if event has function_calls in actions
            if hasattr(event, "actions") and hasattr(event.actions, "function_calls"):
                function_calls = event.actions.function_calls
                # function_calls could be None, a list, or a single object
                if function_calls:
                    # Ensure it's iterable
                    if not hasattr(function_calls, "__iter__"):
                        function_calls = [function_calls]

                    try:
                        for fc in function_calls:
                            # Extract name - be defensive for mocks and real objects
                            name = "unknown"

                            # Try direct access first
                            if hasattr(fc, "name"):
                                try:
                                    name_val = fc.name
                                    # Real string value
                                    if isinstance(name_val, str) and name_val != "name":
                                        name = name_val
                                    # Check if it's a Mock (has _mock_name attribute)
                                    elif hasattr(name_val, "_mock_name"):
                                        # Extract actual mock name, not the attribute name
                                        mock_name = str(name_val._mock_name)
                                        # Mock names often have format "mock.attribute.name"
                                        if "." in mock_name:
                                            name = mock_name.split(".")[-1]
                                        else:
                                            name = mock_name
                                except Exception as exc:
                                    logger.debug(
                                        "Failed to extract tool call name; using fallback.",
                                        error=str(exc),
                                        function_call_repr=repr(fc),
                                    )

                            # Extract arguments
                            args = getattr(fc, "args", {})
                            if not isinstance(args, dict):
                                args = {}

                            tool_calls.append(
                                ToolCallRecord(
                                    name=name,
                                    arguments=args,
                                    result=None,  # Will be populated if response found
                                    timestamp=0.0,  # Not tracked in current impl
                                )
                            )
                    except (TypeError, AttributeError):
                        # function_calls not iterable or other issues, skip
                        pass

        return tool_calls

    def _extract_state_deltas(self, events: list[Any]) -> list[dict[str, Any]]:
        """Extract state change records from ADK Event stream.

        Args:
            events: List of ADK Event objects from runner.

        Returns:
            List of dictionaries containing state delta information.
            Each dict has 'key' and 'value' fields from Event.state_delta.

        Note:
            Skips events with None state_delta attributes. State deltas
            capture changes to session or agent state during execution.
        """
        state_deltas: list[dict[str, Any]] = []

        for event in events:
            if hasattr(event, "state_delta") and event.state_delta is not None:
                state_deltas.append(
                    {
                        "key": getattr(event.state_delta, "key", "unknown"),
                        "value": getattr(event.state_delta, "value", None),
                    }
                )

        return state_deltas

    def _extract_token_usage(self, events: list[Any]) -> TokenUsage | None:
        """Extract token usage metadata from ADK Event stream.

        Args:
            events: List of ADK Event objects from runner.

        Returns:
            TokenUsage instance if usage metadata found, None otherwise.

        Note:
            Searches for usage_metadata on final response events. Returns
            the last found usage data (most complete metrics).
        """
        usage_data = None

        for event in events:
            if hasattr(event, "usage_metadata") and event.usage_metadata is not None:
                metadata = event.usage_metadata
                usage_data = TokenUsage(
                    input_tokens=getattr(metadata, "input_tokens", 0),
                    output_tokens=getattr(metadata, "output_tokens", 0),
                    total_tokens=getattr(metadata, "total_tokens", 0),
                )

        return usage_data

    def _build_trajectory(
        self,
        events: list[Any],
        final_output: str,
        error: str | None = None,
    ) -> ADKTrajectory:
        """Assemble complete trajectory from event stream.

        Args:
            events: List of ADK Event objects collected during execution.
            final_output: The final text response from the agent.
            error: Error message if execution failed, None otherwise.

        Returns:
            ADKTrajectory with tool calls, state deltas, token usage,
            final output, and error (if any).

        Note:
            Synthesizes trajectory by delegating to extract_trajectory utility
            with configured trajectory_config. This is the complete execution
            record for one batch example.
        """
        return extract_trajectory(
            events=events,
            final_output=final_output,
            error=error,
            config=self.trajectory_config,
        )

    async def _eval_single_with_semaphore(
        self,
        example: dict[str, Any],
        example_index: int,
        candidate: dict[str, str],
        capture_traces: bool,
        semaphore: asyncio.Semaphore,
    ) -> tuple[str, float, ADKTrajectory | None, dict[str, Any] | None]:
        """Evaluate a single example with semaphore-controlled concurrency.

        Args:
            example: Input example with "input" key and optional "expected" key.
            example_index: Index of example in batch (for logging).
            candidate: Candidate component values (for instruction override).
            capture_traces: Whether to capture execution traces.
            semaphore: Semaphore to control concurrent execution.

        Returns:
            Tuple of (output_text, score, trajectory_or_none, metadata_or_none).
            On failure, returns ("", 0.0, error_trajectory, None).

        Note:
            Semaphore-controlled wrapper around single example evaluation.
            Called from evaluate() for each example in the batch to ensure
            at most max_concurrent_evals evaluations run simultaneously.
        """
        async with semaphore:
            self._logger.debug(
                "adapter.evaluate.example",
                example_index=example_index,
                example_input=example.get("input", "")[:50],
            )

            try:
                # Run the agent for this example
                output_text: str
                if capture_traces:
                    result = await self._run_single_example(
                        example, capture_events=True
                    )
                    # When capture_events=True, result is tuple[str, list[Any]]
                    output_text, events = result
                    # Build trajectory from collected events
                    trajectory = self._build_trajectory(
                        events=list(events),
                        final_output=output_text,
                        error=None,
                    )
                else:
                    # When capture_events=False, result is str
                    run_result = await self._run_single_example(example)
                    output_text = str(run_result)
                    trajectory = None

                # Score the output
                input_text = example.get("input", "")
                expected = example.get("expected")
                score_result = await self.scorer.async_score(
                    input_text, output_text, expected
                )
                # Handle both float and tuple[float, dict] return types
                if isinstance(score_result, tuple):
                    score = score_result[0]
                    metadata = score_result[1]
                else:
                    score = float(score_result)
                    metadata = None

                return (output_text, score, trajectory, metadata)

            except Exception as e:
                # Wrap in domain exception per ADR-009 (preserve batch resilience)
                wrapped = EvaluationError(
                    f"Example {example_index} evaluation failed",
                    cause=e,
                    example_index=example_index,
                )

                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=example_index,
                    error=str(wrapped),
                )

                # Create error trajectory if capturing traces
                error_trajectory = None
                if capture_traces:
                    error_trajectory = self._build_trajectory(
                        events=[],
                        final_output="",
                        error=str(wrapped),
                    )

                return ("", 0.0, error_trajectory, None)

    async def _prepare_multimodal_content(
        self, example: dict[str, Any]
    ) -> Content | None:
        """Prepare multimodal Content from example with videos field.

        Args:
            example: Input example dict with optional 'input' text and 'videos' paths.

        Returns:
            Content object with text and video parts, or None if no videos present.

        Note:
            Sequences video loading via VideoBlobService then Content assembly.
            Text part is included first, followed by video parts in order.
        """
        videos = example.get("videos")
        if not videos:
            return None

        parts: list[Part] = []

        # Add text part first if present
        input_text = example.get("input")
        if input_text:
            parts.append(Part(text=input_text))

        # Load video parts
        video_parts = await self._video_service.prepare_video_parts(videos)
        parts.extend(video_parts)

        self._logger.debug(
            "adapter.multimodal_content.prepared",
            text_present=bool(input_text),
            video_count=len(videos),
            total_parts=len(parts),
        )

        return Content(parts=parts, role="user")

    async def _run_single_example(
        self, example: dict[str, Any], capture_events: bool = False
    ) -> tuple[str, list[Any]] | str:
        """Execute agent on a single input example via AgentExecutor.

        Args:
            example: Input example with "input" key and/or "videos" key.
            capture_events: If True, return (output, events) tuple.
                           If False, return just output string.

        Returns:
            If capture_events=True: (final_output, event_list) tuple
            If capture_events=False: final_output string only

        Raises:
            RuntimeError: If agent execution fails.

        Note:
            Delegates to AgentExecutor for unified execution path.
            When capture_events=True, collects all events for trace extraction.
            Supports multimodal inputs via 'videos' field in example.
        """
        input_text = example.get("input", "")
        videos = example.get("videos")

        # Check if we have any input (text or videos)
        if not input_text and not videos:
            return ("", []) if capture_events else ""

        # Prepare multimodal content if videos present
        input_content = await self._prepare_multimodal_content(example)

        result = await self._executor.execute_agent(
            agent=self.agent,
            input_text=input_text,
            input_content=input_content,
        )

        if result.status == ExecutionStatus.FAILED:
            raise RuntimeError(result.error_message or "Executor returned FAILED")

        final_output = result.extracted_value or ""
        events = result.captured_events or []

        return (final_output, events) if capture_events else final_output

    async def make_reflective_dataset(
        self,
        candidate: dict[str, str],
        eval_batch: EvaluationBatch[ADKTrajectory, str],
        components_to_update: list[str],
    ) -> Mapping[str, Sequence[Mapping[str, Any]]]:
        """Build trials from evaluation results for reflection.

        Terminology:
            - trial: One performance record {input, output, feedback, trajectory}
            - trials: Collection of trial records for a component

        Args:
            candidate: Current candidate component values.
            eval_batch: Evaluation results including trajectories and optional
                scorer metadata (e.g., from CriticScorer).
            components_to_update: List of component names to generate
                trials for.

        Returns:
            Mapping from component name to sequence of trials.
            Each trial contains input, output, feedback (with score and
            feedback_text), and optional trajectory.

        Examples:
            Generate trials for reflection:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            trials_dataset = await adapter.make_reflective_dataset(
                candidate, result, ["instruction"]
            )
            assert "instruction" in trials_dataset
            # Each trial has structured feedback
            trial = trials_dataset["instruction"][0]
            assert "input" in trial
            assert "output" in trial
            assert "feedback" in trial
            assert trial["feedback"]["score"] == 0.75
            ```

        Note:
            Operates on eval_batch trajectories (capture_traces=True required).
            Dataset format is compatible with proposer's trial-based interface.
            Scorer metadata (feedback_text, feedback_dimensions) from
            eval_batch.metadata is included in each trial's feedback dict.
        """
        self._logger.info(
            "adapter.make_reflective_dataset.start",
            num_trials=len(eval_batch.outputs),
            components=components_to_update,
        )

        # Build trials for each requested component
        result: dict[str, list[dict[str, Any]]] = {}

        for component in components_to_update:
            trials: list[dict[str, Any]] = []

            for i, (output, score) in enumerate(
                zip(eval_batch.outputs, eval_batch.scores, strict=True)
            ):
                # Get trajectory info if available
                trajectory = None
                if eval_batch.trajectories and i < len(eval_batch.trajectories):
                    trajectory = eval_batch.trajectories[i]

                # Get metadata if available
                metadata = None
                if eval_batch.metadata and i < len(eval_batch.metadata):
                    metadata = eval_batch.metadata[i]

                # Get input text if available
                input_text = ""
                if eval_batch.inputs and i < len(eval_batch.inputs):
                    input_text = eval_batch.inputs[i]

                trial = self._build_trial(
                    input_text=input_text,
                    output=output,
                    score=score,
                    trajectory=trajectory,
                    metadata=metadata,
                )
                trials.append(trial)

            result[component] = trials

        self._logger.info(
            "adapter.make_reflective_dataset.complete",
            num_components=len(result),
            total_trials=sum(len(t) for t in result.values()),
        )

        return result

    def _build_trace(self, trajectory: ADKTrajectory) -> dict[str, Any] | None:
        """Build trace dict from ADK trajectory data.

        Extracts execution details from ADK trajectory - the intermediate
        steps between input and output (tool calls, token usage, errors).

        Args:
            trajectory: ADK execution record with tool calls, tokens, etc.

        Returns:
            Trace dict with available details, or None if no trace data.
            Can include: tool_calls, tokens, error, and future fields
            like reasoning, state_deltas, etc.
        """
        trace: dict[str, Any] = {}

        if trajectory.tool_calls:
            trace["tool_calls"] = len(trajectory.tool_calls)
        if trajectory.error:
            trace["error"] = trajectory.error
        if trajectory.token_usage:
            trace["tokens"] = trajectory.token_usage.total_tokens

        # Return None if no trace data collected
        return trace if trace else None

    def _build_trial(
        self,
        input_text: str,
        output: str,
        score: float,
        trajectory: ADKTrajectory | None,
        metadata: dict[str, Any] | None = None,
    ) -> dict[str, Any]:
        """Build a single trial record for reflection.

        Delegates to shared TrialBuilder for consistent trial structure.

        Terminology:
            - trial: One performance record {feedback, trajectory}
            - feedback: Critic evaluation {score, feedback_text, feedback_*}
            - trajectory: The journey from input to output with optional trace

        Args:
            input_text: The input that was given to the system.
            output: What the system produced.
            score: Evaluation score for this output.
            trajectory: Optional execution record with tool calls, state, etc.
            metadata: Optional scorer metadata dict (e.g., from CriticScorer).

        Returns:
            Trial dict with keys: feedback, trajectory.
            - feedback: score (mandatory), feedback_text, feedback_* (optional)
            - trajectory: input, output (mandatory), trace (optional)
        """
        # Build trace from ADK trajectory if available
        trace = self._build_trace(trajectory) if trajectory else None

        return self._trial_builder.build_trial(
            input_text=input_text,
            output=output,
            score=score,
            metadata=metadata,
            trace=trace,
            log_passthrough=True,
        )

    async def propose_new_texts(
        self,
        candidate: dict[str, str],
        reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
        components_to_update: list[str],
    ) -> dict[str, str]:
        """Propose new component texts based on trials.

        Delegates to AsyncReflectiveMutationProposer to generate improved
        component text via LLM reflection on trials. When the proposer returns
        None (no trials), falls back to unchanged candidate values.

        Args:
            candidate: Current candidate component texts (name → text).
            reflective_dataset: Trials from make_reflective_dataset().
                Maps component name to list of trial records.
            components_to_update: Components to generate proposals for.

        Returns:
            Dictionary mapping component names to proposed component text.
            When proposer returns None, returns unchanged candidate values.

        Examples:
            Using the proposer to generate improved component text:

            ```python
            # After evaluation with traces
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            trials = await adapter.make_reflective_dataset(
                candidate, result, ["instruction"]
            )

            # Propose new component text via LLM reflection on trials
            new_texts = await adapter.propose_new_texts(
                candidate, trials, ["instruction"]
            )
            # new_texts["instruction"] contains proposed component text
            ```

        Note:
            Delegates to AsyncReflectiveMutationProposer for actual mutation
            generation. Falls back gracefully when no trials available.
        """
        self._logger.debug(
            "propose_new_texts.delegating",
            components_requested=components_to_update,
        )

        result = await self._proposer.propose(
            candidate, reflective_dataset, components_to_update
        )

        if result is None:
            self._logger.info(
                "propose_new_texts.fallback",
                reason="proposer_returned_none",
                components_requested=components_to_update,
            )
            return {
                component: candidate.get(component, "")
                for component in components_to_update
            }

        # Merge with candidate for any missing components
        merged = {
            component: result.get(component, candidate.get(component, ""))
            for component in components_to_update
        }

        # Log proposed texts for each component
        for component, proposed_text in result.items():
            self._logger.info(
                "proposal.text",
                component=component,
                proposed_length=len(proposed_text),
                proposed_preview=proposed_text[:300] + "..."
                if len(proposed_text) > 300
                else proposed_text,
            )

        self._logger.info(
            "propose_new_texts.complete",
            components_proposed=list(result.keys()),
            components_requested=components_to_update,
        )

        return merged

__init__

__init__(
    agent: LlmAgent,
    scorer: Scorer,
    executor: AgentExecutorProtocol,
    max_concurrent_evals: int = 5,
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_adk_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    reflection_agent: LlmAgent | None = None,
    reflection_output_field: str | None = None,
    schema_constraints: SchemaConstraints | None = None,
    video_service: VideoBlobServiceProtocol | None = None,
) -> None

Initialize the ADK adapter with agent and scorer.

PARAMETER DESCRIPTION
agent

The ADK LlmAgent to evaluate with different instructions.

TYPE: LlmAgent

scorer

Scorer implementation for evaluating agent outputs.

TYPE: Scorer

executor

AgentExecutorProtocol implementation for unified agent execution. The executor handles session management and execution, enabling feature parity across all agent types.

TYPE: AgentExecutorProtocol

max_concurrent_evals

Maximum number of concurrent evaluations to run in parallel. Must be at least 1. Defaults to 5.

TYPE: int DEFAULT: 5

session_service

Optional session service for state management. If None, creates an InMemorySessionService.

TYPE: BaseSessionService | None DEFAULT: None

app_name

Application name for session identification.

TYPE: str DEFAULT: 'gepa_adk_eval'

trajectory_config

Configuration for trajectory extraction behavior. If None, uses TrajectoryConfig defaults (secure, all features enabled).

TYPE: TrajectoryConfig | None DEFAULT: None

proposer

Optional mutation proposer for generating improved instructions via LLM reflection. If provided, takes precedence over reflection_agent.

TYPE: AsyncReflectiveMutationProposer | None DEFAULT: None

reflection_agent

ADK LlmAgent to use for reflection operations. Either this or proposer must be provided. When provided, creates an ADK-based reflection function and passes it to a new proposer.

TYPE: LlmAgent | None DEFAULT: None

reflection_output_field

Field name to extract from structured output when reflection_agent has an output_schema. When the reflection agent returns structured output (dict), this specifies which field contains the proposed text. For schema evolution, use "class_definition" with a SchemaProposal output_schema. Only used when reflection_agent is provided.

TYPE: str | None DEFAULT: None

schema_constraints

Optional SchemaConstraints for output_schema evolution. When provided, proposed schema mutations are validated against these constraints. Mutations that violate constraints (e.g., remove required fields) are rejected and the original schema is preserved.

TYPE: SchemaConstraints | None DEFAULT: None

video_service

Optional VideoBlobServiceProtocol for multimodal input support. When provided, enables processing of trainset examples with 'videos' field. If None, defaults to a new VideoBlobService instance.

TYPE: VideoBlobServiceProtocol | None DEFAULT: None

RAISES DESCRIPTION
TypeError

If agent is not an LlmAgent instance.

TypeError

If scorer does not satisfy Scorer protocol.

TypeError

If reflection_agent is provided but not an LlmAgent instance.

ValueError

If app_name is empty string or max_concurrent_evals < 1.

ValueError

If neither proposer nor reflection_agent is provided.

Examples:

Basic setup with reflection agent:

from gepa_adk.adapters.agent_executor import AgentExecutor

reflection_agent = LlmAgent(name="reflector", model="gemini-2.5-flash")
executor = AgentExecutor()
adapter = ADKAdapter(
    agent, scorer, executor, reflection_agent=reflection_agent
)

With custom trajectory configuration:

config = TrajectoryConfig(
    redact_sensitive=True,
    max_string_length=5000,
)
executor = AgentExecutor()
adapter = ADKAdapter(
    agent,
    scorer,
    executor,
    reflection_agent=reflection_agent,
    trajectory_config=config,
)

With shared session service:

from google.adk.sessions import InMemorySessionService
from gepa_adk.adapters.agent_executor import AgentExecutor

session_service = InMemorySessionService()
executor = AgentExecutor(session_service=session_service)
adapter = ADKAdapter(
    agent,
    scorer,
    executor,
    reflection_agent=reflection_agent,
    session_service=session_service,
)
Note

Caches the agent's original instruction and restores it after each evaluation to ensure no side effects between evaluations.

Source code in src/gepa_adk/adapters/adk_adapter.py
def __init__(
    self,
    agent: LlmAgent,
    scorer: Scorer,
    executor: AgentExecutorProtocol,
    max_concurrent_evals: int = 5,
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_adk_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    reflection_agent: LlmAgent | None = None,
    reflection_output_field: str | None = None,
    schema_constraints: SchemaConstraints | None = None,
    video_service: VideoBlobServiceProtocol | None = None,
) -> None:
    """Initialize the ADK adapter with agent and scorer.

    Args:
        agent: The ADK LlmAgent to evaluate with different instructions.
        scorer: Scorer implementation for evaluating agent outputs.
        executor: AgentExecutorProtocol implementation for unified agent execution.
            The executor handles session management and execution, enabling feature
            parity across all agent types.
        max_concurrent_evals: Maximum number of concurrent evaluations to run
            in parallel. Must be at least 1. Defaults to 5.
        session_service: Optional session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for session identification.
        trajectory_config: Configuration for trajectory extraction behavior.
            If None, uses TrajectoryConfig defaults (secure, all features enabled).
        proposer: Optional mutation proposer for generating improved instructions
            via LLM reflection. If provided, takes precedence over reflection_agent.
        reflection_agent: ADK LlmAgent to use for reflection operations.
            Either this or proposer must be provided. When provided, creates an
            ADK-based reflection function and passes it to a new proposer.
        reflection_output_field: Field name to extract from structured output when
            reflection_agent has an output_schema. When the reflection agent returns
            structured output (dict), this specifies which field contains the proposed
            text. For schema evolution, use "class_definition" with a SchemaProposal
            output_schema. Only used when reflection_agent is provided.
        schema_constraints: Optional SchemaConstraints for output_schema evolution.
            When provided, proposed schema mutations are validated against these
            constraints. Mutations that violate constraints (e.g., remove required
            fields) are rejected and the original schema is preserved.
        video_service: Optional VideoBlobServiceProtocol for multimodal input support.
            When provided, enables processing of trainset examples with 'videos' field.
            If None, defaults to a new VideoBlobService instance.

    Raises:
        TypeError: If agent is not an LlmAgent instance.
        TypeError: If scorer does not satisfy Scorer protocol.
        TypeError: If reflection_agent is provided but not an LlmAgent instance.
        ValueError: If app_name is empty string or max_concurrent_evals < 1.
        ValueError: If neither proposer nor reflection_agent is provided.

    Examples:
        Basic setup with reflection agent:

        ```python
        from gepa_adk.adapters.agent_executor import AgentExecutor

        reflection_agent = LlmAgent(name="reflector", model="gemini-2.5-flash")
        executor = AgentExecutor()
        adapter = ADKAdapter(
            agent, scorer, executor, reflection_agent=reflection_agent
        )
        ```

        With custom trajectory configuration:

        ```python
        config = TrajectoryConfig(
            redact_sensitive=True,
            max_string_length=5000,
        )
        executor = AgentExecutor()
        adapter = ADKAdapter(
            agent,
            scorer,
            executor,
            reflection_agent=reflection_agent,
            trajectory_config=config,
        )
        ```

        With shared session service:

        ```python
        from google.adk.sessions import InMemorySessionService
        from gepa_adk.adapters.agent_executor import AgentExecutor

        session_service = InMemorySessionService()
        executor = AgentExecutor(session_service=session_service)
        adapter = ADKAdapter(
            agent,
            scorer,
            executor,
            reflection_agent=reflection_agent,
            session_service=session_service,
        )
        ```

    Note:
        Caches the agent's original instruction and restores it after
        each evaluation to ensure no side effects between evaluations.
    """
    # Type validation
    if not isinstance(agent, LlmAgent):
        raise TypeError(f"agent must be LlmAgent, got {type(agent)}")

    # Scorer protocol check (runtime_checkable)
    if not hasattr(scorer, "score") or not hasattr(scorer, "async_score"):
        raise TypeError(
            f"scorer must implement Scorer protocol, got {type(scorer)}"
        )

    if not app_name or not app_name.strip():
        raise ValueError("app_name cannot be empty")

    if max_concurrent_evals < 1:
        raise ValueError(
            f"max_concurrent_evals must be at least 1, got {max_concurrent_evals}"
        )

    # Validate reflection_agent if provided
    if reflection_agent is not None and not isinstance(reflection_agent, LlmAgent):
        raise TypeError(
            f"reflection_agent must be LlmAgent, got {type(reflection_agent)}"
        )

    self.agent = agent
    self.scorer = scorer
    self.max_concurrent_evals = max_concurrent_evals
    self.trajectory_config = trajectory_config or TrajectoryConfig()
    self._session_service = session_service or InMemorySessionService()
    self._app_name = app_name.strip()

    # Store executor for unified execution (T048)
    self._executor = executor

    # Store video service for multimodal input support
    self._video_service = video_service or VideoBlobService()

    # Store and apply schema constraints to output_schema handler
    self._schema_constraints = schema_constraints
    if schema_constraints is not None:
        handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
        if isinstance(handler, OutputSchemaHandler):
            handler.set_constraints(schema_constraints)

    # Bind logger with adapter context
    self._logger = logger.bind(
        adapter="ADKAdapter",
        agent_name=self.agent.name,
        app_name=self._app_name,
    )

    # Initialize trial builder for reflective dataset construction
    self._trial_builder = TrialBuilder()

    # Create proposer with clear precedence: proposer overrides reflection_agent.
    if proposer is not None:
        if reflection_agent is not None:
            self._logger.warning(
                "adapter.proposer.precedence",
                message="proposer parameter takes precedence over reflection_agent",
            )
        self._proposer = proposer
    elif reflection_agent is not None:
        # Create ADK reflection function and pass to proposer
        adk_reflection_fn = create_adk_reflection_fn(
            reflection_agent,
            executor=self._executor,
            session_service=self._session_service,
            output_field=reflection_output_field,
        )
        self._proposer = AsyncReflectiveMutationProposer(
            adk_reflection_fn=adk_reflection_fn
        )
    else:
        raise ValueError(
            "Either proposer or reflection_agent must be provided. "
            "Use reflection_agent with an ADK LlmAgent for reflection operations."
        )

    self._logger.info("adapter.initialized")

cleanup

cleanup() -> None

Clean up adapter resources and clear handler constraints.

Clears any schema constraints set on the OutputSchemaHandler to prevent constraint leakage between evolution runs. Should be called when the adapter is no longer needed.

Note

OutputSchemaHandler is a singleton, so constraints set during one evolution run could affect subsequent runs if not cleared.

Source code in src/gepa_adk/adapters/adk_adapter.py
def cleanup(self) -> None:
    """Clean up adapter resources and clear handler constraints.

    Clears any schema constraints set on the OutputSchemaHandler to prevent
    constraint leakage between evolution runs. Should be called when the
    adapter is no longer needed.

    Note:
        OutputSchemaHandler is a singleton, so constraints set during one
        evolution run could affect subsequent runs if not cleared.
    """
    if self._schema_constraints is not None:
        handler = get_handler(COMPONENT_OUTPUT_SCHEMA)
        if isinstance(handler, OutputSchemaHandler):
            handler.set_constraints(None)
        self._logger.debug("adapter.cleanup.constraints_cleared")

evaluate async

evaluate(
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[ADKTrajectory, str]

Evaluate agent with candidate instruction over a batch of inputs.

PARAMETER DESCRIPTION
batch

List of input examples, each with "input" key and optional "expected" key for scoring.

TYPE: list[dict[str, Any]]

candidate

Component name to text mapping. If "instruction" key is present, it overrides the agent's instruction.

TYPE: dict[str, str]

capture_traces

Whether to capture execution traces (tool calls, state deltas, token usage).

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
EvaluationBatch[ADKTrajectory, str]

EvaluationBatch containing outputs, scores, and optional trajectories.

Examples:

Basic evaluation without traces:

batch = [
    {"input": "What is 2+2?", "expected": "4"},
    {"input": "Capital of France?", "expected": "Paris"},
]
candidate = {"instruction": "Be concise"}
result = await adapter.evaluate(batch, candidate)
assert len(result.outputs) == 2
assert len(result.scores) == 2

With trace capture:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
assert result.trajectories is not None
assert len(result.trajectories) == len(batch)
Note

Original instruction is restored after evaluation completes, even if an exception occurs during evaluation. Evaluations run in parallel with concurrency controlled by max_concurrent_evals parameter. Results maintain input order despite parallel execution.

Source code in src/gepa_adk/adapters/adk_adapter.py
async def evaluate(
    self,
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[ADKTrajectory, str]:
    """Evaluate agent with candidate instruction over a batch of inputs.

    Args:
        batch: List of input examples, each with "input" key and optional
            "expected" key for scoring.
        candidate: Component name to text mapping. If "instruction" key
            is present, it overrides the agent's instruction.
        capture_traces: Whether to capture execution traces (tool calls,
            state deltas, token usage).

    Returns:
        EvaluationBatch containing outputs, scores, and optional trajectories.

    Examples:
        Basic evaluation without traces:

        ```python
        batch = [
            {"input": "What is 2+2?", "expected": "4"},
            {"input": "Capital of France?", "expected": "Paris"},
        ]
        candidate = {"instruction": "Be concise"}
        result = await adapter.evaluate(batch, candidate)
        assert len(result.outputs) == 2
        assert len(result.scores) == 2
        ```

        With trace capture:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        assert result.trajectories is not None
        assert len(result.trajectories) == len(batch)
        ```

    Note:
        Original instruction is restored after evaluation completes,
        even if an exception occurs during evaluation.
        Evaluations run in parallel with concurrency controlled by
        max_concurrent_evals parameter. Results maintain input order
        despite parallel execution.
    """
    self._logger.info(
        "adapter.evaluate.start",
        batch_size=len(batch),
        max_concurrent=self.max_concurrent_evals,
        capture_traces=capture_traces,
    )

    # Handle empty batch case
    if not batch:
        self._logger.info("adapter.evaluate.complete", batch_size=0)
        return EvaluationBatch(outputs=[], scores=[], trajectories=None)

    # Apply candidate components (instruction and/or output_schema) and save originals
    originals = self._apply_candidate(candidate)

    try:
        # Create semaphore to limit concurrent evaluations
        semaphore = asyncio.Semaphore(self.max_concurrent_evals)

        # Create tasks for all examples
        tasks = [
            self._eval_single_with_semaphore(
                example=example,
                example_index=i,
                candidate=candidate,
                capture_traces=capture_traces,
                semaphore=semaphore,
            )
            for i, example in enumerate(batch)
        ]

        # Execute all tasks in parallel with exception handling
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Process results and maintain order
        outputs: list[str] = []
        scores: list[float] = []
        trajectories: list[ADKTrajectory] | None = [] if capture_traces else None
        metadata_list: list[dict[str, Any]] = []
        inputs: list[str] = []  # Collect inputs for reflection

        successful = 0
        failed = 0

        for i, result in enumerate(results):
            # Collect input text for this example
            inputs.append(batch[i].get("input", ""))
            if isinstance(result, Exception):
                # Handle exception case
                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=i,
                    error=str(result),
                )
                outputs.append("")
                scores.append(0.0)
                metadata_list.append({})
                failed += 1

                if capture_traces:
                    error_trajectory = self._build_trajectory(
                        events=[],
                        final_output="",
                        error=str(result),
                    )
                    trajectories.append(error_trajectory)  # type: ignore
            else:
                # Unpack success: (output_text, score, trajectory_or_none, metadata_or_none)
                # After isinstance check, result is guaranteed to be the tuple type
                output_text, score, trajectory, metadata = result  # type: ignore[misc]
                outputs.append(output_text)
                scores.append(score)
                metadata_list.append(metadata if metadata is not None else {})
                successful += 1

                if capture_traces and trajectory is not None:
                    trajectories.append(trajectory)  # type: ignore

        avg_score = sum(scores) / len(scores) if scores else 0.0

        self._logger.info(
            "adapter.evaluate.complete",
            batch_size=len(batch),
            successful=successful,
            failed=failed,
            avg_score=avg_score,
        )

        # Only include metadata if at least one example has non-empty metadata
        # Check if any metadata dict has content (not just empty dicts)
        has_metadata = any(
            meta and isinstance(meta, dict) and meta for meta in metadata_list
        )
        final_metadata = metadata_list if has_metadata else None

        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories,
            metadata=final_metadata,
            inputs=inputs,
        )

    finally:
        # Always restore original components via registry dispatch
        self._restore_agent(originals)

make_reflective_dataset async

make_reflective_dataset(
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[ADKTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]

Build trials from evaluation results for reflection.

Terminology
  • trial: One performance record {input, output, feedback, trajectory}
  • trials: Collection of trial records for a component
PARAMETER DESCRIPTION
candidate

Current candidate component values.

TYPE: dict[str, str]

eval_batch

Evaluation results including trajectories and optional scorer metadata (e.g., from CriticScorer).

TYPE: EvaluationBatch[ADKTrajectory, str]

components_to_update

List of component names to generate trials for.

TYPE: list[str]

RETURNS DESCRIPTION
Mapping[str, Sequence[Mapping[str, Any]]]

Mapping from component name to sequence of trials.

Mapping[str, Sequence[Mapping[str, Any]]]

Each trial contains input, output, feedback (with score and

Mapping[str, Sequence[Mapping[str, Any]]]

feedback_text), and optional trajectory.

Examples:

Generate trials for reflection:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
trials_dataset = await adapter.make_reflective_dataset(
    candidate, result, ["instruction"]
)
assert "instruction" in trials_dataset
# Each trial has structured feedback
trial = trials_dataset["instruction"][0]
assert "input" in trial
assert "output" in trial
assert "feedback" in trial
assert trial["feedback"]["score"] == 0.75
Note

Operates on eval_batch trajectories (capture_traces=True required). Dataset format is compatible with proposer's trial-based interface. Scorer metadata (feedback_text, feedback_dimensions) from eval_batch.metadata is included in each trial's feedback dict.

Source code in src/gepa_adk/adapters/adk_adapter.py
async def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[ADKTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]:
    """Build trials from evaluation results for reflection.

    Terminology:
        - trial: One performance record {input, output, feedback, trajectory}
        - trials: Collection of trial records for a component

    Args:
        candidate: Current candidate component values.
        eval_batch: Evaluation results including trajectories and optional
            scorer metadata (e.g., from CriticScorer).
        components_to_update: List of component names to generate
            trials for.

    Returns:
        Mapping from component name to sequence of trials.
        Each trial contains input, output, feedback (with score and
        feedback_text), and optional trajectory.

    Examples:
        Generate trials for reflection:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        trials_dataset = await adapter.make_reflective_dataset(
            candidate, result, ["instruction"]
        )
        assert "instruction" in trials_dataset
        # Each trial has structured feedback
        trial = trials_dataset["instruction"][0]
        assert "input" in trial
        assert "output" in trial
        assert "feedback" in trial
        assert trial["feedback"]["score"] == 0.75
        ```

    Note:
        Operates on eval_batch trajectories (capture_traces=True required).
        Dataset format is compatible with proposer's trial-based interface.
        Scorer metadata (feedback_text, feedback_dimensions) from
        eval_batch.metadata is included in each trial's feedback dict.
    """
    self._logger.info(
        "adapter.make_reflective_dataset.start",
        num_trials=len(eval_batch.outputs),
        components=components_to_update,
    )

    # Build trials for each requested component
    result: dict[str, list[dict[str, Any]]] = {}

    for component in components_to_update:
        trials: list[dict[str, Any]] = []

        for i, (output, score) in enumerate(
            zip(eval_batch.outputs, eval_batch.scores, strict=True)
        ):
            # Get trajectory info if available
            trajectory = None
            if eval_batch.trajectories and i < len(eval_batch.trajectories):
                trajectory = eval_batch.trajectories[i]

            # Get metadata if available
            metadata = None
            if eval_batch.metadata and i < len(eval_batch.metadata):
                metadata = eval_batch.metadata[i]

            # Get input text if available
            input_text = ""
            if eval_batch.inputs and i < len(eval_batch.inputs):
                input_text = eval_batch.inputs[i]

            trial = self._build_trial(
                input_text=input_text,
                output=output,
                score=score,
                trajectory=trajectory,
                metadata=metadata,
            )
            trials.append(trial)

        result[component] = trials

    self._logger.info(
        "adapter.make_reflective_dataset.complete",
        num_components=len(result),
        total_trials=sum(len(t) for t in result.values()),
    )

    return result

propose_new_texts async

propose_new_texts(
    candidate: dict[str, str],
    reflective_dataset: Mapping[
        str, Sequence[Mapping[str, Any]]
    ],
    components_to_update: list[str],
) -> dict[str, str]

Propose new component texts based on trials.

Delegates to AsyncReflectiveMutationProposer to generate improved component text via LLM reflection on trials. When the proposer returns None (no trials), falls back to unchanged candidate values.

PARAMETER DESCRIPTION
candidate

Current candidate component texts (name → text).

TYPE: dict[str, str]

reflective_dataset

Trials from make_reflective_dataset(). Maps component name to list of trial records.

TYPE: Mapping[str, Sequence[Mapping[str, Any]]]

components_to_update

Components to generate proposals for.

TYPE: list[str]

RETURNS DESCRIPTION
dict[str, str]

Dictionary mapping component names to proposed component text.

dict[str, str]

When proposer returns None, returns unchanged candidate values.

Examples:

Using the proposer to generate improved component text:

# After evaluation with traces
result = await adapter.evaluate(batch, candidate, capture_traces=True)
trials = await adapter.make_reflective_dataset(
    candidate, result, ["instruction"]
)

# Propose new component text via LLM reflection on trials
new_texts = await adapter.propose_new_texts(
    candidate, trials, ["instruction"]
)
# new_texts["instruction"] contains proposed component text
Note

Delegates to AsyncReflectiveMutationProposer for actual mutation generation. Falls back gracefully when no trials available.

Source code in src/gepa_adk/adapters/adk_adapter.py
async def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]:
    """Propose new component texts based on trials.

    Delegates to AsyncReflectiveMutationProposer to generate improved
    component text via LLM reflection on trials. When the proposer returns
    None (no trials), falls back to unchanged candidate values.

    Args:
        candidate: Current candidate component texts (name → text).
        reflective_dataset: Trials from make_reflective_dataset().
            Maps component name to list of trial records.
        components_to_update: Components to generate proposals for.

    Returns:
        Dictionary mapping component names to proposed component text.
        When proposer returns None, returns unchanged candidate values.

    Examples:
        Using the proposer to generate improved component text:

        ```python
        # After evaluation with traces
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        trials = await adapter.make_reflective_dataset(
            candidate, result, ["instruction"]
        )

        # Propose new component text via LLM reflection on trials
        new_texts = await adapter.propose_new_texts(
            candidate, trials, ["instruction"]
        )
        # new_texts["instruction"] contains proposed component text
        ```

    Note:
        Delegates to AsyncReflectiveMutationProposer for actual mutation
        generation. Falls back gracefully when no trials available.
    """
    self._logger.debug(
        "propose_new_texts.delegating",
        components_requested=components_to_update,
    )

    result = await self._proposer.propose(
        candidate, reflective_dataset, components_to_update
    )

    if result is None:
        self._logger.info(
            "propose_new_texts.fallback",
            reason="proposer_returned_none",
            components_requested=components_to_update,
        )
        return {
            component: candidate.get(component, "")
            for component in components_to_update
        }

    # Merge with candidate for any missing components
    merged = {
        component: result.get(component, candidate.get(component, ""))
        for component in components_to_update
    }

    # Log proposed texts for each component
    for component, proposed_text in result.items():
        self._logger.info(
            "proposal.text",
            component=component,
            proposed_length=len(proposed_text),
            proposed_preview=proposed_text[:300] + "..."
            if len(proposed_text) > 300
            else proposed_text,
        )

    self._logger.info(
        "propose_new_texts.complete",
        components_proposed=list(result.keys()),
        components_requested=components_to_update,
    )

    return merged

AgentExecutor

Unified agent execution adapter.

Provides a single execution path for all ADK agent types (generator, critic, reflection) with consistent session management, event capture, and result handling.

ATTRIBUTE DESCRIPTION
_session_service

ADK session service for state management.

TYPE: BaseSessionService

_app_name

Application name for ADK runner.

TYPE: str

Examples:

Basic usage:

executor = AgentExecutor()
result = await executor.execute_agent(
    agent=my_agent,
    input_text="Hello, world!",
)
if result.status == ExecutionStatus.SUCCESS:
    print(f"Output: {result.extracted_value}")

With custom session service:

from google.adk.sessions import InMemorySessionService

session_service = InMemorySessionService()
executor = AgentExecutor(session_service=session_service)
Note

Adapter implements AgentExecutorProtocol for dependency injection and testing. All ADK-specific logic is encapsulated here.

Source code in src/gepa_adk/adapters/agent_executor.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
class AgentExecutor:
    """Unified agent execution adapter.

    Provides a single execution path for all ADK agent types (generator, critic,
    reflection) with consistent session management, event capture, and result
    handling.

    Attributes:
        _session_service (BaseSessionService): ADK session service for state management.
        _app_name (str): Application name for ADK runner.

    Examples:
        Basic usage:

        ```python
        executor = AgentExecutor()
        result = await executor.execute_agent(
            agent=my_agent,
            input_text="Hello, world!",
        )
        if result.status == ExecutionStatus.SUCCESS:
            print(f"Output: {result.extracted_value}")
        ```

        With custom session service:

        ```python
        from google.adk.sessions import InMemorySessionService

        session_service = InMemorySessionService()
        executor = AgentExecutor(session_service=session_service)
        ```

    Note:
        Adapter implements AgentExecutorProtocol for dependency injection
        and testing. All ADK-specific logic is encapsulated here.
    """

    def __init__(
        self,
        session_service: BaseSessionService | None = None,
        app_name: str = "gepa_executor",
    ) -> None:
        """Initialize AgentExecutor.

        Args:
            session_service: ADK session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for ADK runner. Defaults to "gepa_executor".

        Examples:
            Default initialization:

            ```python
            executor = AgentExecutor()
            ```

            With custom app name:

            ```python
            executor = AgentExecutor(app_name="my_app")
            ```

        Note:
            Creates a shared executor that uses the session service for all
            agent executions, allowing session state to be shared between
            executions when desired.
        """
        self._session_service = session_service or InMemorySessionService()
        self._app_name = app_name
        self._logger = logger.bind(component="AgentExecutor", app_name=app_name)

    async def _create_session(
        self,
        user_id: str,
        session_state: dict[str, Any] | None = None,
    ) -> Session:
        """Create a new session with optional initial state.

        Args:
            user_id: User identifier for the session.
            session_state: Initial state to inject into the session.

        Returns:
            Created ADK Session object.

        Note:
            Optional session state enables template variable substitution in
            agent instructions (e.g., {component_text} and {trials}).
        """
        session_id = f"exec_{uuid4()}"

        self._logger.debug(
            "session.creating",
            session_id=session_id,
            user_id=user_id,
            has_initial_state=session_state is not None,
        )

        session = await self._session_service.create_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
            state=session_state,
        )

        self._logger.debug(
            "session.created",
            session_id=session.id,
        )

        return session

    async def _get_session(self, session_id: str, user_id: str) -> Session:
        """Retrieve an existing session by ID.

        Args:
            session_id: The session ID to retrieve.
            user_id: User identifier for the session.

        Returns:
            Existing ADK Session object.

        Raises:
            SessionNotFoundError: If the session does not exist.

        Note:
            Only performs strict existence checks for sessions that must
            already exist. Callers use this to fail fast instead of creating
            a new session. For get-or-create semantics, use _get_or_create_session.
        """
        session = await self._session_service.get_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
        )

        if session is None:
            raise SessionNotFoundError(session_id)

        self._logger.debug(
            "session.retrieved",
            session_id=session_id,
        )

        return session

    async def _get_or_create_session(
        self,
        session_id: str,
        user_id: str,
        session_state: dict[str, Any] | None = None,
    ) -> Session:
        """Get an existing session or create a new one with the specified ID.

        Implements "get or create" semantics for session management. If the
        session exists, returns it. If not, creates a new session with the
        specified ID and optional initial state.

        Args:
            session_id: The session ID to retrieve or create.
            user_id: User identifier for the session.
            session_state: Initial state to inject if creating a new session.
                Ignored if session already exists.

        Returns:
            ADK Session object (existing or newly created).

        Note:
            Only applies initial state when creating new sessions. Existing
            sessions retain their current state regardless of session_state
            parameter.
        """
        # Try to get existing session first
        session = await self._session_service.get_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
        )

        if session is not None:
            self._logger.debug(
                "session.retrieved",
                session_id=session_id,
            )
            return session

        # Session doesn't exist, create it with the specified ID
        self._logger.debug(
            "session.creating_with_id",
            session_id=session_id,
            user_id=user_id,
            has_initial_state=session_state is not None,
        )

        session = await self._session_service.create_session(
            app_name=self._app_name,
            user_id=user_id,
            session_id=session_id,
            state=session_state,
        )

        self._logger.debug(
            "session.created",
            session_id=session.id,
        )

        return session

    def _apply_overrides(
        self,
        agent: Any,
        instruction_override: str | None,
        output_schema_override: Any | None,
    ) -> Any:
        """Apply instruction and schema overrides to create modified agent copy.

        Args:
            agent: Original ADK LlmAgent.
            instruction_override: If provided, replaces agent instruction.
            output_schema_override: If provided, replaces output schema.

        Returns:
            Modified agent copy (or original if no overrides).

        Note:
            Original agent is preserved by creating a shallow copy with
            overridden attributes. The original agent is never modified.
        """
        if instruction_override is None and output_schema_override is None:
            return agent

        # Import LlmAgent here to avoid circular imports
        from google.adk.agents import LlmAgent

        # Create a copy with overrides
        # LlmAgent doesn't have a simple copy mechanism, so we recreate it
        # with the same parameters but modified instruction/schema
        # Extract agent attributes with proper defaults
        agent_tools = getattr(agent, "tools", None)
        modified_agent = LlmAgent(
            name=agent.name,
            model=agent.model,
            instruction=instruction_override or agent.instruction,
            output_schema=output_schema_override
            or getattr(agent, "output_schema", None),
            output_key=getattr(agent, "output_key", None),
            tools=agent_tools if agent_tools else [],
            before_model_callback=getattr(agent, "before_model_callback", None),
            after_model_callback=getattr(agent, "after_model_callback", None),
        )

        self._logger.debug(
            "agent.overrides_applied",
            instruction_override=instruction_override is not None,
            schema_override=output_schema_override is not None,
        )

        return modified_agent

    def _build_content(
        self,
        input_text: str,
        input_content: types.Content | None = None,
    ) -> types.Content:
        """Build Content for agent execution.

        Assembles the Content object to send to the agent. If input_content
        is provided, uses it directly. Otherwise, wraps input_text in a
        Content with a single text Part.

        Args:
            input_text: Text input for the agent.
            input_content: Pre-assembled multimodal Content. Takes precedence
                over input_text when provided.

        Returns:
            Content object for agent execution.

        Note:
            Serves as the central point for Content assembly, supporting
            both text-only (backward compatible) and multimodal inputs.
        """
        if input_content is not None:
            return input_content

        return types.Content(
            role="user",
            parts=[types.Part(text=input_text)],
        )

    async def _execute_runner(
        self,
        runner: Runner,
        session: Session,
        user_id: str,
        input_text: str,
        input_content: types.Content | None = None,
    ) -> list[Any]:
        """Execute the ADK Runner and capture events.

        Args:
            runner: ADK Runner instance.
            session: ADK Session for execution.
            user_id: User identifier.
            input_text: User message to send (used if input_content is None).
            input_content: Pre-assembled multimodal Content. Takes precedence
                over input_text when provided.

        Returns:
            List of captured ADK events.

        Note:
            Orchestrates the core Runner.run_async() loop, capturing all
            events for later output extraction.
        """
        content = self._build_content(input_text, input_content)

        events: list[Any] = []

        async for event in runner.run_async(
            user_id=user_id,
            session_id=session.id,
            new_message=content,
        ):
            events.append(event)

        return events

    async def _execute_with_timeout(
        self,
        runner: Runner,
        session: Session,
        user_id: str,
        input_text: str,
        timeout_seconds: int,
        input_content: types.Content | None = None,
    ) -> tuple[list[Any], bool]:
        """Execute runner with timeout handling.

        Args:
            runner: ADK Runner instance.
            session: ADK Session for execution.
            user_id: User identifier.
            input_text: User message to send (used if input_content is None).
            timeout_seconds: Maximum execution time.
            input_content: Pre-assembled multimodal Content. Takes precedence
                over input_text when provided.

        Returns:
            Tuple of (captured_events, timed_out).

        Note:
            On timeout, returns partial events captured before timeout.
            Uses asyncio.timeout for cancellation.
        """
        events: list[Any] = []
        timed_out = False

        try:
            async with asyncio.timeout(timeout_seconds):
                events = await self._execute_runner(
                    runner, session, user_id, input_text, input_content
                )
        except TimeoutError:
            timed_out = True
            self._logger.warning(
                "execution.timeout",
                session_id=session.id,
                timeout_seconds=timeout_seconds,
                events_captured=len(events),
            )

        return events, timed_out

    async def _extract_output(
        self,
        session: Session,
        events: list[Any],
        agent: Any,
    ) -> str | None:
        """Extract output from session state with event fallback.

        Args:
            session: ADK Session after execution.
            events: Captured events from execution.
            agent: Agent that was executed (for output_key).

        Returns:
            Extracted output string, or None if no output found.

        Note:
            Output extraction prioritizes state-based approach (using
            output_key), then falls back to event-based extraction.
        """
        # Try state-based extraction first (if agent has output_key)
        output_key = getattr(agent, "output_key", None)
        if output_key:
            # Refresh session state
            refreshed_session = await self._session_service.get_session(
                app_name=self._app_name,
                user_id=session.user_id,
                session_id=session.id,
            )
            if refreshed_session and refreshed_session.state:
                state_output = extract_output_from_state(
                    refreshed_session.state, output_key
                )
                if state_output:
                    self._logger.debug(
                        "output.extracted_from_state",
                        output_key=output_key,
                    )
                    return state_output

        # Fallback to event-based extraction
        event_output = extract_final_output(events)
        if event_output:
            self._logger.debug("output.extracted_from_events")
            return event_output

        return None

    async def execute_agent(
        self,
        agent: Any,
        input_text: str,
        *,
        input_content: types.Content | None = None,
        instruction_override: str | None = None,
        output_schema_override: Any | None = None,
        session_state: dict[str, Any] | None = None,
        existing_session_id: str | None = None,
        timeout_seconds: int = 300,
    ) -> ExecutionResult:
        """Execute an agent and return structured result.

        Runs the specified agent with the given input, optionally applying
        instruction or schema overrides for evolution scenarios. Manages
        session lifecycle and captures execution events.

        Args:
            agent: ADK LlmAgent to execute. The agent's tools, output_key,
                and other ADK features are preserved during execution.
            input_text: User message to send to the agent. Used when
                input_content is None for backward compatibility.
            input_content: Pre-assembled multimodal Content for the agent.
                When provided, takes precedence over input_text. Use this
                for multimodal inputs containing video or other media.
            instruction_override: If provided, replaces the agent's instruction
                for this execution only. Original agent is not modified.
            output_schema_override: If provided, replaces the agent's output
                schema for this execution only (type[BaseModel]). Used for schema evolution.
            session_state: Initial state to inject into the session. Used for
                template variable substitution (e.g., {component_text}).
            existing_session_id: If provided, uses get-or-create semantics to
                retrieve or create a session with this ID. Enables session sharing
                between agents (e.g., critic accessing generator state).
            timeout_seconds: Maximum execution time in seconds. Defaults to 300.
                Execution terminates with TIMEOUT status if exceeded.

        Returns:
            ExecutionResult with status, output, and debugging information.

        Examples:
            Basic execution:

            ```python
            result = await executor.execute_agent(
                agent=greeter,
                input_text="Hello!",
            )
            print(result.extracted_value)
            ```

            With session state for reflection:

            ```python
            result = await executor.execute_agent(
                agent=reflector,
                input_text="Improve the instruction",
                session_state={
                    "component_text": "Be helpful.",
                    "trials": '[{"score": 0.5}]',
                },
            )
            ```

            With multimodal content:

            ```python
            from google.genai.types import Content, Part

            content = Content(
                role="user",
                parts=[Part(text="Describe this video"), video_part],
            )
            result = await executor.execute_agent(
                agent=analyzer,
                input_text="",  # Can be empty when content provided
                input_content=content,
            )
            ```

        Note:
            Optional typing (Any) is used for agent parameter to avoid
            coupling to ADK types in the ports layer. Implementations
            should validate that the agent is a valid LlmAgent.
        """
        start_time = time.perf_counter()
        user_id = "exec_user"
        is_multimodal = input_content is not None

        self._logger.info(
            "execution.start",
            agent_name=getattr(agent, "name", "unknown"),
            input_length=len(input_text),
            is_multimodal=is_multimodal,
            has_instruction_override=instruction_override is not None,
            has_schema_override=output_schema_override is not None,
            has_session_state=session_state is not None,
            existing_session_id=existing_session_id,
            timeout_seconds=timeout_seconds,
        )

        # Get or create session
        session: Session
        if existing_session_id:
            session = await self._get_or_create_session(
                existing_session_id, user_id, session_state
            )
        else:
            session = await self._create_session(user_id, session_state)

        # Apply overrides if provided
        effective_agent = self._apply_overrides(
            agent, instruction_override, output_schema_override
        )

        # Create runner
        runner = Runner(
            agent=effective_agent,
            app_name=self._app_name,
            session_service=self._session_service,
        )

        # Execute with timeout and capture events
        events: list[Any] = []
        timed_out = False
        error_message: str | None = None

        try:
            events, timed_out = await self._execute_with_timeout(
                runner, session, user_id, input_text, timeout_seconds, input_content
            )
        except Exception as e:
            error_message = str(e)
            self._logger.error(
                "execution.error",
                session_id=session.id,
                error=error_message,
            )

        # Calculate execution time
        execution_time = time.perf_counter() - start_time

        # Determine status
        if error_message:
            status = ExecutionStatus.FAILED
        elif timed_out:
            status = ExecutionStatus.TIMEOUT
            error_message = f"Execution timed out after {timeout_seconds}s"
        else:
            status = ExecutionStatus.SUCCESS

        # Extract output (even on timeout, we try to get partial results)
        extracted_value: str | None = None
        if status == ExecutionStatus.SUCCESS or (timed_out and events):
            extracted_value = await self._extract_output(session, events, agent)

        self._logger.info(
            "execution.complete",
            session_id=session.id,
            status=status.value,
            execution_time_seconds=execution_time,
            events_captured=len(events),
            has_output=extracted_value is not None,
        )

        return ExecutionResult(
            status=status,
            session_id=session.id,
            extracted_value=extracted_value,
            error_message=error_message,
            execution_time_seconds=execution_time,
            captured_events=events,
        )

__init__

__init__(
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_executor",
) -> None

Initialize AgentExecutor.

PARAMETER DESCRIPTION
session_service

ADK session service for state management. If None, creates an InMemorySessionService.

TYPE: BaseSessionService | None DEFAULT: None

app_name

Application name for ADK runner. Defaults to "gepa_executor".

TYPE: str DEFAULT: 'gepa_executor'

Examples:

Default initialization:

executor = AgentExecutor()

With custom app name:

executor = AgentExecutor(app_name="my_app")
Note

Creates a shared executor that uses the session service for all agent executions, allowing session state to be shared between executions when desired.

Source code in src/gepa_adk/adapters/agent_executor.py
def __init__(
    self,
    session_service: BaseSessionService | None = None,
    app_name: str = "gepa_executor",
) -> None:
    """Initialize AgentExecutor.

    Args:
        session_service: ADK session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for ADK runner. Defaults to "gepa_executor".

    Examples:
        Default initialization:

        ```python
        executor = AgentExecutor()
        ```

        With custom app name:

        ```python
        executor = AgentExecutor(app_name="my_app")
        ```

    Note:
        Creates a shared executor that uses the session service for all
        agent executions, allowing session state to be shared between
        executions when desired.
    """
    self._session_service = session_service or InMemorySessionService()
    self._app_name = app_name
    self._logger = logger.bind(component="AgentExecutor", app_name=app_name)

execute_agent async

execute_agent(
    agent: Any,
    input_text: str,
    *,
    input_content: Content | None = None,
    instruction_override: str | None = None,
    output_schema_override: Any | None = None,
    session_state: dict[str, Any] | None = None,
    existing_session_id: str | None = None,
    timeout_seconds: int = 300,
) -> ExecutionResult

Execute an agent and return structured result.

Runs the specified agent with the given input, optionally applying instruction or schema overrides for evolution scenarios. Manages session lifecycle and captures execution events.

PARAMETER DESCRIPTION
agent

ADK LlmAgent to execute. The agent's tools, output_key, and other ADK features are preserved during execution.

TYPE: Any

input_text

User message to send to the agent. Used when input_content is None for backward compatibility.

TYPE: str

input_content

Pre-assembled multimodal Content for the agent. When provided, takes precedence over input_text. Use this for multimodal inputs containing video or other media.

TYPE: Content | None DEFAULT: None

instruction_override

If provided, replaces the agent's instruction for this execution only. Original agent is not modified.

TYPE: str | None DEFAULT: None

output_schema_override

If provided, replaces the agent's output schema for this execution only (type[BaseModel]). Used for schema evolution.

TYPE: Any | None DEFAULT: None

session_state

Initial state to inject into the session. Used for template variable substitution (e.g., {component_text}).

TYPE: dict[str, Any] | None DEFAULT: None

existing_session_id

If provided, uses get-or-create semantics to retrieve or create a session with this ID. Enables session sharing between agents (e.g., critic accessing generator state).

TYPE: str | None DEFAULT: None

timeout_seconds

Maximum execution time in seconds. Defaults to 300. Execution terminates with TIMEOUT status if exceeded.

TYPE: int DEFAULT: 300

RETURNS DESCRIPTION
ExecutionResult

ExecutionResult with status, output, and debugging information.

Examples:

Basic execution:

result = await executor.execute_agent(
    agent=greeter,
    input_text="Hello!",
)
print(result.extracted_value)

With session state for reflection:

result = await executor.execute_agent(
    agent=reflector,
    input_text="Improve the instruction",
    session_state={
        "component_text": "Be helpful.",
        "trials": '[{"score": 0.5}]',
    },
)

With multimodal content:

from google.genai.types import Content, Part

content = Content(
    role="user",
    parts=[Part(text="Describe this video"), video_part],
)
result = await executor.execute_agent(
    agent=analyzer,
    input_text="",  # Can be empty when content provided
    input_content=content,
)
Note

Optional typing (Any) is used for agent parameter to avoid coupling to ADK types in the ports layer. Implementations should validate that the agent is a valid LlmAgent.

Source code in src/gepa_adk/adapters/agent_executor.py
async def execute_agent(
    self,
    agent: Any,
    input_text: str,
    *,
    input_content: types.Content | None = None,
    instruction_override: str | None = None,
    output_schema_override: Any | None = None,
    session_state: dict[str, Any] | None = None,
    existing_session_id: str | None = None,
    timeout_seconds: int = 300,
) -> ExecutionResult:
    """Execute an agent and return structured result.

    Runs the specified agent with the given input, optionally applying
    instruction or schema overrides for evolution scenarios. Manages
    session lifecycle and captures execution events.

    Args:
        agent: ADK LlmAgent to execute. The agent's tools, output_key,
            and other ADK features are preserved during execution.
        input_text: User message to send to the agent. Used when
            input_content is None for backward compatibility.
        input_content: Pre-assembled multimodal Content for the agent.
            When provided, takes precedence over input_text. Use this
            for multimodal inputs containing video or other media.
        instruction_override: If provided, replaces the agent's instruction
            for this execution only. Original agent is not modified.
        output_schema_override: If provided, replaces the agent's output
            schema for this execution only (type[BaseModel]). Used for schema evolution.
        session_state: Initial state to inject into the session. Used for
            template variable substitution (e.g., {component_text}).
        existing_session_id: If provided, uses get-or-create semantics to
            retrieve or create a session with this ID. Enables session sharing
            between agents (e.g., critic accessing generator state).
        timeout_seconds: Maximum execution time in seconds. Defaults to 300.
            Execution terminates with TIMEOUT status if exceeded.

    Returns:
        ExecutionResult with status, output, and debugging information.

    Examples:
        Basic execution:

        ```python
        result = await executor.execute_agent(
            agent=greeter,
            input_text="Hello!",
        )
        print(result.extracted_value)
        ```

        With session state for reflection:

        ```python
        result = await executor.execute_agent(
            agent=reflector,
            input_text="Improve the instruction",
            session_state={
                "component_text": "Be helpful.",
                "trials": '[{"score": 0.5}]',
            },
        )
        ```

        With multimodal content:

        ```python
        from google.genai.types import Content, Part

        content = Content(
            role="user",
            parts=[Part(text="Describe this video"), video_part],
        )
        result = await executor.execute_agent(
            agent=analyzer,
            input_text="",  # Can be empty when content provided
            input_content=content,
        )
        ```

    Note:
        Optional typing (Any) is used for agent parameter to avoid
        coupling to ADK types in the ports layer. Implementations
        should validate that the agent is a valid LlmAgent.
    """
    start_time = time.perf_counter()
    user_id = "exec_user"
    is_multimodal = input_content is not None

    self._logger.info(
        "execution.start",
        agent_name=getattr(agent, "name", "unknown"),
        input_length=len(input_text),
        is_multimodal=is_multimodal,
        has_instruction_override=instruction_override is not None,
        has_schema_override=output_schema_override is not None,
        has_session_state=session_state is not None,
        existing_session_id=existing_session_id,
        timeout_seconds=timeout_seconds,
    )

    # Get or create session
    session: Session
    if existing_session_id:
        session = await self._get_or_create_session(
            existing_session_id, user_id, session_state
        )
    else:
        session = await self._create_session(user_id, session_state)

    # Apply overrides if provided
    effective_agent = self._apply_overrides(
        agent, instruction_override, output_schema_override
    )

    # Create runner
    runner = Runner(
        agent=effective_agent,
        app_name=self._app_name,
        session_service=self._session_service,
    )

    # Execute with timeout and capture events
    events: list[Any] = []
    timed_out = False
    error_message: str | None = None

    try:
        events, timed_out = await self._execute_with_timeout(
            runner, session, user_id, input_text, timeout_seconds, input_content
        )
    except Exception as e:
        error_message = str(e)
        self._logger.error(
            "execution.error",
            session_id=session.id,
            error=error_message,
        )

    # Calculate execution time
    execution_time = time.perf_counter() - start_time

    # Determine status
    if error_message:
        status = ExecutionStatus.FAILED
    elif timed_out:
        status = ExecutionStatus.TIMEOUT
        error_message = f"Execution timed out after {timeout_seconds}s"
    else:
        status = ExecutionStatus.SUCCESS

    # Extract output (even on timeout, we try to get partial results)
    extracted_value: str | None = None
    if status == ExecutionStatus.SUCCESS or (timed_out and events):
        extracted_value = await self._extract_output(session, events, agent)

    self._logger.info(
        "execution.complete",
        session_id=session.id,
        status=status.value,
        execution_time_seconds=execution_time,
        events_captured=len(events),
        has_output=extracted_value is not None,
    )

    return ExecutionResult(
        status=status,
        session_id=session.id,
        extracted_value=extracted_value,
        error_message=error_message,
        execution_time_seconds=execution_time,
        captured_events=events,
    )

SessionNotFoundError

Bases: EvolutionError


              flowchart TD
              gepa_adk.adapters.SessionNotFoundError[SessionNotFoundError]
              gepa_adk.domain.exceptions.EvolutionError[EvolutionError]

                              gepa_adk.domain.exceptions.EvolutionError --> gepa_adk.adapters.SessionNotFoundError
                


              click gepa_adk.adapters.SessionNotFoundError href "" "gepa_adk.adapters.SessionNotFoundError"
              click gepa_adk.domain.exceptions.EvolutionError href "" "gepa_adk.domain.exceptions.EvolutionError"
            

Raised when a requested session does not exist.

ATTRIBUTE DESCRIPTION
session_id

The session ID that was not found.

TYPE: str

Examples:

Handling session not found with strict existence checking:

from gepa_adk.adapters.agent_executor import SessionNotFoundError

try:
    session = await executor._get_session(
        session_id="invalid_session",
        user_id="user_123",
    )
except SessionNotFoundError as e:
    print(f"Session not found: {e.session_id}")
Note

Arises only from strict existence-checking paths like _get_session(). The execute_agent() method uses get-or-create semantics and will not raise this exception.

Source code in src/gepa_adk/adapters/agent_executor.py
class SessionNotFoundError(EvolutionError):
    """Raised when a requested session does not exist.

    Attributes:
        session_id (str): The session ID that was not found.

    Examples:
        Handling session not found with strict existence checking:

        ```python
        from gepa_adk.adapters.agent_executor import SessionNotFoundError

        try:
            session = await executor._get_session(
                session_id="invalid_session",
                user_id="user_123",
            )
        except SessionNotFoundError as e:
            print(f"Session not found: {e.session_id}")
        ```

    Note:
        Arises only from strict existence-checking paths like _get_session().
        The execute_agent() method uses get-or-create semantics and will not
        raise this exception.
    """

    def __init__(self, session_id: str) -> None:
        """Initialize SessionNotFoundError.

        Args:
            session_id: The session ID that was not found.
        """
        self.session_id = session_id
        super().__init__(f"Session not found: {session_id}")

__init__

__init__(session_id: str) -> None

Initialize SessionNotFoundError.

PARAMETER DESCRIPTION
session_id

The session ID that was not found.

TYPE: str

Source code in src/gepa_adk/adapters/agent_executor.py
def __init__(self, session_id: str) -> None:
    """Initialize SessionNotFoundError.

    Args:
        session_id: The session ID that was not found.
    """
    self.session_id = session_id
    super().__init__(f"Session not found: {session_id}")

CurrentBestCandidateSelector

Always select the candidate with the highest average score.

Note

A greedy selector always exploits the best-average candidate.

Examples:

selector = CurrentBestCandidateSelector()
candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
class CurrentBestCandidateSelector:
    """Always select the candidate with the highest average score.

    Note:
        A greedy selector always exploits the best-average candidate.

    Examples:
        ```python
        selector = CurrentBestCandidateSelector()
        candidate_idx = await selector.select_candidate(state)
        ```
    """

    async def select_candidate(self, state: ParetoState) -> int:
        """Return the best-average candidate index.

        Args:
            state: Current evolution state with Pareto tracking.

        Returns:
            Selected candidate index.

        Raises:
            NoCandidateAvailableError: If no candidates are available.

        Examples:
            ```python
            candidate_idx = await selector.select_candidate(state)
            ```
        """
        if state.best_average_idx is None:
            raise NoCandidateAvailableError("No candidates available for selection")
        return state.best_average_idx

select_candidate async

select_candidate(state: ParetoState) -> int

Return the best-average candidate index.

PARAMETER DESCRIPTION
state

Current evolution state with Pareto tracking.

TYPE: ParetoState

RETURNS DESCRIPTION
int

Selected candidate index.

RAISES DESCRIPTION
NoCandidateAvailableError

If no candidates are available.

Examples:

candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
async def select_candidate(self, state: ParetoState) -> int:
    """Return the best-average candidate index.

    Args:
        state: Current evolution state with Pareto tracking.

    Returns:
        Selected candidate index.

    Raises:
        NoCandidateAvailableError: If no candidates are available.

    Examples:
        ```python
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    if state.best_average_idx is None:
        raise NoCandidateAvailableError("No candidates available for selection")
    return state.best_average_idx

EpsilonGreedyCandidateSelector

Epsilon-greedy selection balancing exploration and exploitation.

ATTRIBUTE DESCRIPTION
_epsilon

Exploration probability.

TYPE: float

_rng

RNG for exploration decisions.

TYPE: Random

Note

A mixed strategy sometimes explores and otherwise exploits the best.

Examples:

selector = EpsilonGreedyCandidateSelector(epsilon=0.1, rng=random.Random(7))
candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
class EpsilonGreedyCandidateSelector:
    """Epsilon-greedy selection balancing exploration and exploitation.

    Attributes:
        _epsilon (float): Exploration probability.
        _rng (random.Random): RNG for exploration decisions.

    Note:
        A mixed strategy sometimes explores and otherwise exploits the best.

    Examples:
        ```python
        selector = EpsilonGreedyCandidateSelector(epsilon=0.1, rng=random.Random(7))
        candidate_idx = await selector.select_candidate(state)
        ```
    """

    def __init__(self, epsilon: float, rng: random.Random | None = None) -> None:
        """Initialize the selector.

        Args:
            epsilon: Probability of random exploration.
            rng: Optional random number generator for reproducibility.

        Raises:
            ConfigurationError: If epsilon is outside [0.0, 1.0].
        """
        if not 0.0 <= epsilon <= 1.0:
            raise ConfigurationError(
                "epsilon must be between 0.0 and 1.0",
                field="epsilon",
                value=epsilon,
                constraint="0.0 <= epsilon <= 1.0",
            )
        self._epsilon = epsilon
        self._rng = rng or random.Random()

    async def select_candidate(self, state: ParetoState) -> int:
        """Select a candidate using epsilon-greedy strategy.

        Args:
            state: Current evolution state with Pareto tracking.

        Returns:
            Selected candidate index.

        Raises:
            NoCandidateAvailableError: If no candidates are available.

        Examples:
            ```python
            candidate_idx = await selector.select_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available for selection")

        if self._rng.random() < self._epsilon:
            return self._rng.randint(0, len(state.candidates) - 1)

        if state.best_average_idx is None:
            raise NoCandidateAvailableError("No candidates available for selection")

        return state.best_average_idx

__init__

__init__(epsilon: float, rng: Random | None = None) -> None

Initialize the selector.

PARAMETER DESCRIPTION
epsilon

Probability of random exploration.

TYPE: float

rng

Optional random number generator for reproducibility.

TYPE: Random | None DEFAULT: None

RAISES DESCRIPTION
ConfigurationError

If epsilon is outside [0.0, 1.0].

Source code in src/gepa_adk/adapters/candidate_selector.py
def __init__(self, epsilon: float, rng: random.Random | None = None) -> None:
    """Initialize the selector.

    Args:
        epsilon: Probability of random exploration.
        rng: Optional random number generator for reproducibility.

    Raises:
        ConfigurationError: If epsilon is outside [0.0, 1.0].
    """
    if not 0.0 <= epsilon <= 1.0:
        raise ConfigurationError(
            "epsilon must be between 0.0 and 1.0",
            field="epsilon",
            value=epsilon,
            constraint="0.0 <= epsilon <= 1.0",
        )
    self._epsilon = epsilon
    self._rng = rng or random.Random()

select_candidate async

select_candidate(state: ParetoState) -> int

Select a candidate using epsilon-greedy strategy.

PARAMETER DESCRIPTION
state

Current evolution state with Pareto tracking.

TYPE: ParetoState

RETURNS DESCRIPTION
int

Selected candidate index.

RAISES DESCRIPTION
NoCandidateAvailableError

If no candidates are available.

Examples:

candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
async def select_candidate(self, state: ParetoState) -> int:
    """Select a candidate using epsilon-greedy strategy.

    Args:
        state: Current evolution state with Pareto tracking.

    Returns:
        Selected candidate index.

    Raises:
        NoCandidateAvailableError: If no candidates are available.

    Examples:
        ```python
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available for selection")

    if self._rng.random() < self._epsilon:
        return self._rng.randint(0, len(state.candidates) - 1)

    if state.best_average_idx is None:
        raise NoCandidateAvailableError("No candidates available for selection")

    return state.best_average_idx

ParetoCandidateSelector

Sample from the Pareto front proportional to leadership frequency.

ATTRIBUTE DESCRIPTION
_rng

RNG for sampling candidates.

TYPE: Random

Note

A Pareto selector emphasizes candidates that lead more examples.

Examples:

selector = ParetoCandidateSelector(rng=random.Random(42))
candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
class ParetoCandidateSelector:
    """Sample from the Pareto front proportional to leadership frequency.

    Attributes:
        _rng (random.Random): RNG for sampling candidates.

    Note:
        A Pareto selector emphasizes candidates that lead more examples.

    Examples:
        ```python
        selector = ParetoCandidateSelector(rng=random.Random(42))
        candidate_idx = await selector.select_candidate(state)
        ```
    """

    def __init__(self, rng: random.Random | None = None) -> None:
        """Initialize the selector.

        Args:
            rng: Optional random number generator for reproducibility.
        """
        self._rng = rng or random.Random()

    async def select_candidate(self, state: ParetoState) -> int:
        """Select a candidate index from the Pareto frontier.

        Args:
            state: Current evolution state with Pareto tracking.

        Returns:
            Selected candidate index.

        Raises:
            NoCandidateAvailableError: If no candidates or leaders exist.

        Examples:
            ```python
            candidate_idx = await selector.select_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available for selection")

        weights = state.frontier.get_selection_weights()
        if not weights:
            raise NoCandidateAvailableError("Pareto frontier is empty")

        sampling_list = [
            candidate_idx
            for candidate_idx, weight in weights.items()
            for _ in range(weight)
        ]
        if not sampling_list:
            raise NoCandidateAvailableError("Pareto frontier has no leaders")

        return self._rng.choice(sampling_list)

__init__

__init__(rng: Random | None = None) -> None

Initialize the selector.

PARAMETER DESCRIPTION
rng

Optional random number generator for reproducibility.

TYPE: Random | None DEFAULT: None

Source code in src/gepa_adk/adapters/candidate_selector.py
def __init__(self, rng: random.Random | None = None) -> None:
    """Initialize the selector.

    Args:
        rng: Optional random number generator for reproducibility.
    """
    self._rng = rng or random.Random()

select_candidate async

select_candidate(state: ParetoState) -> int

Select a candidate index from the Pareto frontier.

PARAMETER DESCRIPTION
state

Current evolution state with Pareto tracking.

TYPE: ParetoState

RETURNS DESCRIPTION
int

Selected candidate index.

RAISES DESCRIPTION
NoCandidateAvailableError

If no candidates or leaders exist.

Examples:

candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
async def select_candidate(self, state: ParetoState) -> int:
    """Select a candidate index from the Pareto frontier.

    Args:
        state: Current evolution state with Pareto tracking.

    Returns:
        Selected candidate index.

    Raises:
        NoCandidateAvailableError: If no candidates or leaders exist.

    Examples:
        ```python
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available for selection")

    weights = state.frontier.get_selection_weights()
    if not weights:
        raise NoCandidateAvailableError("Pareto frontier is empty")

    sampling_list = [
        candidate_idx
        for candidate_idx, weight in weights.items()
        for _ in range(weight)
    ]
    if not sampling_list:
        raise NoCandidateAvailableError("Pareto frontier has no leaders")

    return self._rng.choice(sampling_list)

ComponentHandlerRegistry

Registry for component handlers with O(1) lookup.

Stores component handlers keyed by component name, providing registration, lookup, and existence checking operations.

ATTRIBUTE DESCRIPTION
_handlers

Internal dict mapping component names to handlers.

TYPE: dict[str, ComponentHandler]

Examples:

Create and use a registry:

registry = ComponentHandlerRegistry()
registry.register("instruction", InstructionHandler())
handler = registry.get("instruction")
See Also
Note

A default registry instance is available as component_handlers module variable, with convenience functions get_handler() and register_handler().

Source code in src/gepa_adk/adapters/component_handlers.py
class ComponentHandlerRegistry:
    """Registry for component handlers with O(1) lookup.

    Stores component handlers keyed by component name, providing
    registration, lookup, and existence checking operations.

    Attributes:
        _handlers: Internal dict mapping component names to handlers.

    Examples:
        Create and use a registry:

        ```python
        registry = ComponentHandlerRegistry()
        registry.register("instruction", InstructionHandler())
        handler = registry.get("instruction")
        ```

    See Also:
        - [`get_handler()`][gepa_adk.adapters.component_handlers.get_handler]:
            Convenience function for default registry.

    Note:
        A default registry instance is available as `component_handlers`
        module variable, with convenience functions `get_handler()` and
        `register_handler()`.
    """

    def __init__(self) -> None:
        """Initialize empty registry.

        Examples:
            ```python
            registry = ComponentHandlerRegistry()
            assert not registry.has("instruction")
            ```

        Note:
            Creates an empty internal dict for handler storage.
        """
        self._handlers: dict[str, ComponentHandler] = {}

    def register(self, name: str, handler: ComponentHandler) -> None:
        """Register a handler for a component name.

        Args:
            name: Component name (e.g., "instruction", "output_schema").
                Must be a non-empty string.
            handler: Handler implementing ComponentHandler protocol.

        Raises:
            ValueError: If name is empty or None.
            TypeError: If handler doesn't implement ComponentHandler protocol.

        Examples:
            ```python
            registry.register("instruction", InstructionHandler())
            registry.register("output_schema", OutputSchemaHandler())
            ```

        Note:
            Overwrites existing handler if name already registered.
            Logs a debug message on replacement.
        """
        if not name:
            raise ValueError("Component name must be a non-empty string")

        if not isinstance(handler, ComponentHandler):
            raise TypeError(
                f"Handler does not implement ComponentHandler protocol: {type(handler)}"
            )

        if name in self._handlers:
            logger.debug(
                "component_handler.register.replace",
                name=name,
                old_handler=type(self._handlers[name]).__name__,
                new_handler=type(handler).__name__,
            )

        self._handlers[name] = handler
        logger.debug(
            "component_handler.register.success",
            name=name,
            handler=type(handler).__name__,
        )

    def get(self, name: str) -> ComponentHandler:
        """Retrieve handler for component name.

        Args:
            name: Component name to look up.

        Returns:
            The registered ComponentHandler.

        Raises:
            ValueError: If name is empty or None.
            KeyError: If no handler registered for name.

        Examples:
            ```python
            handler = registry.get("instruction")
            original = handler.apply(agent, "New value")
            ```
        """
        if not name:
            raise ValueError("Component name must be a non-empty string")

        if name not in self._handlers:
            raise KeyError(f"No handler registered for component: {name}")

        return self._handlers[name]

    def has(self, name: str) -> bool:
        """Check if handler exists for component name.

        Args:
            name: Component name to check.

        Returns:
            True if handler registered, False otherwise.

        Examples:
            ```python
            if registry.has("instruction"):
                handler = registry.get("instruction")
            ```

        Note:
            Outputs False for empty/None names (no ValueError).
            This allows safe checking without exception handling.
        """
        if not name:
            return False
        return name in self._handlers

    def names(self) -> list[str]:
        """Return sorted list of registered handler names.

        Returns:
            Sorted list of component names with registered handlers.

        Examples:
            ```python
            available = registry.names()
            # ["generate_content_config", "instruction", "output_schema"]
            ```

        Note:
            Output is sorted alphabetically for consistent error messages
            and validation feedback.
        """
        return sorted(self._handlers.keys())

__init__

__init__() -> None

Initialize empty registry.

Examples:

registry = ComponentHandlerRegistry()
assert not registry.has("instruction")
Note

Creates an empty internal dict for handler storage.

Source code in src/gepa_adk/adapters/component_handlers.py
def __init__(self) -> None:
    """Initialize empty registry.

    Examples:
        ```python
        registry = ComponentHandlerRegistry()
        assert not registry.has("instruction")
        ```

    Note:
        Creates an empty internal dict for handler storage.
    """
    self._handlers: dict[str, ComponentHandler] = {}

register

register(name: str, handler: ComponentHandler) -> None

Register a handler for a component name.

PARAMETER DESCRIPTION
name

Component name (e.g., "instruction", "output_schema"). Must be a non-empty string.

TYPE: str

handler

Handler implementing ComponentHandler protocol.

TYPE: ComponentHandler

RAISES DESCRIPTION
ValueError

If name is empty or None.

TypeError

If handler doesn't implement ComponentHandler protocol.

Examples:

registry.register("instruction", InstructionHandler())
registry.register("output_schema", OutputSchemaHandler())
Note

Overwrites existing handler if name already registered. Logs a debug message on replacement.

Source code in src/gepa_adk/adapters/component_handlers.py
def register(self, name: str, handler: ComponentHandler) -> None:
    """Register a handler for a component name.

    Args:
        name: Component name (e.g., "instruction", "output_schema").
            Must be a non-empty string.
        handler: Handler implementing ComponentHandler protocol.

    Raises:
        ValueError: If name is empty or None.
        TypeError: If handler doesn't implement ComponentHandler protocol.

    Examples:
        ```python
        registry.register("instruction", InstructionHandler())
        registry.register("output_schema", OutputSchemaHandler())
        ```

    Note:
        Overwrites existing handler if name already registered.
        Logs a debug message on replacement.
    """
    if not name:
        raise ValueError("Component name must be a non-empty string")

    if not isinstance(handler, ComponentHandler):
        raise TypeError(
            f"Handler does not implement ComponentHandler protocol: {type(handler)}"
        )

    if name in self._handlers:
        logger.debug(
            "component_handler.register.replace",
            name=name,
            old_handler=type(self._handlers[name]).__name__,
            new_handler=type(handler).__name__,
        )

    self._handlers[name] = handler
    logger.debug(
        "component_handler.register.success",
        name=name,
        handler=type(handler).__name__,
    )

get

get(name: str) -> ComponentHandler

Retrieve handler for component name.

PARAMETER DESCRIPTION
name

Component name to look up.

TYPE: str

RETURNS DESCRIPTION
ComponentHandler

The registered ComponentHandler.

RAISES DESCRIPTION
ValueError

If name is empty or None.

KeyError

If no handler registered for name.

Examples:

handler = registry.get("instruction")
original = handler.apply(agent, "New value")
Source code in src/gepa_adk/adapters/component_handlers.py
def get(self, name: str) -> ComponentHandler:
    """Retrieve handler for component name.

    Args:
        name: Component name to look up.

    Returns:
        The registered ComponentHandler.

    Raises:
        ValueError: If name is empty or None.
        KeyError: If no handler registered for name.

    Examples:
        ```python
        handler = registry.get("instruction")
        original = handler.apply(agent, "New value")
        ```
    """
    if not name:
        raise ValueError("Component name must be a non-empty string")

    if name not in self._handlers:
        raise KeyError(f"No handler registered for component: {name}")

    return self._handlers[name]

has

has(name: str) -> bool

Check if handler exists for component name.

PARAMETER DESCRIPTION
name

Component name to check.

TYPE: str

RETURNS DESCRIPTION
bool

True if handler registered, False otherwise.

Examples:

if registry.has("instruction"):
    handler = registry.get("instruction")
Note

Outputs False for empty/None names (no ValueError). This allows safe checking without exception handling.

Source code in src/gepa_adk/adapters/component_handlers.py
def has(self, name: str) -> bool:
    """Check if handler exists for component name.

    Args:
        name: Component name to check.

    Returns:
        True if handler registered, False otherwise.

    Examples:
        ```python
        if registry.has("instruction"):
            handler = registry.get("instruction")
        ```

    Note:
        Outputs False for empty/None names (no ValueError).
        This allows safe checking without exception handling.
    """
    if not name:
        return False
    return name in self._handlers

names

names() -> list[str]

Return sorted list of registered handler names.

RETURNS DESCRIPTION
list[str]

Sorted list of component names with registered handlers.

Examples:

available = registry.names()
# ["generate_content_config", "instruction", "output_schema"]
Note

Output is sorted alphabetically for consistent error messages and validation feedback.

Source code in src/gepa_adk/adapters/component_handlers.py
def names(self) -> list[str]:
    """Return sorted list of registered handler names.

    Returns:
        Sorted list of component names with registered handlers.

    Examples:
        ```python
        available = registry.names()
        # ["generate_content_config", "instruction", "output_schema"]
        ```

    Note:
        Output is sorted alphabetically for consistent error messages
        and validation feedback.
    """
    return sorted(self._handlers.keys())

GenerateContentConfigHandler

Handler for agent.generate_content_config component.

Manages serialization, application, and restoration of the agent's LLM generation configuration during evolution.

Examples:

handler = GenerateContentConfigHandler()
original = handler.serialize(agent)  # YAML string
handler.apply(agent, "temperature: 0.5")
# ... evaluate ...
handler.restore(agent, original_config)
Note

All state is stored in the agent object - handler is stateless. On invalid config, logs warning and keeps original.

Source code in src/gepa_adk/adapters/component_handlers.py
class GenerateContentConfigHandler:
    """Handler for agent.generate_content_config component.

    Manages serialization, application, and restoration of the
    agent's LLM generation configuration during evolution.

    Examples:
        ```python
        handler = GenerateContentConfigHandler()
        original = handler.serialize(agent)  # YAML string
        handler.apply(agent, "temperature: 0.5")
        # ... evaluate ...
        handler.restore(agent, original_config)
        ```

    Note:
        All state is stored in the agent object - handler is stateless.
        On invalid config, logs warning and keeps original.
    """

    def serialize(self, agent: "LlmAgent") -> str:
        """Extract generate_content_config from agent as YAML.

        Args:
            agent: The LlmAgent instance.

        Returns:
            YAML string with parameter descriptions as comments.
            Returns empty string if generate_content_config is None.

        Examples:
            ```python
            yaml_str = handler.serialize(agent)
            # yaml_str contains YAML with temperature, top_p, etc.
            ```
        """
        config = getattr(agent, "generate_content_config", None)
        return serialize_generate_config(config)

    def apply(self, agent: "LlmAgent", value: str) -> Any:
        """Apply new generate_content_config to agent, return original.

        Args:
            agent: The LlmAgent instance to modify.
            value: YAML string defining the new config parameters.

        Returns:
            The original GenerateContentConfig (or None).

        Examples:
            ```python
            original = handler.apply(agent, "temperature: 0.5")
            # agent.generate_content_config.temperature is now 0.5
            ```

        Note:
            On deserialization or validation failure, logs warning and
            keeps original config. Never raises exceptions.
        """
        original = getattr(agent, "generate_content_config", None)

        try:
            new_config = deserialize_generate_config(value, original)

            # Validate the parsed config
            config_dict = {}
            for param in [
                "temperature",
                "top_p",
                "top_k",
                "max_output_tokens",
                "presence_penalty",
                "frequency_penalty",
            ]:
                param_value = getattr(new_config, param, None)
                if param_value is not None:
                    config_dict[param] = param_value

            errors = validate_generate_config(config_dict)
            if errors:
                logger.warning(
                    "generate_content_config_handler.apply.validation_failed",
                    errors=errors,
                    config_preview=value[:100] if value else "",
                )
                # Keep original - don't modify agent
                return original

            agent.generate_content_config = new_config
            logger.debug(
                "generate_content_config_handler.apply",
                original_temp=getattr(original, "temperature", None)
                if original
                else None,
                new_temp=getattr(new_config, "temperature", None),
            )
        except ConfigValidationError as e:
            logger.warning(
                "generate_content_config_handler.apply.failed",
                error=str(e),
                config_preview=value[:100] if value else "",
            )
            # Keep original - don't modify agent

        return original

    def restore(self, agent: "LlmAgent", original: Any) -> None:
        """Restore original generate_content_config to agent.

        Args:
            agent: The LlmAgent instance to restore.
            original: The original GenerateContentConfig (or None).

        Examples:
            ```python
            handler.restore(agent, original_config)
            # agent.generate_content_config is back to original
            ```
        """
        agent.generate_content_config = original
        logger.debug(
            "generate_content_config_handler.restore",
            has_config=original is not None,
        )

serialize

serialize(agent: 'LlmAgent') -> str

Extract generate_content_config from agent as YAML.

PARAMETER DESCRIPTION
agent

The LlmAgent instance.

TYPE: 'LlmAgent'

RETURNS DESCRIPTION
str

YAML string with parameter descriptions as comments.

str

Returns empty string if generate_content_config is None.

Examples:

yaml_str = handler.serialize(agent)
# yaml_str contains YAML with temperature, top_p, etc.
Source code in src/gepa_adk/adapters/component_handlers.py
def serialize(self, agent: "LlmAgent") -> str:
    """Extract generate_content_config from agent as YAML.

    Args:
        agent: The LlmAgent instance.

    Returns:
        YAML string with parameter descriptions as comments.
        Returns empty string if generate_content_config is None.

    Examples:
        ```python
        yaml_str = handler.serialize(agent)
        # yaml_str contains YAML with temperature, top_p, etc.
        ```
    """
    config = getattr(agent, "generate_content_config", None)
    return serialize_generate_config(config)

apply

apply(agent: 'LlmAgent', value: str) -> Any

Apply new generate_content_config to agent, return original.

PARAMETER DESCRIPTION
agent

The LlmAgent instance to modify.

TYPE: 'LlmAgent'

value

YAML string defining the new config parameters.

TYPE: str

RETURNS DESCRIPTION
Any

The original GenerateContentConfig (or None).

Examples:

original = handler.apply(agent, "temperature: 0.5")
# agent.generate_content_config.temperature is now 0.5
Note

On deserialization or validation failure, logs warning and keeps original config. Never raises exceptions.

Source code in src/gepa_adk/adapters/component_handlers.py
def apply(self, agent: "LlmAgent", value: str) -> Any:
    """Apply new generate_content_config to agent, return original.

    Args:
        agent: The LlmAgent instance to modify.
        value: YAML string defining the new config parameters.

    Returns:
        The original GenerateContentConfig (or None).

    Examples:
        ```python
        original = handler.apply(agent, "temperature: 0.5")
        # agent.generate_content_config.temperature is now 0.5
        ```

    Note:
        On deserialization or validation failure, logs warning and
        keeps original config. Never raises exceptions.
    """
    original = getattr(agent, "generate_content_config", None)

    try:
        new_config = deserialize_generate_config(value, original)

        # Validate the parsed config
        config_dict = {}
        for param in [
            "temperature",
            "top_p",
            "top_k",
            "max_output_tokens",
            "presence_penalty",
            "frequency_penalty",
        ]:
            param_value = getattr(new_config, param, None)
            if param_value is not None:
                config_dict[param] = param_value

        errors = validate_generate_config(config_dict)
        if errors:
            logger.warning(
                "generate_content_config_handler.apply.validation_failed",
                errors=errors,
                config_preview=value[:100] if value else "",
            )
            # Keep original - don't modify agent
            return original

        agent.generate_content_config = new_config
        logger.debug(
            "generate_content_config_handler.apply",
            original_temp=getattr(original, "temperature", None)
            if original
            else None,
            new_temp=getattr(new_config, "temperature", None),
        )
    except ConfigValidationError as e:
        logger.warning(
            "generate_content_config_handler.apply.failed",
            error=str(e),
            config_preview=value[:100] if value else "",
        )
        # Keep original - don't modify agent

    return original

restore

restore(agent: 'LlmAgent', original: Any) -> None

Restore original generate_content_config to agent.

PARAMETER DESCRIPTION
agent

The LlmAgent instance to restore.

TYPE: 'LlmAgent'

original

The original GenerateContentConfig (or None).

TYPE: Any

Examples:

handler.restore(agent, original_config)
# agent.generate_content_config is back to original
Source code in src/gepa_adk/adapters/component_handlers.py
def restore(self, agent: "LlmAgent", original: Any) -> None:
    """Restore original generate_content_config to agent.

    Args:
        agent: The LlmAgent instance to restore.
        original: The original GenerateContentConfig (or None).

    Examples:
        ```python
        handler.restore(agent, original_config)
        # agent.generate_content_config is back to original
        ```
    """
    agent.generate_content_config = original
    logger.debug(
        "generate_content_config_handler.restore",
        has_config=original is not None,
    )

InstructionHandler

Handler for agent.instruction component.

Manages serialization, application, and restoration of the agent's instruction (system prompt) during evolution.

Examples:

handler = InstructionHandler()
original = handler.serialize(agent)  # "Be helpful"
handler.apply(agent, "Be concise")
# ... evaluate ...
handler.restore(agent, original)  # Back to "Be helpful"
Note

All state is stored in the agent object - handler is stateless. No instance attributes are maintained.

Source code in src/gepa_adk/adapters/component_handlers.py
class InstructionHandler:
    """Handler for agent.instruction component.

    Manages serialization, application, and restoration of the
    agent's instruction (system prompt) during evolution.

    Examples:
        ```python
        handler = InstructionHandler()
        original = handler.serialize(agent)  # "Be helpful"
        handler.apply(agent, "Be concise")
        # ... evaluate ...
        handler.restore(agent, original)  # Back to "Be helpful"
        ```

    Note:
        All state is stored in the agent object - handler is stateless.
        No instance attributes are maintained.
    """

    def serialize(self, agent: "LlmAgent") -> str:
        """Extract instruction from agent as string.

        Args:
            agent: The LlmAgent instance.

        Returns:
            The agent's instruction as string.
            Returns empty string if instruction is None.

        Examples:
            ```python
            text = handler.serialize(agent)
            # text == "You are a helpful assistant."
            ```
        """
        instruction = agent.instruction
        if instruction is None:
            return ""
        return str(instruction)

    def apply(self, agent: "LlmAgent", value: str) -> str:
        """Apply new instruction to agent, return original.

        Args:
            agent: The LlmAgent instance to modify.
            value: The new instruction string.

        Returns:
            The original instruction value.

        Examples:
            ```python
            original = handler.apply(agent, "New instruction")
            # agent.instruction is now "New instruction"
            ```
        """
        original = self.serialize(agent)
        agent.instruction = value
        logger.debug(
            "instruction_handler.apply",
            original_preview=original[:50] if original else "",
            new_preview=value[:50] if value else "",
        )
        return original

    def restore(self, agent: "LlmAgent", original: str) -> None:
        """Restore original instruction to agent.

        Args:
            agent: The LlmAgent instance to restore.
            original: The original instruction value.

        Examples:
            ```python
            handler.restore(agent, original)
            # agent.instruction is back to original
            ```
        """
        agent.instruction = original
        logger.debug(
            "instruction_handler.restore",
            instruction_preview=original[:50] if original else "",
        )

serialize

serialize(agent: 'LlmAgent') -> str

Extract instruction from agent as string.

PARAMETER DESCRIPTION
agent

The LlmAgent instance.

TYPE: 'LlmAgent'

RETURNS DESCRIPTION
str

The agent's instruction as string.

str

Returns empty string if instruction is None.

Examples:

text = handler.serialize(agent)
# text == "You are a helpful assistant."
Source code in src/gepa_adk/adapters/component_handlers.py
def serialize(self, agent: "LlmAgent") -> str:
    """Extract instruction from agent as string.

    Args:
        agent: The LlmAgent instance.

    Returns:
        The agent's instruction as string.
        Returns empty string if instruction is None.

    Examples:
        ```python
        text = handler.serialize(agent)
        # text == "You are a helpful assistant."
        ```
    """
    instruction = agent.instruction
    if instruction is None:
        return ""
    return str(instruction)

apply

apply(agent: 'LlmAgent', value: str) -> str

Apply new instruction to agent, return original.

PARAMETER DESCRIPTION
agent

The LlmAgent instance to modify.

TYPE: 'LlmAgent'

value

The new instruction string.

TYPE: str

RETURNS DESCRIPTION
str

The original instruction value.

Examples:

original = handler.apply(agent, "New instruction")
# agent.instruction is now "New instruction"
Source code in src/gepa_adk/adapters/component_handlers.py
def apply(self, agent: "LlmAgent", value: str) -> str:
    """Apply new instruction to agent, return original.

    Args:
        agent: The LlmAgent instance to modify.
        value: The new instruction string.

    Returns:
        The original instruction value.

    Examples:
        ```python
        original = handler.apply(agent, "New instruction")
        # agent.instruction is now "New instruction"
        ```
    """
    original = self.serialize(agent)
    agent.instruction = value
    logger.debug(
        "instruction_handler.apply",
        original_preview=original[:50] if original else "",
        new_preview=value[:50] if value else "",
    )
    return original

restore

restore(agent: 'LlmAgent', original: str) -> None

Restore original instruction to agent.

PARAMETER DESCRIPTION
agent

The LlmAgent instance to restore.

TYPE: 'LlmAgent'

original

The original instruction value.

TYPE: str

Examples:

handler.restore(agent, original)
# agent.instruction is back to original
Source code in src/gepa_adk/adapters/component_handlers.py
def restore(self, agent: "LlmAgent", original: str) -> None:
    """Restore original instruction to agent.

    Args:
        agent: The LlmAgent instance to restore.
        original: The original instruction value.

    Examples:
        ```python
        handler.restore(agent, original)
        # agent.instruction is back to original
        ```
    """
    agent.instruction = original
    logger.debug(
        "instruction_handler.restore",
        instruction_preview=original[:50] if original else "",
    )

OutputSchemaHandler

Handler for agent.output_schema component.

Manages serialization, application, and restoration of the agent's output schema (Pydantic model) during evolution.

ATTRIBUTE DESCRIPTION
_constraints

Optional SchemaConstraints for field preservation.

TYPE: SchemaConstraints | None

Examples:

handler = OutputSchemaHandler()
handler.set_constraints(SchemaConstraints(required_fields=("score",)))
original_schema = handler.apply(agent, new_schema_text)
# ... evaluate ...
handler.restore(agent, original_schema)
Note

Applies serialize_pydantic_schema and deserialize_schema utilities. On invalid schema text, keeps original and logs warning. When constraints are set, validates proposed schemas before applying.

Source code in src/gepa_adk/adapters/component_handlers.py
class OutputSchemaHandler:
    """Handler for agent.output_schema component.

    Manages serialization, application, and restoration of the
    agent's output schema (Pydantic model) during evolution.

    Attributes:
        _constraints: Optional SchemaConstraints for field preservation.

    Examples:
        ```python
        handler = OutputSchemaHandler()
        handler.set_constraints(SchemaConstraints(required_fields=("score",)))
        original_schema = handler.apply(agent, new_schema_text)
        # ... evaluate ...
        handler.restore(agent, original_schema)
        ```

    Note:
        Applies serialize_pydantic_schema and deserialize_schema utilities.
        On invalid schema text, keeps original and logs warning.
        When constraints are set, validates proposed schemas before applying.
    """

    def __init__(self) -> None:
        """Initialize handler with no constraints.

        Examples:
            ```python
            handler = OutputSchemaHandler()
            assert handler._constraints is None
            ```
        """
        self._constraints: SchemaConstraints | None = None

    def set_constraints(self, constraints: SchemaConstraints | None) -> None:
        """Set schema constraints for field preservation.

        Args:
            constraints: SchemaConstraints specifying required fields and
                type preservation rules. Pass None to clear constraints.

        Examples:
            ```python
            handler.set_constraints(SchemaConstraints(required_fields=("score",)))
            handler.set_constraints(None)  # Clear constraints
            ```

        Note:
            Once set, constraints are checked during apply() - proposed schemas
            that violate constraints will be rejected and the original kept.
        """
        self._constraints = constraints
        logger.debug(
            "output_schema_handler.set_constraints",
            has_constraints=constraints is not None,
            required_fields=constraints.required_fields if constraints else None,
        )

    def serialize(self, agent: "LlmAgent") -> str:
        """Extract output schema from agent as Python source.

        Args:
            agent: The LlmAgent instance.

        Returns:
            Python source code defining the schema class.
            Returns empty string if output_schema is None.

        Examples:
            ```python
            schema_text = handler.serialize(agent)
            # schema_text contains Python class definition
            ```
        """
        output_schema = getattr(agent, "output_schema", None)
        if output_schema is None:
            return ""
        try:
            return serialize_pydantic_schema(output_schema)
        except (TypeError, OSError) as e:
            logger.warning(
                "output_schema_handler.serialize.failed",
                error=str(e),
                schema_type=type(output_schema).__name__,
            )
            return ""

    def apply(self, agent: "LlmAgent", value: str) -> Any:
        """Apply new output schema to agent, return original.

        Args:
            agent: The LlmAgent instance to modify.
            value: Python source code defining the new schema.

        Returns:
            The original output_schema (class or None).

        Examples:
            ```python
            original = handler.apply(
                agent,
                '''
            class NewSchema(BaseModel):
                result: str
            ''',
            )
            ```

        Note:
            On deserialization failure, logs warning and keeps original.
            On constraint violation, keeps original.
            Never raises exceptions - graceful degradation.
        """
        original = getattr(agent, "output_schema", None)

        try:
            new_schema = deserialize_schema(value)

            # Validate against constraints if set
            if self._constraints is not None:
                is_valid, violations = validate_schema_against_constraints(
                    new_schema, original, self._constraints
                )
                if not is_valid:
                    logger.warning(
                        "output_schema_handler.apply.constraint_violation",
                        violations=violations,
                        schema_preview=value[:100] if value else "",
                    )
                    # Keep original - constraint violation
                    return original

            agent.output_schema = new_schema
            logger.debug(
                "output_schema_handler.apply",
                original_name=original.__name__ if original else None,
                new_name=new_schema.__name__,
            )
        except SchemaValidationError as e:
            logger.warning(
                "output_schema_handler.apply.failed",
                error=str(e),
                schema_preview=value[:100] if value else "",
            )
            # Keep original - don't modify agent

        return original

    def restore(self, agent: "LlmAgent", original: Any) -> None:
        """Restore original output schema to agent.

        Args:
            agent: The LlmAgent instance to restore.
            original: The original output_schema (class or None).

        Examples:
            ```python
            handler.restore(agent, original_schema)
            # agent.output_schema is back to original
            ```
        """
        agent.output_schema = original
        logger.debug(
            "output_schema_handler.restore",
            schema_name=original.__name__ if original else None,
        )

__init__

__init__() -> None

Initialize handler with no constraints.

Examples:

handler = OutputSchemaHandler()
assert handler._constraints is None
Source code in src/gepa_adk/adapters/component_handlers.py
def __init__(self) -> None:
    """Initialize handler with no constraints.

    Examples:
        ```python
        handler = OutputSchemaHandler()
        assert handler._constraints is None
        ```
    """
    self._constraints: SchemaConstraints | None = None

set_constraints

set_constraints(
    constraints: SchemaConstraints | None,
) -> None

Set schema constraints for field preservation.

PARAMETER DESCRIPTION
constraints

SchemaConstraints specifying required fields and type preservation rules. Pass None to clear constraints.

TYPE: SchemaConstraints | None

Examples:

handler.set_constraints(SchemaConstraints(required_fields=("score",)))
handler.set_constraints(None)  # Clear constraints
Note

Once set, constraints are checked during apply() - proposed schemas that violate constraints will be rejected and the original kept.

Source code in src/gepa_adk/adapters/component_handlers.py
def set_constraints(self, constraints: SchemaConstraints | None) -> None:
    """Set schema constraints for field preservation.

    Args:
        constraints: SchemaConstraints specifying required fields and
            type preservation rules. Pass None to clear constraints.

    Examples:
        ```python
        handler.set_constraints(SchemaConstraints(required_fields=("score",)))
        handler.set_constraints(None)  # Clear constraints
        ```

    Note:
        Once set, constraints are checked during apply() - proposed schemas
        that violate constraints will be rejected and the original kept.
    """
    self._constraints = constraints
    logger.debug(
        "output_schema_handler.set_constraints",
        has_constraints=constraints is not None,
        required_fields=constraints.required_fields if constraints else None,
    )

serialize

serialize(agent: 'LlmAgent') -> str

Extract output schema from agent as Python source.

PARAMETER DESCRIPTION
agent

The LlmAgent instance.

TYPE: 'LlmAgent'

RETURNS DESCRIPTION
str

Python source code defining the schema class.

str

Returns empty string if output_schema is None.

Examples:

schema_text = handler.serialize(agent)
# schema_text contains Python class definition
Source code in src/gepa_adk/adapters/component_handlers.py
def serialize(self, agent: "LlmAgent") -> str:
    """Extract output schema from agent as Python source.

    Args:
        agent: The LlmAgent instance.

    Returns:
        Python source code defining the schema class.
        Returns empty string if output_schema is None.

    Examples:
        ```python
        schema_text = handler.serialize(agent)
        # schema_text contains Python class definition
        ```
    """
    output_schema = getattr(agent, "output_schema", None)
    if output_schema is None:
        return ""
    try:
        return serialize_pydantic_schema(output_schema)
    except (TypeError, OSError) as e:
        logger.warning(
            "output_schema_handler.serialize.failed",
            error=str(e),
            schema_type=type(output_schema).__name__,
        )
        return ""

apply

apply(agent: 'LlmAgent', value: str) -> Any

Apply new output schema to agent, return original.

PARAMETER DESCRIPTION
agent

The LlmAgent instance to modify.

TYPE: 'LlmAgent'

value

Python source code defining the new schema.

TYPE: str

RETURNS DESCRIPTION
Any

The original output_schema (class or None).

Examples:

original = handler.apply(
    agent,
    '''
class NewSchema(BaseModel):
    result: str
''',
)
Note

On deserialization failure, logs warning and keeps original. On constraint violation, keeps original. Never raises exceptions - graceful degradation.

Source code in src/gepa_adk/adapters/component_handlers.py
def apply(self, agent: "LlmAgent", value: str) -> Any:
    """Apply new output schema to agent, return original.

    Args:
        agent: The LlmAgent instance to modify.
        value: Python source code defining the new schema.

    Returns:
        The original output_schema (class or None).

    Examples:
        ```python
        original = handler.apply(
            agent,
            '''
        class NewSchema(BaseModel):
            result: str
        ''',
        )
        ```

    Note:
        On deserialization failure, logs warning and keeps original.
        On constraint violation, keeps original.
        Never raises exceptions - graceful degradation.
    """
    original = getattr(agent, "output_schema", None)

    try:
        new_schema = deserialize_schema(value)

        # Validate against constraints if set
        if self._constraints is not None:
            is_valid, violations = validate_schema_against_constraints(
                new_schema, original, self._constraints
            )
            if not is_valid:
                logger.warning(
                    "output_schema_handler.apply.constraint_violation",
                    violations=violations,
                    schema_preview=value[:100] if value else "",
                )
                # Keep original - constraint violation
                return original

        agent.output_schema = new_schema
        logger.debug(
            "output_schema_handler.apply",
            original_name=original.__name__ if original else None,
            new_name=new_schema.__name__,
        )
    except SchemaValidationError as e:
        logger.warning(
            "output_schema_handler.apply.failed",
            error=str(e),
            schema_preview=value[:100] if value else "",
        )
        # Keep original - don't modify agent

    return original

restore

restore(agent: 'LlmAgent', original: Any) -> None

Restore original output schema to agent.

PARAMETER DESCRIPTION
agent

The LlmAgent instance to restore.

TYPE: 'LlmAgent'

original

The original output_schema (class or None).

TYPE: Any

Examples:

handler.restore(agent, original_schema)
# agent.output_schema is back to original
Source code in src/gepa_adk/adapters/component_handlers.py
def restore(self, agent: "LlmAgent", original: Any) -> None:
    """Restore original output schema to agent.

    Args:
        agent: The LlmAgent instance to restore.
        original: The original output_schema (class or None).

    Examples:
        ```python
        handler.restore(agent, original_schema)
        # agent.output_schema is back to original
        ```
    """
    agent.output_schema = original
    logger.debug(
        "output_schema_handler.restore",
        schema_name=original.__name__ if original else None,
    )

AllComponentSelector

Selects all available components for simultaneous update.

This selector returns the full list of components every time, enabling simultaneous evolution of all parts of the candidate.

Examples:

selector = AllComponentSelector()
all_comps = await selector.select_components(["a", "b"], 1, 0)
# Returns ["a", "b"]
Note

Always returns all components, enabling comprehensive mutations across the entire candidate in a single iteration.

Source code in src/gepa_adk/adapters/component_selector.py
class AllComponentSelector:
    """Selects all available components for simultaneous update.

    This selector returns the full list of components every time, enabling
    simultaneous evolution of all parts of the candidate.

    Examples:
        ```python
        selector = AllComponentSelector()
        all_comps = await selector.select_components(["a", "b"], 1, 0)
        # Returns ["a", "b"]
        ```

    Note:
        Always returns all components, enabling comprehensive mutations
        across the entire candidate in a single iteration.
    """

    async def select_components(
        self, components: list[str], iteration: int, candidate_idx: int
    ) -> list[str]:
        """Select all components to update.

        Args:
            components: List of available component keys.
            iteration: Current global iteration number (unused).
            candidate_idx: Index of the candidate being evolved (unused).

        Returns:
            List containing all component keys.

        Raises:
            ValueError: If components list is empty.

        Examples:
            ```python
            selected = await selector.select_components(["a", "b"], 1, 0)
            ```

        Note:
            Outputs the complete component list unchanged, enabling
            simultaneous evolution of all candidate parts.
        """
        if not components:
            raise ValueError("No components provided for selection")

        return list(components)

select_components async

select_components(
    components: list[str],
    iteration: int,
    candidate_idx: int,
) -> list[str]

Select all components to update.

PARAMETER DESCRIPTION
components

List of available component keys.

TYPE: list[str]

iteration

Current global iteration number (unused).

TYPE: int

candidate_idx

Index of the candidate being evolved (unused).

TYPE: int

RETURNS DESCRIPTION
list[str]

List containing all component keys.

RAISES DESCRIPTION
ValueError

If components list is empty.

Examples:

selected = await selector.select_components(["a", "b"], 1, 0)
Note

Outputs the complete component list unchanged, enabling simultaneous evolution of all candidate parts.

Source code in src/gepa_adk/adapters/component_selector.py
async def select_components(
    self, components: list[str], iteration: int, candidate_idx: int
) -> list[str]:
    """Select all components to update.

    Args:
        components: List of available component keys.
        iteration: Current global iteration number (unused).
        candidate_idx: Index of the candidate being evolved (unused).

    Returns:
        List containing all component keys.

    Raises:
        ValueError: If components list is empty.

    Examples:
        ```python
        selected = await selector.select_components(["a", "b"], 1, 0)
        ```

    Note:
        Outputs the complete component list unchanged, enabling
        simultaneous evolution of all candidate parts.
    """
    if not components:
        raise ValueError("No components provided for selection")

    return list(components)

RoundRobinComponentSelector

Selects components in a round-robin fashion.

This selector cycles through the list of components one by one, maintaining state per candidate index to ensure consistent rotation.

ATTRIBUTE DESCRIPTION
_next_index

Mapping of candidate_idx to next component index.

TYPE: dict[int, int]

Examples:

selector = RoundRobinComponentSelector()
# First call selects first component
c1 = await selector.select_components(["a", "b"], 1, 0)  # ["a"]
# Second call selects second component
c2 = await selector.select_components(["a", "b"], 2, 0)  # ["b"]
Note

Alternates through components sequentially, ensuring balanced evolution across all candidate parts.

Source code in src/gepa_adk/adapters/component_selector.py
class RoundRobinComponentSelector:
    """Selects components in a round-robin fashion.

    This selector cycles through the list of components one by one, maintaining
    state per candidate index to ensure consistent rotation.

    Attributes:
        _next_index (dict[int, int]): Mapping of candidate_idx to next component index.

    Examples:
        ```python
        selector = RoundRobinComponentSelector()
        # First call selects first component
        c1 = await selector.select_components(["a", "b"], 1, 0)  # ["a"]
        # Second call selects second component
        c2 = await selector.select_components(["a", "b"], 2, 0)  # ["b"]
        ```

    Note:
        Alternates through components sequentially, ensuring balanced evolution
        across all candidate parts.
    """

    def __init__(self) -> None:
        """Initialize the round-robin selector.

        Note:
            Creates empty index tracking dictionary for per-candidate rotation state.
        """
        self._next_index: dict[int, int] = defaultdict(int)

    async def select_components(
        self, components: list[str], iteration: int, candidate_idx: int
    ) -> list[str]:
        """Select a single component to update using round-robin logic.

        Args:
            components: List of available component keys.
            iteration: Current global iteration number (unused by this strategy).
            candidate_idx: Index of the candidate being evolved.

        Returns:
            List containing the single selected component key.

        Raises:
            ValueError: If components list is empty.

        Examples:
            ```python
            selected = await selector.select_components(["a", "b"], 1, 0)
            ```

        Note:
            Outputs one component per call, advancing the rotation index for
            the specified candidate.
        """
        if not components:
            raise ValueError("No components provided for selection")

        # Get current index for this candidate
        current_idx = self._next_index[candidate_idx]

        # Select component
        selected = components[current_idx % len(components)]

        # Advance index for next time
        self._next_index[candidate_idx] = (current_idx + 1) % len(components)

        return [selected]

__init__

__init__() -> None

Initialize the round-robin selector.

Note

Creates empty index tracking dictionary for per-candidate rotation state.

Source code in src/gepa_adk/adapters/component_selector.py
def __init__(self) -> None:
    """Initialize the round-robin selector.

    Note:
        Creates empty index tracking dictionary for per-candidate rotation state.
    """
    self._next_index: dict[int, int] = defaultdict(int)

select_components async

select_components(
    components: list[str],
    iteration: int,
    candidate_idx: int,
) -> list[str]

Select a single component to update using round-robin logic.

PARAMETER DESCRIPTION
components

List of available component keys.

TYPE: list[str]

iteration

Current global iteration number (unused by this strategy).

TYPE: int

candidate_idx

Index of the candidate being evolved.

TYPE: int

RETURNS DESCRIPTION
list[str]

List containing the single selected component key.

RAISES DESCRIPTION
ValueError

If components list is empty.

Examples:

selected = await selector.select_components(["a", "b"], 1, 0)
Note

Outputs one component per call, advancing the rotation index for the specified candidate.

Source code in src/gepa_adk/adapters/component_selector.py
async def select_components(
    self, components: list[str], iteration: int, candidate_idx: int
) -> list[str]:
    """Select a single component to update using round-robin logic.

    Args:
        components: List of available component keys.
        iteration: Current global iteration number (unused by this strategy).
        candidate_idx: Index of the candidate being evolved.

    Returns:
        List containing the single selected component key.

    Raises:
        ValueError: If components list is empty.

    Examples:
        ```python
        selected = await selector.select_components(["a", "b"], 1, 0)
        ```

    Note:
        Outputs one component per call, advancing the rotation index for
        the specified candidate.
    """
    if not components:
        raise ValueError("No components provided for selection")

    # Get current index for this candidate
    current_idx = self._next_index[candidate_idx]

    # Select component
    selected = components[current_idx % len(components)]

    # Advance index for next time
    self._next_index[candidate_idx] = (current_idx + 1) % len(components)

    return [selected]

CriticOutput

Bases: BaseModel


              flowchart TD
              gepa_adk.adapters.CriticOutput[CriticOutput]

              

              click gepa_adk.adapters.CriticOutput href "" "gepa_adk.adapters.CriticOutput"
            

Advanced schema for structured critic feedback with dimensions.

This schema defines the expected JSON structure that critic agents should return when configured with output_schema. The score field is required, while other fields are optional and will be preserved in metadata.

ATTRIBUTE DESCRIPTION
score

Score value between 0.0 and 1.0 (required).

TYPE: float

feedback

Human-readable feedback text (optional).

TYPE: str

dimension_scores

Per-dimension evaluation scores (optional).

TYPE: dict[str, float]

actionable_guidance

Specific improvement suggestions (optional).

TYPE: str

Examples:

Advanced critic output:

{
    "score": 0.75,
    "feedback": "Good response but could be more concise",
    "dimension_scores": {
        "accuracy": 0.9,
        "clarity": 0.6,
        "completeness": 0.8
    },
    "actionable_guidance": "Reduce response length by 30%"
}
Note

All critic agents using this schema must return structured JSON. When this schema is used as output_schema on an LlmAgent, the agent can ONLY reply and CANNOT use any tools. This is acceptable for critic agents focused on scoring.

See Also

SimpleCriticOutput: KISS schema with just score + feedback.

Source code in src/gepa_adk/adapters/critic_scorer.py
class CriticOutput(BaseModel):
    """Advanced schema for structured critic feedback with dimensions.

    This schema defines the expected JSON structure that critic agents
    should return when configured with output_schema. The score field is
    required, while other fields are optional and will be preserved in
    metadata.

    Attributes:
        score: Score value between 0.0 and 1.0 (required).
        feedback: Human-readable feedback text (optional).
        dimension_scores: Per-dimension evaluation scores (optional).
        actionable_guidance: Specific improvement suggestions (optional).

    Examples:
        Advanced critic output:

        ```json
        {
            "score": 0.75,
            "feedback": "Good response but could be more concise",
            "dimension_scores": {
                "accuracy": 0.9,
                "clarity": 0.6,
                "completeness": 0.8
            },
            "actionable_guidance": "Reduce response length by 30%"
        }
        ```

    Note:
        All critic agents using this schema must return structured JSON.
        When this schema is used as output_schema on an LlmAgent, the
        agent can ONLY reply and CANNOT use any tools. This is acceptable
        for critic agents focused on scoring.

    See Also:
        SimpleCriticOutput: KISS schema with just score + feedback.
    """

    score: float = Field(
        ...,
        ge=0.0,
        le=1.0,
        description="Score from 0.0 to 1.0",
    )
    feedback: str = Field(
        default="",
        description="Human-readable feedback",
    )
    dimension_scores: dict[str, float] = Field(
        default_factory=dict,
        description="Per-dimension scores",
    )
    actionable_guidance: str = Field(
        default="",
        description="Improvement suggestions",
    )

CriticScorer

Adapter that wraps ADK critic agents to provide structured scoring.

CriticScorer implements the Scorer protocol, enabling integration with gepa-adk's evaluation and evolution workflows. It executes ADK critic agents (LlmAgent, SequentialAgent, etc.) and extracts structured scores with metadata from their outputs.

ATTRIBUTE DESCRIPTION
critic_agent

ADK agent configured for evaluation.

TYPE: BaseAgent

_session_service

Session service for state management.

TYPE: BaseSessionService

_app_name

Application name for session identification.

TYPE: str

_logger

Bound logger with scorer context.

TYPE: BoundLogger

Examples:

Basic usage:

from google.adk.agents import LlmAgent
from gepa_adk.adapters.critic_scorer import CriticScorer, CriticOutput
from gepa_adk.adapters.agent_executor import AgentExecutor

critic = LlmAgent(
    name="quality_critic",
    model="gemini-2.5-flash",
    instruction="Evaluate response quality...",
    output_schema=CriticOutput,
)

executor = AgentExecutor()
scorer = CriticScorer(critic_agent=critic, executor=executor)
score, metadata = await scorer.async_score(
    input_text="What is Python?",
    output="Python is a programming language.",
)
Note

Adapter wraps ADK critic agents to provide structured scoring. Implements Scorer protocol for compatibility with evolution engine. Creates isolated sessions per scoring call unless session_id provided.

Source code in src/gepa_adk/adapters/critic_scorer.py
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
class CriticScorer:
    """Adapter that wraps ADK critic agents to provide structured scoring.

    CriticScorer implements the Scorer protocol, enabling integration with
    gepa-adk's evaluation and evolution workflows. It executes ADK critic
    agents (LlmAgent, SequentialAgent, etc.) and extracts structured scores
    with metadata from their outputs.

    Attributes:
        critic_agent (BaseAgent): ADK agent configured for evaluation.
        _session_service (BaseSessionService): Session service for state
            management.
        _app_name (str): Application name for session identification.
        _logger (structlog.BoundLogger): Bound logger with scorer context.

    Examples:
        Basic usage:

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters.critic_scorer import CriticScorer, CriticOutput
        from gepa_adk.adapters.agent_executor import AgentExecutor

        critic = LlmAgent(
            name="quality_critic",
            model="gemini-2.5-flash",
            instruction="Evaluate response quality...",
            output_schema=CriticOutput,
        )

        executor = AgentExecutor()
        scorer = CriticScorer(critic_agent=critic, executor=executor)
        score, metadata = await scorer.async_score(
            input_text="What is Python?",
            output="Python is a programming language.",
        )
        ```

    Note:
        Adapter wraps ADK critic agents to provide structured scoring.
        Implements Scorer protocol for compatibility with evolution engine.
        Creates isolated sessions per scoring call unless session_id provided.
    """

    def __init__(
        self,
        critic_agent: BaseAgent,
        executor: AgentExecutorProtocol,
        session_service: BaseSessionService | None = None,
        app_name: str = "critic_scorer",
    ) -> None:
        """Initialize CriticScorer with critic agent.

        Args:
            critic_agent: ADK agent (LlmAgent or workflow agent) configured
                for evaluation.
            executor: AgentExecutorProtocol implementation for unified agent
                execution. Handles session management and execution, enabling
                feature parity across all agent types.
            session_service: Optional session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for session identification.

        Raises:
            TypeError: If critic_agent is not a BaseAgent instance.
            ValueError: If app_name is empty string.

        Examples:
            Basic setup with executor:

            ```python
            from gepa_adk.adapters.agent_executor import AgentExecutor

            executor = AgentExecutor()
            scorer = CriticScorer(critic_agent=critic, executor=executor)
            ```

            With shared session service:

            ```python
            from google.adk.sessions import InMemorySessionService
            from gepa_adk.adapters.agent_executor import AgentExecutor

            session_service = InMemorySessionService()
            executor = AgentExecutor(session_service=session_service)
            scorer = CriticScorer(
                critic_agent=critic,
                executor=executor,
                session_service=session_service,
            )
            ```

        Note:
            Creates logger with scorer context and validates agent type.
        """
        if not isinstance(critic_agent, BaseAgent):
            raise TypeError(f"critic_agent must be BaseAgent, got {type(critic_agent)}")

        if not app_name or not app_name.strip():
            raise ValueError("app_name cannot be empty")

        self.critic_agent = critic_agent
        self._session_service = session_service or InMemorySessionService()
        self._app_name = app_name.strip()
        self._executor = executor

        # Bind logger with scorer context
        self._logger = logger.bind(
            scorer="CriticScorer",
            agent_name=self.critic_agent.name,
            app_name=self._app_name,
            uses_executor=True,  # Always true since executor is required
        )

        self._logger.info("scorer.initialized")

    def _format_critic_input(
        self,
        input_text: str,
        output: str,
        expected: str | None = None,
    ) -> str:
        """Format input for critic agent evaluation.

        Builds a prompt that presents the input query, agent output, and
        optionally the expected output for the critic to evaluate.

        Args:
            input_text: The original input provided to the agent being evaluated.
            output: The agent's generated output to score.
            expected: Optional expected/reference output for comparison.

        Returns:
            Formatted prompt string for the critic agent.

        Examples:
            Basic formatting:

            ```python
            prompt = scorer._format_critic_input(
                input_text="What is 2+2?",
                output="4",
                expected="4",
            )
            ```

        Note:
            Organizes input for critic evaluation with clearly labeled sections.
            Format is designed to give critic context for evaluation.
            Expected output is included only if provided.
        """
        parts = [
            "Input Query:",
            input_text,
            "",
            "Agent Output:",
            output,
        ]

        if expected is not None:
            parts.extend(
                [
                    "",
                    "Expected Output:",
                    expected,
                ]
            )

        parts.append("")
        parts.append(
            "Please evaluate the agent output and provide a score with feedback."
        )

        return "\n".join(parts)

    def _parse_critic_output(self, output_text: str) -> tuple[float, dict[str, Any]]:
        """Parse critic agent output and extract score with metadata.

        Parses the critic's output text as JSON and extracts the score field
        along with optional metadata (feedback, dimension_scores,
        actionable_guidance, and any additional fields).

        Args:
            output_text: Raw text output from critic agent.

        Returns:
            Tuple of (score, metadata) where:
            - score: Float value extracted from output
            - metadata: Dict containing feedback, dimension_scores,
                actionable_guidance, and any additional fields

        Raises:
            CriticOutputParseError: If output cannot be parsed as JSON.
            MissingScoreFieldError: If parsed JSON lacks required score field.

        Examples:
            Parse structured output:

            ```python
            output = '{"score": 0.75, "feedback": "Good", "dimension_scores": {"accuracy": 0.9}}'
            score, metadata = scorer._parse_critic_output(output)
            assert score == 0.75
            assert metadata["feedback"] == "Good"
            ```

        Note:
            Obtains score and metadata from critic JSON output with validation.
            Preserves all fields from parsed JSON in metadata, not just
            the known CriticOutput schema fields. This allows for extensibility.
        """
        # Parse JSON output
        try:
            parsed = json.loads(output_text)
        except json.JSONDecodeError as e:
            raise CriticOutputParseError(
                f"Critic output is not valid JSON: {e}",
                raw_output=output_text,
                parse_error=str(e),
                cause=e,
            ) from e

        # Validate parsed output is a dict
        if not isinstance(parsed, dict):
            raise CriticOutputParseError(
                f"Critic output must be a JSON object, got {type(parsed).__name__}",
                raw_output=output_text,
                parse_error="Not a JSON object",
            )

        # Extract required score field
        if "score" not in parsed:
            raise MissingScoreFieldError(
                "Critic output missing required 'score' field",
                parsed_output=parsed,
            )

        score = parsed["score"]
        if not isinstance(score, (int, float)):
            raise MissingScoreFieldError(
                f"Score field must be numeric, got {type(score).__name__}",
                parsed_output=parsed,
            )

        # Build metadata dict with known fields and any additional fields
        metadata: dict[str, Any] = {}

        # Extract known fields if present
        if "feedback" in parsed:
            metadata["feedback"] = str(parsed["feedback"])
        if "dimension_scores" in parsed:
            # Preserve dimension_scores as-is (may contain non-numeric values)
            metadata["dimension_scores"] = parsed["dimension_scores"]
        if "actionable_guidance" in parsed:
            metadata["actionable_guidance"] = str(parsed["actionable_guidance"])

        # Preserve any additional fields
        known_fields = {"score", "feedback", "dimension_scores", "actionable_guidance"}
        for key, value in parsed.items():
            if key not in known_fields:
                metadata[key] = value

        return float(score), metadata

    def _extract_json_from_text(self, text: str) -> str:
        """Extract JSON from text that may contain markdown code blocks.

        Minimal implementation - tries direct parse and markdown extraction.
        A more robust implementation will be added per GitHub issue #78.

        Args:
            text: Text that may contain JSON.

        Returns:
            Extracted JSON string, or original text if extraction fails.

        Note:
            Operates as a minimal JSON extractor; robust implementation planned
            per GitHub issue #78.
        """
        # Try parsing the entire text as-is
        try:
            json.loads(text.strip())
            return text.strip()
        except json.JSONDecodeError:
            pass

        # Extract from markdown code blocks (```json ... ``` or ``` ... ```)
        json_block_pattern = r"```(?:json)?\s*\n?(.*?)\n?```"
        matches = re.findall(json_block_pattern, text, re.DOTALL | re.IGNORECASE)
        for match in matches:
            try:
                json.loads(match.strip())
                return match.strip()
            except json.JSONDecodeError:
                continue

        # Try to find JSON object embedded in text (minimal regex for { ... })
        # Look for opening brace and try to find matching closing brace
        brace_start = text.find("{")
        if brace_start != -1:
            # Try to find the matching closing brace
            # NOTE: This algorithm doesn't account for braces within string literals
            # (e.g., JSON with template strings like "instruction": "Use {variable}").
            # This is a minimal implementation; a more robust parser will be added
            # per GitHub issue #78.
            depth = 0
            for i in range(brace_start, len(text)):
                if text[i] == "{":
                    depth += 1
                elif text[i] == "}":
                    depth -= 1
                    if depth == 0:
                        candidate = text[brace_start : i + 1]
                        try:
                            json.loads(candidate)
                            return candidate
                        except json.JSONDecodeError:
                            break

        # Return original text (will fail with clear error message)
        return text

    async def async_score(
        self,
        input_text: str,
        output: str,
        expected: str | None = None,
        session_id: str | None = None,
    ) -> tuple[float, dict[str, Any]]:
        """Score an agent output asynchronously using the critic agent.

        Executes the critic agent with formatted input and extracts structured
        score and metadata from the response.

        Args:
            input_text: The original input provided to the agent being evaluated.
            output: The agent's generated output to score.
            expected: Optional expected/reference output for comparison.
            session_id: Optional session ID to share state with main agent
                workflow. If None, creates an isolated session.

        Returns:
            Tuple of (score, metadata) where:
            - score: Float value, conventionally 0.0-1.0
            - metadata: Dict with feedback, dimension_scores,
                actionable_guidance, and any additional fields

        Raises:
            CriticOutputParseError: If critic output is not valid JSON.
            MissingScoreFieldError: If score field missing from output.

        Examples:
            Basic async scoring:

            ```python
            score, metadata = await scorer.async_score(
                input_text="What is Python?",
                output="Python is a programming language.",
            )
            ```

            With session sharing:

            ```python
            score, metadata = await scorer.async_score(
                input_text="...",
                output="...",
                session_id="existing_session_123",
            )
            ```

        Note:
            Orchestrates critic agent execution via AgentExecutor and extracts
            structured output. Creates isolated session unless session_id provided
            for state sharing.
        """
        self._logger.debug(
            "scorer.async_score.start",
            input_preview=input_text[:50] if input_text else "",
            output_preview=output[:50] if output else "",
            has_expected=expected is not None,
            session_id=session_id,
        )

        # Format input for critic
        critic_input = self._format_critic_input(input_text, output, expected)

        # Execute via AgentExecutor
        result = await self._executor.execute_agent(
            agent=self.critic_agent,
            input_text=critic_input,
            existing_session_id=session_id,
        )

        if result.status == ExecutionStatus.FAILED:
            raise ScoringError(
                f"Critic agent execution failed: {result.error_message}",
            )

        final_output = result.extracted_value

        if not final_output:
            raise ScoringError("Critic agent returned empty output")

        # Parse output and extract score
        try:
            score, metadata = self._parse_critic_output(final_output)
        except (CriticOutputParseError, MissingScoreFieldError) as e:
            self._logger.error(
                "scorer.async_score.parse_error",
                error=str(e),
                error_type=type(e).__name__,
            )
            raise

        # Log multi-dimensional scoring context if present
        log_context: dict[str, Any] = {
            "score": score,
            "has_feedback": "feedback" in metadata,
            "has_dimension_scores": "dimension_scores" in metadata,
            "has_actionable_guidance": "actionable_guidance" in metadata,
        }
        if "dimension_scores" in metadata:
            log_context["dimension_count"] = len(metadata["dimension_scores"])

        self._logger.info(
            "scorer.async_score.complete",
            **log_context,
        )

        return score, metadata

    def score(
        self,
        input_text: str,
        output: str,
        expected: str | None = None,
    ) -> tuple[float, dict[str, Any]]:
        """Score an agent output synchronously using the critic agent.

        Synchronous wrapper around async_score() using asyncio.run().

        Args:
            input_text: The original input provided to the agent being evaluated.
            output: The agent's generated output to score.
            expected: Optional expected/reference output for comparison.

        Returns:
            Tuple of (score, metadata) where:
            - score: Float value, conventionally 0.0-1.0
            - metadata: Dict with feedback, dimension_scores,
                actionable_guidance, and any additional fields

        Raises:
            CriticOutputParseError: If critic output is not valid JSON.
            MissingScoreFieldError: If score field missing from output.

        Examples:
            Basic sync scoring:

            ```python
            score, metadata = scorer.score(
                input_text="What is 2+2?",
                output="4",
                expected="4",
            )
            ```

        Note:
            Operates synchronously by wrapping async_score() with asyncio.run().
            Uses asyncio.run() to execute async_score(). Prefer async_score()
            for better performance in async contexts.
        """
        return asyncio.run(self.async_score(input_text, output, expected))

__init__

__init__(
    critic_agent: BaseAgent,
    executor: AgentExecutorProtocol,
    session_service: BaseSessionService | None = None,
    app_name: str = "critic_scorer",
) -> None

Initialize CriticScorer with critic agent.

PARAMETER DESCRIPTION
critic_agent

ADK agent (LlmAgent or workflow agent) configured for evaluation.

TYPE: BaseAgent

executor

AgentExecutorProtocol implementation for unified agent execution. Handles session management and execution, enabling feature parity across all agent types.

TYPE: AgentExecutorProtocol

session_service

Optional session service for state management. If None, creates an InMemorySessionService.

TYPE: BaseSessionService | None DEFAULT: None

app_name

Application name for session identification.

TYPE: str DEFAULT: 'critic_scorer'

RAISES DESCRIPTION
TypeError

If critic_agent is not a BaseAgent instance.

ValueError

If app_name is empty string.

Examples:

Basic setup with executor:

from gepa_adk.adapters.agent_executor import AgentExecutor

executor = AgentExecutor()
scorer = CriticScorer(critic_agent=critic, executor=executor)

With shared session service:

from google.adk.sessions import InMemorySessionService
from gepa_adk.adapters.agent_executor import AgentExecutor

session_service = InMemorySessionService()
executor = AgentExecutor(session_service=session_service)
scorer = CriticScorer(
    critic_agent=critic,
    executor=executor,
    session_service=session_service,
)
Note

Creates logger with scorer context and validates agent type.

Source code in src/gepa_adk/adapters/critic_scorer.py
def __init__(
    self,
    critic_agent: BaseAgent,
    executor: AgentExecutorProtocol,
    session_service: BaseSessionService | None = None,
    app_name: str = "critic_scorer",
) -> None:
    """Initialize CriticScorer with critic agent.

    Args:
        critic_agent: ADK agent (LlmAgent or workflow agent) configured
            for evaluation.
        executor: AgentExecutorProtocol implementation for unified agent
            execution. Handles session management and execution, enabling
            feature parity across all agent types.
        session_service: Optional session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for session identification.

    Raises:
        TypeError: If critic_agent is not a BaseAgent instance.
        ValueError: If app_name is empty string.

    Examples:
        Basic setup with executor:

        ```python
        from gepa_adk.adapters.agent_executor import AgentExecutor

        executor = AgentExecutor()
        scorer = CriticScorer(critic_agent=critic, executor=executor)
        ```

        With shared session service:

        ```python
        from google.adk.sessions import InMemorySessionService
        from gepa_adk.adapters.agent_executor import AgentExecutor

        session_service = InMemorySessionService()
        executor = AgentExecutor(session_service=session_service)
        scorer = CriticScorer(
            critic_agent=critic,
            executor=executor,
            session_service=session_service,
        )
        ```

    Note:
        Creates logger with scorer context and validates agent type.
    """
    if not isinstance(critic_agent, BaseAgent):
        raise TypeError(f"critic_agent must be BaseAgent, got {type(critic_agent)}")

    if not app_name or not app_name.strip():
        raise ValueError("app_name cannot be empty")

    self.critic_agent = critic_agent
    self._session_service = session_service or InMemorySessionService()
    self._app_name = app_name.strip()
    self._executor = executor

    # Bind logger with scorer context
    self._logger = logger.bind(
        scorer="CriticScorer",
        agent_name=self.critic_agent.name,
        app_name=self._app_name,
        uses_executor=True,  # Always true since executor is required
    )

    self._logger.info("scorer.initialized")

async_score async

async_score(
    input_text: str,
    output: str,
    expected: str | None = None,
    session_id: str | None = None,
) -> tuple[float, dict[str, Any]]

Score an agent output asynchronously using the critic agent.

Executes the critic agent with formatted input and extracts structured score and metadata from the response.

PARAMETER DESCRIPTION
input_text

The original input provided to the agent being evaluated.

TYPE: str

output

The agent's generated output to score.

TYPE: str

expected

Optional expected/reference output for comparison.

TYPE: str | None DEFAULT: None

session_id

Optional session ID to share state with main agent workflow. If None, creates an isolated session.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
float

Tuple of (score, metadata) where:

dict[str, Any]
  • score: Float value, conventionally 0.0-1.0
tuple[float, dict[str, Any]]
  • metadata: Dict with feedback, dimension_scores, actionable_guidance, and any additional fields
RAISES DESCRIPTION
CriticOutputParseError

If critic output is not valid JSON.

MissingScoreFieldError

If score field missing from output.

Examples:

Basic async scoring:

score, metadata = await scorer.async_score(
    input_text="What is Python?",
    output="Python is a programming language.",
)

With session sharing:

score, metadata = await scorer.async_score(
    input_text="...",
    output="...",
    session_id="existing_session_123",
)
Note

Orchestrates critic agent execution via AgentExecutor and extracts structured output. Creates isolated session unless session_id provided for state sharing.

Source code in src/gepa_adk/adapters/critic_scorer.py
async def async_score(
    self,
    input_text: str,
    output: str,
    expected: str | None = None,
    session_id: str | None = None,
) -> tuple[float, dict[str, Any]]:
    """Score an agent output asynchronously using the critic agent.

    Executes the critic agent with formatted input and extracts structured
    score and metadata from the response.

    Args:
        input_text: The original input provided to the agent being evaluated.
        output: The agent's generated output to score.
        expected: Optional expected/reference output for comparison.
        session_id: Optional session ID to share state with main agent
            workflow. If None, creates an isolated session.

    Returns:
        Tuple of (score, metadata) where:
        - score: Float value, conventionally 0.0-1.0
        - metadata: Dict with feedback, dimension_scores,
            actionable_guidance, and any additional fields

    Raises:
        CriticOutputParseError: If critic output is not valid JSON.
        MissingScoreFieldError: If score field missing from output.

    Examples:
        Basic async scoring:

        ```python
        score, metadata = await scorer.async_score(
            input_text="What is Python?",
            output="Python is a programming language.",
        )
        ```

        With session sharing:

        ```python
        score, metadata = await scorer.async_score(
            input_text="...",
            output="...",
            session_id="existing_session_123",
        )
        ```

    Note:
        Orchestrates critic agent execution via AgentExecutor and extracts
        structured output. Creates isolated session unless session_id provided
        for state sharing.
    """
    self._logger.debug(
        "scorer.async_score.start",
        input_preview=input_text[:50] if input_text else "",
        output_preview=output[:50] if output else "",
        has_expected=expected is not None,
        session_id=session_id,
    )

    # Format input for critic
    critic_input = self._format_critic_input(input_text, output, expected)

    # Execute via AgentExecutor
    result = await self._executor.execute_agent(
        agent=self.critic_agent,
        input_text=critic_input,
        existing_session_id=session_id,
    )

    if result.status == ExecutionStatus.FAILED:
        raise ScoringError(
            f"Critic agent execution failed: {result.error_message}",
        )

    final_output = result.extracted_value

    if not final_output:
        raise ScoringError("Critic agent returned empty output")

    # Parse output and extract score
    try:
        score, metadata = self._parse_critic_output(final_output)
    except (CriticOutputParseError, MissingScoreFieldError) as e:
        self._logger.error(
            "scorer.async_score.parse_error",
            error=str(e),
            error_type=type(e).__name__,
        )
        raise

    # Log multi-dimensional scoring context if present
    log_context: dict[str, Any] = {
        "score": score,
        "has_feedback": "feedback" in metadata,
        "has_dimension_scores": "dimension_scores" in metadata,
        "has_actionable_guidance": "actionable_guidance" in metadata,
    }
    if "dimension_scores" in metadata:
        log_context["dimension_count"] = len(metadata["dimension_scores"])

    self._logger.info(
        "scorer.async_score.complete",
        **log_context,
    )

    return score, metadata

score

score(
    input_text: str,
    output: str,
    expected: str | None = None,
) -> tuple[float, dict[str, Any]]

Score an agent output synchronously using the critic agent.

Synchronous wrapper around async_score() using asyncio.run().

PARAMETER DESCRIPTION
input_text

The original input provided to the agent being evaluated.

TYPE: str

output

The agent's generated output to score.

TYPE: str

expected

Optional expected/reference output for comparison.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
float

Tuple of (score, metadata) where:

dict[str, Any]
  • score: Float value, conventionally 0.0-1.0
tuple[float, dict[str, Any]]
  • metadata: Dict with feedback, dimension_scores, actionable_guidance, and any additional fields
RAISES DESCRIPTION
CriticOutputParseError

If critic output is not valid JSON.

MissingScoreFieldError

If score field missing from output.

Examples:

Basic sync scoring:

score, metadata = scorer.score(
    input_text="What is 2+2?",
    output="4",
    expected="4",
)
Note

Operates synchronously by wrapping async_score() with asyncio.run(). Uses asyncio.run() to execute async_score(). Prefer async_score() for better performance in async contexts.

Source code in src/gepa_adk/adapters/critic_scorer.py
def score(
    self,
    input_text: str,
    output: str,
    expected: str | None = None,
) -> tuple[float, dict[str, Any]]:
    """Score an agent output synchronously using the critic agent.

    Synchronous wrapper around async_score() using asyncio.run().

    Args:
        input_text: The original input provided to the agent being evaluated.
        output: The agent's generated output to score.
        expected: Optional expected/reference output for comparison.

    Returns:
        Tuple of (score, metadata) where:
        - score: Float value, conventionally 0.0-1.0
        - metadata: Dict with feedback, dimension_scores,
            actionable_guidance, and any additional fields

    Raises:
        CriticOutputParseError: If critic output is not valid JSON.
        MissingScoreFieldError: If score field missing from output.

    Examples:
        Basic sync scoring:

        ```python
        score, metadata = scorer.score(
            input_text="What is 2+2?",
            output="4",
            expected="4",
        )
        ```

    Note:
        Operates synchronously by wrapping async_score() with asyncio.run().
        Uses asyncio.run() to execute async_score(). Prefer async_score()
        for better performance in async contexts.
    """
    return asyncio.run(self.async_score(input_text, output, expected))

SimpleCriticOutput

Bases: BaseModel


              flowchart TD
              gepa_adk.adapters.SimpleCriticOutput[SimpleCriticOutput]

              

              click gepa_adk.adapters.SimpleCriticOutput href "" "gepa_adk.adapters.SimpleCriticOutput"
            

KISS schema for basic critic feedback.

This is the minimal schema for critic agents that only need to provide a score and text feedback. Use this for straightforward evaluation tasks where dimension breakdowns are not needed.

ATTRIBUTE DESCRIPTION
score

Score value between 0.0 and 1.0 (required).

TYPE: float

feedback

Human-readable feedback text (required).

TYPE: str

Examples:

Simple critic output:

{
    "score": 0.75,
    "feedback": "Good response but could be more concise."
}

Using with LlmAgent:

from google.adk.agents import LlmAgent
from gepa_adk.adapters.critic_scorer import SimpleCriticOutput

critic = LlmAgent(
    name="simple_critic",
    model="gemini-2.5-flash",
    instruction=SIMPLE_CRITIC_INSTRUCTION,
    output_schema=SimpleCriticOutput,
)
Note

Applies to basic evaluation tasks where only a score and feedback are needed. For more detailed evaluations with dimension scores, use CriticOutput instead.

See Also

CriticOutput: Advanced schema with dimension scores and guidance.

Source code in src/gepa_adk/adapters/critic_scorer.py
class SimpleCriticOutput(BaseModel):
    """KISS schema for basic critic feedback.

    This is the minimal schema for critic agents that only need to provide
    a score and text feedback. Use this for straightforward evaluation tasks
    where dimension breakdowns are not needed.

    Attributes:
        score: Score value between 0.0 and 1.0 (required).
        feedback: Human-readable feedback text (required).

    Examples:
        Simple critic output:

        ```json
        {
            "score": 0.75,
            "feedback": "Good response but could be more concise."
        }
        ```

        Using with LlmAgent:

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters.critic_scorer import SimpleCriticOutput

        critic = LlmAgent(
            name="simple_critic",
            model="gemini-2.5-flash",
            instruction=SIMPLE_CRITIC_INSTRUCTION,
            output_schema=SimpleCriticOutput,
        )
        ```

    Note:
        Applies to basic evaluation tasks where only a score and feedback
        are needed. For more detailed evaluations with dimension scores,
        use CriticOutput instead.

    See Also:
        CriticOutput: Advanced schema with dimension scores and guidance.
    """

    score: float = Field(
        ...,
        ge=0.0,
        le=1.0,
        description="Score from 0.0 to 1.0",
    )
    feedback: str = Field(
        ...,
        description="Human-readable feedback explaining the score",
    )

FullEvaluationPolicy

Evaluation policy that scores all validation examples every iteration.

This is the default evaluation policy, providing complete visibility into solution performance across all validation examples.

Note

Always returns all valset IDs, ensuring complete evaluation coverage each iteration.

Examples:

policy = FullEvaluationPolicy()
batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
# Returns: [0, 1, 2, 3, 4]
Source code in src/gepa_adk/adapters/evaluation_policy.py
class FullEvaluationPolicy:
    """Evaluation policy that scores all validation examples every iteration.

    This is the default evaluation policy, providing complete visibility
    into solution performance across all validation examples.

    Note:
        Always returns all valset IDs, ensuring complete evaluation coverage
        each iteration.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        batch = policy.get_eval_batch([0, 1, 2, 3, 4], state)
        # Returns: [0, 1, 2, 3, 4]
        ```
    """

    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        """Return all validation example indices.

        Args:
            valset_ids (Sequence[int]): All available validation example indices.
            state (ParetoState): Current evolution state (unused for full evaluation).
            target_candidate_idx (int | None): Optional candidate being evaluated
                (unused).

        Returns:
            list[int]: List of all valset_ids.

        Note:
            Outputs the complete valset for comprehensive evaluation coverage.

        Examples:
            ```python
            policy = FullEvaluationPolicy()
            batch = policy.get_eval_batch([0, 1, 2], state)
            assert batch == [0, 1, 2]
            ```
        """
        return list(valset_ids)

    def get_best_candidate(self, state: ParetoState) -> int:
        """Return index of candidate with highest average score.

        Args:
            state: Current evolution state with candidate scores.

        Returns:
            Index of best performing candidate.

        Raises:
            NoCandidateAvailableError: If state has no candidates.

        Note:
            Outputs the candidate index with the highest mean score across
            all evaluated examples.

        Examples:
            ```python
            policy = FullEvaluationPolicy()
            best_idx = policy.get_best_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available")

        best_idx = None
        best_score = float("-inf")
        for candidate_idx in range(len(state.candidates)):
            score = self.get_valset_score(candidate_idx, state)
            if score > best_score:
                best_score = score
                best_idx = candidate_idx

        if best_idx is None:
            raise NoCandidateAvailableError("No scored candidates available")
        return best_idx

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        """Return mean score across all evaluated examples for a candidate.

        Args:
            candidate_idx (int): Index of candidate to score.
            state (ParetoState): Current evolution state.

        Returns:
            float: Mean score across all examples, or float('-inf') if no scores.

        Note:
            Outputs the arithmetic mean of all scores for the candidate,
            or negative infinity if no scores exist.

        Examples:
            ```python
            policy = FullEvaluationPolicy()
            score = policy.get_valset_score(0, state)
            ```
        """
        scores = state.candidate_scores.get(candidate_idx)
        if not scores:
            return float("-inf")
        return fmean(scores.values())

get_eval_batch

get_eval_batch(
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]

Return all validation example indices.

PARAMETER DESCRIPTION
valset_ids

All available validation example indices.

TYPE: Sequence[int]

state

Current evolution state (unused for full evaluation).

TYPE: ParetoState

target_candidate_idx

Optional candidate being evaluated (unused).

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
list[int]

list[int]: List of all valset_ids.

Note

Outputs the complete valset for comprehensive evaluation coverage.

Examples:

policy = FullEvaluationPolicy()
batch = policy.get_eval_batch([0, 1, 2], state)
assert batch == [0, 1, 2]
Source code in src/gepa_adk/adapters/evaluation_policy.py
def get_eval_batch(
    self,
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]:
    """Return all validation example indices.

    Args:
        valset_ids (Sequence[int]): All available validation example indices.
        state (ParetoState): Current evolution state (unused for full evaluation).
        target_candidate_idx (int | None): Optional candidate being evaluated
            (unused).

    Returns:
        list[int]: List of all valset_ids.

    Note:
        Outputs the complete valset for comprehensive evaluation coverage.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        batch = policy.get_eval_batch([0, 1, 2], state)
        assert batch == [0, 1, 2]
        ```
    """
    return list(valset_ids)

get_best_candidate

get_best_candidate(state: ParetoState) -> int

Return index of candidate with highest average score.

PARAMETER DESCRIPTION
state

Current evolution state with candidate scores.

TYPE: ParetoState

RETURNS DESCRIPTION
int

Index of best performing candidate.

RAISES DESCRIPTION
NoCandidateAvailableError

If state has no candidates.

Note

Outputs the candidate index with the highest mean score across all evaluated examples.

Examples:

policy = FullEvaluationPolicy()
best_idx = policy.get_best_candidate(state)
Source code in src/gepa_adk/adapters/evaluation_policy.py
def get_best_candidate(self, state: ParetoState) -> int:
    """Return index of candidate with highest average score.

    Args:
        state: Current evolution state with candidate scores.

    Returns:
        Index of best performing candidate.

    Raises:
        NoCandidateAvailableError: If state has no candidates.

    Note:
        Outputs the candidate index with the highest mean score across
        all evaluated examples.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        best_idx = policy.get_best_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available")

    best_idx = None
    best_score = float("-inf")
    for candidate_idx in range(len(state.candidates)):
        score = self.get_valset_score(candidate_idx, state)
        if score > best_score:
            best_score = score
            best_idx = candidate_idx

    if best_idx is None:
        raise NoCandidateAvailableError("No scored candidates available")
    return best_idx

get_valset_score

get_valset_score(
    candidate_idx: int, state: ParetoState
) -> float

Return mean score across all evaluated examples for a candidate.

PARAMETER DESCRIPTION
candidate_idx

Index of candidate to score.

TYPE: int

state

Current evolution state.

TYPE: ParetoState

RETURNS DESCRIPTION
float

Mean score across all examples, or float('-inf') if no scores.

TYPE: float

Note

Outputs the arithmetic mean of all scores for the candidate, or negative infinity if no scores exist.

Examples:

policy = FullEvaluationPolicy()
score = policy.get_valset_score(0, state)
Source code in src/gepa_adk/adapters/evaluation_policy.py
def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
    """Return mean score across all evaluated examples for a candidate.

    Args:
        candidate_idx (int): Index of candidate to score.
        state (ParetoState): Current evolution state.

    Returns:
        float: Mean score across all examples, or float('-inf') if no scores.

    Note:
        Outputs the arithmetic mean of all scores for the candidate,
        or negative infinity if no scores exist.

    Examples:
        ```python
        policy = FullEvaluationPolicy()
        score = policy.get_valset_score(0, state)
        ```
    """
    scores = state.candidate_scores.get(candidate_idx)
    if not scores:
        return float("-inf")
    return fmean(scores.values())

SubsetEvaluationPolicy

Evaluation policy that scores a configurable subset with round-robin coverage.

This policy reduces evaluation cost for large validation sets by evaluating only a subset of examples per iteration, using round-robin selection to ensure all examples are eventually covered.

ATTRIBUTE DESCRIPTION
subset_size

If int, absolute count of examples to evaluate per iteration. If float (0.0-1.0), fraction of total valset size.

TYPE: int | float

_offset

Internal state tracking current position for round-robin selection.

TYPE: int

Note

Advances offset each iteration to provide round-robin coverage across the full valset over multiple iterations.

Examples:

# Evaluate 20% of valset per iteration
policy = SubsetEvaluationPolicy(subset_size=0.2)

# Evaluate exactly 5 examples per iteration
policy = SubsetEvaluationPolicy(subset_size=5)
Source code in src/gepa_adk/adapters/evaluation_policy.py
class SubsetEvaluationPolicy:
    """Evaluation policy that scores a configurable subset with round-robin coverage.

    This policy reduces evaluation cost for large validation sets by evaluating
    only a subset of examples per iteration, using round-robin selection to
    ensure all examples are eventually covered.

    Attributes:
        subset_size (int | float): If int, absolute count of examples to evaluate
            per iteration. If float (0.0-1.0), fraction of total valset size.
        _offset (int): Internal state tracking current position for round-robin
            selection.

    Note:
        Advances offset each iteration to provide round-robin coverage across
        the full valset over multiple iterations.

    Examples:
        ```python
        # Evaluate 20% of valset per iteration
        policy = SubsetEvaluationPolicy(subset_size=0.2)

        # Evaluate exactly 5 examples per iteration
        policy = SubsetEvaluationPolicy(subset_size=5)
        ```
    """

    def __init__(self, subset_size: int | float = 0.2) -> None:
        """Initialize subset evaluation policy.

        Args:
            subset_size: If int, evaluate this many examples per iteration.
                If float, evaluate this fraction of total valset.
                Default: 0.2 (20% of valset per iteration).

        Note:
            Creates policy with initial offset of 0 for round-robin selection.
        """
        self.subset_size = subset_size
        self._offset = 0

    def get_eval_batch(
        self,
        valset_ids: Sequence[int],
        state: ParetoState,
        target_candidate_idx: int | None = None,
    ) -> list[int]:
        """Return subset of validation example indices with round-robin selection.

        Args:
            valset_ids: All available validation example indices.
            state: Current evolution state (unused for subset selection).
            target_candidate_idx: Optional candidate being evaluated (unused).

        Returns:
            List of example indices to evaluate this iteration.
            Uses round-robin to ensure all examples are eventually covered.

        Raises:
            ValueError: If subset_size is outside the allowed range.

        Note:
            Outputs a subset of valset IDs starting at the current offset,
            wrapping around if needed to provide round-robin coverage.

        Examples:
            ```python
            policy = SubsetEvaluationPolicy(subset_size=0.25)
            batch = policy.get_eval_batch(list(range(8)), state)
            assert len(batch) == 2
            ```
        """
        if not valset_ids:
            return []

        total_size = len(valset_ids)

        # Calculate subset count
        if isinstance(self.subset_size, float):
            if not (0.0 < self.subset_size <= 1.0):
                raise ValueError(
                    f"subset_size float must be in (0.0, 1.0], got {self.subset_size}"
                )
            subset_count = max(1, int(self.subset_size * total_size))
        else:
            subset_count = self.subset_size

        # Fallback to full evaluation if subset_size exceeds valset size
        if subset_count >= total_size:
            return list(valset_ids)

        # Round-robin selection: slice starting at _offset, wrapping around
        result: list[int] = []
        for i in range(subset_count):
            idx = (self._offset + i) % total_size
            result.append(valset_ids[idx])

        # Advance offset for next iteration
        self._offset = (self._offset + subset_count) % total_size

        return result

    def get_best_candidate(self, state: ParetoState) -> int:
        """Return index of candidate with highest average score.

        Args:
            state (ParetoState): Current evolution state with candidate scores.

        Returns:
            int: Index of best performing candidate.

        Raises:
            NoCandidateAvailableError: If state has no candidates.

        Note:
            Outputs the candidate index with the highest mean score across
            evaluated examples, consistent with FullEvaluationPolicy behavior.

        Examples:
            ```python
            policy = SubsetEvaluationPolicy()
            best_idx = policy.get_best_candidate(state)
            ```
        """
        if not state.candidates:
            raise NoCandidateAvailableError("No candidates available")

        best_idx = None
        best_score = float("-inf")
        for candidate_idx in range(len(state.candidates)):
            score = self.get_valset_score(candidate_idx, state)
            if score > best_score:
                best_score = score
                best_idx = candidate_idx

        if best_idx is None:
            raise NoCandidateAvailableError("No scored candidates available")
        return best_idx

    def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
        """Return mean score across evaluated examples for a candidate.

        Args:
            candidate_idx (int): Index of candidate to score.
            state (ParetoState): Current evolution state.

        Returns:
            float: Mean score across evaluated examples, or float('-inf') if no scores.

        Note:
            Outputs the arithmetic mean of scores for the candidate across
            only the examples that were actually evaluated (subset).

        Examples:
            ```python
            policy = SubsetEvaluationPolicy()
            score = policy.get_valset_score(0, state)
            ```
        """
        scores = state.candidate_scores.get(candidate_idx)
        if not scores:
            return float("-inf")
        return fmean(scores.values())

__init__

__init__(subset_size: int | float = 0.2) -> None

Initialize subset evaluation policy.

PARAMETER DESCRIPTION
subset_size

If int, evaluate this many examples per iteration. If float, evaluate this fraction of total valset. Default: 0.2 (20% of valset per iteration).

TYPE: int | float DEFAULT: 0.2

Note

Creates policy with initial offset of 0 for round-robin selection.

Source code in src/gepa_adk/adapters/evaluation_policy.py
def __init__(self, subset_size: int | float = 0.2) -> None:
    """Initialize subset evaluation policy.

    Args:
        subset_size: If int, evaluate this many examples per iteration.
            If float, evaluate this fraction of total valset.
            Default: 0.2 (20% of valset per iteration).

    Note:
        Creates policy with initial offset of 0 for round-robin selection.
    """
    self.subset_size = subset_size
    self._offset = 0

get_eval_batch

get_eval_batch(
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]

Return subset of validation example indices with round-robin selection.

PARAMETER DESCRIPTION
valset_ids

All available validation example indices.

TYPE: Sequence[int]

state

Current evolution state (unused for subset selection).

TYPE: ParetoState

target_candidate_idx

Optional candidate being evaluated (unused).

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
list[int]

List of example indices to evaluate this iteration.

list[int]

Uses round-robin to ensure all examples are eventually covered.

RAISES DESCRIPTION
ValueError

If subset_size is outside the allowed range.

Note

Outputs a subset of valset IDs starting at the current offset, wrapping around if needed to provide round-robin coverage.

Examples:

policy = SubsetEvaluationPolicy(subset_size=0.25)
batch = policy.get_eval_batch(list(range(8)), state)
assert len(batch) == 2
Source code in src/gepa_adk/adapters/evaluation_policy.py
def get_eval_batch(
    self,
    valset_ids: Sequence[int],
    state: ParetoState,
    target_candidate_idx: int | None = None,
) -> list[int]:
    """Return subset of validation example indices with round-robin selection.

    Args:
        valset_ids: All available validation example indices.
        state: Current evolution state (unused for subset selection).
        target_candidate_idx: Optional candidate being evaluated (unused).

    Returns:
        List of example indices to evaluate this iteration.
        Uses round-robin to ensure all examples are eventually covered.

    Raises:
        ValueError: If subset_size is outside the allowed range.

    Note:
        Outputs a subset of valset IDs starting at the current offset,
        wrapping around if needed to provide round-robin coverage.

    Examples:
        ```python
        policy = SubsetEvaluationPolicy(subset_size=0.25)
        batch = policy.get_eval_batch(list(range(8)), state)
        assert len(batch) == 2
        ```
    """
    if not valset_ids:
        return []

    total_size = len(valset_ids)

    # Calculate subset count
    if isinstance(self.subset_size, float):
        if not (0.0 < self.subset_size <= 1.0):
            raise ValueError(
                f"subset_size float must be in (0.0, 1.0], got {self.subset_size}"
            )
        subset_count = max(1, int(self.subset_size * total_size))
    else:
        subset_count = self.subset_size

    # Fallback to full evaluation if subset_size exceeds valset size
    if subset_count >= total_size:
        return list(valset_ids)

    # Round-robin selection: slice starting at _offset, wrapping around
    result: list[int] = []
    for i in range(subset_count):
        idx = (self._offset + i) % total_size
        result.append(valset_ids[idx])

    # Advance offset for next iteration
    self._offset = (self._offset + subset_count) % total_size

    return result

get_best_candidate

get_best_candidate(state: ParetoState) -> int

Return index of candidate with highest average score.

PARAMETER DESCRIPTION
state

Current evolution state with candidate scores.

TYPE: ParetoState

RETURNS DESCRIPTION
int

Index of best performing candidate.

TYPE: int

RAISES DESCRIPTION
NoCandidateAvailableError

If state has no candidates.

Note

Outputs the candidate index with the highest mean score across evaluated examples, consistent with FullEvaluationPolicy behavior.

Examples:

policy = SubsetEvaluationPolicy()
best_idx = policy.get_best_candidate(state)
Source code in src/gepa_adk/adapters/evaluation_policy.py
def get_best_candidate(self, state: ParetoState) -> int:
    """Return index of candidate with highest average score.

    Args:
        state (ParetoState): Current evolution state with candidate scores.

    Returns:
        int: Index of best performing candidate.

    Raises:
        NoCandidateAvailableError: If state has no candidates.

    Note:
        Outputs the candidate index with the highest mean score across
        evaluated examples, consistent with FullEvaluationPolicy behavior.

    Examples:
        ```python
        policy = SubsetEvaluationPolicy()
        best_idx = policy.get_best_candidate(state)
        ```
    """
    if not state.candidates:
        raise NoCandidateAvailableError("No candidates available")

    best_idx = None
    best_score = float("-inf")
    for candidate_idx in range(len(state.candidates)):
        score = self.get_valset_score(candidate_idx, state)
        if score > best_score:
            best_score = score
            best_idx = candidate_idx

    if best_idx is None:
        raise NoCandidateAvailableError("No scored candidates available")
    return best_idx

get_valset_score

get_valset_score(
    candidate_idx: int, state: ParetoState
) -> float

Return mean score across evaluated examples for a candidate.

PARAMETER DESCRIPTION
candidate_idx

Index of candidate to score.

TYPE: int

state

Current evolution state.

TYPE: ParetoState

RETURNS DESCRIPTION
float

Mean score across evaluated examples, or float('-inf') if no scores.

TYPE: float

Note

Outputs the arithmetic mean of scores for the candidate across only the examples that were actually evaluated (subset).

Examples:

policy = SubsetEvaluationPolicy()
score = policy.get_valset_score(0, state)
Source code in src/gepa_adk/adapters/evaluation_policy.py
def get_valset_score(self, candidate_idx: int, state: ParetoState) -> float:
    """Return mean score across evaluated examples for a candidate.

    Args:
        candidate_idx (int): Index of candidate to score.
        state (ParetoState): Current evolution state.

    Returns:
        float: Mean score across evaluated examples, or float('-inf') if no scores.

    Note:
        Outputs the arithmetic mean of scores for the candidate across
        only the examples that were actually evaluated (subset).

    Examples:
        ```python
        policy = SubsetEvaluationPolicy()
        score = policy.get_valset_score(0, state)
        ```
    """
    scores = state.candidate_scores.get(candidate_idx)
    if not scores:
        return float("-inf")
    return fmean(scores.values())

MultiAgentAdapter

Adapter for multi-agent pipeline evaluation with per-agent component routing.

Wraps multiple ADK agents into a SequentialAgent for evaluation, enabling session state sharing between agents. Implements AsyncGEPAAdapter protocol for use with AsyncGEPAEngine.

Supports per-agent component configuration via the components parameter, allowing different components to be evolved for each agent (e.g., evolve generator's instruction while evolving critic's generate_content_config).

ATTRIBUTE DESCRIPTION
agents

Named ADK agents to evaluate together.

TYPE: dict[str, LlmAgent]

components

Per-agent component configuration.

TYPE: ComponentsMapping

primary

Name of agent whose output is used for scoring.

TYPE: str

scorer

Scoring implementation (CriticScorer or similar).

TYPE: Scorer

share_session

Whether agents share session state.

TYPE: bool

session_service

Session service for state management.

TYPE: InMemorySessionService

trajectory_config

Configuration for trajectory extraction.

TYPE: TrajectoryConfig | None

_executor

Optional unified executor for consistent agent execution. When None, uses legacy execution path.

TYPE: AgentExecutorProtocol | None

_proposer

Mutation proposer for generating improved instructions via LLM reflection.

TYPE: AsyncReflectiveMutationProposer

_logger

Bound structlog logger with context.

TYPE: BoundLogger

Examples:

Basic adapter setup with per-agent components (API v0.3.x):

from google.adk.agents import LlmAgent
from gepa_adk.adapters import MultiAgentAdapter
from gepa_adk.ports.scorer import Scorer

generator = LlmAgent(
    name="generator",
    model="gemini-2.5-flash",
    output_key="generated_code",
)
critic = LlmAgent(
    name="critic",
    model="gemini-2.5-flash",
    instruction="Review the code in {generated_code}.",
)
scorer = MyScorer()

adapter = MultiAgentAdapter(
    agents={"generator": generator, "critic": critic},
    primary="generator",
    scorer=scorer,
    components={
        "generator": ["instruction", "output_schema"],
        "critic": ["generate_content_config"],
    },
)

# Candidates use qualified names (agent.component format per ADR-012)
candidate = {
    "generator.instruction": "Generate high-quality code",
    "generator.output_schema": "class Output(BaseModel): ...",
    "critic.generate_content_config": "temperature: 0.3",
}
result = await adapter.evaluate(batch, candidate)

Exclude an agent from evolution:

adapter = MultiAgentAdapter(
    agents={"generator": gen, "validator": val},
    primary="generator",
    scorer=scorer,
    components={
        "generator": ["instruction"],
        "validator": [],  # Empty list = no evolution
    },
)
Note

Adheres to AsyncGEPAAdapter[dict[str, Any], MultiAgentTrajectory, str] protocol. All methods are async and follow ADK's async-first patterns.

Breaking Change (0.3.x): - agents parameter changed from list[LlmAgent] to dict[str, LlmAgent] - components parameter is now required - Candidate keys use qualified names (agent.component) instead of {agent_name}_instruction format

Session Sharing Behavior: - When share_session=True (default): Uses SequentialAgent to execute agents sequentially with shared InvocationContext. Earlier agents can write to session state (via output_key), later agents can read that state via template strings like {output_key} in their instructions. - When share_session=False: Each agent executes with an isolated session. Agents cannot access each other's outputs. This is useful when agents should not interfere with each other's state (EdgeCase-5: incompatible outputs behavior).

output_key State Propagation: - When an agent has output_key set, its final response is automatically saved to session.state[output_key]. - With share_session=True, subsequent agents can reference this via template strings: instruction="Process {output_key}". - With share_session=False, state is not shared and template references will not resolve (agents see empty or undefined state).

Source code in src/gepa_adk/adapters/multi_agent.py
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
class MultiAgentAdapter:
    """Adapter for multi-agent pipeline evaluation with per-agent component routing.

    Wraps multiple ADK agents into a SequentialAgent for evaluation,
    enabling session state sharing between agents. Implements
    AsyncGEPAAdapter protocol for use with AsyncGEPAEngine.

    Supports per-agent component configuration via the `components` parameter,
    allowing different components to be evolved for each agent (e.g., evolve
    generator's instruction while evolving critic's generate_content_config).

    Attributes:
        agents (dict[str, LlmAgent]): Named ADK agents to evaluate together.
        components (ComponentsMapping): Per-agent component configuration.
        primary (str): Name of agent whose output is used for scoring.
        scorer (Scorer): Scoring implementation (CriticScorer or similar).
        share_session (bool): Whether agents share session state.
        session_service (InMemorySessionService): Session service for state management.
        trajectory_config (TrajectoryConfig | None): Configuration for trajectory extraction.
        _executor (AgentExecutorProtocol | None): Optional unified executor for
            consistent agent execution. When None, uses legacy execution path.
        _proposer (AsyncReflectiveMutationProposer): Mutation proposer for
            generating improved instructions via LLM reflection.
        _logger (BoundLogger): Bound structlog logger with context.

    Examples:
        Basic adapter setup with per-agent components (API v0.3.x):

        ```python
        from google.adk.agents import LlmAgent
        from gepa_adk.adapters import MultiAgentAdapter
        from gepa_adk.ports.scorer import Scorer

        generator = LlmAgent(
            name="generator",
            model="gemini-2.5-flash",
            output_key="generated_code",
        )
        critic = LlmAgent(
            name="critic",
            model="gemini-2.5-flash",
            instruction="Review the code in {generated_code}.",
        )
        scorer = MyScorer()

        adapter = MultiAgentAdapter(
            agents={"generator": generator, "critic": critic},
            primary="generator",
            scorer=scorer,
            components={
                "generator": ["instruction", "output_schema"],
                "critic": ["generate_content_config"],
            },
        )

        # Candidates use qualified names (agent.component format per ADR-012)
        candidate = {
            "generator.instruction": "Generate high-quality code",
            "generator.output_schema": "class Output(BaseModel): ...",
            "critic.generate_content_config": "temperature: 0.3",
        }
        result = await adapter.evaluate(batch, candidate)
        ```

        Exclude an agent from evolution:

        ```python
        adapter = MultiAgentAdapter(
            agents={"generator": gen, "validator": val},
            primary="generator",
            scorer=scorer,
            components={
                "generator": ["instruction"],
                "validator": [],  # Empty list = no evolution
            },
        )
        ```

    Note:
        Adheres to AsyncGEPAAdapter[dict[str, Any], MultiAgentTrajectory, str] protocol.
        All methods are async and follow ADK's async-first patterns.

        **Breaking Change (0.3.x)**:
        - `agents` parameter changed from `list[LlmAgent]` to `dict[str, LlmAgent]`
        - `components` parameter is now required
        - Candidate keys use qualified names (agent.component) instead of
          {agent_name}_instruction format

        **Session Sharing Behavior**:
        - When `share_session=True` (default): Uses SequentialAgent to execute
          agents sequentially with shared InvocationContext. Earlier agents can
          write to session state (via `output_key`), later agents can read that
          state via template strings like `{output_key}` in their instructions.
        - When `share_session=False`: Each agent executes with an isolated session.
          Agents cannot access each other's outputs. This is useful when agents
          should not interfere with each other's state (EdgeCase-5: incompatible
          outputs behavior).

        **output_key State Propagation**:
        - When an agent has `output_key` set, its final response is automatically
          saved to `session.state[output_key]`.
        - With `share_session=True`, subsequent agents can reference this via
          template strings: `instruction="Process {output_key}"`.
        - With `share_session=False`, state is not shared and template references
          will not resolve (agents see empty or undefined state).
    """

    def __init__(
        self,
        agents: dict[str, LlmAgent],
        primary: str,
        components: ComponentsMapping,
        scorer: Scorer | None = None,
        share_session: bool = True,
        session_service: BaseSessionService | None = None,
        app_name: str = "multi_agent_eval",
        trajectory_config: TrajectoryConfig | None = None,
        proposer: AsyncReflectiveMutationProposer | None = None,
        executor: AgentExecutorProtocol | None = None,
        workflow: AnyAgentType | None = None,
    ) -> None:
        """Initialize the MultiAgent adapter with named agents and component config.

        Args:
            agents: Named ADK agents to evolve together. Must have at least
                one agent. Keys are agent names, values are LlmAgent instances.
            primary: Name of the agent whose output is used for scoring.
                Must match one of the agent names in the dict.
            components: Per-agent component configuration mapping agent names
                to lists of component names to evolve. All agents must have
                an entry (use empty list to exclude from evolution). Component
                names must have registered handlers.
            scorer: Optional scorer implementation. If None, the primary agent
                must have an output_schema for schema-based scoring.
            share_session: Whether agents share session state during execution.
                When True (default), uses SequentialAgent. When False, agents
                execute with isolated sessions.
            session_service: Optional session service for state management.
                If None, creates an InMemorySessionService.
            app_name: Application name for session identification.
            trajectory_config: Configuration for trajectory extraction behavior.
                If None, uses TrajectoryConfig defaults.
            proposer: Mutation proposer for generating improved instructions
                via LLM reflection. Required. Create using `create_adk_reflection_fn()`
                with an ADK LlmAgent for reflection.
            executor: Optional unified executor for consistent agent execution.
                If None, uses legacy execution path with direct Runner calls.
                When provided, all agent executions use the executor's execute_agent
                method for consistent session management and feature parity (FR-001).
            workflow: Optional original workflow structure to preserve during
                cloning. When provided, _build_pipeline() uses
                clone_workflow_with_overrides() to preserve workflow type
                (LoopAgent iterations, ParallelAgent concurrency). When None,
                creates a flat SequentialAgent (legacy behavior).

        Raises:
            MultiAgentValidationError: If agents dict is empty, primary agent
                not found, or no scorer and primary lacks output_schema.
            ValueError: If proposer is not provided, or if components mapping
                contains unknown agents, unknown component handlers, or is
                missing entries for agents in the agents dict.

        Examples:
            With per-agent components (API v0.3.x):

            ```python
            from gepa_adk.engine import (
                create_adk_reflection_fn,
                AsyncReflectiveMutationProposer,
            )

            reflection_fn = create_adk_reflection_fn(reflection_agent, executor)
            proposer = AsyncReflectiveMutationProposer(adk_reflection_fn=reflection_fn)
            adapter = MultiAgentAdapter(
                agents={"generator": gen, "critic": critic},
                primary="generator",
                components={
                    "generator": ["instruction", "output_schema"],
                    "critic": ["instruction"],
                },
                scorer=scorer,
                proposer=proposer,
            )
            ```

            Excluding an agent from evolution:

            ```python
            adapter = MultiAgentAdapter(
                agents={"generator": gen, "validator": val},
                primary="generator",
                components={
                    "generator": ["instruction"],
                    "validator": [],  # Excluded from evolution
                },
                scorer=scorer,
                proposer=proposer,
            )
            ```

        Note:
            Clones agents during evaluation to apply candidate instructions.
            Original agents are never mutated.
        """
        # Validation
        if not agents:
            raise MultiAgentValidationError(
                "agents dict cannot be empty",
                field="agents",
                value={},
                constraint="len >= 1",
            )

        agent_names = list(agents.keys())

        # Check primary agent exists
        if primary not in agent_names:
            raise MultiAgentValidationError(
                f"primary agent '{primary}' not found in agents dict",
                field="primary",
                value=primary,
                constraint=f"must be one of {agent_names}",
            )

        # Check scorer or output_schema
        primary_agent = agents[primary]
        if scorer is None and primary_agent.output_schema is None:
            raise MultiAgentValidationError(
                "no scorer and primary agent lacks output_schema",
                field="scorer",
                value=None,
                constraint="scorer must be provided or primary agent must have output_schema",
            )

        self.agents = agents
        self.components = components
        self.primary = primary
        self.scorer = scorer
        self.share_session = share_session
        self.session_service = session_service or InMemorySessionService()
        self.app_name = app_name
        self.trajectory_config = trajectory_config or TrajectoryConfig()
        self._workflow = workflow  # Store original workflow for structure preservation
        if proposer is None:
            raise ValueError(
                "proposer is required. Create one using create_adk_reflection_fn() "
                "with an ADK LlmAgent for reflection operations."
            )
        self._proposer = proposer
        self._executor = executor

        # Validate components mapping (fail-fast)
        self._validate_components()

        # Bind logger with adapter context (FR-008)
        self._logger = logger.bind(
            adapter="MultiAgentAdapter",
            primary_agent=self.primary,
            agent_count=len(self.agents),
            app_name=self.app_name,
            uses_executor=executor is not None,
        )

        # Initialize trial builder for reflective dataset construction
        self._trial_builder = TrialBuilder()

        self._logger.info("adapter.initialized")

    def _validate_components(self) -> None:
        """Validate components mapping at initialization time.

        Performs fail-fast validation to ensure:
        1. All agent names in components exist in agents dict
        2. All agents in agents dict have entries in components
        3. All component names have registered handlers

        Raises:
            ValueError: If validation fails with descriptive error message.

        Note:
            Called during __init__ to catch configuration errors early.
        """
        agent_names = set(self.agents.keys())

        # Check all agent names in components exist in agents dict
        for agent_name in self.components:
            if agent_name not in agent_names:
                raise ValueError(
                    f"Agent '{agent_name}' not found in agents dict. "
                    f"Available: {sorted(agent_names)}"
                )

        # Check all agents in agents dict have entries in components
        missing_agents = agent_names - set(self.components.keys())
        if missing_agents:
            raise ValueError(
                f"Agents {sorted(missing_agents)} missing from components mapping. "
                "All agents must have an entry in components (use empty list to exclude)."
            )

        # Check all component names have handlers
        for agent_name, comp_list in self.components.items():
            for comp_name in comp_list:
                if not component_handlers.has(comp_name):
                    available = component_handlers.names()
                    raise ValueError(
                        f"No handler registered for component '{comp_name}'. "
                        f"Available: {available}"
                    )

        logger.debug(
            "components.validated",
            agents=list(self.components.keys()),
            components_per_agent={k: len(v) for k, v in self.components.items()},
        )

    def _apply_candidate(self, candidate: dict[str, str]) -> dict[str, Any]:
        """Apply candidate component values to agents, tracking originals.

        Routes each candidate component to the correct agent based on
        qualified component names (agent.component format per ADR-012).

        Args:
            candidate: Mapping of qualified component names to new values.
                Keys must be in format 'agent.component'.

        Returns:
            Dictionary mapping qualified names to original values for restoration.

        Raises:
            ValueError: If qualified name format is invalid.
            KeyError: If agent not found or handler not registered.

        Examples:
            Apply candidate and get originals:

            ```python
            candidate = {
                "generator.instruction": "evolved text",
                "critic.generate_content_config": "temperature: 0.3",
            }
            originals = adapter._apply_candidate(candidate)
            # originals["generator.instruction"] contains original instruction
            ```

        Note:
            Returns originals dict for use with _restore_agents(). Does not
            modify self.agents - modifications are applied in-place to agent
            objects which are later cloned for pipeline execution.
        """
        originals: dict[str, Any] = {}

        for qualified_name, value in candidate.items():
            # Skip non-component keys (e.g., evolution_id)
            if "." not in qualified_name:
                continue

            spec = ComponentSpec.parse(qualified_name)

            if spec.agent not in self.agents:
                raise KeyError(
                    f"Agent '{spec.agent}' not found. "
                    f"Available: {list(self.agents.keys())}"
                )

            agent = self.agents[spec.agent]
            handler = get_handler(spec.component)
            originals[qualified_name] = handler.apply(agent, value)

            self._logger.debug(
                "component.applied",
                qualified_name=qualified_name,
                agent=spec.agent,
                component=spec.component,
            )

        return originals

    def _restore_agents(self, originals: dict[str, Any]) -> None:
        """Restore all agents to original state after evaluation.

        Uses best-effort restoration: all components are attempted even if
        some fail. Errors are aggregated and raised as RestoreError after
        all restoration attempts complete.

        Args:
            originals: Mapping of qualified names to original values,
                as returned by _apply_candidate().

        Raises:
            RestoreError: If one or more components fail to restore, containing
                list of (qualified_name, exception) pairs.

        Note:
            Always attempts to restore all components even if some fail,
            to minimize state corruption. Uses try/except for each component.
        """
        errors: list[tuple[str, Exception]] = []

        for qualified_name, original in originals.items():
            try:
                spec = ComponentSpec.parse(qualified_name)
                agent = self.agents[spec.agent]
                handler = get_handler(spec.component)
                handler.restore(agent, original)

                self._logger.debug(
                    "component.restored",
                    qualified_name=qualified_name,
                )
            except Exception as e:
                errors.append((qualified_name, e))
                self._logger.warning(
                    "component.restore_failed",
                    qualified_name=qualified_name,
                    error=str(e),
                )

        if errors:
            raise RestoreError(
                f"Failed to restore {len(errors)} components",
                errors=errors,
            )

    def _build_pipeline(
        self,
        candidate: dict[str, str],
    ) -> AnyAgentType:
        """Build pipeline with instruction overrides, preserving workflow structure.

        When a workflow was provided at initialization, clones the workflow
        structure recursively using clone_workflow_with_overrides(), preserving
        LoopAgent iterations and ParallelAgent concurrency. Otherwise, creates
        a flat SequentialAgent (legacy behavior).

        Non-instruction components (output_schema, generate_content_config)
        must be applied to original agents via _apply_candidate() before
        calling this method.

        Args:
            candidate: Qualified component name to text mapping. Keys should
                follow the pattern `{agent_name}.{component_name}` per ADR-012.

        Returns:
            Cloned workflow with same structure as original, or SequentialAgent
            if no workflow was provided. LlmAgents have instruction overrides
            applied; container agents preserve properties like max_iterations.

        Examples:
            Building pipeline with instruction overrides:

            ```python
            candidate = {
                "generator.instruction": "Generate code...",
                "critic.instruction": "Review code...",
            }
            pipeline = adapter._build_pipeline(candidate)
            ```

            With workflow preservation (LoopAgent):

            ```python
            # If adapter was initialized with workflow=LoopAgent(max_iterations=3, ...)
            pipeline = adapter._build_pipeline(candidate)
            # pipeline is LoopAgent with max_iterations=3 preserved
            ```

        Note:
            Only instruction components are applied via cloning. Other components
            are inherited from the (pre-modified) original agents.
            See evaluate() for the full execution flow.
        """
        # Use structure-preserving cloning when workflow is provided
        if self._workflow is not None:
            self._logger.debug(
                "Using structure-preserving clone",
                workflow_type=type(self._workflow).__name__,
            )
            return clone_workflow_with_overrides(self._workflow, candidate)

        # Legacy behavior: flatten to SequentialAgent
        cloned_agents = []

        # Build set of updates per agent from qualified names
        agent_updates: dict[str, dict[str, Any]] = {name: {} for name in self.agents}

        for qualified_name, value in candidate.items():
            # Skip non-component keys (e.g., evolution_id)
            if "." not in qualified_name:
                continue

            spec = ComponentSpec.parse(qualified_name)
            if spec.agent in agent_updates:
                # Only instruction is cloned via model_copy; other components
                # (output_schema, generate_content_config) are applied to the
                # original agents via _apply_candidate() BEFORE this method.
                #
                # Full execution flow in evaluate():
                # 1. _apply_candidate() - applies ALL components to originals
                # 2. _build_pipeline() - clones agents with instruction override
                # 3. Run pipeline (clones inherit non-instruction from originals)
                # 4. _restore_agents() - restores original state
                if spec.component == "instruction":
                    agent_updates[spec.agent]["instruction"] = value

        for agent_name, agent in self.agents.items():
            updates = agent_updates.get(agent_name, {})
            # Always clear parent_agent to allow re-parenting
            updates["parent_agent"] = None

            cloned = agent.model_copy(update=updates)
            cloned_agents.append(cloned)

        pipeline = SequentialAgent(
            name="MultiAgentPipeline",
            sub_agents=cloned_agents,
        )

        return pipeline

    def _extract_primary_output(
        self,
        pipeline_output: str,
        session_state: dict[str, Any],
        primary_agent: LlmAgent,
    ) -> str:
        """Extract primary agent's output from pipeline execution.

        If the primary agent has an output_key, retrieves the output from
        session state using the shared extract_output_from_state utility.
        Otherwise, uses the pipeline's final output.

        Args:
            pipeline_output: Final output from SequentialAgent execution.
            session_state: Session state dictionary from execution.
            primary_agent: The primary agent instance.

        Returns:
            Primary agent's output text for scoring.

        Examples:
            Extracting output with output_key:

            ```python
            primary_agent = LlmAgent(name="generator", output_key="generated_code")
            output = adapter._extract_primary_output(
                pipeline_output="...",
                session_state={"generated_code": "def foo(): ..."},
                primary_agent=primary_agent,
            )
            # output == "def foo(): ..."
            ```

        Note:
            Outputs are saved to session state via the agent's output_key property.
            When share_session=True, later agents can access earlier outputs via
            template strings like {output_key} in their instructions. When
            share_session=False, agents have isolated sessions and cannot
            access each other's outputs (EdgeCase-5: incompatible outputs behavior).

            Uses the shared extract_output_from_state utility for consistent
            state-based output extraction across the codebase.
        """
        # Try state-based extraction using shared utility
        output_key = getattr(primary_agent, "output_key", None)
        state_output = extract_output_from_state(session_state, output_key)
        if state_output is not None:
            return state_output

        # Fallback to pipeline output
        return pipeline_output

    def _create_session_id(self) -> str:
        """Create a unique session ID for evaluation isolation.

        Returns:
            Unique session identifier string.

        Note:
            Outputs unique session identifiers using timestamp and random components.
        """
        import time
        import uuid

        return f"eval_{int(time.time() * 1000)}_{uuid.uuid4().hex[:8]}"

    def _cleanup_session(self, session_id: str) -> None:
        """Clean up session after evaluation.

        Args:
            session_id: Session identifier to clean up.

        Note:
            Optional cleanup method. Currently a no-op. Future implementations
            may delete sessions from persistent storage.
        """
        # No-op for now - sessions are in-memory and cleaned up automatically
        pass

    async def evaluate(
        self,
        batch: list[dict[str, Any]],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[MultiAgentTrajectory, str]:
        """Evaluate multi-agent pipeline with candidate component values over a batch.

        Args:
            batch: List of input examples, each with "input" key and optional
                "expected" key for scoring.
            candidate: Qualified component name to text mapping. Keys should
                follow the pattern `{agent_name}.{component_name}` per ADR-012.
            capture_traces: Whether to capture execution traces (tool calls,
                state deltas, token usage).

        Returns:
            EvaluationBatch containing outputs, scores, and optional trajectories.

        Examples:
            Basic evaluation without traces:

            ```python
            batch = [
                {"input": "Generate code...", "expected": "def foo(): ..."},
            ]
            candidate = {
                "generator.instruction": "Generate high-quality code",
                "critic.instruction": "Review thoroughly",
            }
            result = await adapter.evaluate(batch, candidate)
            assert len(result.outputs) == 1
            assert len(result.scores) == 1
            ```

            With trace capture:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            assert result.trajectories is not None
            assert len(result.trajectories) == len(batch)
            ```

        Note:
            Orchestrates evaluation by applying candidate components to agents,
            building a SequentialAgent pipeline with cloned agents, then restoring
            original agent state. Primary agent's output is scored.

            Uses try/finally to ensure agents are restored even on evaluation errors,
            preventing state corruption between candidate evaluations.
        """
        self._logger.info(
            "adapter.evaluate.start",
            batch_size=len(batch),
            capture_traces=capture_traces,
            evolution_id=candidate.get("evolution_id", "unknown"),
        )

        # Handle empty batch case
        if not batch:
            self._logger.info("adapter.evaluate.complete", batch_size=0)
            return EvaluationBatch(outputs=[], scores=[], trajectories=None)

        # Apply candidate components to agents, track originals for restoration
        # This applies all component types (instruction, output_schema,
        # generate_content_config) via their handlers per FR-003
        originals = self._apply_candidate(candidate)

        try:
            # Build pipeline with candidate instructions (if sharing session)
            # For isolated sessions, we'll clone agents with candidate instructions
            # Note: _build_pipeline clones agents; non-instruction components are
            # inherited from the (now-modified) original agents
            if self.share_session:
                pipeline = self._build_pipeline(candidate)
            else:
                pipeline = None

            primary_agent = self.agents[self.primary]

            # Create semaphore for concurrency control
            semaphore = asyncio.Semaphore(5)  # Default max concurrent evals

            # Create tasks for all examples
            tasks = [
                self._eval_single_with_semaphore(
                    example=example,
                    example_index=i,
                    pipeline=pipeline,
                    primary_agent=primary_agent,
                    candidate=candidate,
                    capture_traces=capture_traces,
                    semaphore=semaphore,
                )
                for i, example in enumerate(batch)
            ]

            # Execute all tasks in parallel with exception handling
            results = await asyncio.gather(*tasks, return_exceptions=True)

            # Process results and maintain order
            outputs: list[str] = []
            scores: list[float] = []
            trajectories: list[MultiAgentTrajectory] | None = (
                [] if capture_traces else None
            )
            metadata_list: list[dict[str, Any]] = []
            inputs: list[str] = []

            successful = 0
            failed = 0

            for i, result in enumerate(results):
                if isinstance(result, Exception):
                    # Handle exception case
                    self._logger.warning(
                        "adapter.evaluate.example.error",
                        example_index=i,
                        error=str(result),
                    )
                    outputs.append("")
                    scores.append(0.0)
                    metadata_list.append({})
                    inputs.append(batch[i].get("input", "") if i < len(batch) else "")
                    failed += 1

                    if capture_traces:
                        error_trajectory = MultiAgentTrajectory(
                            agent_trajectories={},
                            pipeline_output="",
                            total_token_usage=None,
                            error=str(result),
                        )
                        trajectories.append(error_trajectory)  # type: ignore
                else:
                    # Unpack success case: (output, score, trajectory, metadata, input_text)
                    output_text, score, trajectory, metadata, input_text = result  # type: ignore[misc]
                    outputs.append(output_text)
                    scores.append(score)
                    metadata_list.append(metadata or {})
                    inputs.append(input_text or "")
                    successful += 1

                    if capture_traces and trajectory is not None:
                        trajectories.append(trajectory)  # type: ignore

            avg_score = sum(scores) / len(scores) if scores else 0.0

            self._logger.info(
                "adapter.evaluate.complete",
                batch_size=len(batch),
                successful=successful,
                failed=failed,
                avg_score=avg_score,
            )

            return EvaluationBatch(
                outputs=outputs,
                scores=scores,
                trajectories=trajectories,
                metadata=metadata_list,
                inputs=inputs,
            )
        finally:
            # Restore all agents to original state per FR-004
            # Uses best-effort restoration: attempts all components even if some fail
            self._restore_agents(originals)

    async def _eval_single_with_semaphore(
        self,
        example: dict[str, Any],
        example_index: int,
        pipeline: AnyAgentType | None,
        primary_agent: LlmAgent,
        candidate: dict[str, str],
        capture_traces: bool,
        semaphore: asyncio.Semaphore,
    ) -> tuple[str, float, MultiAgentTrajectory | None, dict[str, Any] | None, str]:
        """Evaluate a single example with semaphore-controlled concurrency.

        Args:
            example: Input example with "input" key and optional "expected" key.
            example_index: Index of example in batch (for logging).
            pipeline: Workflow pipeline to execute (if share_session=True).
            primary_agent: Primary agent for output extraction.
            candidate: Candidate instructions to apply (for isolated sessions).
            capture_traces: Whether to capture execution traces.
            semaphore: Semaphore to control concurrent execution.

        Returns:
            Tuple of (output_text, score, trajectory_or_none, metadata, input_text).
            On failure, returns ("", 0.0, error_trajectory, None, input_text).

        Note:
            Orchestrates single example evaluation with semaphore-controlled concurrency.
        """
        async with semaphore:
            self._logger.debug(
                "adapter.evaluate.example",
                example_index=example_index,
                example_input=example.get("input", "")[:50],
            )

            try:
                # Run the pipeline for this example
                output_text: str
                session_state: dict[str, Any] = {}

                if capture_traces:
                    result = await self._run_single_example(
                        example, pipeline, candidate, capture_events=True
                    )
                    # When capture_events=True, result is a 3-tuple of
                    # (output_text, events, session_state).
                    output_text, events, session_state = result  # type: ignore[misc]
                    # Build trajectory from collected events
                    trajectory = self._build_trajectory(
                        events=list(events),
                        final_output=output_text,
                        session_state=session_state,
                        error=None,
                    )
                else:
                    # When capture_events=False, result is a 2-tuple of
                    # (output_text, session_state).
                    run_result = await self._run_single_example(
                        example, pipeline, candidate, capture_events=False
                    )
                    output_text, session_state = run_result  # type: ignore[misc]
                    trajectory = None

                # Extract primary agent output
                primary_output = self._extract_primary_output(
                    output_text, session_state, primary_agent
                )

                # Score the output
                input_text = example.get("input", "")
                expected = example.get("expected")
                metadata: dict[str, Any] | None = None
                if self.scorer:
                    score_result = await self.scorer.async_score(
                        input_text, primary_output, expected
                    )
                    # Handle both float and tuple[float, dict] return types
                    # The dict contains metadata like feedback, dimension_scores, etc.
                    if isinstance(score_result, tuple):
                        score = score_result[0]
                        metadata = score_result[1] if len(score_result) > 1 else None
                    else:
                        score = float(score_result)
                else:
                    # Simple schema-based scoring fallback when no external scorer is provided.
                    # If an expected value is given, return 1.0 on exact match of the primary
                    # output and 0.0 otherwise; if no expected is provided, return 0.0.
                    if expected is None:
                        score = 0.0
                    else:
                        score = 1.0 if primary_output == expected else 0.0

                return (primary_output, score, trajectory, metadata, input_text)

            except Exception as e:
                # Wrap in domain exception per ADR-009 (preserve batch resilience)
                wrapped = EvaluationError(
                    f"Example {example_index} evaluation failed",
                    cause=e,
                    example_index=example_index,
                )

                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=example_index,
                    error=str(wrapped),
                )

                # Create error trajectory if capturing traces
                error_trajectory = None
                if capture_traces:
                    error_trajectory = MultiAgentTrajectory(
                        agent_trajectories={},
                        pipeline_output="",
                        total_token_usage=None,
                        error=str(wrapped),
                    )

                return ("", 0.0, error_trajectory, None, example.get("input", ""))

    async def _run_single_example(
        self,
        example: dict[str, Any],
        pipeline: AnyAgentType | None,
        candidate: dict[str, str],
        capture_events: bool = False,
    ) -> tuple[str, list[Any], dict[str, Any]] | tuple[str, dict[str, Any]]:
        """Execute pipeline on a single input example.

        Args:
            example: Input example with "input" key.
            pipeline: Workflow pipeline to execute (if share_session=True).
                      If None, executes agents independently (share_session=False).
            candidate: Candidate instructions to apply (for isolated sessions).
            capture_events: If True, return (output, events, state) tuple.
                           If False, return (output, state) tuple.

        Returns:
            If capture_events=True: (final_output, event_list, session_state) tuple
            If capture_events=False: (final_output, session_state) tuple

        Raises:
            RuntimeError: If pipeline execution fails.

        Note:
            Orchestrates execution based on session sharing mode. When
            share_session=False, executes agents independently with isolated
            sessions. When share_session=True, uses SequentialAgent pipeline.
            Streams events via ADK Runner pattern with async iteration.
        """
        input_text = example.get("input", "")
        if not input_text:
            return ("", [], {}) if capture_events else ("", {})

        if self.share_session:
            # Use SequentialAgent pipeline (shared session)
            if pipeline is None:
                raise RuntimeError("Pipeline required for shared session mode")
            return await self._run_shared_session(
                input_text,
                pipeline,
                capture_events,
            )
        else:
            # Execute agents independently (isolated sessions)
            return await self._run_isolated_sessions(
                input_text, candidate, capture_events
            )

    async def _run_shared_session(
        self,
        input_text: str,
        pipeline: AnyAgentType,
        capture_events: bool,
    ) -> tuple[str, list[Any], dict[str, Any]] | tuple[str, dict[str, Any]]:
        """Execute agents with shared session state via workflow pipeline.

        Args:
            input_text: Input text for the pipeline.
            pipeline: Workflow pipeline to execute.
            capture_events: Whether to capture events.

        Returns:
            Tuple of (output, events, state) or (output, state).

        Note:
            Uses unified AgentExecutor when available (FR-002), otherwise
            falls back to legacy execution via direct Runner calls.
        """
        # Use executor if available (FR-002)
        if self._executor is not None:
            from gepa_adk.ports.agent_executor import ExecutionStatus

            result = await self._executor.execute_agent(
                agent=pipeline,
                input_text=input_text,
            )

            if result.status == ExecutionStatus.FAILED:
                # Log failure and return empty output
                self._logger.error(
                    "pipeline.execution.failed",
                    session_id=result.session_id,
                    error=result.error_message,
                )
                if capture_events:
                    return ("", result.captured_events or [], {})
                return ("", {})

            final_output = result.extracted_value or ""
            session_state: dict[str, Any] = {}

            # Retrieve session state from session service
            try:
                session = await self.session_service.get_session(
                    app_name=self.app_name,
                    user_id="eval_user",
                    session_id=result.session_id,
                )
                if session and hasattr(session, "state"):
                    session_state = dict(session.state)
            except (KeyError, AttributeError, TypeError) as exc:
                # Session state retrieval failed due to expected lookup/attribute issues.
                self._logger.debug(
                    "session_state.retrieval_failed",
                    session_id=result.session_id,
                    error=str(exc),
                )

            if capture_events:
                return (final_output, result.captured_events or [], session_state)
            return (final_output, session_state)

        # Legacy execution path (no executor)
        from google.genai import types

        runner = Runner(
            agent=pipeline,
            app_name=self.app_name,
            session_service=self.session_service,
        )

        content = types.Content(
            role="user",
            parts=[types.Part(text=input_text)],
        )

        # Create session in session service first (required by ADK Runner)
        session = await self.session_service.create_session(
            app_name=self.app_name,
            user_id="eval_user",
        )
        session_id = session.id

        events: list[Any] = []
        session_state_legacy: dict[str, Any] = {}

        try:
            async for event in runner.run_async(
                user_id="eval_user",
                session_id=session_id,
                new_message=content,
            ):
                events.append(event)

                if hasattr(event, "session") and event.session:
                    if hasattr(event.session, "state"):
                        session_state_legacy = dict(event.session.state)  # type: ignore
        finally:
            self._cleanup_session(session_id)

        # Extract final output using shared utility (filters thought parts)
        final_output = extract_final_output(events)

        if capture_events:
            return (final_output, events, session_state_legacy)
        return (final_output, session_state_legacy)

    async def _run_isolated_sessions(
        self,
        input_text: str,
        candidate: dict[str, str],
        capture_events: bool,
    ) -> tuple[str, list[Any], dict[str, Any]] | tuple[str, dict[str, Any]]:
        """Execute agents independently with isolated sessions.

        Args:
            input_text: Input text for the first agent.
            candidate: Candidate component values using qualified names.
            capture_events: Whether to capture events.

        Returns:
            Tuple of (output, events, state) or (output, state).
            Returns the primary agent's output.

        Note:
            Orchestrates independent execution of each agent with its own session.
            State is not shared between agents. The primary agent's output is returned.
            Agents are cloned with candidate instructions to avoid mutation.
        """
        from google.genai import types

        # Clone primary agent with candidate instruction using qualified name
        primary_agent = self.agents[self.primary]
        instruction_key = f"{self.primary}.instruction"
        if instruction_key in candidate:
            primary_agent = primary_agent.model_copy(
                update={"instruction": candidate[instruction_key]}
            )

        all_events: list[Any] = []
        session_state: dict[str, Any] = {}

        # Use executor if available (FR-002)
        if self._executor is not None:
            from gepa_adk.ports.agent_executor import ExecutionStatus

            result = await self._executor.execute_agent(
                agent=primary_agent,
                input_text=input_text,
            )

            if result.status == ExecutionStatus.FAILED:
                self._logger.warning(
                    "isolated_session_failed",
                    agent=primary_agent.name,
                    session_id=result.session_id,
                    error=result.error_message,
                )
                # Return empty output on failure
                if capture_events:
                    return ("", result.captured_events or [], {})
                return ("", {})

            final_output = result.extracted_value or ""

            # Retrieve session state from session service if session was created
            if result.session_id:
                try:
                    session = await self.session_service.get_session(
                        app_name=self.app_name,
                        user_id="eval_user",
                        session_id=result.session_id,
                    )
                    if session and hasattr(session, "state"):
                        session_state = dict(session.state)
                except KeyError:
                    # Session might not exist or might have been cleaned up.
                    self._logger.debug(
                        "session_state_unavailable",
                        app_name=self.app_name,
                        user_id="eval_user",
                        session_id=result.session_id,
                    )
                except ValueError:
                    # Session identifier or parameters may be invalid; ignore for evaluation.
                    self._logger.debug(
                        "session_state_invalid",
                        app_name=self.app_name,
                        user_id="eval_user",
                        session_id=result.session_id,
                    )

            if capture_events:
                return (final_output, result.captured_events or [], session_state)
            return (final_output, session_state)

        # Legacy execution path (no executor)
        # Execute primary agent with isolated session
        runner = Runner(
            agent=primary_agent,
            app_name=self.app_name,
            session_service=self.session_service,
        )

        content = types.Content(
            role="user",
            parts=[types.Part(text=input_text)],
        )

        # Create session in session service first (required by ADK Runner)
        session = await self.session_service.create_session(
            app_name=self.app_name,
            user_id="eval_user",
        )
        session_id = session.id

        try:
            async for event in runner.run_async(
                user_id="eval_user",
                session_id=session_id,
                new_message=content,
            ):
                all_events.append(event)

                if hasattr(event, "session") and event.session:
                    if hasattr(event.session, "state"):
                        session_state = dict(event.session.state)  # type: ignore
        finally:
            self._cleanup_session(session_id)

        # Extract final output using shared utility (filters thought parts)
        final_output = extract_final_output(all_events)

        if capture_events:
            return (final_output, all_events, session_state)
        return (final_output, session_state)

    def _aggregate_token_usage(
        self,
        agent_trajectories: dict[str, ADKTrajectory],
    ) -> TokenUsage | None:
        """Sum token usage across all agent trajectories.

        Args:
            agent_trajectories: Mapping of agent names to their trajectories.

        Returns:
            TokenUsage with summed totals if any agent has usage data,
            None if no usage data available.

        Note:
            Sums input, output, and total tokens across all agents to provide
            pipeline-level token consumption metrics.
        """
        total_input = 0
        total_output = 0
        total = 0
        has_usage = False

        for trajectory in agent_trajectories.values():
            if trajectory.token_usage:
                has_usage = True
                total_input += trajectory.token_usage.input_tokens
                total_output += trajectory.token_usage.output_tokens
                total += trajectory.token_usage.total_tokens

        if has_usage:
            return TokenUsage(
                input_tokens=total_input,
                output_tokens=total_output,
                total_tokens=total,
            )
        return None

    def _build_trajectory(
        self,
        events: list[Any],
        final_output: str,
        session_state: dict[str, Any],
        error: str | None = None,
    ) -> MultiAgentTrajectory:
        """Assemble complete multi-agent trajectory from event stream.

        Partitions events by originating agent (using event.author) and builds
        individual ADKTrajectory objects for each agent. This enables fine-grained
        observability into which sub-agent contributed which tool calls, state
        changes, and token usage.

        Args:
            events: List of ADK Event objects collected during execution.
            final_output: The final text response from the pipeline.
            session_state: Session state dictionary from execution.
            error: Error message if execution failed, None otherwise.

        Returns:
            MultiAgentTrajectory with per-agent trajectories, pipeline output,
            aggregated token usage, and error (if any).

        Note:
            Organizes trajectory data by extracting individual agent trajectories
            from events and aggregating token usage across all agents.
        """
        # Partition events by originating agent
        partitions = partition_events_by_agent(events)

        # Build per-agent trajectories
        if not partitions:
            # If no agent-authored events were found, fall back to a minimal
            # trajectory for the primary agent using the full event list.
            agent_trajectories: dict[str, ADKTrajectory] = {
                self.primary: extract_trajectory(
                    events=events,
                    final_output=final_output,
                    error=error,
                    config=self.trajectory_config,
                )
            }
        else:
            agent_trajectories = {}
            for agent_name, agent_events in partitions.items():
                # Only primary agent gets final_output and error attribution
                agent_output = final_output if agent_name == self.primary else ""
                agent_error = error if agent_name == self.primary else None

                agent_trajectories[agent_name] = extract_trajectory(
                    events=agent_events,
                    final_output=agent_output,
                    error=agent_error,
                    config=self.trajectory_config,
                )

        # Aggregate token usage across all agents
        total_token_usage = self._aggregate_token_usage(agent_trajectories)

        return MultiAgentTrajectory(
            agent_trajectories=agent_trajectories,
            pipeline_output=final_output,
            total_token_usage=total_token_usage,
            error=error,
        )

    async def make_reflective_dataset(
        self,
        candidate: dict[str, str],
        eval_batch: EvaluationBatch[MultiAgentTrajectory, str],
        components_to_update: list[str],
    ) -> Mapping[str, Sequence[Mapping[str, Any]]]:
        """Build reflective datasets from evaluation results with traces.

        Args:
            candidate: Current candidate component values.
            eval_batch: Evaluation results including trajectories.
            components_to_update: List of component names to generate
                datasets for.

        Returns:
            Mapping from component name to sequence of reflection examples.
            Each example contains input, output, score, and trace context.

        Examples:
            Generate reflection dataset:

            ```python
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            dataset = await adapter.make_reflective_dataset(
                candidate, result, ["generator_instruction", "critic_instruction"]
            )
            assert "generator_instruction" in dataset
            ```

        Note:
            Operates on eval_batch trajectories (capture_traces=True required).
            Dataset format is compatible with MutationProposer interface.
        """
        self._logger.info(
            "adapter.make_reflective_dataset.start",
            num_examples=len(eval_batch.outputs),
            components=components_to_update,
        )

        # Build reflective dataset for each requested component
        result: dict[str, list[dict[str, Any]]] = {}

        for component in components_to_update:
            examples: list[dict[str, Any]] = []

            for i, (output, score) in enumerate(
                zip(eval_batch.outputs, eval_batch.scores, strict=True)
            ):
                # Get trajectory info if available
                trajectory = None
                if eval_batch.trajectories and i < len(eval_batch.trajectories):
                    trajectory = eval_batch.trajectories[i]

                # Get metadata (critic feedback) if available
                metadata = None
                if eval_batch.metadata and i < len(eval_batch.metadata):
                    metadata = eval_batch.metadata[i]

                # Get input text if available
                input_text = None
                if eval_batch.inputs and i < len(eval_batch.inputs):
                    input_text = eval_batch.inputs[i]

                example = self._build_reflection_example(
                    input_text=input_text,
                    output=output,
                    score=score,
                    trajectory=trajectory,
                    metadata=metadata,
                    component_name=component,
                    component_value=candidate.get(component, ""),
                )
                examples.append(example)

            result[component] = examples

        self._logger.info(
            "adapter.make_reflective_dataset.complete",
            num_components=len(result),
            total_examples=sum(len(exs) for exs in result.values()),
        )

        return result

    def _build_reflection_example(
        self,
        input_text: str | None,
        output: str,
        score: float,
        trajectory: MultiAgentTrajectory | None,
        metadata: dict[str, Any] | None,
        component_name: str,
        component_value: str,
    ) -> dict[str, Any]:
        """Build a single trial record for reflection.

        Delegates to shared TrialBuilder for consistent trial structure.

        Terminology:
            - trial: One performance record {feedback, trajectory}
            - feedback: Critic evaluation {score, feedback_text, ...}
            - trajectory: The journey from input to output with optional trace

        Args:
            input_text: The input that was given to the pipeline.
            output: Pipeline output text.
            score: Evaluation score for this output.
            trajectory: Optional trajectory with execution trace.
            metadata: Optional scorer metadata dict (e.g., from CriticScorer)
                containing 'feedback', 'dimension_scores', 'actionable_guidance'.
            component_name: Name of the component being evaluated.
            component_value: Current value of the component.

        Returns:
            Trial dict with keys: feedback, trajectory.
            - feedback: score (mandatory), feedback_text (mandatory if available)
            - trajectory: input, output (mandatory), component context

        Note:
            Aligns with whitepaper requirements: score + feedback_text are the
            minimum required fields. Extra metadata (dimensions, guidance) is
            passed through when available.
        """
        # Extract error from trajectory if present
        error = trajectory.error if trajectory else None

        # Build extra trajectory fields specific to multi-agent pipelines
        extra_trajectory: dict[str, Any] = {
            "component": component_name,
            "component_value": component_value,
        }

        # Add token usage from trajectory if available
        if trajectory and trajectory.total_token_usage:
            extra_trajectory["tokens"] = trajectory.total_token_usage.total_tokens

        return self._trial_builder.build_trial(
            input_text=input_text,
            output=output,
            score=score,
            metadata=metadata,
            error=error,
            extra_trajectory=extra_trajectory,
        )

    async def propose_new_texts(
        self,
        candidate: dict[str, str],
        reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
        components_to_update: list[str],
    ) -> dict[str, str]:
        """Propose new component texts based on reflective dataset.

        Delegates to AsyncReflectiveMutationProposer to generate improved
        instruction text via LLM reflection. When the proposer returns None
        (empty dataset), falls back to unchanged candidate values.

        Args:
            candidate: Current candidate component values.
            reflective_dataset: Dataset from make_reflective_dataset(), keyed by
                component name with sequences of reflection examples in the format
                produced by build_reflection_example().
            components_to_update: Components to generate proposals for.

        Returns:
            Dictionary mapping component names to proposed new text values.
            When proposer returns None, returns unchanged candidate values.

        Examples:
            Propose new component texts via LLM reflection:

            ```python
            # After evaluation with traces
            result = await adapter.evaluate(batch, candidate, capture_traces=True)
            dataset = await adapter.make_reflective_dataset(
                candidate, result, ["generator_instruction", "critic_instruction"]
            )

            # Propose new texts via LLM reflection
            proposals = await adapter.propose_new_texts(
                candidate,
                dataset,
                ["generator_instruction", "critic_instruction"],
            )
            # proposals contains improved instructions based on feedback
            ```

        Note:
            Delegates to AsyncReflectiveMutationProposer for actual mutation
            generation. Falls back gracefully when dataset is empty.
        """
        self._logger.debug(
            "propose_new_texts.delegating",
            components_requested=components_to_update,
        )

        result = await self._proposer.propose(
            candidate, reflective_dataset, components_to_update
        )

        if result is None:
            self._logger.info(
                "propose_new_texts.fallback",
                reason="proposer_returned_none",
                components_requested=components_to_update,
            )
            return {
                component: candidate.get(component, "")
                for component in components_to_update
            }

        # Merge with candidate for any missing components
        merged = {
            component: result.get(component, candidate.get(component, ""))
            for component in components_to_update
        }

        # Log proposed texts for each component
        for component, proposed_text in result.items():
            self._logger.info(
                "proposal.text",
                component=component,
                proposed_length=len(proposed_text),
                proposed_preview=proposed_text[:300] + "..."
                if len(proposed_text) > 300
                else proposed_text,
            )

        self._logger.info(
            "propose_new_texts.complete",
            components_proposed=list(result.keys()),
            components_requested=components_to_update,
        )

        return merged

__init__

__init__(
    agents: dict[str, LlmAgent],
    primary: str,
    components: ComponentsMapping,
    scorer: Scorer | None = None,
    share_session: bool = True,
    session_service: BaseSessionService | None = None,
    app_name: str = "multi_agent_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    executor: AgentExecutorProtocol | None = None,
    workflow: AnyAgentType | None = None,
) -> None

Initialize the MultiAgent adapter with named agents and component config.

PARAMETER DESCRIPTION
agents

Named ADK agents to evolve together. Must have at least one agent. Keys are agent names, values are LlmAgent instances.

TYPE: dict[str, LlmAgent]

primary

Name of the agent whose output is used for scoring. Must match one of the agent names in the dict.

TYPE: str

components

Per-agent component configuration mapping agent names to lists of component names to evolve. All agents must have an entry (use empty list to exclude from evolution). Component names must have registered handlers.

TYPE: ComponentsMapping

scorer

Optional scorer implementation. If None, the primary agent must have an output_schema for schema-based scoring.

TYPE: Scorer | None DEFAULT: None

share_session

Whether agents share session state during execution. When True (default), uses SequentialAgent. When False, agents execute with isolated sessions.

TYPE: bool DEFAULT: True

session_service

Optional session service for state management. If None, creates an InMemorySessionService.

TYPE: BaseSessionService | None DEFAULT: None

app_name

Application name for session identification.

TYPE: str DEFAULT: 'multi_agent_eval'

trajectory_config

Configuration for trajectory extraction behavior. If None, uses TrajectoryConfig defaults.

TYPE: TrajectoryConfig | None DEFAULT: None

proposer

Mutation proposer for generating improved instructions via LLM reflection. Required. Create using create_adk_reflection_fn() with an ADK LlmAgent for reflection.

TYPE: AsyncReflectiveMutationProposer | None DEFAULT: None

executor

Optional unified executor for consistent agent execution. If None, uses legacy execution path with direct Runner calls. When provided, all agent executions use the executor's execute_agent method for consistent session management and feature parity (FR-001).

TYPE: AgentExecutorProtocol | None DEFAULT: None

workflow

Optional original workflow structure to preserve during cloning. When provided, _build_pipeline() uses clone_workflow_with_overrides() to preserve workflow type (LoopAgent iterations, ParallelAgent concurrency). When None, creates a flat SequentialAgent (legacy behavior).

TYPE: AnyAgentType | None DEFAULT: None

RAISES DESCRIPTION
MultiAgentValidationError

If agents dict is empty, primary agent not found, or no scorer and primary lacks output_schema.

ValueError

If proposer is not provided, or if components mapping contains unknown agents, unknown component handlers, or is missing entries for agents in the agents dict.

Examples:

With per-agent components (API v0.3.x):

from gepa_adk.engine import (
    create_adk_reflection_fn,
    AsyncReflectiveMutationProposer,
)

reflection_fn = create_adk_reflection_fn(reflection_agent, executor)
proposer = AsyncReflectiveMutationProposer(adk_reflection_fn=reflection_fn)
adapter = MultiAgentAdapter(
    agents={"generator": gen, "critic": critic},
    primary="generator",
    components={
        "generator": ["instruction", "output_schema"],
        "critic": ["instruction"],
    },
    scorer=scorer,
    proposer=proposer,
)

Excluding an agent from evolution:

adapter = MultiAgentAdapter(
    agents={"generator": gen, "validator": val},
    primary="generator",
    components={
        "generator": ["instruction"],
        "validator": [],  # Excluded from evolution
    },
    scorer=scorer,
    proposer=proposer,
)
Note

Clones agents during evaluation to apply candidate instructions. Original agents are never mutated.

Source code in src/gepa_adk/adapters/multi_agent.py
def __init__(
    self,
    agents: dict[str, LlmAgent],
    primary: str,
    components: ComponentsMapping,
    scorer: Scorer | None = None,
    share_session: bool = True,
    session_service: BaseSessionService | None = None,
    app_name: str = "multi_agent_eval",
    trajectory_config: TrajectoryConfig | None = None,
    proposer: AsyncReflectiveMutationProposer | None = None,
    executor: AgentExecutorProtocol | None = None,
    workflow: AnyAgentType | None = None,
) -> None:
    """Initialize the MultiAgent adapter with named agents and component config.

    Args:
        agents: Named ADK agents to evolve together. Must have at least
            one agent. Keys are agent names, values are LlmAgent instances.
        primary: Name of the agent whose output is used for scoring.
            Must match one of the agent names in the dict.
        components: Per-agent component configuration mapping agent names
            to lists of component names to evolve. All agents must have
            an entry (use empty list to exclude from evolution). Component
            names must have registered handlers.
        scorer: Optional scorer implementation. If None, the primary agent
            must have an output_schema for schema-based scoring.
        share_session: Whether agents share session state during execution.
            When True (default), uses SequentialAgent. When False, agents
            execute with isolated sessions.
        session_service: Optional session service for state management.
            If None, creates an InMemorySessionService.
        app_name: Application name for session identification.
        trajectory_config: Configuration for trajectory extraction behavior.
            If None, uses TrajectoryConfig defaults.
        proposer: Mutation proposer for generating improved instructions
            via LLM reflection. Required. Create using `create_adk_reflection_fn()`
            with an ADK LlmAgent for reflection.
        executor: Optional unified executor for consistent agent execution.
            If None, uses legacy execution path with direct Runner calls.
            When provided, all agent executions use the executor's execute_agent
            method for consistent session management and feature parity (FR-001).
        workflow: Optional original workflow structure to preserve during
            cloning. When provided, _build_pipeline() uses
            clone_workflow_with_overrides() to preserve workflow type
            (LoopAgent iterations, ParallelAgent concurrency). When None,
            creates a flat SequentialAgent (legacy behavior).

    Raises:
        MultiAgentValidationError: If agents dict is empty, primary agent
            not found, or no scorer and primary lacks output_schema.
        ValueError: If proposer is not provided, or if components mapping
            contains unknown agents, unknown component handlers, or is
            missing entries for agents in the agents dict.

    Examples:
        With per-agent components (API v0.3.x):

        ```python
        from gepa_adk.engine import (
            create_adk_reflection_fn,
            AsyncReflectiveMutationProposer,
        )

        reflection_fn = create_adk_reflection_fn(reflection_agent, executor)
        proposer = AsyncReflectiveMutationProposer(adk_reflection_fn=reflection_fn)
        adapter = MultiAgentAdapter(
            agents={"generator": gen, "critic": critic},
            primary="generator",
            components={
                "generator": ["instruction", "output_schema"],
                "critic": ["instruction"],
            },
            scorer=scorer,
            proposer=proposer,
        )
        ```

        Excluding an agent from evolution:

        ```python
        adapter = MultiAgentAdapter(
            agents={"generator": gen, "validator": val},
            primary="generator",
            components={
                "generator": ["instruction"],
                "validator": [],  # Excluded from evolution
            },
            scorer=scorer,
            proposer=proposer,
        )
        ```

    Note:
        Clones agents during evaluation to apply candidate instructions.
        Original agents are never mutated.
    """
    # Validation
    if not agents:
        raise MultiAgentValidationError(
            "agents dict cannot be empty",
            field="agents",
            value={},
            constraint="len >= 1",
        )

    agent_names = list(agents.keys())

    # Check primary agent exists
    if primary not in agent_names:
        raise MultiAgentValidationError(
            f"primary agent '{primary}' not found in agents dict",
            field="primary",
            value=primary,
            constraint=f"must be one of {agent_names}",
        )

    # Check scorer or output_schema
    primary_agent = agents[primary]
    if scorer is None and primary_agent.output_schema is None:
        raise MultiAgentValidationError(
            "no scorer and primary agent lacks output_schema",
            field="scorer",
            value=None,
            constraint="scorer must be provided or primary agent must have output_schema",
        )

    self.agents = agents
    self.components = components
    self.primary = primary
    self.scorer = scorer
    self.share_session = share_session
    self.session_service = session_service or InMemorySessionService()
    self.app_name = app_name
    self.trajectory_config = trajectory_config or TrajectoryConfig()
    self._workflow = workflow  # Store original workflow for structure preservation
    if proposer is None:
        raise ValueError(
            "proposer is required. Create one using create_adk_reflection_fn() "
            "with an ADK LlmAgent for reflection operations."
        )
    self._proposer = proposer
    self._executor = executor

    # Validate components mapping (fail-fast)
    self._validate_components()

    # Bind logger with adapter context (FR-008)
    self._logger = logger.bind(
        adapter="MultiAgentAdapter",
        primary_agent=self.primary,
        agent_count=len(self.agents),
        app_name=self.app_name,
        uses_executor=executor is not None,
    )

    # Initialize trial builder for reflective dataset construction
    self._trial_builder = TrialBuilder()

    self._logger.info("adapter.initialized")

evaluate async

evaluate(
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[MultiAgentTrajectory, str]

Evaluate multi-agent pipeline with candidate component values over a batch.

PARAMETER DESCRIPTION
batch

List of input examples, each with "input" key and optional "expected" key for scoring.

TYPE: list[dict[str, Any]]

candidate

Qualified component name to text mapping. Keys should follow the pattern {agent_name}.{component_name} per ADR-012.

TYPE: dict[str, str]

capture_traces

Whether to capture execution traces (tool calls, state deltas, token usage).

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
EvaluationBatch[MultiAgentTrajectory, str]

EvaluationBatch containing outputs, scores, and optional trajectories.

Examples:

Basic evaluation without traces:

batch = [
    {"input": "Generate code...", "expected": "def foo(): ..."},
]
candidate = {
    "generator.instruction": "Generate high-quality code",
    "critic.instruction": "Review thoroughly",
}
result = await adapter.evaluate(batch, candidate)
assert len(result.outputs) == 1
assert len(result.scores) == 1

With trace capture:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
assert result.trajectories is not None
assert len(result.trajectories) == len(batch)
Note

Orchestrates evaluation by applying candidate components to agents, building a SequentialAgent pipeline with cloned agents, then restoring original agent state. Primary agent's output is scored.

Uses try/finally to ensure agents are restored even on evaluation errors, preventing state corruption between candidate evaluations.

Source code in src/gepa_adk/adapters/multi_agent.py
async def evaluate(
    self,
    batch: list[dict[str, Any]],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[MultiAgentTrajectory, str]:
    """Evaluate multi-agent pipeline with candidate component values over a batch.

    Args:
        batch: List of input examples, each with "input" key and optional
            "expected" key for scoring.
        candidate: Qualified component name to text mapping. Keys should
            follow the pattern `{agent_name}.{component_name}` per ADR-012.
        capture_traces: Whether to capture execution traces (tool calls,
            state deltas, token usage).

    Returns:
        EvaluationBatch containing outputs, scores, and optional trajectories.

    Examples:
        Basic evaluation without traces:

        ```python
        batch = [
            {"input": "Generate code...", "expected": "def foo(): ..."},
        ]
        candidate = {
            "generator.instruction": "Generate high-quality code",
            "critic.instruction": "Review thoroughly",
        }
        result = await adapter.evaluate(batch, candidate)
        assert len(result.outputs) == 1
        assert len(result.scores) == 1
        ```

        With trace capture:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        assert result.trajectories is not None
        assert len(result.trajectories) == len(batch)
        ```

    Note:
        Orchestrates evaluation by applying candidate components to agents,
        building a SequentialAgent pipeline with cloned agents, then restoring
        original agent state. Primary agent's output is scored.

        Uses try/finally to ensure agents are restored even on evaluation errors,
        preventing state corruption between candidate evaluations.
    """
    self._logger.info(
        "adapter.evaluate.start",
        batch_size=len(batch),
        capture_traces=capture_traces,
        evolution_id=candidate.get("evolution_id", "unknown"),
    )

    # Handle empty batch case
    if not batch:
        self._logger.info("adapter.evaluate.complete", batch_size=0)
        return EvaluationBatch(outputs=[], scores=[], trajectories=None)

    # Apply candidate components to agents, track originals for restoration
    # This applies all component types (instruction, output_schema,
    # generate_content_config) via their handlers per FR-003
    originals = self._apply_candidate(candidate)

    try:
        # Build pipeline with candidate instructions (if sharing session)
        # For isolated sessions, we'll clone agents with candidate instructions
        # Note: _build_pipeline clones agents; non-instruction components are
        # inherited from the (now-modified) original agents
        if self.share_session:
            pipeline = self._build_pipeline(candidate)
        else:
            pipeline = None

        primary_agent = self.agents[self.primary]

        # Create semaphore for concurrency control
        semaphore = asyncio.Semaphore(5)  # Default max concurrent evals

        # Create tasks for all examples
        tasks = [
            self._eval_single_with_semaphore(
                example=example,
                example_index=i,
                pipeline=pipeline,
                primary_agent=primary_agent,
                candidate=candidate,
                capture_traces=capture_traces,
                semaphore=semaphore,
            )
            for i, example in enumerate(batch)
        ]

        # Execute all tasks in parallel with exception handling
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Process results and maintain order
        outputs: list[str] = []
        scores: list[float] = []
        trajectories: list[MultiAgentTrajectory] | None = (
            [] if capture_traces else None
        )
        metadata_list: list[dict[str, Any]] = []
        inputs: list[str] = []

        successful = 0
        failed = 0

        for i, result in enumerate(results):
            if isinstance(result, Exception):
                # Handle exception case
                self._logger.warning(
                    "adapter.evaluate.example.error",
                    example_index=i,
                    error=str(result),
                )
                outputs.append("")
                scores.append(0.0)
                metadata_list.append({})
                inputs.append(batch[i].get("input", "") if i < len(batch) else "")
                failed += 1

                if capture_traces:
                    error_trajectory = MultiAgentTrajectory(
                        agent_trajectories={},
                        pipeline_output="",
                        total_token_usage=None,
                        error=str(result),
                    )
                    trajectories.append(error_trajectory)  # type: ignore
            else:
                # Unpack success case: (output, score, trajectory, metadata, input_text)
                output_text, score, trajectory, metadata, input_text = result  # type: ignore[misc]
                outputs.append(output_text)
                scores.append(score)
                metadata_list.append(metadata or {})
                inputs.append(input_text or "")
                successful += 1

                if capture_traces and trajectory is not None:
                    trajectories.append(trajectory)  # type: ignore

        avg_score = sum(scores) / len(scores) if scores else 0.0

        self._logger.info(
            "adapter.evaluate.complete",
            batch_size=len(batch),
            successful=successful,
            failed=failed,
            avg_score=avg_score,
        )

        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories,
            metadata=metadata_list,
            inputs=inputs,
        )
    finally:
        # Restore all agents to original state per FR-004
        # Uses best-effort restoration: attempts all components even if some fail
        self._restore_agents(originals)

make_reflective_dataset async

make_reflective_dataset(
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[MultiAgentTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]

Build reflective datasets from evaluation results with traces.

PARAMETER DESCRIPTION
candidate

Current candidate component values.

TYPE: dict[str, str]

eval_batch

Evaluation results including trajectories.

TYPE: EvaluationBatch[MultiAgentTrajectory, str]

components_to_update

List of component names to generate datasets for.

TYPE: list[str]

RETURNS DESCRIPTION
Mapping[str, Sequence[Mapping[str, Any]]]

Mapping from component name to sequence of reflection examples.

Mapping[str, Sequence[Mapping[str, Any]]]

Each example contains input, output, score, and trace context.

Examples:

Generate reflection dataset:

result = await adapter.evaluate(batch, candidate, capture_traces=True)
dataset = await adapter.make_reflective_dataset(
    candidate, result, ["generator_instruction", "critic_instruction"]
)
assert "generator_instruction" in dataset
Note

Operates on eval_batch trajectories (capture_traces=True required). Dataset format is compatible with MutationProposer interface.

Source code in src/gepa_adk/adapters/multi_agent.py
async def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[MultiAgentTrajectory, str],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]:
    """Build reflective datasets from evaluation results with traces.

    Args:
        candidate: Current candidate component values.
        eval_batch: Evaluation results including trajectories.
        components_to_update: List of component names to generate
            datasets for.

    Returns:
        Mapping from component name to sequence of reflection examples.
        Each example contains input, output, score, and trace context.

    Examples:
        Generate reflection dataset:

        ```python
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        dataset = await adapter.make_reflective_dataset(
            candidate, result, ["generator_instruction", "critic_instruction"]
        )
        assert "generator_instruction" in dataset
        ```

    Note:
        Operates on eval_batch trajectories (capture_traces=True required).
        Dataset format is compatible with MutationProposer interface.
    """
    self._logger.info(
        "adapter.make_reflective_dataset.start",
        num_examples=len(eval_batch.outputs),
        components=components_to_update,
    )

    # Build reflective dataset for each requested component
    result: dict[str, list[dict[str, Any]]] = {}

    for component in components_to_update:
        examples: list[dict[str, Any]] = []

        for i, (output, score) in enumerate(
            zip(eval_batch.outputs, eval_batch.scores, strict=True)
        ):
            # Get trajectory info if available
            trajectory = None
            if eval_batch.trajectories and i < len(eval_batch.trajectories):
                trajectory = eval_batch.trajectories[i]

            # Get metadata (critic feedback) if available
            metadata = None
            if eval_batch.metadata and i < len(eval_batch.metadata):
                metadata = eval_batch.metadata[i]

            # Get input text if available
            input_text = None
            if eval_batch.inputs and i < len(eval_batch.inputs):
                input_text = eval_batch.inputs[i]

            example = self._build_reflection_example(
                input_text=input_text,
                output=output,
                score=score,
                trajectory=trajectory,
                metadata=metadata,
                component_name=component,
                component_value=candidate.get(component, ""),
            )
            examples.append(example)

        result[component] = examples

    self._logger.info(
        "adapter.make_reflective_dataset.complete",
        num_components=len(result),
        total_examples=sum(len(exs) for exs in result.values()),
    )

    return result

propose_new_texts async

propose_new_texts(
    candidate: dict[str, str],
    reflective_dataset: Mapping[
        str, Sequence[Mapping[str, Any]]
    ],
    components_to_update: list[str],
) -> dict[str, str]

Propose new component texts based on reflective dataset.

Delegates to AsyncReflectiveMutationProposer to generate improved instruction text via LLM reflection. When the proposer returns None (empty dataset), falls back to unchanged candidate values.

PARAMETER DESCRIPTION
candidate

Current candidate component values.

TYPE: dict[str, str]

reflective_dataset

Dataset from make_reflective_dataset(), keyed by component name with sequences of reflection examples in the format produced by build_reflection_example().

TYPE: Mapping[str, Sequence[Mapping[str, Any]]]

components_to_update

Components to generate proposals for.

TYPE: list[str]

RETURNS DESCRIPTION
dict[str, str]

Dictionary mapping component names to proposed new text values.

dict[str, str]

When proposer returns None, returns unchanged candidate values.

Examples:

Propose new component texts via LLM reflection:

# After evaluation with traces
result = await adapter.evaluate(batch, candidate, capture_traces=True)
dataset = await adapter.make_reflective_dataset(
    candidate, result, ["generator_instruction", "critic_instruction"]
)

# Propose new texts via LLM reflection
proposals = await adapter.propose_new_texts(
    candidate,
    dataset,
    ["generator_instruction", "critic_instruction"],
)
# proposals contains improved instructions based on feedback
Note

Delegates to AsyncReflectiveMutationProposer for actual mutation generation. Falls back gracefully when dataset is empty.

Source code in src/gepa_adk/adapters/multi_agent.py
async def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: Mapping[str, Sequence[Mapping[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]:
    """Propose new component texts based on reflective dataset.

    Delegates to AsyncReflectiveMutationProposer to generate improved
    instruction text via LLM reflection. When the proposer returns None
    (empty dataset), falls back to unchanged candidate values.

    Args:
        candidate: Current candidate component values.
        reflective_dataset: Dataset from make_reflective_dataset(), keyed by
            component name with sequences of reflection examples in the format
            produced by build_reflection_example().
        components_to_update: Components to generate proposals for.

    Returns:
        Dictionary mapping component names to proposed new text values.
        When proposer returns None, returns unchanged candidate values.

    Examples:
        Propose new component texts via LLM reflection:

        ```python
        # After evaluation with traces
        result = await adapter.evaluate(batch, candidate, capture_traces=True)
        dataset = await adapter.make_reflective_dataset(
            candidate, result, ["generator_instruction", "critic_instruction"]
        )

        # Propose new texts via LLM reflection
        proposals = await adapter.propose_new_texts(
            candidate,
            dataset,
            ["generator_instruction", "critic_instruction"],
        )
        # proposals contains improved instructions based on feedback
        ```

    Note:
        Delegates to AsyncReflectiveMutationProposer for actual mutation
        generation. Falls back gracefully when dataset is empty.
    """
    self._logger.debug(
        "propose_new_texts.delegating",
        components_requested=components_to_update,
    )

    result = await self._proposer.propose(
        candidate, reflective_dataset, components_to_update
    )

    if result is None:
        self._logger.info(
            "propose_new_texts.fallback",
            reason="proposer_returned_none",
            components_requested=components_to_update,
        )
        return {
            component: candidate.get(component, "")
            for component in components_to_update
        }

    # Merge with candidate for any missing components
    merged = {
        component: result.get(component, candidate.get(component, ""))
        for component in components_to_update
    }

    # Log proposed texts for each component
    for component, proposed_text in result.items():
        self._logger.info(
            "proposal.text",
            component=component,
            proposed_length=len(proposed_text),
            proposed_preview=proposed_text[:300] + "..."
            if len(proposed_text) > 300
            else proposed_text,
        )

    self._logger.info(
        "propose_new_texts.complete",
        components_proposed=list(result.keys()),
        components_requested=components_to_update,
    )

    return merged

TimeoutStopper

Stop evolution after a specified timeout.

Terminates evolution when wall-clock time exceeds the configured timeout duration. Useful for resource management and CI/CD pipelines.

ATTRIBUTE DESCRIPTION
timeout_seconds

Maximum wall-clock time for evolution.

TYPE: float

Examples:

Creating a 5-minute timeout:

from gepa_adk.adapters.stoppers import TimeoutStopper
from gepa_adk.domain.stopper import StopperState

stopper = TimeoutStopper(300.0)  # 5 minutes

state = StopperState(
    iteration=10,
    best_score=0.8,
    stagnation_counter=2,
    total_evaluations=50,
    candidates_count=3,
    elapsed_seconds=400.0,  # Exceeds timeout
)

stopper(state)  # Returns True (should stop)
Note

All timeout values must be positive. Zero and negative values raise ValueError to prevent invalid configurations.

Source code in src/gepa_adk/adapters/stoppers/timeout.py
class TimeoutStopper:
    """Stop evolution after a specified timeout.

    Terminates evolution when wall-clock time exceeds the configured
    timeout duration. Useful for resource management and CI/CD pipelines.

    Attributes:
        timeout_seconds (float): Maximum wall-clock time for evolution.

    Examples:
        Creating a 5-minute timeout:

        ```python
        from gepa_adk.adapters.stoppers import TimeoutStopper
        from gepa_adk.domain.stopper import StopperState

        stopper = TimeoutStopper(300.0)  # 5 minutes

        state = StopperState(
            iteration=10,
            best_score=0.8,
            stagnation_counter=2,
            total_evaluations=50,
            candidates_count=3,
            elapsed_seconds=400.0,  # Exceeds timeout
        )

        stopper(state)  # Returns True (should stop)
        ```

    Note:
        All timeout values must be positive. Zero and negative values
        raise ValueError to prevent invalid configurations.
    """

    def __init__(self, timeout_seconds: float) -> None:
        """Initialize timeout stopper with maximum duration.

        Args:
            timeout_seconds: Maximum wall-clock time for evolution in seconds.
                Must be a positive value.

        Raises:
            ValueError: If timeout_seconds is zero or negative.

        Examples:
            ```python
            stopper = TimeoutStopper(60.0)  # 1 minute
            stopper = TimeoutStopper(3600.0)  # 1 hour
            ```

        Note:
            Consider using reasonable timeouts for your use case. Very
            short timeouts may not allow sufficient evolution progress.
        """
        if timeout_seconds <= 0:
            raise ValueError("timeout_seconds must be positive")
        self.timeout_seconds = timeout_seconds

    def __call__(self, state: StopperState) -> bool:
        """Check if evolution should stop due to timeout.

        Args:
            state: Current evolution state snapshot containing elapsed_seconds.

        Returns:
            True if elapsed time meets or exceeds timeout, False otherwise.

        Examples:
            ```python
            stopper = TimeoutStopper(60.0)

            # Not yet timed out
            state1 = StopperState(
                iteration=5,
                best_score=0.5,
                stagnation_counter=0,
                total_evaluations=25,
                candidates_count=1,
                elapsed_seconds=30.0,
            )
            stopper(state1)  # False

            # Timed out
            state2 = StopperState(
                iteration=10,
                best_score=0.7,
                stagnation_counter=1,
                total_evaluations=50,
                candidates_count=2,
                elapsed_seconds=65.0,
            )
            stopper(state2)  # True
            ```

        Note:
            Often called after each iteration. Returns True as soon as
            the time limit is reached.
        """
        return state.elapsed_seconds >= self.timeout_seconds

__init__

__init__(timeout_seconds: float) -> None

Initialize timeout stopper with maximum duration.

PARAMETER DESCRIPTION
timeout_seconds

Maximum wall-clock time for evolution in seconds. Must be a positive value.

TYPE: float

RAISES DESCRIPTION
ValueError

If timeout_seconds is zero or negative.

Examples:

stopper = TimeoutStopper(60.0)  # 1 minute
stopper = TimeoutStopper(3600.0)  # 1 hour
Note

Consider using reasonable timeouts for your use case. Very short timeouts may not allow sufficient evolution progress.

Source code in src/gepa_adk/adapters/stoppers/timeout.py
def __init__(self, timeout_seconds: float) -> None:
    """Initialize timeout stopper with maximum duration.

    Args:
        timeout_seconds: Maximum wall-clock time for evolution in seconds.
            Must be a positive value.

    Raises:
        ValueError: If timeout_seconds is zero or negative.

    Examples:
        ```python
        stopper = TimeoutStopper(60.0)  # 1 minute
        stopper = TimeoutStopper(3600.0)  # 1 hour
        ```

    Note:
        Consider using reasonable timeouts for your use case. Very
        short timeouts may not allow sufficient evolution progress.
    """
    if timeout_seconds <= 0:
        raise ValueError("timeout_seconds must be positive")
    self.timeout_seconds = timeout_seconds

__call__

__call__(state: StopperState) -> bool

Check if evolution should stop due to timeout.

PARAMETER DESCRIPTION
state

Current evolution state snapshot containing elapsed_seconds.

TYPE: StopperState

RETURNS DESCRIPTION
bool

True if elapsed time meets or exceeds timeout, False otherwise.

Examples:

stopper = TimeoutStopper(60.0)

# Not yet timed out
state1 = StopperState(
    iteration=5,
    best_score=0.5,
    stagnation_counter=0,
    total_evaluations=25,
    candidates_count=1,
    elapsed_seconds=30.0,
)
stopper(state1)  # False

# Timed out
state2 = StopperState(
    iteration=10,
    best_score=0.7,
    stagnation_counter=1,
    total_evaluations=50,
    candidates_count=2,
    elapsed_seconds=65.0,
)
stopper(state2)  # True
Note

Often called after each iteration. Returns True as soon as the time limit is reached.

Source code in src/gepa_adk/adapters/stoppers/timeout.py
def __call__(self, state: StopperState) -> bool:
    """Check if evolution should stop due to timeout.

    Args:
        state: Current evolution state snapshot containing elapsed_seconds.

    Returns:
        True if elapsed time meets or exceeds timeout, False otherwise.

    Examples:
        ```python
        stopper = TimeoutStopper(60.0)

        # Not yet timed out
        state1 = StopperState(
            iteration=5,
            best_score=0.5,
            stagnation_counter=0,
            total_evaluations=25,
            candidates_count=1,
            elapsed_seconds=30.0,
        )
        stopper(state1)  # False

        # Timed out
        state2 = StopperState(
            iteration=10,
            best_score=0.7,
            stagnation_counter=1,
            total_evaluations=50,
            candidates_count=2,
            elapsed_seconds=65.0,
        )
        stopper(state2)  # True
        ```

    Note:
        Often called after each iteration. Returns True as soon as
        the time limit is reached.
    """
    return state.elapsed_seconds >= self.timeout_seconds

TrialBuilder

Build trial records for reflection datasets.

Constructs consistent trial structures following the GEPA whitepaper format. Extracts feedback fields from scorer metadata and builds trajectory dicts.

ATTRIBUTE DESCRIPTION
_logger

Logger for metadata passthrough debugging.

TYPE: BoundLogger

Examples:

Basic trial building:

builder = TrialBuilder()

# With minimal data
trial = builder.build_trial(
    input_text="Hello",
    output="Hi there!",
    score=0.8,
)

# With full metadata
trial = builder.build_trial(
    input_text="Explain AI",
    output="AI is...",
    score=0.9,
    metadata={
        "feedback": "Clear explanation",
        "dimension_scores": {"clarity": 0.95},
        "actionable_guidance": "Add examples",
    },
    extra_trajectory={"component": "instruction"},
)
See Also
Note

All optional metadata fields are validated before inclusion to prevent malformed data from propagating to reflection prompts.

Source code in src/gepa_adk/adapters/trial_builder.py
class TrialBuilder:
    """Build trial records for reflection datasets.

    Constructs consistent trial structures following the GEPA whitepaper format.
    Extracts feedback fields from scorer metadata and builds trajectory dicts.

    Attributes:
        _logger (structlog.BoundLogger): Logger for metadata passthrough debugging.

    Examples:
        Basic trial building:

        ```python
        builder = TrialBuilder()

        # With minimal data
        trial = builder.build_trial(
            input_text="Hello",
            output="Hi there!",
            score=0.8,
        )

        # With full metadata
        trial = builder.build_trial(
            input_text="Explain AI",
            output="AI is...",
            score=0.9,
            metadata={
                "feedback": "Clear explanation",
                "dimension_scores": {"clarity": 0.95},
                "actionable_guidance": "Add examples",
            },
            extra_trajectory={"component": "instruction"},
        )
        ```

    See Also:
        - [`build_feedback`][gepa_adk.adapters.trial_builder.TrialBuilder.build_feedback]:
          Build just the feedback dict.

    Note:
        All optional metadata fields are validated before inclusion to prevent
        malformed data from propagating to reflection prompts.
    """

    def __init__(self) -> None:
        """Initialize the TrialBuilder.

        Examples:
            ```python
            builder = TrialBuilder()
            ```

        Note:
            Creates a module-scoped logger for metadata passthrough debugging.
        """
        self._logger = structlog.get_logger(__name__)

    def build_feedback(
        self,
        score: float,
        metadata: dict[str, Any] | None = None,
        *,
        error: str | None = None,
        log_passthrough: bool = False,
    ) -> dict[str, Any]:
        """Build the feedback dict from score and metadata.

        Extracts and validates feedback fields from scorer metadata, including
        feedback_text, feedback_guidance, and feedback_dimensions.

        Args:
            score: Evaluation score (mandatory).
            metadata: Optional scorer metadata dict containing:
                - feedback: Text feedback from critic.
                - actionable_guidance: Improvement suggestions.
                - dimension_scores: Per-dimension score breakdown.
            error: Optional error message to include in feedback.
            log_passthrough: If True, log debug info about metadata extraction.

        Returns:
            Feedback dict with keys:
                - score (mandatory): The evaluation score.
                - feedback_text (if available): Text feedback from critic.
                - feedback_guidance (if available): Improvement suggestions.
                - feedback_dimensions (if available): Dimension score breakdown.
                - error (if provided): Error message from execution.

        Examples:
            ```python
            builder = TrialBuilder()

            # Minimal feedback
            feedback = builder.build_feedback(0.75)
            assert feedback == {"score": 0.75}

            # With metadata
            feedback = builder.build_feedback(
                0.85,
                metadata={
                    "feedback": "Good work",
                    "dimension_scores": {"accuracy": 0.9},
                },
            )
            assert "feedback_text" in feedback
            ```

        Note:
            Only non-empty strings and dicts are included to keep feedback clean.
        """
        # Use normalize_feedback for consistent field mapping
        feedback = normalize_feedback(score, metadata)

        # Log metadata passthrough for debugging if requested
        if log_passthrough and metadata and isinstance(metadata, dict):
            has_feedback = bool(
                metadata.get("feedback") or metadata.get("feedback_text")
            )
            has_guidance = bool(metadata.get("actionable_guidance"))
            has_dimensions = bool(metadata.get("dimension_scores"))
            self._logger.debug(
                "trial_builder.metadata.passthrough",
                has_feedback=has_feedback,
                has_guidance=has_guidance,
                has_dimensions=has_dimensions,
            )

        # Add error if present
        if error:
            feedback["error"] = error

        return feedback

    def build_trial(
        self,
        input_text: str | None,
        output: str,
        score: float,
        metadata: dict[str, Any] | None = None,
        *,
        error: str | None = None,
        trace: dict[str, Any] | None = None,
        extra_trajectory: dict[str, Any] | None = None,
        log_passthrough: bool = False,
    ) -> dict[str, Any]:
        """Build a complete trial record for reflection.

        Constructs a trial with feedback and trajectory dicts following the
        GEPA whitepaper structure.

        Args:
            input_text: The input that was given to the system. Can be None for
                pipelines where input context is implicit.
            output: What the system produced.
            score: Evaluation score for this output.
            metadata: Optional scorer metadata dict (from CriticScorer).
            error: Optional error message from execution.
            trace: Optional execution trace dict (tool calls, state, tokens).
            extra_trajectory: Optional extra fields to include in trajectory
                (e.g., component name, component value, tokens).
            log_passthrough: If True, log debug info about metadata extraction.

        Returns:
            Trial dict with keys:
                - feedback: Evaluation feedback (score, feedback_text, etc.)
                - trajectory: Execution journey (input, output, trace, etc.)

        Examples:
            ```python
            builder = TrialBuilder()

            # Simple trial
            trial = builder.build_trial(
                input_text="What is Python?",
                output="A programming language",
                score=0.9,
            )
            assert trial["feedback"]["score"] == 0.9
            assert trial["trajectory"]["output"] == "A programming language"

            # With trace and extra trajectory data
            trial = builder.build_trial(
                input_text="Count to 3",
                output="1, 2, 3",
                score=1.0,
                trace={"tool_calls": [{"name": "count"}]},
                extra_trajectory={"component": "counter"},
            )
            assert "trace" in trial["trajectory"]
            assert trial["trajectory"]["component"] == "counter"
            ```

        Note:
            Optional input is only included in trajectory when not None,
            supporting pipelines where input context is implicit.
        """
        # Build feedback dict
        feedback = self.build_feedback(
            score,
            metadata,
            error=error,
            log_passthrough=log_passthrough,
        )

        # Build trajectory dict
        trajectory: dict[str, Any] = {"output": output}

        # Add input if available
        if input_text is not None:
            trajectory["input"] = input_text

        # Add trace if available
        if trace:
            trajectory["trace"] = trace

        # Add extra trajectory fields if provided
        if extra_trajectory:
            trajectory.update(extra_trajectory)

        # Build trial record
        return {
            "feedback": feedback,
            "trajectory": trajectory,
        }

__init__

__init__() -> None

Initialize the TrialBuilder.

Examples:

builder = TrialBuilder()
Note

Creates a module-scoped logger for metadata passthrough debugging.

Source code in src/gepa_adk/adapters/trial_builder.py
def __init__(self) -> None:
    """Initialize the TrialBuilder.

    Examples:
        ```python
        builder = TrialBuilder()
        ```

    Note:
        Creates a module-scoped logger for metadata passthrough debugging.
    """
    self._logger = structlog.get_logger(__name__)

build_feedback

build_feedback(
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]

Build the feedback dict from score and metadata.

Extracts and validates feedback fields from scorer metadata, including feedback_text, feedback_guidance, and feedback_dimensions.

PARAMETER DESCRIPTION
score

Evaluation score (mandatory).

TYPE: float

metadata

Optional scorer metadata dict containing: - feedback: Text feedback from critic. - actionable_guidance: Improvement suggestions. - dimension_scores: Per-dimension score breakdown.

TYPE: dict[str, Any] | None DEFAULT: None

error

Optional error message to include in feedback.

TYPE: str | None DEFAULT: None

log_passthrough

If True, log debug info about metadata extraction.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
dict[str, Any]

Feedback dict with keys: - score (mandatory): The evaluation score. - feedback_text (if available): Text feedback from critic. - feedback_guidance (if available): Improvement suggestions. - feedback_dimensions (if available): Dimension score breakdown. - error (if provided): Error message from execution.

Examples:

builder = TrialBuilder()

# Minimal feedback
feedback = builder.build_feedback(0.75)
assert feedback == {"score": 0.75}

# With metadata
feedback = builder.build_feedback(
    0.85,
    metadata={
        "feedback": "Good work",
        "dimension_scores": {"accuracy": 0.9},
    },
)
assert "feedback_text" in feedback
Note

Only non-empty strings and dicts are included to keep feedback clean.

Source code in src/gepa_adk/adapters/trial_builder.py
def build_feedback(
    self,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]:
    """Build the feedback dict from score and metadata.

    Extracts and validates feedback fields from scorer metadata, including
    feedback_text, feedback_guidance, and feedback_dimensions.

    Args:
        score: Evaluation score (mandatory).
        metadata: Optional scorer metadata dict containing:
            - feedback: Text feedback from critic.
            - actionable_guidance: Improvement suggestions.
            - dimension_scores: Per-dimension score breakdown.
        error: Optional error message to include in feedback.
        log_passthrough: If True, log debug info about metadata extraction.

    Returns:
        Feedback dict with keys:
            - score (mandatory): The evaluation score.
            - feedback_text (if available): Text feedback from critic.
            - feedback_guidance (if available): Improvement suggestions.
            - feedback_dimensions (if available): Dimension score breakdown.
            - error (if provided): Error message from execution.

    Examples:
        ```python
        builder = TrialBuilder()

        # Minimal feedback
        feedback = builder.build_feedback(0.75)
        assert feedback == {"score": 0.75}

        # With metadata
        feedback = builder.build_feedback(
            0.85,
            metadata={
                "feedback": "Good work",
                "dimension_scores": {"accuracy": 0.9},
            },
        )
        assert "feedback_text" in feedback
        ```

    Note:
        Only non-empty strings and dicts are included to keep feedback clean.
    """
    # Use normalize_feedback for consistent field mapping
    feedback = normalize_feedback(score, metadata)

    # Log metadata passthrough for debugging if requested
    if log_passthrough and metadata and isinstance(metadata, dict):
        has_feedback = bool(
            metadata.get("feedback") or metadata.get("feedback_text")
        )
        has_guidance = bool(metadata.get("actionable_guidance"))
        has_dimensions = bool(metadata.get("dimension_scores"))
        self._logger.debug(
            "trial_builder.metadata.passthrough",
            has_feedback=has_feedback,
            has_guidance=has_guidance,
            has_dimensions=has_dimensions,
        )

    # Add error if present
    if error:
        feedback["error"] = error

    return feedback

build_trial

build_trial(
    input_text: str | None,
    output: str,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    trace: dict[str, Any] | None = None,
    extra_trajectory: dict[str, Any] | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]

Build a complete trial record for reflection.

Constructs a trial with feedback and trajectory dicts following the GEPA whitepaper structure.

PARAMETER DESCRIPTION
input_text

The input that was given to the system. Can be None for pipelines where input context is implicit.

TYPE: str | None

output

What the system produced.

TYPE: str

score

Evaluation score for this output.

TYPE: float

metadata

Optional scorer metadata dict (from CriticScorer).

TYPE: dict[str, Any] | None DEFAULT: None

error

Optional error message from execution.

TYPE: str | None DEFAULT: None

trace

Optional execution trace dict (tool calls, state, tokens).

TYPE: dict[str, Any] | None DEFAULT: None

extra_trajectory

Optional extra fields to include in trajectory (e.g., component name, component value, tokens).

TYPE: dict[str, Any] | None DEFAULT: None

log_passthrough

If True, log debug info about metadata extraction.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
dict[str, Any]

Trial dict with keys: - feedback: Evaluation feedback (score, feedback_text, etc.) - trajectory: Execution journey (input, output, trace, etc.)

Examples:

builder = TrialBuilder()

# Simple trial
trial = builder.build_trial(
    input_text="What is Python?",
    output="A programming language",
    score=0.9,
)
assert trial["feedback"]["score"] == 0.9
assert trial["trajectory"]["output"] == "A programming language"

# With trace and extra trajectory data
trial = builder.build_trial(
    input_text="Count to 3",
    output="1, 2, 3",
    score=1.0,
    trace={"tool_calls": [{"name": "count"}]},
    extra_trajectory={"component": "counter"},
)
assert "trace" in trial["trajectory"]
assert trial["trajectory"]["component"] == "counter"
Note

Optional input is only included in trajectory when not None, supporting pipelines where input context is implicit.

Source code in src/gepa_adk/adapters/trial_builder.py
def build_trial(
    self,
    input_text: str | None,
    output: str,
    score: float,
    metadata: dict[str, Any] | None = None,
    *,
    error: str | None = None,
    trace: dict[str, Any] | None = None,
    extra_trajectory: dict[str, Any] | None = None,
    log_passthrough: bool = False,
) -> dict[str, Any]:
    """Build a complete trial record for reflection.

    Constructs a trial with feedback and trajectory dicts following the
    GEPA whitepaper structure.

    Args:
        input_text: The input that was given to the system. Can be None for
            pipelines where input context is implicit.
        output: What the system produced.
        score: Evaluation score for this output.
        metadata: Optional scorer metadata dict (from CriticScorer).
        error: Optional error message from execution.
        trace: Optional execution trace dict (tool calls, state, tokens).
        extra_trajectory: Optional extra fields to include in trajectory
            (e.g., component name, component value, tokens).
        log_passthrough: If True, log debug info about metadata extraction.

    Returns:
        Trial dict with keys:
            - feedback: Evaluation feedback (score, feedback_text, etc.)
            - trajectory: Execution journey (input, output, trace, etc.)

    Examples:
        ```python
        builder = TrialBuilder()

        # Simple trial
        trial = builder.build_trial(
            input_text="What is Python?",
            output="A programming language",
            score=0.9,
        )
        assert trial["feedback"]["score"] == 0.9
        assert trial["trajectory"]["output"] == "A programming language"

        # With trace and extra trajectory data
        trial = builder.build_trial(
            input_text="Count to 3",
            output="1, 2, 3",
            score=1.0,
            trace={"tool_calls": [{"name": "count"}]},
            extra_trajectory={"component": "counter"},
        )
        assert "trace" in trial["trajectory"]
        assert trial["trajectory"]["component"] == "counter"
        ```

    Note:
        Optional input is only included in trajectory when not None,
        supporting pipelines where input context is implicit.
    """
    # Build feedback dict
    feedback = self.build_feedback(
        score,
        metadata,
        error=error,
        log_passthrough=log_passthrough,
    )

    # Build trajectory dict
    trajectory: dict[str, Any] = {"output": output}

    # Add input if available
    if input_text is not None:
        trajectory["input"] = input_text

    # Add trace if available
    if trace:
        trajectory["trace"] = trace

    # Add extra trajectory fields if provided
    if extra_trajectory:
        trajectory.update(extra_trajectory)

    # Build trial record
    return {
        "feedback": feedback,
        "trajectory": trajectory,
    }

VideoBlobService

Video blob loading service for multimodal content.

Implements VideoBlobServiceProtocol to convert video files to ADK Part objects. Validates video files for existence, size limits, and MIME types before loading.

ATTRIBUTE DESCRIPTION
_logger

Structured logger for video operations.

TYPE: BoundLogger

Examples:

Load a single video:

service = VideoBlobService()
parts = await service.prepare_video_parts(["/data/lecture.mp4"])
assert len(parts) == 1

Load multiple videos:

paths = ["/data/intro.mp4", "/data/main.mp4"]
parts = await service.prepare_video_parts(paths)
assert len(parts) == 2

Handle validation errors:

from gepa_adk.domain.exceptions import VideoValidationError

try:
    parts = await service.prepare_video_parts(["/missing.mp4"])
except VideoValidationError as e:
    print(f"Invalid: {e.video_path}")
Note

Adapter implements VideoBlobServiceProtocol for dependency injection and testing. All ADK-specific Part creation is encapsulated here.

Source code in src/gepa_adk/adapters/video_blob_service.py
class VideoBlobService:
    """Video blob loading service for multimodal content.

    Implements VideoBlobServiceProtocol to convert video files to ADK Part
    objects. Validates video files for existence, size limits, and MIME types
    before loading.

    Attributes:
        _logger (BoundLogger): Structured logger for video operations.

    Examples:
        Load a single video:

        ```python
        service = VideoBlobService()
        parts = await service.prepare_video_parts(["/data/lecture.mp4"])
        assert len(parts) == 1
        ```

        Load multiple videos:

        ```python
        paths = ["/data/intro.mp4", "/data/main.mp4"]
        parts = await service.prepare_video_parts(paths)
        assert len(parts) == 2
        ```

        Handle validation errors:

        ```python
        from gepa_adk.domain.exceptions import VideoValidationError

        try:
            parts = await service.prepare_video_parts(["/missing.mp4"])
        except VideoValidationError as e:
            print(f"Invalid: {e.video_path}")
        ```

    Note:
        Adapter implements VideoBlobServiceProtocol for dependency injection
        and testing. All ADK-specific Part creation is encapsulated here.
    """

    def __init__(self) -> None:
        """Initialize VideoBlobService.

        Examples:
            Default initialization:

            ```python
            service = VideoBlobService()
            ```

        Note:
            Creates a service instance with a bound logger for video
            operations. No external dependencies are required.
        """
        self._logger = logger.bind(component="VideoBlobService")

    def _detect_mime_type(self, video_path: str) -> str:
        """Detect MIME type of a video file.

        Uses Python's mimetypes module to guess the MIME type based on
        file extension.

        Args:
            video_path: Path to the video file.

        Returns:
            Detected MIME type string (e.g., "video/mp4").

        Note:
            Scans file extension only, not file content. Falls back to
            "application/octet-stream" for unknown types.
        """
        mime_type, _ = mimetypes.guess_type(video_path)
        return mime_type or "application/octet-stream"

    def validate_video_file(self, video_path: str) -> VideoFileInfo:
        """Validate a video file and return its metadata.

        Checks that the file exists, is within size limits, and has
        a valid video MIME type.

        Args:
            video_path: Absolute path to the video file to validate.

        Returns:
            VideoFileInfo containing validated metadata.

        Raises:
            VideoValidationError: If validation fails.

        Examples:
            Validate a video file:

            ```python
            info = service.validate_video_file("/data/video.mp4")
            print(f"Size: {info.size_bytes}")
            ```

        Note:
            Operates synchronously for fast pre-validation. File content
            is not read, only metadata is checked.
        """
        path = Path(video_path)

        # Check file existence
        if not path.exists():
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="file_not_found",
            )
            raise VideoValidationError(
                f"Video file not found: {video_path}",
                video_path=video_path,
                constraint="file must exist",
            )

        # Check file is not a directory
        if not path.is_file():
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="not_a_file",
            )
            raise VideoValidationError(
                f"Path is not a file: {video_path}",
                video_path=video_path,
                constraint="path must be a file",
            )

        # Check file size
        size_bytes = path.stat().st_size
        if size_bytes > MAX_VIDEO_SIZE_BYTES:
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="file_too_large",
                size_bytes=size_bytes,
                max_bytes=MAX_VIDEO_SIZE_BYTES,
            )
            raise VideoValidationError(
                f"Video exceeds 2GB limit: {size_bytes} bytes",
                video_path=video_path,
                constraint="size <= 2GB",
            )

        # Check MIME type
        mime_type = self._detect_mime_type(video_path)
        if not mime_type.startswith("video/"):
            self._logger.warning(
                "video.validation_failed",
                video_path=video_path,
                reason="invalid_mime_type",
                mime_type=mime_type,
            )
            raise VideoValidationError(
                f"Not a video file: {mime_type}",
                video_path=video_path,
                constraint="must be video/* MIME type",
            )

        self._logger.debug(
            "video.validated",
            video_path=video_path,
            size_bytes=size_bytes,
            mime_type=mime_type,
        )

        return VideoFileInfo(
            path=video_path,
            size_bytes=size_bytes,
            mime_type=mime_type,
        )

    async def _load_video_bytes(self, video_path: str) -> bytes:
        """Load video file bytes asynchronously.

        Reads the file content in a thread pool to avoid blocking
        the event loop.

        Args:
            video_path: Path to the video file.

        Returns:
            Raw bytes of the video file.

        Note:
            Spawns blocking file read in thread pool via run_in_executor
            to prevent blocking the event loop during large file reads.
        """
        loop = asyncio.get_running_loop()

        def read_file() -> bytes:
            with open(video_path, "rb") as f:
                return f.read()

        return await loop.run_in_executor(None, read_file)

    async def prepare_video_parts(self, video_paths: list[str]) -> list[Part]:
        """Load video files and create Part objects for multimodal content.

        Validates and loads all video files, converting them to ADK Part
        objects with inline video data.

        Args:
            video_paths: List of absolute paths to video files.

        Returns:
            List of Part objects, one per input path. Order is preserved.

        Raises:
            ValueError: If video_paths is empty.
            VideoValidationError: If any file fails validation.

        Examples:
            Load videos:

            ```python
            parts = await service.prepare_video_parts(
                [
                    "/data/intro.mp4",
                    "/data/main.mp4",
                ]
            )
            assert len(parts) == 2
            ```

        Note:
            Operations include async file I/O. Videos are validated before
            loading to provide early failure with clear error messages.
        """
        if not video_paths:
            raise ValueError("video_paths cannot be empty")

        self._logger.info(
            "video.prepare_start",
            video_count=len(video_paths),
        )

        parts: list[Part] = []

        for video_path in video_paths:
            # Validate first
            info = self.validate_video_file(video_path)

            # Load video bytes
            video_bytes = await self._load_video_bytes(video_path)

            # Create Part with inline data
            part = Part.from_bytes(
                data=video_bytes,
                mime_type=info.mime_type,
            )
            parts.append(part)

            self._logger.debug(
                "video.loaded",
                video_path=video_path,
                size_bytes=info.size_bytes,
                mime_type=info.mime_type,
            )

        self._logger.info(
            "video.prepare_complete",
            video_count=len(parts),
        )

        return parts

__init__

__init__() -> None

Initialize VideoBlobService.

Examples:

Default initialization:

service = VideoBlobService()
Note

Creates a service instance with a bound logger for video operations. No external dependencies are required.

Source code in src/gepa_adk/adapters/video_blob_service.py
def __init__(self) -> None:
    """Initialize VideoBlobService.

    Examples:
        Default initialization:

        ```python
        service = VideoBlobService()
        ```

    Note:
        Creates a service instance with a bound logger for video
        operations. No external dependencies are required.
    """
    self._logger = logger.bind(component="VideoBlobService")

validate_video_file

validate_video_file(video_path: str) -> VideoFileInfo

Validate a video file and return its metadata.

Checks that the file exists, is within size limits, and has a valid video MIME type.

PARAMETER DESCRIPTION
video_path

Absolute path to the video file to validate.

TYPE: str

RETURNS DESCRIPTION
VideoFileInfo

VideoFileInfo containing validated metadata.

RAISES DESCRIPTION
VideoValidationError

If validation fails.

Examples:

Validate a video file:

info = service.validate_video_file("/data/video.mp4")
print(f"Size: {info.size_bytes}")
Note

Operates synchronously for fast pre-validation. File content is not read, only metadata is checked.

Source code in src/gepa_adk/adapters/video_blob_service.py
def validate_video_file(self, video_path: str) -> VideoFileInfo:
    """Validate a video file and return its metadata.

    Checks that the file exists, is within size limits, and has
    a valid video MIME type.

    Args:
        video_path: Absolute path to the video file to validate.

    Returns:
        VideoFileInfo containing validated metadata.

    Raises:
        VideoValidationError: If validation fails.

    Examples:
        Validate a video file:

        ```python
        info = service.validate_video_file("/data/video.mp4")
        print(f"Size: {info.size_bytes}")
        ```

    Note:
        Operates synchronously for fast pre-validation. File content
        is not read, only metadata is checked.
    """
    path = Path(video_path)

    # Check file existence
    if not path.exists():
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="file_not_found",
        )
        raise VideoValidationError(
            f"Video file not found: {video_path}",
            video_path=video_path,
            constraint="file must exist",
        )

    # Check file is not a directory
    if not path.is_file():
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="not_a_file",
        )
        raise VideoValidationError(
            f"Path is not a file: {video_path}",
            video_path=video_path,
            constraint="path must be a file",
        )

    # Check file size
    size_bytes = path.stat().st_size
    if size_bytes > MAX_VIDEO_SIZE_BYTES:
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="file_too_large",
            size_bytes=size_bytes,
            max_bytes=MAX_VIDEO_SIZE_BYTES,
        )
        raise VideoValidationError(
            f"Video exceeds 2GB limit: {size_bytes} bytes",
            video_path=video_path,
            constraint="size <= 2GB",
        )

    # Check MIME type
    mime_type = self._detect_mime_type(video_path)
    if not mime_type.startswith("video/"):
        self._logger.warning(
            "video.validation_failed",
            video_path=video_path,
            reason="invalid_mime_type",
            mime_type=mime_type,
        )
        raise VideoValidationError(
            f"Not a video file: {mime_type}",
            video_path=video_path,
            constraint="must be video/* MIME type",
        )

    self._logger.debug(
        "video.validated",
        video_path=video_path,
        size_bytes=size_bytes,
        mime_type=mime_type,
    )

    return VideoFileInfo(
        path=video_path,
        size_bytes=size_bytes,
        mime_type=mime_type,
    )

prepare_video_parts async

prepare_video_parts(video_paths: list[str]) -> list[Part]

Load video files and create Part objects for multimodal content.

Validates and loads all video files, converting them to ADK Part objects with inline video data.

PARAMETER DESCRIPTION
video_paths

List of absolute paths to video files.

TYPE: list[str]

RETURNS DESCRIPTION
list[Part]

List of Part objects, one per input path. Order is preserved.

RAISES DESCRIPTION
ValueError

If video_paths is empty.

VideoValidationError

If any file fails validation.

Examples:

Load videos:

parts = await service.prepare_video_parts(
    [
        "/data/intro.mp4",
        "/data/main.mp4",
    ]
)
assert len(parts) == 2
Note

Operations include async file I/O. Videos are validated before loading to provide early failure with clear error messages.

Source code in src/gepa_adk/adapters/video_blob_service.py
async def prepare_video_parts(self, video_paths: list[str]) -> list[Part]:
    """Load video files and create Part objects for multimodal content.

    Validates and loads all video files, converting them to ADK Part
    objects with inline video data.

    Args:
        video_paths: List of absolute paths to video files.

    Returns:
        List of Part objects, one per input path. Order is preserved.

    Raises:
        ValueError: If video_paths is empty.
        VideoValidationError: If any file fails validation.

    Examples:
        Load videos:

        ```python
        parts = await service.prepare_video_parts(
            [
                "/data/intro.mp4",
                "/data/main.mp4",
            ]
        )
        assert len(parts) == 2
        ```

    Note:
        Operations include async file I/O. Videos are validated before
        loading to provide early failure with clear error messages.
    """
    if not video_paths:
        raise ValueError("video_paths cannot be empty")

    self._logger.info(
        "video.prepare_start",
        video_count=len(video_paths),
    )

    parts: list[Part] = []

    for video_path in video_paths:
        # Validate first
        info = self.validate_video_file(video_path)

        # Load video bytes
        video_bytes = await self._load_video_bytes(video_path)

        # Create Part with inline data
        part = Part.from_bytes(
            data=video_bytes,
            mime_type=info.mime_type,
        )
        parts.append(part)

        self._logger.debug(
            "video.loaded",
            video_path=video_path,
            size_bytes=info.size_bytes,
            mime_type=info.mime_type,
        )

    self._logger.info(
        "video.prepare_complete",
        video_count=len(parts),
    )

    return parts

create_candidate_selector

create_candidate_selector(
    selector_type: str,
    *,
    epsilon: float = 0.1,
    rng: Random | None = None,
) -> CandidateSelectorProtocol

Create a candidate selector by name.

PARAMETER DESCRIPTION
selector_type

Selector identifier (pareto, greedy, epsilon_greedy).

TYPE: str

epsilon

Exploration rate for epsilon-greedy selector.

TYPE: float DEFAULT: 0.1

rng

Optional RNG for selectors using randomness.

TYPE: Random | None DEFAULT: None

RETURNS DESCRIPTION
CandidateSelectorProtocol

CandidateSelectorProtocol implementation.

RAISES DESCRIPTION
ConfigurationError

If selector_type is unsupported.

Examples:

selector = create_candidate_selector("pareto")
candidate_idx = await selector.select_candidate(state)
Source code in src/gepa_adk/adapters/candidate_selector.py
def create_candidate_selector(
    selector_type: str,
    *,
    epsilon: float = 0.1,
    rng: random.Random | None = None,
) -> CandidateSelectorProtocol:
    """Create a candidate selector by name.

    Args:
        selector_type: Selector identifier (pareto, greedy, epsilon_greedy).
        epsilon: Exploration rate for epsilon-greedy selector.
        rng: Optional RNG for selectors using randomness.

    Returns:
        CandidateSelectorProtocol implementation.

    Raises:
        ConfigurationError: If selector_type is unsupported.

    Examples:
        ```python
        selector = create_candidate_selector("pareto")
        candidate_idx = await selector.select_candidate(state)
        ```
    """
    normalized = selector_type.strip().lower()
    if normalized in {"pareto"}:
        return ParetoCandidateSelector(rng=rng)
    if normalized in {"greedy", "current_best", "current-best"}:
        return CurrentBestCandidateSelector()
    if normalized in {"epsilon_greedy", "epsilon-greedy"}:
        return EpsilonGreedyCandidateSelector(epsilon=epsilon, rng=rng)
    raise ConfigurationError(
        "selector_type must be one of pareto, greedy, epsilon_greedy",
        field="selector_type",
        value=selector_type,
        constraint="pareto|greedy|epsilon_greedy",
    )

get_handler

get_handler(name: str) -> ComponentHandler

Get handler from default registry.

PARAMETER DESCRIPTION
name

Component name to look up.

TYPE: str

RETURNS DESCRIPTION
ComponentHandler

The registered ComponentHandler.

RAISES DESCRIPTION
ValueError

If name is empty or None.

KeyError

If no handler registered for name.

Examples:

handler = get_handler("instruction")
original = handler.apply(agent, "New instruction")
See Also
  • [ComponentHandlerRegistry.get()]: Underlying registry method.
Note

Shortcut for component_handlers.get(name).

Source code in src/gepa_adk/adapters/component_handlers.py
def get_handler(name: str) -> ComponentHandler:
    """Get handler from default registry.

    Args:
        name: Component name to look up.

    Returns:
        The registered ComponentHandler.

    Raises:
        ValueError: If name is empty or None.
        KeyError: If no handler registered for name.

    Examples:
        ```python
        handler = get_handler("instruction")
        original = handler.apply(agent, "New instruction")
        ```

    See Also:
        - [`ComponentHandlerRegistry.get()`]: Underlying registry method.

    Note:
        Shortcut for component_handlers.get(name).
    """
    return component_handlers.get(name)

register_handler

register_handler(
    name: str, handler: ComponentHandler
) -> None

Register handler in default registry.

PARAMETER DESCRIPTION
name

Component name to register.

TYPE: str

handler

Handler implementing ComponentHandler protocol.

TYPE: ComponentHandler

RAISES DESCRIPTION
ValueError

If name is empty or None.

TypeError

If handler doesn't implement ComponentHandler protocol.

Examples:

register_handler("my_component", MyHandler())
handler = get_handler("my_component")
See Also
  • [ComponentHandlerRegistry.register()]: Underlying registry method.
Note

Shortcut for component_handlers.register(name, handler).

Source code in src/gepa_adk/adapters/component_handlers.py
def register_handler(name: str, handler: ComponentHandler) -> None:
    """Register handler in default registry.

    Args:
        name: Component name to register.
        handler: Handler implementing ComponentHandler protocol.

    Raises:
        ValueError: If name is empty or None.
        TypeError: If handler doesn't implement ComponentHandler protocol.

    Examples:
        ```python
        register_handler("my_component", MyHandler())
        handler = get_handler("my_component")
        ```

    See Also:
        - [`ComponentHandlerRegistry.register()`]: Underlying registry method.

    Note:
        Shortcut for component_handlers.register(name, handler).
    """
    component_handlers.register(name, handler)

create_component_selector

create_component_selector(
    selector_type: str,
) -> ComponentSelectorProtocol

Create a component selector strategy from a string alias.

PARAMETER DESCRIPTION
selector_type

Name of the selector strategy. Supported values: - 'round_robin', 'roundrobin': Round-robin cycling. - 'all', 'all_components': All components simultaneously.

TYPE: str

RETURNS DESCRIPTION
ComponentSelectorProtocol

Instance of requested component selector.

RAISES DESCRIPTION
ValueError

If selector_type is unknown.

Examples:

# Create round-robin selector
selector = create_component_selector("round_robin")

# Create all-components selector
selector = create_component_selector("all")
Note

Supports flexible string aliases with normalization for common variations (underscores, hyphens, case-insensitive).

Source code in src/gepa_adk/adapters/component_selector.py
def create_component_selector(selector_type: str) -> ComponentSelectorProtocol:
    """Create a component selector strategy from a string alias.

    Args:
        selector_type: Name of the selector strategy.
            Supported values:
            - 'round_robin', 'roundrobin': Round-robin cycling.
            - 'all', 'all_components': All components simultaneously.

    Returns:
        Instance of requested component selector.

    Raises:
        ValueError: If selector_type is unknown.

    Examples:
        ```python
        # Create round-robin selector
        selector = create_component_selector("round_robin")

        # Create all-components selector
        selector = create_component_selector("all")
        ```

    Note:
        Supports flexible string aliases with normalization for common
        variations (underscores, hyphens, case-insensitive).
    """
    normalized = selector_type.lower().replace("_", "").replace("-", "")

    if normalized == "roundrobin":
        return RoundRobinComponentSelector()
    elif normalized in ("all", "allcomponents"):
        return AllComponentSelector()

    raise ValueError(f"Unknown component selector: {selector_type}")

normalize_feedback

normalize_feedback(
    score: float, metadata: dict[str, Any] | None
) -> dict[str, Any]

Normalize critic feedback to consistent trial format.

Converts both simple and advanced critic outputs to a standardized format for use in trial records. This enables the reflection agent to receive consistent feedback regardless of which critic schema was used.

PARAMETER DESCRIPTION
score

The numeric score from the critic (0.0-1.0).

TYPE: float

metadata

Optional metadata dict from critic output. May contain: - feedback (str): Simple feedback text - dimension_scores (dict): Per-dimension scores - actionable_guidance (str): Improvement suggestions - Any additional fields from critic output

TYPE: dict[str, Any] | None

RETURNS DESCRIPTION
dict[str, Any]

Normalized feedback dict with structure:

dict[str, Any]

```python

dict[str, Any]

{ "score": 0.75, "feedback_text": "Main feedback message", "dimension_scores": {...}, # Optional "actionable_guidance": "...", # Optional

dict[str, Any]

}

dict[str, Any]

```

Examples:

Normalize simple feedback:

normalized = normalize_feedback(0.8, {"feedback": "Good job"})
# {"score": 0.8, "feedback_text": "Good job"}

Normalize advanced feedback:

normalized = normalize_feedback(
    0.6,
    {
        "feedback": "Needs work",
        "dimension_scores": {"clarity": 0.5},
        "actionable_guidance": "Add examples",
    },
)
# {
#     "score": 0.6,
#     "feedback_text": "Needs work",
#     "dimension_scores": {"clarity": 0.5},
#     "actionable_guidance": "Add examples",
# }

Handle missing feedback:

normalized = normalize_feedback(0.5, None)
# {"score": 0.5, "feedback_text": ""}
Note

Supports both SimpleCriticOutput and CriticOutput schemas for flexible critic integration. Extracts the "feedback" field and renames it to "feedback_text" for consistent trial structure. Additional fields like dimension_scores are preserved when present.

Source code in src/gepa_adk/adapters/critic_scorer.py
def normalize_feedback(
    score: float,
    metadata: dict[str, Any] | None,
) -> dict[str, Any]:
    """Normalize critic feedback to consistent trial format.

    Converts both simple and advanced critic outputs to a standardized
    format for use in trial records. This enables the reflection agent
    to receive consistent feedback regardless of which critic schema
    was used.

    Args:
        score: The numeric score from the critic (0.0-1.0).
        metadata: Optional metadata dict from critic output. May contain:
            - feedback (str): Simple feedback text
            - dimension_scores (dict): Per-dimension scores
            - actionable_guidance (str): Improvement suggestions
            - Any additional fields from critic output

    Returns:
        Normalized feedback dict with structure:
        ```python
        {
            "score": 0.75,
            "feedback_text": "Main feedback message",
            "dimension_scores": {...},  # Optional
            "actionable_guidance": "...",  # Optional
        }
        ```

    Examples:
        Normalize simple feedback:

        ```python
        normalized = normalize_feedback(0.8, {"feedback": "Good job"})
        # {"score": 0.8, "feedback_text": "Good job"}
        ```

        Normalize advanced feedback:

        ```python
        normalized = normalize_feedback(
            0.6,
            {
                "feedback": "Needs work",
                "dimension_scores": {"clarity": 0.5},
                "actionable_guidance": "Add examples",
            },
        )
        # {
        #     "score": 0.6,
        #     "feedback_text": "Needs work",
        #     "dimension_scores": {"clarity": 0.5},
        #     "actionable_guidance": "Add examples",
        # }
        ```

        Handle missing feedback:

        ```python
        normalized = normalize_feedback(0.5, None)
        # {"score": 0.5, "feedback_text": ""}
        ```

    Note:
        Supports both SimpleCriticOutput and CriticOutput schemas for flexible
        critic integration. Extracts the "feedback" field and renames it to
        "feedback_text" for consistent trial structure. Additional fields
        like dimension_scores are preserved when present.
    """
    result: dict[str, Any] = {"score": score}

    if metadata is None:
        result["feedback_text"] = ""
        return result

    # Extract feedback text - handle both "feedback" and "feedback_text" keys
    feedback_text = metadata.get("feedback_text") or metadata.get("feedback") or ""
    if isinstance(feedback_text, str) and feedback_text.strip():
        result["feedback_text"] = feedback_text.strip()
    else:
        result["feedback_text"] = ""

    # Preserve dimension_scores if present
    dimension_scores = metadata.get("dimension_scores")
    if dimension_scores and isinstance(dimension_scores, dict):
        result["dimension_scores"] = dimension_scores

    # Preserve actionable_guidance if present
    actionable_guidance = metadata.get("actionable_guidance")
    if actionable_guidance and isinstance(actionable_guidance, str):
        guidance_str = actionable_guidance.strip()
        if guidance_str:
            result["actionable_guidance"] = guidance_str

    return result

find_llm_agents

find_llm_agents(
    agent: object,
    max_depth: int = 5,
    current_depth: int = 0,
) -> list["LlmAgent"]

Find all LlmAgents in a workflow (recursive traversal with depth limiting).

Traverses a workflow agent structure recursively to discover all LlmAgent instances at any nesting level, up to the specified maximum depth.

PARAMETER DESCRIPTION
agent

Agent or workflow to search. Can be LlmAgent, workflow agent, or any object.

TYPE: object

max_depth

Maximum recursion depth (default: 5). When current_depth reaches max_depth, traversal stops. Must be >= 1 for meaningful results.

TYPE: int DEFAULT: 5

current_depth

Current recursion level (internal use, default: 0).

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
list['LlmAgent']

List of LlmAgent instances found. Only includes agents with string

list['LlmAgent']

instructions (skips InstructionProvider callables).

Examples:

Finding LlmAgents in a SequentialAgent:

from google.adk.agents import LlmAgent, SequentialAgent
from gepa_adk.adapters.workflow import find_llm_agents

agent1 = LlmAgent(name="agent1", instruction="First")
agent2 = LlmAgent(name="agent2", instruction="Second")
workflow = SequentialAgent(name="pipeline", sub_agents=[agent1, agent2])

agents = find_llm_agents(workflow)
assert len(agents) == 2

Finding LlmAgents in nested workflows:

# Sequential -> Parallel -> LlmAgents
nested_parallel = ParallelAgent(name="parallel", sub_agents=[agent2, agent3])
workflow = SequentialAgent(
    name="pipeline", sub_agents=[agent1, nested_parallel]
)

agents = find_llm_agents(workflow, max_depth=5)
assert len(agents) == 3  # Finds all agents across levels
Note

Operates recursively with depth limiting to discover nested LlmAgents. Skips LlmAgents with InstructionProvider callables (non-string instructions). Respects max_depth to prevent infinite recursion.

Source code in src/gepa_adk/adapters/workflow.py
def find_llm_agents(
    agent: object,
    max_depth: int = 5,
    current_depth: int = 0,
) -> list["LlmAgent"]:
    """Find all LlmAgents in a workflow (recursive traversal with depth limiting).

    Traverses a workflow agent structure recursively to discover all LlmAgent
    instances at any nesting level, up to the specified maximum depth.

    Args:
        agent: Agent or workflow to search. Can be LlmAgent, workflow agent,
            or any object.
        max_depth: Maximum recursion depth (default: 5). When current_depth
            reaches max_depth, traversal stops. Must be >= 1 for meaningful
            results.
        current_depth: Current recursion level (internal use, default: 0).

    Returns:
        List of LlmAgent instances found. Only includes agents with string
        instructions (skips InstructionProvider callables).

    Examples:
        Finding LlmAgents in a SequentialAgent:

        ```python
        from google.adk.agents import LlmAgent, SequentialAgent
        from gepa_adk.adapters.workflow import find_llm_agents

        agent1 = LlmAgent(name="agent1", instruction="First")
        agent2 = LlmAgent(name="agent2", instruction="Second")
        workflow = SequentialAgent(name="pipeline", sub_agents=[agent1, agent2])

        agents = find_llm_agents(workflow)
        assert len(agents) == 2
        ```

        Finding LlmAgents in nested workflows:

        ```python
        # Sequential -> Parallel -> LlmAgents
        nested_parallel = ParallelAgent(name="parallel", sub_agents=[agent2, agent3])
        workflow = SequentialAgent(
            name="pipeline", sub_agents=[agent1, nested_parallel]
        )

        agents = find_llm_agents(workflow, max_depth=5)
        assert len(agents) == 3  # Finds all agents across levels
        ```

    Note:
        Operates recursively with depth limiting to discover nested LlmAgents.
        Skips LlmAgents with InstructionProvider callables (non-string
        instructions). Respects max_depth to prevent infinite recursion.
    """
    from google.adk.agents import LlmAgent

    # Check depth limit first (before processing)
    # Use > instead of >= to allow processing at max_depth
    # e.g., with max_depth=3, we can process agents at depth 0, 1, 2, 3
    if current_depth > max_depth:
        logger.debug(
            "Max depth exceeded, stopping traversal",
            current_depth=current_depth,
            max_depth=max_depth,
        )
        return []

    # If it's an LlmAgent with string instruction, return it
    if isinstance(agent, LlmAgent):
        # Only include string instructions (skip InstructionProvider callables)
        if isinstance(agent.instruction, str):
            logger.debug(
                "Found LlmAgent",
                agent_name=agent.name,
                depth=current_depth,
            )
            return [agent]
        logger.warning(
            "Skipping LlmAgent with non-string instruction",
            agent_name=agent.name,
            instruction_type=type(agent.instruction).__name__,
            depth=current_depth,
        )
        return []

    # If it's a workflow agent, recursively search sub_agents
    if is_workflow_agent(agent):
        logger.debug(
            "Traversing workflow agent",
            workflow_name=getattr(agent, "name", "unknown"),
            workflow_type=type(agent).__name__,
            depth=current_depth,
            max_depth=max_depth,
        )
        agents: list[LlmAgent] = []
        # Recursive traversal: iterate sub_agents and recurse into nested workflows
        # Type narrowing: is_workflow_agent() ensures agent is SequentialAgent,
        # LoopAgent, or ParallelAgent. All inherit from BaseAgent with sub_agents.
        if isinstance(agent, (SequentialAgent, LoopAgent, ParallelAgent)):
            for sub_agent in agent.sub_agents:
                # Recursively search each sub-agent
                nested_agents = find_llm_agents(
                    sub_agent, max_depth=max_depth, current_depth=current_depth + 1
                )
                agents.extend(nested_agents)
        logger.debug(
            "Completed workflow traversal",
            workflow_name=getattr(agent, "name", "unknown"),
            agents_found=len(agents),
            depth=current_depth,
        )
        return agents

    # Not an agent or workflow - return empty list
    logger.debug(
        "Skipping non-agent object",
        object_type=type(agent).__name__,
        depth=current_depth,
    )
    return []

is_workflow_agent

is_workflow_agent(agent: object) -> bool

Check if an agent is a workflow type.

Detects whether an agent is a workflow agent (SequentialAgent, LoopAgent, or ParallelAgent) versus a regular LlmAgent or other agent type.

PARAMETER DESCRIPTION
agent

Agent instance to check. Can be any object type.

TYPE: object

RETURNS DESCRIPTION
bool

True if agent is SequentialAgent, LoopAgent, or ParallelAgent.

bool

False otherwise (including LlmAgent, None, or non-agent objects).

Examples:

Detecting workflow agents:

from google.adk.agents import SequentialAgent, LlmAgent
from gepa_adk.adapters.workflow import is_workflow_agent

sequential = SequentialAgent(name="Pipeline", sub_agents=[])
assert is_workflow_agent(sequential) is True

llm = LlmAgent(name="Agent", instruction="Be helpful")
assert is_workflow_agent(llm) is False
Note

Only workflow agent types (SequentialAgent, LoopAgent, ParallelAgent) are detected. All workflow agents inherit from BaseAgent and have sub_agents, but type detection uses specific class checks for accuracy.

Source code in src/gepa_adk/adapters/workflow.py
def is_workflow_agent(agent: object) -> bool:
    """Check if an agent is a workflow type.

    Detects whether an agent is a workflow agent (SequentialAgent, LoopAgent,
    or ParallelAgent) versus a regular LlmAgent or other agent type.

    Args:
        agent: Agent instance to check. Can be any object type.

    Returns:
        True if agent is SequentialAgent, LoopAgent, or ParallelAgent.
        False otherwise (including LlmAgent, None, or non-agent objects).

    Examples:
        Detecting workflow agents:

        ```python
        from google.adk.agents import SequentialAgent, LlmAgent
        from gepa_adk.adapters.workflow import is_workflow_agent

        sequential = SequentialAgent(name="Pipeline", sub_agents=[])
        assert is_workflow_agent(sequential) is True

        llm = LlmAgent(name="Agent", instruction="Be helpful")
        assert is_workflow_agent(llm) is False
        ```

    Note:
        Only workflow agent types (SequentialAgent, LoopAgent, ParallelAgent)
        are detected. All workflow agents inherit from BaseAgent and have
        sub_agents, but type detection uses specific class checks for accuracy.
    """
    return isinstance(agent, (SequentialAgent, LoopAgent, ParallelAgent))