Genealogy

genealogy ¶

Genealogy tracking and ancestor queries for merge operations.

This module provides functions for tracking parent-child relationships between candidates and finding common ancestors, enabling merge operations that combine improvements from different evolutionary branches.

ATTRIBUTE	DESCRIPTION
`get_ancestors`	Return all ancestor indices for a candidate. TYPE: `function`
`find_common_ancestor`	Find the most recent common ancestor of two candidates. TYPE: `function`

Examples:

Getting all ancestors of a candidate:

from gepa_adk.engine.genealogy import get_ancestors

parent_indices = {0: [None], 1: [0], 2: [1]}
ancestors = get_ancestors(2, parent_indices)
# Returns: {0, 1}

Finding common ancestor:

from gepa_adk.engine.genealogy import find_common_ancestor

parent_indices = {0: [None], 1: [0], 2: [0], 3: [1, 2]}
ancestor = find_common_ancestor(1, 2, parent_indices)
# Returns: 0

Note

This module provides genealogy tracking functions for merge operations. Genealogy tracking enables merge operations by identifying which candidates share common ancestry and can be safely combined.

get_ancestors ¶

get_ancestors(
    candidate_idx: int,
    parent_indices: dict[int, list[int | None]],
) -> set[int]

Return all ancestor indices for a candidate.

Traverses the genealogy tree using breadth-first search to find all ancestors transitively. Seed candidates (with [None] parents) have no ancestors.

PARAMETER	DESCRIPTION
`candidate_idx`	Index of the candidate to trace. TYPE: `int`
`parent_indices`	Mapping of candidate index to parent indices list. TYPE: `dict[int, list[int \| None]]`

RETURNS	DESCRIPTION
`set[int]`	Set of all ancestor candidate indices (excluding the candidate itself).

Examples:

Simple linear genealogy:

parent_indices = {0: [None], 1: [0], 2: [1]}
ancestors = get_ancestors(2, parent_indices)
# Returns: {0, 1}

Merge candidate with two parents:

parent_indices = {0: [None], 1: [0], 2: [0], 3: [1, 2]}
ancestors = get_ancestors(3, parent_indices)
# Returns: {0, 1, 2}

Note

Operations use BFS to avoid recursion depth issues with deep genealogies. Prevents cycles by tracking visited nodes.

Source code in src/gepa_adk/engine/genealogy.py

def get_ancestors(
    candidate_idx: int, parent_indices: dict[int, list[int | None]]
) -> set[int]:
    """Return all ancestor indices for a candidate.

    Traverses the genealogy tree using breadth-first search to find all
    ancestors transitively. Seed candidates (with [None] parents) have
    no ancestors.

    Args:
        candidate_idx: Index of the candidate to trace.
        parent_indices: Mapping of candidate index to parent indices list.

    Returns:
        Set of all ancestor candidate indices (excluding the candidate itself).

    Examples:
        Simple linear genealogy:

        ```python
        parent_indices = {0: [None], 1: [0], 2: [1]}
        ancestors = get_ancestors(2, parent_indices)
        # Returns: {0, 1}
        ```

        Merge candidate with two parents:

        ```python
        parent_indices = {0: [None], 1: [0], 2: [0], 3: [1, 2]}
        ancestors = get_ancestors(3, parent_indices)
        # Returns: {0, 1, 2}
        ```

    Note:
        Operations use BFS to avoid recursion depth issues with deep genealogies.
        Prevents cycles by tracking visited nodes.
    """
    ancestors: set[int] = set()
    visited: set[int] = {candidate_idx}
    queue: deque[int] = deque([candidate_idx])

    while queue:
        current = queue.popleft()
        parents = parent_indices.get(current, [])

        for parent in parents:
            if parent is not None and parent not in visited:
                ancestors.add(parent)
                visited.add(parent)
                queue.append(parent)

    logger.debug(
        "genealogy.ancestors_found",
        candidate_idx=candidate_idx,
        ancestor_count=len(ancestors),
        ancestors=sorted(ancestors),
    )

    return ancestors

find_common_ancestor ¶

find_common_ancestor(
    idx1: int,
    idx2: int,
    parent_indices: dict[int, list[int | None]],
) -> int | None

Find the most recent common ancestor of two candidates.

Identifies the common ancestor with the highest index (most recent) between two candidates. Returns None if no common ancestor exists (separate lineages).

PARAMETER	DESCRIPTION
`idx1`	First candidate index. TYPE: `int`
`idx2`	Second candidate index. TYPE: `int`
`parent_indices`	Mapping of candidate index to parent indices list. TYPE: `dict[int, list[int \| None]]`

RETURNS	DESCRIPTION
`int \| None`	Index of the most recent common ancestor, or None if no common ancestor exists.

Examples:

Candidates sharing a seed:

parent_indices = {0: [None], 1: [0], 2: [0]}
ancestor = find_common_ancestor(1, 2, parent_indices)
# Returns: 0

One candidate is ancestor of the other:

parent_indices = {0: [None], 1: [0], 2: [1]}
ancestor = find_common_ancestor(1, 2, parent_indices)
# Returns: 1 (1 is ancestor of 2)

No common ancestor:

parent_indices = {0: [None], 1: [0], 2: [None], 3: [2]}
ancestor = find_common_ancestor(1, 3, parent_indices)
# Returns: None (separate lineages)

Note

Operations return the highest-indexed common ancestor to ensure we find the most recent shared ancestor, which is most useful for merge operations.

Source code in src/gepa_adk/engine/genealogy.py

def find_common_ancestor(
    idx1: int, idx2: int, parent_indices: dict[int, list[int | None]]
) -> int | None:
    """Find the most recent common ancestor of two candidates.

    Identifies the common ancestor with the highest index (most recent) between
    two candidates. Returns None if no common ancestor exists (separate lineages).

    Args:
        idx1: First candidate index.
        idx2: Second candidate index.
        parent_indices: Mapping of candidate index to parent indices list.

    Returns:
        Index of the most recent common ancestor, or None if no common ancestor exists.

    Examples:
        Candidates sharing a seed:

        ```python
        parent_indices = {0: [None], 1: [0], 2: [0]}
        ancestor = find_common_ancestor(1, 2, parent_indices)
        # Returns: 0
        ```

        One candidate is ancestor of the other:

        ```python
        parent_indices = {0: [None], 1: [0], 2: [1]}
        ancestor = find_common_ancestor(1, 2, parent_indices)
        # Returns: 1 (1 is ancestor of 2)
        ```

        No common ancestor:

        ```python
        parent_indices = {0: [None], 1: [0], 2: [None], 3: [2]}
        ancestor = find_common_ancestor(1, 3, parent_indices)
        # Returns: None (separate lineages)
        ```

    Note:
        Operations return the highest-indexed common ancestor to ensure we find the most
        recent shared ancestor, which is most useful for merge operations.
    """
    # If same candidate, return itself
    if idx1 == idx2:
        logger.debug(
            "genealogy.ancestor_found",
            idx1=idx1,
            idx2=idx2,
            ancestor_idx=idx1,
            relationship="same_candidate",
        )
        return idx1

    ancestors1 = get_ancestors(idx1, parent_indices)
    ancestors2 = get_ancestors(idx2, parent_indices)

    # Include the candidates themselves in case one is an ancestor of the other
    if idx1 in ancestors2:
        # idx1 is an ancestor of idx2
        logger.debug(
            "genealogy.ancestor_found",
            idx1=idx1,
            idx2=idx2,
            ancestor_idx=idx1,
            relationship="idx1_is_ancestor",
        )
        return idx1

    if idx2 in ancestors1:
        # idx2 is an ancestor of idx1
        logger.debug(
            "genealogy.ancestor_found",
            idx1=idx1,
            idx2=idx2,
            ancestor_idx=idx2,
            relationship="idx2_is_ancestor",
        )
        return idx2

    # Find common ancestors
    common = ancestors1 & ancestors2

    if not common:
        logger.debug(
            "genealogy.no_common_ancestor",
            idx1=idx1,
            idx2=idx2,
        )
        return None

    # Return most recent (highest index) common ancestor
    ancestor_idx = max(common)
    logger.debug(
        "genealogy.common_ancestor_found",
        idx1=idx1,
        idx2=idx2,
        ancestor_idx=ancestor_idx,
        all_common=sorted(common),
    )

    return ancestor_idx

filter_ancestors_by_score ¶

filter_ancestors_by_score(
    ancestors: set[int],
    candidate_scores: dict[int, dict[int, float]],
    min_avg_score: float,
) -> set[int]

Filter ancestors by minimum average score constraint.

Removes ancestors that don't meet the minimum average score requirement, ensuring only viable ancestors are considered for merge operations.

PARAMETER	DESCRIPTION
`ancestors`	Set of ancestor candidate indices to filter. TYPE: `set[int]`
`candidate_scores`	Mapping of candidate index to per-example scores. TYPE: `dict[int, dict[int, float]]`
`min_avg_score`	Minimum average score threshold. TYPE: `float`

RETURNS	DESCRIPTION
`set[int]`	Set of ancestor indices that meet the score constraint.

Examples:

Filtering ancestors by score:

ancestors = {0, 1, 2}
candidate_scores = {
    0: {0: 0.5, 1: 0.5},  # avg: 0.5
    1: {0: 0.7, 1: 0.7},  # avg: 0.7
    2: {0: 0.9, 1: 0.9},  # avg: 0.9
}
filtered = filter_ancestors_by_score(
    ancestors, candidate_scores, min_avg_score=0.6
)
# Returns: {1, 2} (0 filtered out)

Note

Operations exclude ancestors without scores from the result.

Source code in src/gepa_adk/engine/genealogy.py

def filter_ancestors_by_score(
    ancestors: set[int],
    candidate_scores: dict[int, dict[int, float]],
    min_avg_score: float,
) -> set[int]:
    """Filter ancestors by minimum average score constraint.

    Removes ancestors that don't meet the minimum average score requirement,
    ensuring only viable ancestors are considered for merge operations.

    Args:
        ancestors: Set of ancestor candidate indices to filter.
        candidate_scores: Mapping of candidate index to per-example scores.
        min_avg_score: Minimum average score threshold.

    Returns:
        Set of ancestor indices that meet the score constraint.

    Examples:
        Filtering ancestors by score:

        ```python
        ancestors = {0, 1, 2}
        candidate_scores = {
            0: {0: 0.5, 1: 0.5},  # avg: 0.5
            1: {0: 0.7, 1: 0.7},  # avg: 0.7
            2: {0: 0.9, 1: 0.9},  # avg: 0.9
        }
        filtered = filter_ancestors_by_score(
            ancestors, candidate_scores, min_avg_score=0.6
        )
        # Returns: {1, 2} (0 filtered out)
        ```

    Note:
        Operations exclude ancestors without scores from the result.
    """
    from statistics import fmean

    filtered: set[int] = set()

    for ancestor_idx in ancestors:
        scores = candidate_scores.get(ancestor_idx)
        if scores:
            avg_score = fmean(scores.values())
            if avg_score >= min_avg_score:
                filtered.add(ancestor_idx)

    logger.debug(
        "genealogy.ancestors_filtered",
        original_count=len(ancestors),
        filtered_count=len(filtered),
        min_avg_score=min_avg_score,
    )

    return filtered

detect_component_divergence ¶

detect_component_divergence(
    ancestor_components: dict[str, str],
    parent_components: dict[str, str],
) -> set[str]

Detect which components have diverged from ancestor to parent.

Identifies component keys where the parent's value differs from the ancestor's value, indicating where improvements or changes occurred.

PARAMETER	DESCRIPTION
`ancestor_components`	Component dictionary from ancestor candidate. TYPE: `dict[str, str]`
`parent_components`	Component dictionary from parent candidate. TYPE: `dict[str, str]`

RETURNS	DESCRIPTION
`set[str]`	Set of component keys that have diverged (changed values).

Examples:

Detecting divergence:

ancestor = {"instruction": "A", "output_schema": "B"}
parent = {"instruction": "A", "output_schema": "C"}
divergence = detect_component_divergence(ancestor, parent)
# Returns: {"output_schema"}

Note

Only checks components present in the ancestor. Components added by the parent are ignored. Missing components are not considered diverged.

Source code in src/gepa_adk/engine/genealogy.py

def detect_component_divergence(
    ancestor_components: dict[str, str],
    parent_components: dict[str, str],
) -> set[str]:
    """Detect which components have diverged from ancestor to parent.

    Identifies component keys where the parent's value differs from the
    ancestor's value, indicating where improvements or changes occurred.

    Args:
        ancestor_components: Component dictionary from ancestor candidate.
        parent_components: Component dictionary from parent candidate.

    Returns:
        Set of component keys that have diverged (changed values).

    Examples:
        Detecting divergence:

        ```python
        ancestor = {"instruction": "A", "output_schema": "B"}
        parent = {"instruction": "A", "output_schema": "C"}
        divergence = detect_component_divergence(ancestor, parent)
        # Returns: {"output_schema"}
        ```

    Note:
        Only checks components present in the ancestor. Components added by
        the parent are ignored. Missing components are not considered diverged.
    """
    diverged: set[str] = set()

    for key in ancestor_components:
        if key in parent_components:
            if ancestor_components[key] != parent_components[key]:
                diverged.add(key)

    logger.debug(
        "genealogy.component_divergence",
        diverged_components=sorted(diverged),
        total_components=len(ancestor_components),
    )

    return diverged

has_desirable_predictors ¶

has_desirable_predictors(
    ancestor_components: dict[str, str],
    parent1_components: dict[str, str],
    parent2_components: dict[str, str],
) -> bool

Check if merge has desirable complementary component changes.

A merge is desirable when parents have changed different components from the ancestor, indicating complementary improvements that can be combined.

PARAMETER	DESCRIPTION
`ancestor_components`	Component dictionary from common ancestor. TYPE: `dict[str, str]`
`parent1_components`	Component dictionary from first parent. TYPE: `dict[str, str]`
`parent2_components`	Component dictionary from second parent. TYPE: `dict[str, str]`

RETURNS	DESCRIPTION
`bool`	True if parents have complementary component changes, False otherwise.

Examples:

Complementary changes (desirable):

ancestor = {"instruction": "A", "output_schema": "B"}
parent1 = {"instruction": "A", "output_schema": "C"}  # output_schema changed
parent2 = {"instruction": "D", "output_schema": "B"}  # instruction changed
assert has_desirable_predictors(ancestor, parent1, parent2) is True

Overlapping changes (less desirable):

ancestor = {"instruction": "A", "output_schema": "B"}
parent1 = {"instruction": "C", "output_schema": "B"}
parent2 = {"instruction": "C", "output_schema": "B"}  # Same change
assert has_desirable_predictors(ancestor, parent1, parent2) is False

Note

Operations return False if no components have changed, or if both parents changed the same components identically.

Source code in src/gepa_adk/engine/genealogy.py

def has_desirable_predictors(
    ancestor_components: dict[str, str],
    parent1_components: dict[str, str],
    parent2_components: dict[str, str],
) -> bool:
    """Check if merge has desirable complementary component changes.

    A merge is desirable when parents have changed different components
    from the ancestor, indicating complementary improvements that can
    be combined.

    Args:
        ancestor_components: Component dictionary from common ancestor.
        parent1_components: Component dictionary from first parent.
        parent2_components: Component dictionary from second parent.

    Returns:
        True if parents have complementary component changes, False otherwise.

    Examples:
        Complementary changes (desirable):

        ```python
        ancestor = {"instruction": "A", "output_schema": "B"}
        parent1 = {"instruction": "A", "output_schema": "C"}  # output_schema changed
        parent2 = {"instruction": "D", "output_schema": "B"}  # instruction changed
        assert has_desirable_predictors(ancestor, parent1, parent2) is True
        ```

        Overlapping changes (less desirable):

        ```python
        ancestor = {"instruction": "A", "output_schema": "B"}
        parent1 = {"instruction": "C", "output_schema": "B"}
        parent2 = {"instruction": "C", "output_schema": "B"}  # Same change
        assert has_desirable_predictors(ancestor, parent1, parent2) is False
        ```

    Note:
        Operations return False if no components have changed, or if both parents
        changed the same components identically.
    """
    divergence1 = detect_component_divergence(ancestor_components, parent1_components)
    divergence2 = detect_component_divergence(ancestor_components, parent2_components)

    # No divergence means no desirable merge
    if not divergence1 or not divergence2:
        logger.debug(
            "genealogy.no_desirable_predictors",
            reason="no_divergence",
            divergence1=sorted(divergence1),
            divergence2=sorted(divergence2),
        )
        return False

    # Check if divergences are complementary (different components or different values)
    # If they overlap but have different values, that's still desirable
    if divergence1 != divergence2:
        # Different components changed - complementary
        logger.debug(
            "genealogy.desirable_predictors",
            divergence1=sorted(divergence1),
            divergence2=sorted(divergence2),
            reason="complementary_components",
        )
        return True

    # Same components changed - check if values differ
    for component in divergence1:
        if parent1_components.get(component) != parent2_components.get(component):
            # Same component, different values - still desirable
            logger.debug(
                "genealogy.desirable_predictors",
                divergence1=sorted(divergence1),
                divergence2=sorted(divergence2),
                reason="different_values",
            )
            return True

    # Same components, same values - not desirable
    logger.debug(
        "genealogy.no_desirable_predictors",
        reason="identical_changes",
        divergence1=sorted(divergence1),
        divergence2=sorted(divergence2),
    )
    return False