Schema utils

schema_utils ¶

Schema utilities for output schema evolution.

This module provides serialization, validation, and deserialization utilities for evolving Pydantic output schemas as components.

The module enables the evolution engine to: 1. Serialize Pydantic BaseModel classes to Python source text 2. Validate proposed schema mutations for correctness and security 3. Deserialize validated schema text back to usable Pydantic models

ATTRIBUTE	DESCRIPTION
`SCHEMA_NAMESPACE`	Controlled namespace for schema execution containing Pydantic types, built-in types, and typing constructs. TYPE: `dict`
`SchemaValidationResult`	Dataclass containing validation results and metadata about the deserialized schema. TYPE: `class`
`serialize_pydantic_schema`	Convert a Pydantic model class to Python source code text. TYPE: `function`
`validate_schema_text`	Validate schema text and return the deserialized class with metadata. TYPE: `function`
`deserialize_schema`	Convenience wrapper to deserialize schema text directly to a Pydantic class. TYPE: `function`

Examples:

Basic round-trip workflow:

from pydantic import BaseModel
from gepa_adk.utils.schema_utils import (
    serialize_pydantic_schema,
    deserialize_schema,
)


class MySchema(BaseModel):
    name: str
    value: int


# Serialize for evolution
text = serialize_pydantic_schema(MySchema)

# After evolution, deserialize back
EvolvedSchema = deserialize_schema(evolved_text)

SchemaValidationResult `dataclass` ¶

Result of validating schema text.

This dataclass contains both the deserialized Pydantic model class and metadata about the schema structure.

ATTRIBUTE	DESCRIPTION
`schema_class`	The deserialized Pydantic BaseModel subclass. TYPE: `type[BaseModel]`
`class_name`	Name of the class found in the schema text. TYPE: `str`
`field_count`	Number of fields defined in the schema. TYPE: `int`
`field_names`	Tuple of field names in the schema. TYPE: `tuple[str, ...]`

Examples:

Validating schema text and inspecting results:

result = validate_schema_text(schema_text)
print(f"Class: {result.class_name}")
print(f"Fields: {result.field_names}")
instance = result.schema_class(name="test", value=42)

Creating a result directly (typically done by validate_schema_text):

result = SchemaValidationResult(
    schema_class=MySchema,
    class_name="MySchema",
    field_count=2,
    field_names=("name", "value"),
)

Source code in src/gepa_adk/utils/schema_utils.py

@dataclass(frozen=True)
class SchemaValidationResult:
    """Result of validating schema text.

    This dataclass contains both the deserialized Pydantic model class
    and metadata about the schema structure.

    Attributes:
        schema_class (type[BaseModel]): The deserialized Pydantic BaseModel subclass.
        class_name (str): Name of the class found in the schema text.
        field_count (int): Number of fields defined in the schema.
        field_names (tuple[str, ...]): Tuple of field names in the schema.

    Examples:
        Validating schema text and inspecting results:

        ```python
        result = validate_schema_text(schema_text)
        print(f"Class: {result.class_name}")
        print(f"Fields: {result.field_names}")
        instance = result.schema_class(name="test", value=42)
        ```

        Creating a result directly (typically done by validate_schema_text):

        ```python
        result = SchemaValidationResult(
            schema_class=MySchema,
            class_name="MySchema",
            field_count=2,
            field_names=("name", "value"),
        )
        ```
    """

    schema_class: type[BaseModel]
    class_name: str
    field_count: int
    field_names: tuple[str, ...]

serialize_pydantic_schema ¶

serialize_pydantic_schema(
    schema_class: type[BaseModel],
) -> str

Serialize a Pydantic model class to Python source code.

Uses inspect.getsource() to retrieve the original source code definition of the schema class. This preserves Field() constraints, defaults, and docstrings.

PARAMETER	DESCRIPTION
`schema_class`	The Pydantic BaseModel subclass to serialize. TYPE: `type[BaseModel]`

RETURNS	DESCRIPTION
`str`	Python source code string defining the class.

RAISES	DESCRIPTION
`TypeError`	If schema_class is not a BaseModel subclass or is an instance.
`OSError`	If source code cannot be retrieved (e.g., dynamic class).

Examples:

Serialize a schema to source code:

from pydantic import BaseModel, Field
from gepa_adk.utils.schema_utils import serialize_pydantic_schema


class MySchema(BaseModel):
    name: str
    value: int = Field(ge=0)


text = serialize_pydantic_schema(MySchema)
# text contains the Python source code for MySchema

Source code in src/gepa_adk/utils/schema_utils.py

def serialize_pydantic_schema(schema_class: type[BaseModel]) -> str:
    """Serialize a Pydantic model class to Python source code.

    Uses inspect.getsource() to retrieve the original source code definition
    of the schema class. This preserves Field() constraints, defaults, and
    docstrings.

    Args:
        schema_class (type[BaseModel]): The Pydantic BaseModel subclass to serialize.

    Returns:
        Python source code string defining the class.

    Raises:
        TypeError: If schema_class is not a BaseModel subclass or is an instance.
        OSError: If source code cannot be retrieved (e.g., dynamic class).

    Examples:
        Serialize a schema to source code:

        ```python
        from pydantic import BaseModel, Field
        from gepa_adk.utils.schema_utils import serialize_pydantic_schema


        class MySchema(BaseModel):
            name: str
            value: int = Field(ge=0)


        text = serialize_pydantic_schema(MySchema)
        # text contains the Python source code for MySchema
        ```
    """
    # Type validation
    if not isinstance(schema_class, type):
        raise TypeError(
            f"Expected a class, got {type(schema_class).__name__} instance. "
            "Pass the class itself, not an instance."
        )

    if not issubclass(schema_class, BaseModel):
        raise TypeError(
            f"Expected a BaseModel subclass, got {schema_class.__name__}. "
            "Schema class must inherit from pydantic.BaseModel."
        )

    logger.debug(
        "schema.serialize.start",
        class_name=schema_class.__name__,
        field_count=len(schema_class.model_fields),
    )

    try:
        source = inspect.getsource(schema_class)
    except OSError as e:
        logger.warning(
            "schema.serialize.failed",
            class_name=schema_class.__name__,
            error=str(e),
        )
        raise

    logger.debug(
        "schema.serialize.complete",
        class_name=schema_class.__name__,
        source_length=len(source),
    )

    return source

validate_schema_text ¶

validate_schema_text(
    schema_text: str,
    *,
    allowed_namespace: dict[str, Any] | None = None,
) -> SchemaValidationResult

Validate schema text and return the deserialized class.

Performs three-stage validation: 1. Preprocessing: Strip markdown code fences if present 2. Syntax: Parse Python syntax with ast.parse() 3. Structure: Check for security violations and BaseModel inheritance 4. Execution: Execute in controlled namespace and verify class

PARAMETER	DESCRIPTION
`schema_text`	Python source code defining a Pydantic model. May be wrapped in markdown code fences (```python...```). TYPE: `str`

PARAMETER	DESCRIPTION
`allowed_namespace`	Override default namespace. If None, uses SCHEMA_NAMESPACE. TYPE: `dict[str, Any] \| None`

RETURNS	DESCRIPTION
`SchemaValidationResult`	SchemaValidationResult with the deserialized class and metadata.

RAISES	DESCRIPTION
`SchemaValidationError`	If validation fails at any stage.

Examples:

Validate schema text and inspect results:

from gepa_adk.utils.schema_utils import validate_schema_text

schema_text = '''
class MySchema(BaseModel):
    name: str
    value: int
'''

result = validate_schema_text(schema_text)
assert result.class_name == "MySchema"
assert result.field_names == ("name", "value")

Source code in src/gepa_adk/utils/schema_utils.py

def validate_schema_text(
    schema_text: str,
    *,
    allowed_namespace: dict[str, Any] | None = None,
) -> SchemaValidationResult:
    """Validate schema text and return the deserialized class.

    Performs three-stage validation:
    1. **Preprocessing**: Strip markdown code fences if present
    2. **Syntax**: Parse Python syntax with ast.parse()
    3. **Structure**: Check for security violations and BaseModel inheritance
    4. **Execution**: Execute in controlled namespace and verify class

    Args:
        schema_text (str): Python source code defining a Pydantic model.
            May be wrapped in markdown code fences (` ```python...``` `).

    Other Parameters:
        allowed_namespace (dict[str, Any] | None): Override default namespace.
            If None, uses SCHEMA_NAMESPACE.

    Returns:
        SchemaValidationResult with the deserialized class and metadata.

    Raises:
        SchemaValidationError: If validation fails at any stage.

    Examples:
        Validate schema text and inspect results:

        ```python
        from gepa_adk.utils.schema_utils import validate_schema_text

        schema_text = '''
        class MySchema(BaseModel):
            name: str
            value: int
        '''

        result = validate_schema_text(schema_text)
        assert result.class_name == "MySchema"
        assert result.field_names == ("name", "value")
        ```
    """
    # Preprocess: strip markdown fences if present
    schema_text = _strip_markdown_fences(schema_text)

    logger.debug(
        "schema.validate.start",
        text_length=len(schema_text),
    )

    # Stage 1: Syntax validation
    tree = _validate_syntax(schema_text)

    # Stage 2: Structure validation
    class_name = _validate_structure(tree, schema_text)

    # Stage 3: Execution
    namespace = allowed_namespace if allowed_namespace is not None else SCHEMA_NAMESPACE
    schema_class = _execute_schema(schema_text, class_name, namespace)

    # Build result
    field_names = tuple(schema_class.model_fields.keys())
    result = SchemaValidationResult(
        schema_class=schema_class,
        class_name=class_name,
        field_count=len(field_names),
        field_names=field_names,
    )

    logger.debug(
        "schema.validate.complete",
        class_name=class_name,
        field_count=result.field_count,
        field_names=list(field_names),
    )

    return result

deserialize_schema ¶

deserialize_schema(schema_text: str) -> type[BaseModel]

Deserialize validated schema text to a Pydantic model class.

Convenience function that calls validate_schema_text and returns only the schema class. Use this when you only need the class and don't need the validation metadata.

PARAMETER	DESCRIPTION
`schema_text`	Python source code defining a Pydantic model. TYPE: `str`

RETURNS	DESCRIPTION
`type[BaseModel]`	The deserialized Pydantic BaseModel subclass.

RAISES	DESCRIPTION
`SchemaValidationError`	If validation fails.

Examples:

Deserialize schema text and create an instance:

from gepa_adk.utils.schema_utils import deserialize_schema

schema_text = '''
class EvolvedSchema(BaseModel):
    result: str
    confidence: float = Field(ge=0.0, le=1.0)
'''

EvolvedSchema = deserialize_schema(schema_text)
instance = EvolvedSchema(result="success", confidence=0.95)

Source code in src/gepa_adk/utils/schema_utils.py

def deserialize_schema(schema_text: str) -> type[BaseModel]:
    """Deserialize validated schema text to a Pydantic model class.

    Convenience function that calls validate_schema_text and returns only
    the schema class. Use this when you only need the class and don't need
    the validation metadata.

    Args:
        schema_text (str): Python source code defining a Pydantic model.

    Returns:
        The deserialized Pydantic BaseModel subclass.

    Raises:
        SchemaValidationError: If validation fails.

    Examples:
        Deserialize schema text and create an instance:

        ```python
        from gepa_adk.utils.schema_utils import deserialize_schema

        schema_text = '''
        class EvolvedSchema(BaseModel):
            result: str
            confidence: float = Field(ge=0.0, le=1.0)
        '''

        EvolvedSchema = deserialize_schema(schema_text)
        instance = EvolvedSchema(result="success", confidence=0.95)
        ```
    """
    logger.debug(
        "schema.deserialize.start",
        text_length=len(schema_text),
    )

    result = validate_schema_text(schema_text)

    logger.debug(
        "schema.deserialize.complete",
        class_name=result.class_name,
    )

    return result.schema_class

validate_schema_against_constraints ¶

validate_schema_against_constraints(
    proposed_schema: type[BaseModel],
    original_schema: type[BaseModel] | None,
    constraints: "SchemaConstraints",
) -> tuple[bool, list[str]]

Validate proposed schema against constraints.

Checks that the proposed schema satisfies all constraints specified in the SchemaConstraints configuration. Validates: - Required fields exist in the proposed schema - Field types match preserve_types constraints

PARAMETER	DESCRIPTION
`proposed_schema`	The proposed Pydantic BaseModel subclass. TYPE: `type[BaseModel]`
`original_schema`	The original schema (may be None if no schema). TYPE: `type[BaseModel] \| None`
`constraints`	SchemaConstraints with required_fields and preserve_types. TYPE: `'SchemaConstraints'`

RETURNS	DESCRIPTION
`bool`	Tuple of (is_valid, list_of_violation_messages).
`list[str]`	is_valid is True if all constraints are satisfied.

Examples:

Check required fields:

from pydantic import BaseModel
from gepa_adk.domain.types import SchemaConstraints
from gepa_adk.utils.schema_utils import validate_schema_against_constraints


class Proposed(BaseModel):
    score: float


constraints = SchemaConstraints(required_fields=("score",))
is_valid, violations = validate_schema_against_constraints(
    Proposed, original, constraints
)

Check type preservation:

constraints = SchemaConstraints(preserve_types={"score": float})
is_valid, violations = validate_schema_against_constraints(
    Proposed, original, constraints
)

Note

Once original_schema is None, validation is short-circuited because we cannot validate against a missing baseline model. Only constraint checks whose fields exist on the original schema are evaluated; constraint entries targeting missing original fields are skipped and a debug message is logged for each skipped field.

Source code in src/gepa_adk/utils/schema_utils.py

def validate_schema_against_constraints(
    proposed_schema: type[BaseModel],
    original_schema: type[BaseModel] | None,
    constraints: "SchemaConstraints",
) -> tuple[bool, list[str]]:
    """Validate proposed schema against constraints.

    Checks that the proposed schema satisfies all constraints specified
    in the SchemaConstraints configuration. Validates:
    - Required fields exist in the proposed schema
    - Field types match preserve_types constraints

    Args:
        proposed_schema: The proposed Pydantic BaseModel subclass.
        original_schema: The original schema (may be None if no schema).
        constraints: SchemaConstraints with required_fields and preserve_types.

    Returns:
        Tuple of (is_valid, list_of_violation_messages).
        is_valid is True if all constraints are satisfied.

    Examples:
        Check required fields:

        ```python
        from pydantic import BaseModel
        from gepa_adk.domain.types import SchemaConstraints
        from gepa_adk.utils.schema_utils import validate_schema_against_constraints


        class Proposed(BaseModel):
            score: float


        constraints = SchemaConstraints(required_fields=("score",))
        is_valid, violations = validate_schema_against_constraints(
            Proposed, original, constraints
        )
        ```

        Check type preservation:

        ```python
        constraints = SchemaConstraints(preserve_types={"score": float})
        is_valid, violations = validate_schema_against_constraints(
            Proposed, original, constraints
        )
        ```

    Note:
        Once original_schema is None, validation is short-circuited because we cannot
        validate against a missing baseline model. Only constraint checks whose fields
        exist on the original schema are evaluated; constraint entries targeting missing
        original fields are skipped and a debug message is logged for each skipped field.
    """
    violations: list[str] = []

    # If no original schema, skip validation
    if original_schema is None:
        return True, []

    # If no constraints, allow everything
    if not constraints.required_fields and not constraints.preserve_types:
        return True, []

    # Get original field names for reference
    original_fields = set(original_schema.model_fields.keys())
    proposed_fields = set(proposed_schema.model_fields.keys())

    # Check required fields
    for field in constraints.required_fields:
        # Skip if field wasn't in original (can't require what was never there)
        if field not in original_fields:
            logger.debug(
                "schema.constraint.skip_nonexistent",
                field=field,
            )
            continue

        if field not in proposed_fields:
            violations.append(f"Required field '{field}' not found in evolved schema")
            logger.debug(
                "schema.constraint.missing_required",
                field=field,
            )

    # Check type preservation
    for field, allowed_types in constraints.preserve_types.items():
        # Skip if field wasn't in original (can't preserve what was never there)
        if field not in original_fields:
            logger.debug(
                "schema.constraint.skip_nonexistent_type",
                field=field,
            )
            continue

        # Check if field exists in proposed schema
        if field not in proposed_fields:
            violations.append(
                f"Field '{field}' with type constraint not found in evolved schema"
            )
            logger.debug(
                "schema.constraint.missing_typed_field",
                field=field,
            )
            continue

        # Extract and compare types
        proposed_type = _extract_field_type(proposed_schema, field)
        if not _is_type_compatible(proposed_type, allowed_types):
            type_names = (
                f"({', '.join(t.__name__ for t in allowed_types)})"
                if isinstance(allowed_types, tuple)
                else allowed_types.__name__
            )
            violations.append(
                f"Field '{field}' type changed: expected {type_names}, "
                f"got {proposed_type.__name__ if proposed_type else 'None'}"
            )
            logger.debug(
                "schema.constraint.type_mismatch",
                field=field,
                expected=type_names,
                actual=proposed_type.__name__ if proposed_type else "None",
            )

    is_valid = len(violations) == 0

    if not is_valid:
        logger.debug(
            "schema.constraint.validation_failed",
            violation_count=len(violations),
        )

    return is_valid, violations

Schema utils

schema_utils ¶

SchemaValidationResult dataclass ¶

serialize_pydantic_schema ¶

validate_schema_text ¶

deserialize_schema ¶

validate_schema_against_constraints ¶

SchemaValidationResult `dataclass` ¶