Skip to content

Schema utils

schema_utils

Schema utilities for output schema evolution.

This module provides serialization, validation, and deserialization utilities for evolving Pydantic output schemas as components.

The module enables the evolution engine to: 1. Serialize Pydantic BaseModel classes to Python source text 2. Validate proposed schema mutations for correctness and security 3. Deserialize validated schema text back to usable Pydantic models

ATTRIBUTE DESCRIPTION
SCHEMA_NAMESPACE

Controlled namespace for schema execution containing Pydantic types, built-in types, and typing constructs.

TYPE: dict

SchemaValidationResult

Dataclass containing validation results and metadata about the deserialized schema.

TYPE: class

serialize_pydantic_schema

Convert a Pydantic model class to Python source code text.

TYPE: function

validate_schema_text

Validate schema text and return the deserialized class with metadata.

TYPE: function

deserialize_schema

Convenience wrapper to deserialize schema text directly to a Pydantic class.

TYPE: function

Examples:

Basic round-trip workflow:

from pydantic import BaseModel
from gepa_adk.utils.schema_utils import (
    serialize_pydantic_schema,
    deserialize_schema,
)


class MySchema(BaseModel):
    name: str
    value: int


# Serialize for evolution
text = serialize_pydantic_schema(MySchema)

# After evolution, deserialize back
EvolvedSchema = deserialize_schema(evolved_text)
See Also
Note

The validation pipeline uses AST parsing before controlled exec() to prevent arbitrary code execution. Import statements and function definitions are explicitly rejected.

SchemaValidationResult dataclass

Result of validating schema text.

This dataclass contains both the deserialized Pydantic model class and metadata about the schema structure.

ATTRIBUTE DESCRIPTION
schema_class

The deserialized Pydantic BaseModel subclass.

TYPE: type[BaseModel]

class_name

Name of the class found in the schema text.

TYPE: str

field_count

Number of fields defined in the schema.

TYPE: int

field_names

Tuple of field names in the schema.

TYPE: tuple[str, ...]

Examples:

Validating schema text and inspecting results:

result = validate_schema_text(schema_text)
print(f"Class: {result.class_name}")
print(f"Fields: {result.field_names}")
instance = result.schema_class(name="test", value=42)

Creating a result directly (typically done by validate_schema_text):

result = SchemaValidationResult(
    schema_class=MySchema,
    class_name="MySchema",
    field_count=2,
    field_names=("name", "value"),
)
Source code in src/gepa_adk/utils/schema_utils.py
@dataclass(frozen=True)
class SchemaValidationResult:
    """Result of validating schema text.

    This dataclass contains both the deserialized Pydantic model class
    and metadata about the schema structure.

    Attributes:
        schema_class (type[BaseModel]): The deserialized Pydantic BaseModel subclass.
        class_name (str): Name of the class found in the schema text.
        field_count (int): Number of fields defined in the schema.
        field_names (tuple[str, ...]): Tuple of field names in the schema.

    Examples:
        Validating schema text and inspecting results:

        ```python
        result = validate_schema_text(schema_text)
        print(f"Class: {result.class_name}")
        print(f"Fields: {result.field_names}")
        instance = result.schema_class(name="test", value=42)
        ```

        Creating a result directly (typically done by validate_schema_text):

        ```python
        result = SchemaValidationResult(
            schema_class=MySchema,
            class_name="MySchema",
            field_count=2,
            field_names=("name", "value"),
        )
        ```
    """

    schema_class: type[BaseModel]
    class_name: str
    field_count: int
    field_names: tuple[str, ...]

serialize_pydantic_schema

serialize_pydantic_schema(
    schema_class: type[BaseModel],
) -> str

Serialize a Pydantic model class to Python source code.

Uses inspect.getsource() to retrieve the original source code definition of the schema class. This preserves Field() constraints, defaults, and docstrings.

PARAMETER DESCRIPTION
schema_class

The Pydantic BaseModel subclass to serialize.

TYPE: type[BaseModel]

RETURNS DESCRIPTION
str

Python source code string defining the class.

RAISES DESCRIPTION
TypeError

If schema_class is not a BaseModel subclass or is an instance.

OSError

If source code cannot be retrieved (e.g., dynamic class).

Examples:

Serialize a schema to source code:

from pydantic import BaseModel, Field
from gepa_adk.utils.schema_utils import serialize_pydantic_schema


class MySchema(BaseModel):
    name: str
    value: int = Field(ge=0)


text = serialize_pydantic_schema(MySchema)
# text contains the Python source code for MySchema
Source code in src/gepa_adk/utils/schema_utils.py
def serialize_pydantic_schema(schema_class: type[BaseModel]) -> str:
    """Serialize a Pydantic model class to Python source code.

    Uses inspect.getsource() to retrieve the original source code definition
    of the schema class. This preserves Field() constraints, defaults, and
    docstrings.

    Args:
        schema_class (type[BaseModel]): The Pydantic BaseModel subclass to serialize.

    Returns:
        Python source code string defining the class.

    Raises:
        TypeError: If schema_class is not a BaseModel subclass or is an instance.
        OSError: If source code cannot be retrieved (e.g., dynamic class).

    Examples:
        Serialize a schema to source code:

        ```python
        from pydantic import BaseModel, Field
        from gepa_adk.utils.schema_utils import serialize_pydantic_schema


        class MySchema(BaseModel):
            name: str
            value: int = Field(ge=0)


        text = serialize_pydantic_schema(MySchema)
        # text contains the Python source code for MySchema
        ```
    """
    # Type validation
    if not isinstance(schema_class, type):
        raise TypeError(
            f"Expected a class, got {type(schema_class).__name__} instance. "
            "Pass the class itself, not an instance."
        )

    if not issubclass(schema_class, BaseModel):
        raise TypeError(
            f"Expected a BaseModel subclass, got {schema_class.__name__}. "
            "Schema class must inherit from pydantic.BaseModel."
        )

    logger.debug(
        "schema.serialize.start",
        class_name=schema_class.__name__,
        field_count=len(schema_class.model_fields),
    )

    try:
        source = inspect.getsource(schema_class)
    except OSError as e:
        logger.warning(
            "schema.serialize.failed",
            class_name=schema_class.__name__,
            error=str(e),
        )
        raise

    logger.debug(
        "schema.serialize.complete",
        class_name=schema_class.__name__,
        source_length=len(source),
    )

    return source

validate_schema_text

validate_schema_text(
    schema_text: str,
    *,
    allowed_namespace: dict[str, Any] | None = None,
) -> SchemaValidationResult

Validate schema text and return the deserialized class.

Performs three-stage validation: 1. Preprocessing: Strip markdown code fences if present 2. Syntax: Parse Python syntax with ast.parse() 3. Structure: Check for security violations and BaseModel inheritance 4. Execution: Execute in controlled namespace and verify class

PARAMETER DESCRIPTION
schema_text

Python source code defining a Pydantic model. May be wrapped in markdown code fences (```python...```).

TYPE: str

PARAMETER DESCRIPTION
allowed_namespace

Override default namespace. If None, uses SCHEMA_NAMESPACE.

TYPE: dict[str, Any] | None

RETURNS DESCRIPTION
SchemaValidationResult

SchemaValidationResult with the deserialized class and metadata.

RAISES DESCRIPTION
SchemaValidationError

If validation fails at any stage.

Examples:

Validate schema text and inspect results:

from gepa_adk.utils.schema_utils import validate_schema_text

schema_text = '''
class MySchema(BaseModel):
    name: str
    value: int
'''

result = validate_schema_text(schema_text)
assert result.class_name == "MySchema"
assert result.field_names == ("name", "value")
Source code in src/gepa_adk/utils/schema_utils.py
def validate_schema_text(
    schema_text: str,
    *,
    allowed_namespace: dict[str, Any] | None = None,
) -> SchemaValidationResult:
    """Validate schema text and return the deserialized class.

    Performs three-stage validation:
    1. **Preprocessing**: Strip markdown code fences if present
    2. **Syntax**: Parse Python syntax with ast.parse()
    3. **Structure**: Check for security violations and BaseModel inheritance
    4. **Execution**: Execute in controlled namespace and verify class

    Args:
        schema_text (str): Python source code defining a Pydantic model.
            May be wrapped in markdown code fences (` ```python...``` `).

    Other Parameters:
        allowed_namespace (dict[str, Any] | None): Override default namespace.
            If None, uses SCHEMA_NAMESPACE.

    Returns:
        SchemaValidationResult with the deserialized class and metadata.

    Raises:
        SchemaValidationError: If validation fails at any stage.

    Examples:
        Validate schema text and inspect results:

        ```python
        from gepa_adk.utils.schema_utils import validate_schema_text

        schema_text = '''
        class MySchema(BaseModel):
            name: str
            value: int
        '''

        result = validate_schema_text(schema_text)
        assert result.class_name == "MySchema"
        assert result.field_names == ("name", "value")
        ```
    """
    # Preprocess: strip markdown fences if present
    schema_text = _strip_markdown_fences(schema_text)

    logger.debug(
        "schema.validate.start",
        text_length=len(schema_text),
    )

    # Stage 1: Syntax validation
    tree = _validate_syntax(schema_text)

    # Stage 2: Structure validation
    class_name = _validate_structure(tree, schema_text)

    # Stage 3: Execution
    namespace = allowed_namespace if allowed_namespace is not None else SCHEMA_NAMESPACE
    schema_class = _execute_schema(schema_text, class_name, namespace)

    # Build result
    field_names = tuple(schema_class.model_fields.keys())
    result = SchemaValidationResult(
        schema_class=schema_class,
        class_name=class_name,
        field_count=len(field_names),
        field_names=field_names,
    )

    logger.debug(
        "schema.validate.complete",
        class_name=class_name,
        field_count=result.field_count,
        field_names=list(field_names),
    )

    return result

deserialize_schema

deserialize_schema(schema_text: str) -> type[BaseModel]

Deserialize validated schema text to a Pydantic model class.

Convenience function that calls validate_schema_text and returns only the schema class. Use this when you only need the class and don't need the validation metadata.

PARAMETER DESCRIPTION
schema_text

Python source code defining a Pydantic model.

TYPE: str

RETURNS DESCRIPTION
type[BaseModel]

The deserialized Pydantic BaseModel subclass.

RAISES DESCRIPTION
SchemaValidationError

If validation fails.

Examples:

Deserialize schema text and create an instance:

from gepa_adk.utils.schema_utils import deserialize_schema

schema_text = '''
class EvolvedSchema(BaseModel):
    result: str
    confidence: float = Field(ge=0.0, le=1.0)
'''

EvolvedSchema = deserialize_schema(schema_text)
instance = EvolvedSchema(result="success", confidence=0.95)
Source code in src/gepa_adk/utils/schema_utils.py
def deserialize_schema(schema_text: str) -> type[BaseModel]:
    """Deserialize validated schema text to a Pydantic model class.

    Convenience function that calls validate_schema_text and returns only
    the schema class. Use this when you only need the class and don't need
    the validation metadata.

    Args:
        schema_text (str): Python source code defining a Pydantic model.

    Returns:
        The deserialized Pydantic BaseModel subclass.

    Raises:
        SchemaValidationError: If validation fails.

    Examples:
        Deserialize schema text and create an instance:

        ```python
        from gepa_adk.utils.schema_utils import deserialize_schema

        schema_text = '''
        class EvolvedSchema(BaseModel):
            result: str
            confidence: float = Field(ge=0.0, le=1.0)
        '''

        EvolvedSchema = deserialize_schema(schema_text)
        instance = EvolvedSchema(result="success", confidence=0.95)
        ```
    """
    logger.debug(
        "schema.deserialize.start",
        text_length=len(schema_text),
    )

    result = validate_schema_text(schema_text)

    logger.debug(
        "schema.deserialize.complete",
        class_name=result.class_name,
    )

    return result.schema_class

validate_schema_against_constraints

validate_schema_against_constraints(
    proposed_schema: type[BaseModel],
    original_schema: type[BaseModel] | None,
    constraints: "SchemaConstraints",
) -> tuple[bool, list[str]]

Validate proposed schema against constraints.

Checks that the proposed schema satisfies all constraints specified in the SchemaConstraints configuration. Validates: - Required fields exist in the proposed schema - Field types match preserve_types constraints

PARAMETER DESCRIPTION
proposed_schema

The proposed Pydantic BaseModel subclass.

TYPE: type[BaseModel]

original_schema

The original schema (may be None if no schema).

TYPE: type[BaseModel] | None

constraints

SchemaConstraints with required_fields and preserve_types.

TYPE: 'SchemaConstraints'

RETURNS DESCRIPTION
bool

Tuple of (is_valid, list_of_violation_messages).

list[str]

is_valid is True if all constraints are satisfied.

Examples:

Check required fields:

from pydantic import BaseModel
from gepa_adk.domain.types import SchemaConstraints
from gepa_adk.utils.schema_utils import validate_schema_against_constraints


class Proposed(BaseModel):
    score: float


constraints = SchemaConstraints(required_fields=("score",))
is_valid, violations = validate_schema_against_constraints(
    Proposed, original, constraints
)

Check type preservation:

constraints = SchemaConstraints(preserve_types={"score": float})
is_valid, violations = validate_schema_against_constraints(
    Proposed, original, constraints
)
Note

Once original_schema is None, validation is short-circuited because we cannot validate against a missing baseline model. Only constraint checks whose fields exist on the original schema are evaluated; constraint entries targeting missing original fields are skipped and a debug message is logged for each skipped field.

Source code in src/gepa_adk/utils/schema_utils.py
def validate_schema_against_constraints(
    proposed_schema: type[BaseModel],
    original_schema: type[BaseModel] | None,
    constraints: "SchemaConstraints",
) -> tuple[bool, list[str]]:
    """Validate proposed schema against constraints.

    Checks that the proposed schema satisfies all constraints specified
    in the SchemaConstraints configuration. Validates:
    - Required fields exist in the proposed schema
    - Field types match preserve_types constraints

    Args:
        proposed_schema: The proposed Pydantic BaseModel subclass.
        original_schema: The original schema (may be None if no schema).
        constraints: SchemaConstraints with required_fields and preserve_types.

    Returns:
        Tuple of (is_valid, list_of_violation_messages).
        is_valid is True if all constraints are satisfied.

    Examples:
        Check required fields:

        ```python
        from pydantic import BaseModel
        from gepa_adk.domain.types import SchemaConstraints
        from gepa_adk.utils.schema_utils import validate_schema_against_constraints


        class Proposed(BaseModel):
            score: float


        constraints = SchemaConstraints(required_fields=("score",))
        is_valid, violations = validate_schema_against_constraints(
            Proposed, original, constraints
        )
        ```

        Check type preservation:

        ```python
        constraints = SchemaConstraints(preserve_types={"score": float})
        is_valid, violations = validate_schema_against_constraints(
            Proposed, original, constraints
        )
        ```

    Note:
        Once original_schema is None, validation is short-circuited because we cannot
        validate against a missing baseline model. Only constraint checks whose fields
        exist on the original schema are evaluated; constraint entries targeting missing
        original fields are skipped and a debug message is logged for each skipped field.
    """
    violations: list[str] = []

    # If no original schema, skip validation
    if original_schema is None:
        return True, []

    # If no constraints, allow everything
    if not constraints.required_fields and not constraints.preserve_types:
        return True, []

    # Get original field names for reference
    original_fields = set(original_schema.model_fields.keys())
    proposed_fields = set(proposed_schema.model_fields.keys())

    # Check required fields
    for field in constraints.required_fields:
        # Skip if field wasn't in original (can't require what was never there)
        if field not in original_fields:
            logger.debug(
                "schema.constraint.skip_nonexistent",
                field=field,
            )
            continue

        if field not in proposed_fields:
            violations.append(f"Required field '{field}' not found in evolved schema")
            logger.debug(
                "schema.constraint.missing_required",
                field=field,
            )

    # Check type preservation
    for field, allowed_types in constraints.preserve_types.items():
        # Skip if field wasn't in original (can't preserve what was never there)
        if field not in original_fields:
            logger.debug(
                "schema.constraint.skip_nonexistent_type",
                field=field,
            )
            continue

        # Check if field exists in proposed schema
        if field not in proposed_fields:
            violations.append(
                f"Field '{field}' with type constraint not found in evolved schema"
            )
            logger.debug(
                "schema.constraint.missing_typed_field",
                field=field,
            )
            continue

        # Extract and compare types
        proposed_type = _extract_field_type(proposed_schema, field)
        if not _is_type_compatible(proposed_type, allowed_types):
            type_names = (
                f"({', '.join(t.__name__ for t in allowed_types)})"
                if isinstance(allowed_types, tuple)
                else allowed_types.__name__
            )
            violations.append(
                f"Field '{field}' type changed: expected {type_names}, "
                f"got {proposed_type.__name__ if proposed_type else 'None'}"
            )
            logger.debug(
                "schema.constraint.type_mismatch",
                field=field,
                expected=type_names,
                actual=proposed_type.__name__ if proposed_type else "None",
            )

    is_valid = len(violations) == 0

    if not is_valid:
        logger.debug(
            "schema.constraint.validation_failed",
            violation_count=len(violations),
        )

    return is_valid, violations