Module: app.extractors.common

Shared extractor runtime primitives (schema + Outlines/OpenAI execution).

Authors:

Roger Erismann (https://hammerdirt.solutions), OpenAI Codex

Purpose:

Provide common extraction utilities used by all section extractors: - strict model base config (StrictBaseModel) - OpenAI schema normalization helper for Outlines compatibility - prompt assembly + model execution via Outlines/OpenAI

Dependencies:
  • outlines for structured generation wrapper

  • openai SDK for model invocation

  • pydantic for response-model validation

class app.extractors.common.StrictBaseModel[source]

Bases: BaseModel

Base model for all extractors with strict schema behavior.

  • Rejects unknown keys (extra=”forbid”).

  • Applies _enforce_openai_schema before schema export.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'json_schema_extra': <function _enforce_openai_schema>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

app.extractors.common.run_outlines_extraction(*, transcript, model_cls, system_path, section_path, logger, model_env='TRAQ_OPENAI_MODEL', default_model='gpt-4o-mini')[source]

Run structured extraction for one section transcript.

Parameters:
  • transcript (str) – Section transcript content (must be non-empty).

  • model_cls (Type[T]) – Pydantic model class for structured output.

  • system_path (Path) – Path to system prompt text.

  • section_path (Path) – Path to section prompt text.

  • logger – Logger used for extraction diagnostics.

  • model_env (str) – Environment variable name containing model id.

  • default_model (str) – Fallback model id when model_env is unset.

Returns:

Parsed and validated model_cls instance.

Raises:
  • ValueError – Transcript is empty.

  • RuntimeErrorOPENAI_API_KEY is missing.

  • Exception – Propagates model/parse errors after logging.

Return type:

T