Module: app.extractors.common¶
Shared extractor runtime primitives (schema + Outlines/OpenAI execution).
- Authors:
Roger Erismann (https://hammerdirt.solutions), OpenAI Codex
- Purpose:
Provide common extraction utilities used by all section extractors: - strict model base config (StrictBaseModel) - OpenAI schema normalization helper for Outlines compatibility - prompt assembly + model execution via Outlines/OpenAI
- Dependencies:
outlines for structured generation wrapper
openai SDK for model invocation
pydantic for response-model validation
- class app.extractors.common.StrictBaseModel[source]¶
Bases:
BaseModelBase model for all extractors with strict schema behavior.
Rejects unknown keys (extra=”forbid”).
Applies _enforce_openai_schema before schema export.
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'json_schema_extra': <function _enforce_openai_schema>}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- app.extractors.common.run_outlines_extraction(*, transcript, model_cls, system_path, section_path, logger, model_env='TRAQ_OPENAI_MODEL', default_model='gpt-4o-mini')[source]¶
Run structured extraction for one section transcript.
- Parameters:
transcript (str) – Section transcript content (must be non-empty).
model_cls (Type[T]) – Pydantic model class for structured output.
system_path (Path) – Path to system prompt text.
section_path (Path) – Path to section prompt text.
logger – Logger used for extraction diagnostics.
model_env (str) – Environment variable name containing model id.
default_model (str) – Fallback model id when model_env is unset.
- Returns:
Parsed and validated model_cls instance.
- Raises:
ValueError – Transcript is empty.
RuntimeError – OPENAI_API_KEY is missing.
Exception – Propagates model/parse errors after logging.
- Return type:
T