Legacy Import

Purpose

The legacy importer is the first non-runtime validation step for the PostgreSQL migration. It allows the schema to be exercised against real job data without changing the live server workflow.

Tool

  • tools/import_legacy_jobs.py

Current import scope

The importer reads the legacy job layout and loads it into PostgreSQL:

  • job_record.json

  • rounds/*/manifest.json

  • rounds/*/review.json

  • recording *.meta.json

  • image *.meta.json

  • final.json

  • final_correction.json

  • artifact paths for uploaded media and generated outputs

What this phase is for

This phase is not yet the runtime migration. It is for:

  • validating the schema against real jobs

  • discovering missing columns or weak table boundaries

  • testing import logic for jobs, rounds, finals, and artifact indexing

  • building confidence before replacing any file-backed runtime metadata path

Initial success conditions

The first phase is successful when all of the following are true:

  1. Schema bootstrap works - PostgreSQL tables can be created from the current SQLAlchemy models.

  2. Real jobs import cleanly - legacy jobs under the legacy storage root import without fatal errors.

  3. Imported counts match filesystem reality - job count in PostgreSQL matches imported job directories - round count matches rounds/* directories - recording/image counts match section metadata files - final/correction counts match the archived files on disk

  4. Archived snapshots are queryable

    • final.json and final_correction.json land in the database as retained snapshots.

  5. Artifact indexing is usable

    • uploaded audio, transcript text, uploaded images, report images, PDFs, DOCX, and GeoJSON are represented as artifact path records.

  6. No runtime behavior changes

    • the live server runtime remains independent from the importer while the import path is being developed and validated.

Next phase after success

Once the importer and schema are stable, begin replacing runtime metadata areas incrementally:

  1. device auth and tokens

  2. job metadata and assignments

  3. round metadata and manifests

  4. media metadata

  5. final and correction metadata