Legacy Import Validation 2026-03-14

Purpose

This report captures the first PostgreSQL import run against legacy job data. It is a historical validation note for the import path, not a description of the current runtime authority model.

Environment

  • database: traq_demo

  • role: traq_app

  • driver: psycopg 3.2.9

  • importer: tools/import_legacy_jobs.py

Import command used

TRAQ_DATABASE_URL='postgresql+psycopg://traq_app:change-this-password@127.0.0.1:5432/traq_demo' \
uv run python tools/import_legacy_jobs.py --init-schema

Importer result

{
  "jobs": 7,
  "rounds": 18,
  "recordings": 96,
  "images": 14,
  "finals": 8,
  "artifacts": 250,
  "skipped": [
    "<legacy storage root>/jobs/job_1"
  ]
}

Observed filesystem reality

Job directories under the legacy storage root: 9

Breakdown: - 7 jobs with job_record.json - 6 jobs with final.json - 3 jobs with final_correction.json - 1 job directory (job_1) has final.json but no job_record.json

What imported successfully

Jobs

Imported jobs in PostgreSQL:

  • J0001 -> job_2c5b75bdbd35 -> draft

  • J0002 -> job_45936377e1f3 -> review_returned

  • J0003 -> job_ea452ac859b5 -> archived

  • J0004 -> job_e2d40483cd58 -> archived

  • J0005 -> job_b46cf3952931 -> archived

  • J0006 -> job_c9995ac6e368 -> archived

  • J0007 -> job_dd80fca1357a -> archived

Counts

  • jobs: 7

  • archived jobs: 5

  • rounds: 18

  • recordings: 96

  • images: 14

  • finals: 8 - 5 final - 3 correction

Artifacts indexed

Artifact counts by kind:

  • audio: 96

  • transcript_txt: 96

  • image: 28

  • review_json: 18

  • final_json: 8

  • final_pdf: 8

  • report_pdf: 8

  • report_docx: 3

  • geojson: 8

Per-job working media counts

  • J0001 -> recordings 0, images 0

  • J0002 -> recordings 1, images 0

  • J0003 -> recordings 21, images 4

  • J0004 -> recordings 16, images 4

  • J0005 -> recordings 17, images 3

  • J0006 -> recordings 24, images 0

  • J0007 -> recordings 17, images 3

Important findings

1. Final-only legacy job exists

job_1 was skipped because it has final.json but no job_record.json. The importer currently requires job_record.json to seed a jobs row.

Consequence: - the importer is not yet robust for older archived-only job directories - we need a fallback job import path based on final.json alone

2. Round history imports cleanly enough to test the schema

The existing round/section/media structure maps naturally into: - job_rounds - round_recordings - round_images - artifacts

This validates the decision to retain round/media tables even though rounds may later be pruned after archival.

3. Job metadata and archived final metadata can disagree

Two imported jobs show disagreement between job_record.json and the archived final snapshot:

  • J0004 - latest_round_id = round_5 - final snapshot round_id = round_2

  • J0006 - latest_round_id = round_2 - final snapshot round_id = round_15

Consequence:

  • final/correction snapshot data must be treated as authoritative for archived output provenance

  • job_record.json should not be assumed to be the final truth for archived jobs

4. Artifact indexing is already useful

The current artifact model is sufficient to reference: - uploaded recordings - transcript text files - uploaded images - report images - review payloads - final JSON - final PDFs - report PDFs - report DOCX - GeoJSON

This is enough to support further query/report work without changing runtime.

Assessment against initial success conditions

Met

  • schema bootstrap works

  • real jobs import without fatal schema errors

  • archived snapshots are queryable

  • artifact indexing is usable

  • runtime server behavior remains unchanged

Partially met

  • imported counts match filesystem reality

Reason:

  • the importer skipped one legacy archived-only job because it lacked job_record.json

  • the importer still needs a fallback path for final-only job directories