Legacy Import Validation 2026-03-14¶

Purpose¶

This report captures the first PostgreSQL import run against legacy job data. It is a historical validation note for the import path, not a description of the current runtime authority model.

Environment¶

database: traq_demo
role: traq_app
driver: psycopg 3.2.9
importer: tools/import_legacy_jobs.py

Import command used¶

TRAQ_DATABASE_URL='postgresql+psycopg://traq_app:change-this-password@127.0.0.1:5432/traq_demo' \
uv run python tools/import_legacy_jobs.py --init-schema

Importer result¶

{
  "jobs": 7,
  "rounds": 18,
  "recordings": 96,
  "images": 14,
  "finals": 8,
  "artifacts": 250,
  "skipped": [
    "<legacy storage root>/jobs/job_1"
  ]
}

Observed filesystem reality¶

Job directories under the legacy storage root: 9

Breakdown: - 7 jobs with job_record.json - 6 jobs with final.json - 3 jobs with final_correction.json - 1 job directory (job_1) has final.json but no job_record.json

What imported successfully¶

Jobs¶

Imported jobs in PostgreSQL:

J0001 -> job_2c5b75bdbd35 -> draft
J0002 -> job_45936377e1f3 -> review_returned
J0003 -> job_ea452ac859b5 -> archived
J0004 -> job_e2d40483cd58 -> archived
J0005 -> job_b46cf3952931 -> archived
J0006 -> job_c9995ac6e368 -> archived
J0007 -> job_dd80fca1357a -> archived

Counts¶

jobs: 7
archived jobs: 5
rounds: 18
recordings: 96
images: 14
finals: 8 - 5 final - 3 correction

Artifacts indexed¶

Artifact counts by kind:

audio: 96
transcript_txt: 96
image: 28
review_json: 18
final_json: 8
final_pdf: 8
report_pdf: 8
report_docx: 3
geojson: 8

Per-job working media counts¶

J0001 -> recordings 0, images 0
J0002 -> recordings 1, images 0
J0003 -> recordings 21, images 4
J0004 -> recordings 16, images 4
J0005 -> recordings 17, images 3
J0006 -> recordings 24, images 0
J0007 -> recordings 17, images 3

Important findings¶

1. Final-only legacy job exists¶

job_1 was skipped because it has final.json but no job_record.json. The importer currently requires job_record.json to seed a jobs row.

Consequence: - the importer is not yet robust for older archived-only job directories - we need a fallback job import path based on final.json alone

2. Round history imports cleanly enough to test the schema¶

The existing round/section/media structure maps naturally into: - job_rounds - round_recordings - round_images - artifacts

This validates the decision to retain round/media tables even though rounds may later be pruned after archival.

3. Job metadata and archived final metadata can disagree¶

Two imported jobs show disagreement between job_record.json and the archived final snapshot:

J0004 - latest_round_id = round_5 - final snapshot round_id = round_2
J0006 - latest_round_id = round_2 - final snapshot round_id = round_15

Consequence:

final/correction snapshot data must be treated as authoritative for archived output provenance
job_record.json should not be assumed to be the final truth for archived jobs

4. Artifact indexing is already useful¶

The current artifact model is sufficient to reference: - uploaded recordings - transcript text files - uploaded images - report images - review payloads - final JSON - final PDFs - report PDFs - report DOCX - GeoJSON

This is enough to support further query/report work without changing runtime.

Assessment against initial success conditions¶

Met¶

schema bootstrap works
real jobs import without fatal schema errors
archived snapshots are queryable
artifact indexing is usable
runtime server behavior remains unchanged

Partially met¶

imported counts match filesystem reality

Reason:

the importer skipped one legacy archived-only job because it lacked job_record.json
the importer still needs a fallback path for final-only job directories

Recommended next changes¶

Add fallback import for archived-only jobs with final.json but no job_record.json.
Add a post-import validation script with explicit comparisons to filesystem counts by job.
Begin using imported data to design runtime read/write services, starting with device/auth and job metadata.