Legacy Import Validation 2026-03-14¶
Purpose¶
This report captures the first PostgreSQL import run against legacy job data. It is a historical validation note for the import path, not a description of the current runtime authority model.
Environment¶
database:
traq_demorole:
traq_appdriver:
psycopg 3.2.9importer:
tools/import_legacy_jobs.py
Import command used¶
TRAQ_DATABASE_URL='postgresql+psycopg://traq_app:change-this-password@127.0.0.1:5432/traq_demo' \
uv run python tools/import_legacy_jobs.py --init-schema
Importer result¶
{
"jobs": 7,
"rounds": 18,
"recordings": 96,
"images": 14,
"finals": 8,
"artifacts": 250,
"skipped": [
"<legacy storage root>/jobs/job_1"
]
}
Observed filesystem reality¶
Job directories under the legacy storage root: 9
Breakdown:
- 7 jobs with job_record.json
- 6 jobs with final.json
- 3 jobs with final_correction.json
- 1 job directory (job_1) has final.json but no job_record.json
What imported successfully¶
Jobs¶
Imported jobs in PostgreSQL:
J0001->job_2c5b75bdbd35->draftJ0002->job_45936377e1f3->review_returnedJ0003->job_ea452ac859b5->archivedJ0004->job_e2d40483cd58->archivedJ0005->job_b46cf3952931->archivedJ0006->job_c9995ac6e368->archivedJ0007->job_dd80fca1357a->archived
Counts¶
jobs:
7archived jobs:
5rounds:
18recordings:
96images:
14finals:
8-5final -3correction
Artifacts indexed¶
Artifact counts by kind:
audio:96transcript_txt:96image:28review_json:18final_json:8final_pdf:8report_pdf:8report_docx:3geojson:8
Per-job working media counts¶
J0001-> recordings0, images0J0002-> recordings1, images0J0003-> recordings21, images4J0004-> recordings16, images4J0005-> recordings17, images3J0006-> recordings24, images0J0007-> recordings17, images3
Important findings¶
1. Final-only legacy job exists¶
job_1 was skipped because it has final.json but no job_record.json.
The importer currently requires job_record.json to seed a jobs row.
Consequence:
- the importer is not yet robust for older archived-only job directories
- we need a fallback job import path based on final.json alone
2. Round history imports cleanly enough to test the schema¶
The existing round/section/media structure maps naturally into:
- job_rounds
- round_recordings
- round_images
- artifacts
This validates the decision to retain round/media tables even though rounds may later be pruned after archival.
3. Job metadata and archived final metadata can disagree¶
Two imported jobs show disagreement between job_record.json and the archived
final snapshot:
J0004-latest_round_id=round_5- final snapshotround_id=round_2J0006-latest_round_id=round_2- final snapshotround_id=round_15
Consequence:
final/correction snapshot data must be treated as authoritative for archived output provenance
job_record.jsonshould not be assumed to be the final truth for archived jobs
4. Artifact indexing is already useful¶
The current artifact model is sufficient to reference: - uploaded recordings - transcript text files - uploaded images - report images - review payloads - final JSON - final PDFs - report PDFs - report DOCX - GeoJSON
This is enough to support further query/report work without changing runtime.
Assessment against initial success conditions¶
Met¶
schema bootstrap works
real jobs import without fatal schema errors
archived snapshots are queryable
artifact indexing is usable
runtime server behavior remains unchanged
Partially met¶
imported counts match filesystem reality
Reason:
the importer skipped one legacy archived-only job because it lacked
job_record.jsonthe importer still needs a fallback path for final-only job directories
Recommended next changes¶
Add fallback import for archived-only jobs with
final.jsonbut nojob_record.json.Add a post-import validation script with explicit comparisons to filesystem counts by job.
Begin using imported data to design runtime read/write services, starting with device/auth and job metadata.