Legacy Import Validation 2026-03-14 =================================== Purpose ------- This report captures the first PostgreSQL import run against legacy job data. It is a historical validation note for the import path, not a description of the current runtime authority model. Environment ----------- - database: ``traq_demo`` - role: ``traq_app`` - driver: ``psycopg 3.2.9`` - importer: ``tools/import_legacy_jobs.py`` Import command used ------------------- :: TRAQ_DATABASE_URL='postgresql+psycopg://traq_app:change-this-password@127.0.0.1:5432/traq_demo' \ uv run python tools/import_legacy_jobs.py --init-schema Importer result --------------- :: { "jobs": 7, "rounds": 18, "recordings": 96, "images": 14, "finals": 8, "artifacts": 250, "skipped": [ "/jobs/job_1" ] } Observed filesystem reality --------------------------- Job directories under the legacy storage root: ``9`` Breakdown: - ``7`` jobs with ``job_record.json`` - ``6`` jobs with ``final.json`` - ``3`` jobs with ``final_correction.json`` - ``1`` job directory (``job_1``) has ``final.json`` but no ``job_record.json`` What imported successfully -------------------------- Jobs ^^^^ Imported jobs in PostgreSQL: - ``J0001`` -> ``job_2c5b75bdbd35`` -> ``draft`` - ``J0002`` -> ``job_45936377e1f3`` -> ``review_returned`` - ``J0003`` -> ``job_ea452ac859b5`` -> ``archived`` - ``J0004`` -> ``job_e2d40483cd58`` -> ``archived`` - ``J0005`` -> ``job_b46cf3952931`` -> ``archived`` - ``J0006`` -> ``job_c9995ac6e368`` -> ``archived`` - ``J0007`` -> ``job_dd80fca1357a`` -> ``archived`` Counts ^^^^^^ - jobs: ``7`` - archived jobs: ``5`` - rounds: ``18`` - recordings: ``96`` - images: ``14`` - finals: ``8`` - ``5`` final - ``3`` correction Artifacts indexed ^^^^^^^^^^^^^^^^^ Artifact counts by kind: - ``audio``: ``96`` - ``transcript_txt``: ``96`` - ``image``: ``28`` - ``review_json``: ``18`` - ``final_json``: ``8`` - ``final_pdf``: ``8`` - ``report_pdf``: ``8`` - ``report_docx``: ``3`` - ``geojson``: ``8`` Per-job working media counts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``J0001`` -> recordings ``0``, images ``0`` - ``J0002`` -> recordings ``1``, images ``0`` - ``J0003`` -> recordings ``21``, images ``4`` - ``J0004`` -> recordings ``16``, images ``4`` - ``J0005`` -> recordings ``17``, images ``3`` - ``J0006`` -> recordings ``24``, images ``0`` - ``J0007`` -> recordings ``17``, images ``3`` Important findings ------------------ 1. Final-only legacy job exists ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``job_1`` was skipped because it has ``final.json`` but no ``job_record.json``. The importer currently requires ``job_record.json`` to seed a ``jobs`` row. Consequence: - the importer is not yet robust for older archived-only job directories - we need a fallback job import path based on ``final.json`` alone 2. Round history imports cleanly enough to test the schema ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The existing round/section/media structure maps naturally into: - ``job_rounds`` - ``round_recordings`` - ``round_images`` - ``artifacts`` This validates the decision to retain round/media tables even though rounds may later be pruned after archival. 3. Job metadata and archived final metadata can disagree ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Two imported jobs show disagreement between ``job_record.json`` and the archived final snapshot: - ``J0004`` - ``latest_round_id`` = ``round_5`` - final snapshot ``round_id`` = ``round_2`` - ``J0006`` - ``latest_round_id`` = ``round_2`` - final snapshot ``round_id`` = ``round_15`` Consequence: - final/correction snapshot data must be treated as authoritative for archived output provenance - ``job_record.json`` should not be assumed to be the final truth for archived jobs 4. Artifact indexing is already useful ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The current artifact model is sufficient to reference: - uploaded recordings - transcript text files - uploaded images - report images - review payloads - final JSON - final PDFs - report PDFs - report DOCX - GeoJSON This is enough to support further query/report work without changing runtime. Assessment against initial success conditions --------------------------------------------- Met ^^^ - schema bootstrap works - real jobs import without fatal schema errors - archived snapshots are queryable - artifact indexing is usable - runtime server behavior remains unchanged Partially met ^^^^^^^^^^^^^ - imported counts match filesystem reality Reason: - the importer skipped one legacy archived-only job because it lacked ``job_record.json`` - the importer still needs a fallback path for final-only job directories Recommended next changes ------------------------ 1. Add fallback import for archived-only jobs with ``final.json`` but no ``job_record.json``. 2. Add a post-import validation script with explicit comparisons to filesystem counts by job. 3. Begin using imported data to design runtime read/write services, starting with device/auth and job metadata.