Database Workflow ================= Purpose ------- This note describes how imported legacy data should be queried, edited, pruned, and reconsolidated into reporting outputs as the server moves from filesystem metadata to PostgreSQL. Read model ---------- Use PostgreSQL as the system of record for metadata and workflow state. Read paths should converge on: - ``customers`` and ``billing_profiles`` for reusable operational identity data - ``operators`` for reusable assessor identity - ``jobs`` for current job identity and status - ``job_rounds`` for working review history - ``round_recordings`` and ``round_images`` for uploaded media metadata - ``job_finals`` for archived final and correction snapshots - ``artifacts`` for file references - ``job_events`` for audit/history Write model ----------- Do not edit archived output artifacts directly. Editing rules: - active work updates database rows - uploaded media files remain immutable artifacts on disk - archived final rows are immutable - correction rows may be overwritten as the current correction copy - generated PDFs, DOCX, and GeoJSON are replaced by regenerating them from the current final/correction payload In practice: - metadata is edited in PostgreSQL - artifacts are regenerated or replaced at known paths - the database stores the authoritative payload and references the artifact files Pruning model ------------- Pruning should happen only after a job is archived and the final/correction snapshots are intact. Keep permanently: - ``jobs`` - ``job_finals`` - ``artifacts`` for retained outputs - ``job_events`` - any assignment/auth history needed for audit Prune candidates after archive: - ``job_rounds`` not referenced by the retained final/correction snapshots - ``round_recordings`` and ``round_images`` tied only to pruned rounds - transient manifest/review artifacts tied only to pruned rounds The current schema keeps working rounds so pruning policy can be tested before any deletion logic is automated. Reporting model --------------- The reporting structure should be driven from ``job_finals.payload``. Why: - archived final/correction snapshots are the legal/reporting record - current tests already show that ``job_record.json`` and archived final lineage can disagree - final/correction payloads contain the consolidated form/transcript/report data That means: - reporting exports should read from ``job_finals`` - customer/billing/admin lookup should read from normalized operational tables - round tables support provenance, troubleshooting, and edit history - they are not the long-term reporting source Read-only query tool -------------------- Use: - ``tools/query_imported_jobs.py`` Current query presets: - ``summary`` - ``archived-finals`` - ``round-mismatches`` - ``media-by-job`` - ``pruning-candidates`` - ``report-projection`` - ``normalized-entities`` These queries exist to test the schema against real imported jobs before live runtime code is moved to PostgreSQL.