Database Workflow
=================

Purpose
-------

This note describes how imported legacy data should be queried, edited, pruned,
and reconsolidated into reporting outputs as the server moves from filesystem
metadata to PostgreSQL.

Read model
----------

Use PostgreSQL as the system of record for metadata and workflow state.

Read paths should converge on:

- ``customers`` and ``billing_profiles`` for reusable operational identity data
- ``operators`` for reusable assessor identity
- ``jobs`` for current job identity and status
- ``job_rounds`` for working review history
- ``round_recordings`` and ``round_images`` for uploaded media metadata
- ``job_finals`` for archived final and correction snapshots
- ``artifacts`` for file references
- ``job_events`` for audit/history

Write model
-----------

Do not edit archived output artifacts directly.

Editing rules:

- active work updates database rows
- uploaded media files remain immutable artifacts on disk
- archived final rows are immutable
- correction rows may be overwritten as the current correction copy
- generated PDFs, DOCX, and GeoJSON are replaced by regenerating them from the
  current final/correction payload

In practice:

- metadata is edited in PostgreSQL
- artifacts are regenerated or replaced at known paths
- the database stores the authoritative payload and references the artifact files

Pruning model
-------------

Pruning should happen only after a job is archived and the final/correction
snapshots are intact.

Keep permanently:

- ``jobs``
- ``job_finals``
- ``artifacts`` for retained outputs
- ``job_events``
- any assignment/auth history needed for audit

Prune candidates after archive:

- ``job_rounds`` not referenced by the retained final/correction snapshots
- ``round_recordings`` and ``round_images`` tied only to pruned rounds
- transient manifest/review artifacts tied only to pruned rounds

The current schema keeps working rounds so pruning policy can be tested before
any deletion logic is automated.

Reporting model
---------------

The reporting structure should be driven from ``job_finals.payload``.

Why:

- archived final/correction snapshots are the legal/reporting record
- current tests already show that ``job_record.json`` and archived final lineage
  can disagree
- final/correction payloads contain the consolidated form/transcript/report data

That means:

- reporting exports should read from ``job_finals``
- customer/billing/admin lookup should read from normalized operational tables
- round tables support provenance, troubleshooting, and edit history
- they are not the long-term reporting source

Read-only query tool
--------------------

Use:

- ``tools/query_imported_jobs.py``

Current query presets:

- ``summary``
- ``archived-finals``
- ``round-mismatches``
- ``media-by-job``
- ``pruning-candidates``
- ``report-projection``
- ``normalized-entities``

These queries exist to test the schema against real imported jobs before live
runtime code is moved to PostgreSQL.