Database Workflow

Purpose

This note describes how imported legacy data should be queried, edited, pruned, and reconsolidated into reporting outputs as the server moves from filesystem metadata to PostgreSQL.

Read model

Use PostgreSQL as the system of record for metadata and workflow state.

Read paths should converge on:

  • customers and billing_profiles for reusable operational identity data

  • operators for reusable assessor identity

  • jobs for current job identity and status

  • job_rounds for working review history

  • round_recordings and round_images for uploaded media metadata

  • job_finals for archived final and correction snapshots

  • artifacts for file references

  • job_events for audit/history

Write model

Do not edit archived output artifacts directly.

Editing rules:

  • active work updates database rows

  • uploaded media files remain immutable artifacts on disk

  • archived final rows are immutable

  • correction rows may be overwritten as the current correction copy

  • generated PDFs, DOCX, and GeoJSON are replaced by regenerating them from the current final/correction payload

In practice:

  • metadata is edited in PostgreSQL

  • artifacts are regenerated or replaced at known paths

  • the database stores the authoritative payload and references the artifact files

Pruning model

Pruning should happen only after a job is archived and the final/correction snapshots are intact.

Keep permanently:

  • jobs

  • job_finals

  • artifacts for retained outputs

  • job_events

  • any assignment/auth history needed for audit

Prune candidates after archive:

  • job_rounds not referenced by the retained final/correction snapshots

  • round_recordings and round_images tied only to pruned rounds

  • transient manifest/review artifacts tied only to pruned rounds

The current schema keeps working rounds so pruning policy can be tested before any deletion logic is automated.

Reporting model

The reporting structure should be driven from job_finals.payload.

Why:

  • archived final/correction snapshots are the legal/reporting record

  • current tests already show that job_record.json and archived final lineage can disagree

  • final/correction payloads contain the consolidated form/transcript/report data

That means:

  • reporting exports should read from job_finals

  • customer/billing/admin lookup should read from normalized operational tables

  • round tables support provenance, troubleshooting, and edit history

  • they are not the long-term reporting source

Read-only query tool

Use:

  • tools/query_imported_jobs.py

Current query presets:

  • summary

  • archived-finals

  • round-mismatches

  • media-by-job

  • pruning-candidates

  • report-projection

  • normalized-entities

These queries exist to test the schema against real imported jobs before live runtime code is moved to PostgreSQL.