Skip to content

Architecture Source Ledger

publication-target: Marius architecture build reference tracked-raw-text: no raw-pdf-tracked: no

Source

  • file: source.pdf
  • original-path: redacted from tracked ledger; see ignored extraction report when local provenance is needed
  • cache-path: .cache/architecture-reference/source-corpus/source.pdf
  • sha256: d7d5b77acaf501ceed0d41f07162d8dc3cdf6b6253a089817678477e3be4bc45
  • size-bytes: 216602954
  • source-pages: 442
  • extracted-pages: 442
  • extracted-at: 2026-06-16T13:38:33.478Z
  • extractor: Bun orchestration over pdftotext, fallback mutool, fallback uvx markitdown unless --no-markitdown, optional OCR with --allow-ocr
  • source-override: pass --source=/path/to/source.pdf or set ARCHITECTURE_REFERENCE_SOURCE_PDF

Registered Reference Seeds

  • source: architecture web corpus
  • url: registered-reference-url
  • status: registered metadata only, not scraped by this generator
  • allowed-use: future curated URL selection, short source pointers after explicit capture
  • excluded-use: blind full-site mirroring, raw article dumps, copied diagram text or images

Corpus Boundary

This repo stores only build-reference notes, topic indexes, corpus pointers, and extraction tooling. The raw PDF, raw page text, OCR images, and intermediate chunk JSON stay under .cache/architecture-reference/source-corpus/, which is ignored by git.

Use this as an architecture build reference: explain patterns in operational words, cite corpus pages for inspection, and keep raw captures out of git.

Extraction Report

  • report: .cache/architecture-reference/source-corpus/extraction-report.json
  • chunks: 18
  • failed-or-partial-ranges: p. 1-25 (partial), p. 301-325 (partial), p. 351-375 (partial), p. 376-400 (partial)