{
  "schemaVersion": 1,
  "library": "markerPDF",
  "estimatedProgress": 99.99,
  "suiteProgress": "cloned-static-inventory-plus-native-runner-verifier-evidence; 2216 PHP behavior tests tracked after the native upstream CI benchmark evidence, tagged table header graph slice, annotation action-reference slice, supplied Texify model-output adapter boundary, tagged-structure inheritance/reference slice, StructTreeRoot multi-page ParentTree ordering slice, compact CMap row recovery slice, tagged table irregular section ordering slice, predefined CMap CID fallback diagnostics slice, Type0/Type3 broader font behavior slice, qpdf encrypted permission fixture slice, encrypted object/xref combination breadth slice, page-level partial extraction diagnostics slice, layout reading-order reconstruction slice, nested tagged-table preservation slice, image-only OCR handoff boundary slice, and embedded TrueType glyph-order width slice; the replayable benchmark_data_short.zip runner/verifier path is mapped with explicit no-Python/no-model exclusions; continuous markerPDF/Pandoc refill watchdog active",
  "phpPass": 3663,
  "phpFail": 0,
  "wordpressScenarios": 2932,
  "phase": "native no-GPU markerPDF scope; embedded TrueType glyph-order width patch rebased after image-only OCR handoff boundary and current markerPDF burn-down slices; continuous markerPDF/Pandoc refill watchdog active",
  "audit": "2026-06-20 UTC plib-tuzwg.10 embedded TrueType glyph-order width slice rebased after plib-tuzwg.14 image-only OCR handoff and current markerPDF burn-down merges: PdfTextExtractor recovers length-bounded and Flate-compressed /FontFile2 TrueType hmtx advance widths, maps CIDFontType2 widths through /CIDToGIDMap glyph order, and maps simple TrueType Encoding Differences through post format 2 glyph-name indexes before text-gap grouping.",
  "currentWork": "MarkerPDF is supervised as a native PHP searchable-PDF parser/converter plus supplied-boundary review pipeline. This isolated patch covers embedded TrueType width recovery for searchable PDFs: /FontFile2 hmtx widths are parsed natively from raw or Flate-compressed font-program streams, CIDFontType2 /CIDToGIDMap streams map CIDs to glyph IDs, and simple TrueType post glyph-name indexes map Encoding Differences to glyph IDs so WordPress/Pandoc text grouping avoids bogus word splits without Python, OCR/models, multiprocessing, PDFium/PIL, JavaScript execution, media execution, or external PDF tools.",
  "blocker": "No local embedded TrueType glyph-order width blocker remains for this slice. Still out of scope: live Surya layout/order/OCR/table model execution, scanned-PDF OCR/model behavior, Python benchmark process execution, pdftext and pypdfium2/PDFium runtime extraction, Torch model batching, tabled-pdf live model execution, Texify/Nougat model execution, Streamlit/FastAPI/Uvicorn server runtimes, media execution, and external OCR/raster/PDF validation tools.",
  "latestCommit": "pending markerpdf truetype embedded glyph-order width current-base slice",
  "nextTask": "After this patch is accepted, continue native no-GPU markerPDF triage with non-overlapping searchable-PDF parser behavior around fonts, CMaps, stream filters, xref repair, metadata, outlines, annotations, forms, page geometry, ExtGState/image/filter metadata, and supplied-boundary table/equation handoffs.",
  "commit": "pending markerpdf truetype embedded glyph-order width current-base slice",
  "latestFocusedSlice": "2026-06-20 UTC embedded TrueType glyph-order width current-base slice: PdfTextExtractor parses raw or Flate-compressed /FontFile2 hmtx widths and applies them through CIDToGIDMap glyph order and post glyph-name indexes before native text-gap grouping.",
  "latestRunAddendum": "2026-06-20 UTC embedded TrueType glyph-order width slice on final base after image-only OCR handoff boundary. Final-base checks: php -l lanes/markerpdf/src/PdfTextExtractor.php passed; focused TrueType gate php tools/run-tests.php lanes/markerpdf/tests/PdfFontTrueTypeEmbeddedWidthGlyphOrderCurrentBaseTest.php => 1 test file / 14 assertions / 0 failures; blocking StructTree integration gate php tools/run-tests.php lanes/markerpdf/tests/PdfStructTreeRootMultiPageOrderingCurrentBaseTest.php => 1 test file / 12 assertions / 0 failures after preserving inherited Table roles through direct tagged-table MCIDs; nested tagged table guard php tools/run-tests.php lanes/markerpdf/tests/PdfNestedTaggedTablePreservationCurrentBaseTest.php => 1 test file / 44 assertions / 0 failures; adjacent font/CMap/width gate => 10 test files / 716 assertions / 0 failures; adjacent StructTree/table/page-review gate => 4 test files / 364 assertions / 0 failures; full markerPDF lane gate php tools/run-tests.php lanes/markerpdf/tests => 1660 test files / 83505 assertions / 0 failures. No Python, OCR/model runtime, multiprocessing, PDFium/PIL, JavaScript execution, media execution, or external PDF tools were invoked.",
  "latestWordPressScenario": "WordPress PDF imports now use embedded TrueType /FontFile2 hmtx widths through CIDToGIDMap glyph order and post glyph-name indexes for searchable text grouping, preventing bogus spaces or collapsed words without JavaScript, Python, OCR/models, multiprocessing, PDFium/PIL, media execution, or external PDF tools.",
  "wordpressScenario": "WordPress PDF imports now cover nested tagged table preservation where a catalog StructTreeRoot top-level Table with a child Table inside a data cell becomes one Gutenberg table block containing an inner table, visible cell text is preserved, matching legacy text blocks are replaced, and custom glyph ActualText sentinels do not leak into rendered output, plus qpdf-derived AES-256 revision-6 encrypted permission fixtures that stop at encrypted preflight with password-readiness diagnostics, model queuing disabled, and no raw fixture content exposure, plus lightweight attachment preflight for page-level /AF associated FileSpecs with page number/object/index metadata and EmbeddedFiles mirror marking without payload-byte exposure, plus encrypted Standard crypt-filter content-role preflight for StmF/StrF/EFF identity, encrypted, and missing filters before import decisions, plus Image XObject CTM placement through q/Q/cm and nested Form XObject /Matrix boundaries for review-only media bboxes without raster payload leakage, plus outline sibling /Last terminal traversal where malformed same-parent /Next decoys after the declared last item are excluded from document metadata, TOC/navigation review, and remote action review while valid next-object references remain review metadata, plus WordPress PDF imports now cover output-folder file-conflict runtime preflight where convert.py os.makedirs(out_folder, exist_ok=True) blocks before metadata_file loading, model handoff, multiprocessing, or external PDF tools, plus WordPress PDF imports now cover the native no-GPU path: classic xref-table /Prev incremental updates with damaged same-generation explicit offsets now select current WordPress page text, XMP/Info/catalog metadata, and EmbeddedFiles attachments before stale previous-section rows, optional-content-hidden Image XObjects remain review-only/uninvoked while visible painted image resources are counted for media review and all raster payload bytes stay out of Gutenberg paragraphs, same-generation xref-stream /Prev incremental updates with damaged explicit offsets now select current WordPress page text, XMP/Info/catalog metadata, and EmbeddedFiles attachments before stale previous-section rows, malformed outline child /Next parent-boundary traversal that prevents duplicate top-level navigation entries while preserving review-only metadata, process_single_pdf preflight return-value review where unsupported filetypes preserve upstream return 0 while existing-output, short-text, and ready conversion branches preserve Python None/null, searchable text, same-generation malformed xref-stream /Index direct-offset repair for current metadata and EmbeddedFiles attachments before stale /Prev rows, generation-exact xref /Prev chain catalog Metadata and EmbeddedFiles name-tree references that exclude mismatched-generation XMP and stale attachments while preserving current Info/catalog metadata, classic xref rebuild selection for EmbeddedFiles name-tree attachments when final startxref points to an older classic table, escaped Link annotation dictionary keys for WordPress span promotion with hidden escaped-flag exclusion, escaped catalog PageLabels name boundaries with nested private decoy exclusion, selected blank pdftext dictionary pages preserving page metadata without empty paragraph leakage, Flate-first then ASCII85 missing-Length stream stack recovery before fake compressed endstream markers, classic xref rebuild recovery bounded before the selected startxref token so post-EOF xref/trailer garbage cannot replace current page text or metadata, runtime file-list preflight showing directories excluded and regular non-PDF sidecars retained as convert.py task candidates, catalog PageLabels indirect /S /P /St operand preview metadata, sparse page-keyed pdftext dictionary layout/order artifact matching, token-aware AcroForm field key parsing with escaped /Fields resolution and literal/nested decoy /V /Kids exclusion, EOF-bounded outline TOC/navigation metadata that excludes stale trailing /Outlines objects after %%EOF, escaped top-level page /Ann#6fts annotation names for link/markup promotion with hidden /Annots review-only, top-level page /Resources null inheritance, CCITTFax/DCT image filter exclusion, malformed CMap fallback, object-stream header repair, Form-resource Image XObject exclusion, outline/structure metadata, page resource inheritance, selected-page supplied layout/order artifact alignment, named-bbox and numeric-string table geometry normalization, quote-operator styled-span advance geometry, relative Td current-font-advance word-gap boundaries, vertical CIDFont /W2 styled-span bbox geometry, Type3 CharProc fallback-payload exclusion, rotated/UserUnit link rectangle promotion, missing-Length and stale declared-Length stream stack recovery, inline image decode boundaries, XMP/current-trailer metadata, encrypted permission and crypt-filter preflight, attachment summaries, native named destinations, PageLabels limits, xref classic/Prev repair, pdftext dictionary_output normalization, current incremental /Prev xref content selection, encrypted converter preflight short-circuiting, AcroForm field review examples, runtime admission review, table/stream/filter boundaries, CMap/font/width behavior, Image XObject exclusion/review, outline/navigation metadata, security/form/annotation review, table/equation supplied-boundary output, and fail-closed metadata for encrypted or model-required documents."
}
