Document Parser

Batch extract structured fields from PDFs, images, and text documents with annotated visual output and optional comparison against expected values.

Video tutorial coming soon

About This Tool

Document Parser is a Claude Code skill that extracts structured field data from batches of documents—PDFs, images, Word files, and plain text. It produces annotated visual output showing exactly where each value was found in the source document.

Supports two modes: extract a list of fields across documents, or compare extracted values against a user-provided table of expected values for tick-and-tie verification. Extraction agents never see expected values, ensuring unbiased audit results.

Key Features

Batch extraction across PDFs, images (.png, .jpg), Word (.docx), and text files
Bounding box annotations on visual documents with interactive annotation viewer
In-context text highlighting for Word and text documents
Parallel multi-agent extraction for large document sets
Optional comparison mode with fuzzy matching (dates, currency, whitespace)
Confidence scoring for each extracted value
Subdirectory grouping for organized document batches

Access the Tool

Download Claude Code Skill (.skill)

Open source under the MIT License. Free to use, modify, and distribute.