Document Parser
Batch extract structured fields from PDFs, images, and text documents with annotated visual output and optional comparison against expected values.
Video tutorial coming soon
About This Tool
Document Parser is a Claude Code skill that extracts structured field data from batches of documents—PDFs, images, Word files, and plain text. It produces annotated visual output showing exactly where each value was found in the source document.
Supports two modes: extract a list of fields across documents, or compare extracted values against a user-provided table of expected values for tick-and-tie verification. Extraction agents never see expected values, ensuring unbiased audit results.
Key Features
- Batch extraction across PDFs, images (.png, .jpg), Word (.docx), and text files
- Bounding box annotations on visual documents with interactive annotation viewer
- In-context text highlighting for Word and text documents
- Parallel multi-agent extraction for large document sets
- Optional comparison mode with fuzzy matching (dates, currency, whitespace)
- Confidence scoring for each extracted value
- Subdirectory grouping for organized document batches
Access the Tool
Open source under the MIT License. Free to use, modify, and distribute.