![]() ![]() The top-level pdfplumber.PDF class represents a single PDF and has two main properties: PropertyĪ dictionary of metadata key/value pairs, drawn from the PDF's Info trailers. If that is not intended, pass strict_metadata=True to the open method and pdfplumber.open will raise an exception if it is unable to parse the metadata. Invalid metadata values are treated as a warning by default. Defaults to all available.Ī JSON-formatted string (e.g., '). types Ĭhoices are char, rect, line, curve, image, annot, et cetera. The json format returns more information it includes PDF-level and page-level metadata, plus dictionary-nested attributes.Ī space-delimited, 1-indexed list of pages or hyphenated page ranges. The output will be a CSV containing info about every character, line, and rectangle in the PDF. Table of ContentsĬommand line interface Basic example curl "" > background-checks.pdf ![]() To get a cost estimate, contact Jeremy (for projects of any size or complexity) and/or Samkit (specifically for table extraction). □ This repository’s maintainers are available to hire for PDF data-extraction consulting projects. To ask a question or request assistance with a specific PDF, please use the discussions forum. ![]() Translations of this document are available in: Chinese (by report a bug or request a feature, please file an issue. Built on pdfminer.six.Ĭurrently tested on Python 3.8, 3.9, 3.10, 3.11. Works best on machine-generated, rather than scanned, PDFs. Plus: Table extraction and visual debugging. ![]() Plumb a PDF for detailed information about each text character, rectangle, and line. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |