PDF to Excel (XLSX) Converter
Extract tables and text from PDF and convert them to Excel XLSX spreadsheets securely in your browser.
Drop PDF file here
or click to select file
Extract tables and text from PDF and convert them to Excel XLSX spreadsheets securely in your browser.
or click to select file
Pulling tabular data out of a PDF and into a spreadsheet is one of the most common document workflows in offices that handle invoices, financial reports, scientific papers, and government data. The PDF format does not natively understand tables — it just describes glyph positions on a page — so converting to Excel requires inferring table structure from the geometry of the text. Where one cell ends and the next begins must be guessed from horizontal whitespace; where one row ends and the next begins, from vertical whitespace.
This tool parses the PDF using PDF.js, extracts text items with their bounding boxes, and clusters the items into rows and columns based on position. The detected table is written to an Excel workbook using the SheetJS xlsx library. The output is a standard .xlsx file that opens in Excel, Google Sheets, Numbers, or any other spreadsheet application.
PDF table extraction is genuinely hard, and no extractor produces perfect results on every PDF. Tables with consistent column boundaries, no merged cells, and clear vertical alignment convert cleanly. Tables with merged cells, multi-line entries, footnotes, or unusual layouts typically need manual cleanup after extraction. Plan for review.
The reason is almost always analysis. Data trapped in a PDF cannot be sorted, filtered, summed, charted, or pivoted. Once it is in Excel, every standard spreadsheet operation becomes available — and that opens up the difference between staring at a static report and actually working with the numbers in it.
Bulk data work is impossible in PDF. Aggregating quarterly figures across multiple PDF reports, comparing line items across vendors, or pulling specific columns for downstream analysis all require getting the data into a format that supports those operations. Excel and CSV are those formats. Conversion is the bridge.
Drop a PDF containing tabular data, get a workbook with each table on its own sheet.
PDF.js exposes a getTextContent API that returns text items with their bounding boxes. Each item has a string, a transform matrix (for position and rotation), and width/height. The converter sorts items by Y-coordinate to identify lines, then within each line by X-coordinate. Items at very similar Y positions form a row.
Column detection uses gap analysis: the X-distance between consecutive items in a row indicates whether they belong to the same cell or adjacent cells. A gap larger than a threshold (typically 1–2 character widths) signals a column boundary. Threshold tuning trades off between merging adjacent columns and splitting single columns.
Excel output uses SheetJS to construct a workbook in memory, with each detected table on its own sheet named Sheet1, Sheet2, etc. The workbook is serialized to .xlsx (Office Open XML) format and offered as a download. The result opens in Excel 2007+, Google Sheets, LibreOffice Calc, and Apple Numbers.