Does it work on scanned PDFs?

No. The converter requires actual text in the PDF. Scanned PDFs are images of text and produce no extractable content. Run them through OCR (Tesseract, Adobe Acrobat, ocrmypdf) to add a text layer first.

Can it handle multi-page tables?

Tables that span multiple pages are recognized as separate tables on each page. After conversion, copy the rows from successive sheets into a single sheet to reassemble the multi-page table.

Are merged cells preserved?

Imperfectly. Merged cells in the PDF often appear as single cells with text spanning multiple columns; the converter may interpret these as column-shifted rows. Plan to fix manually.

What output format is produced?

.xlsx (Office Open XML), the modern Excel format. The file opens in Excel 2007+, Google Sheets, LibreOffice Calc, Apple Numbers, and any other modern spreadsheet.

Is my PDF uploaded to a server?

No. Parsing and Excel generation happen in your browser using PDF.js and SheetJS.

What is the maximum file size?

50 MB. Conversion time depends on document complexity rather than file size alone — a graphics-heavy 50 MB PDF may take longer to extract than a text-heavy one.

Why are my numbers in the wrong columns?

Almost always because the converter's column detection threshold did not match the PDF's actual layout. Open the source PDF, look at where columns visually break, and manually shift cells in Excel as needed.

PDF to Excel (XLSX) Converter

About PDF to Excel Conversion

Pulling tabular data out of a PDF and into a spreadsheet is one of the most common document workflows in offices that handle invoices, financial reports, scientific papers, and government data. The PDF format does not natively understand tables — it just describes glyph positions on a page — so converting to Excel requires inferring table structure from the geometry of the text. Where one cell ends and the next begins must be guessed from horizontal whitespace; where one row ends and the next begins, from vertical whitespace.

This tool parses the PDF using PDF.js, extracts text items with their bounding boxes, and clusters the items into rows and columns based on position. The detected table is written to an Excel workbook using the SheetJS xlsx library. The output is a standard .xlsx file that opens in Excel, Google Sheets, Numbers, or any other spreadsheet application.

PDF table extraction is genuinely hard, and no extractor produces perfect results on every PDF. Tables with consistent column boundaries, no merged cells, and clear vertical alignment convert cleanly. Tables with merged cells, multi-line entries, footnotes, or unusual layouts typically need manual cleanup after extraction. Plan for review.

Why Convert PDF to Excel

The reason is almost always analysis. Data trapped in a PDF cannot be sorted, filtered, summed, charted, or pivoted. Once it is in Excel, every standard spreadsheet operation becomes available — and that opens up the difference between staring at a static report and actually working with the numbers in it.

Bulk data work is impossible in PDF. Aggregating quarterly figures across multiple PDF reports, comparing line items across vendors, or pulling specific columns for downstream analysis all require getting the data into a format that supports those operations. Excel and CSV are those formats. Conversion is the bridge.

How to Convert PDF to Excel

Drop a PDF containing tabular data, get a workbook with each table on its own sheet.

Upload your PDF: Drag the file into the upload area or click to browse. Files up to 50 MB are supported. The PDF must contain actual text; scanned PDFs need OCR first.
Wait for table detection: PDF.js extracts text items and their positions. The converter clusters items into rows and columns by analyzing horizontal and vertical alignment. Detection takes seconds for short documents and longer for multi-page tables.
Review detected tables: Detected tables are previewed before download. Confirm the columns and rows match what you expect; misalignments here become Excel cleanup later.
Download as XLSX: The converter writes each detected table to a separate sheet in an .xlsx workbook using SheetJS. Open the result in Excel or Google Sheets and clean up any residual issues.

Common Use Cases

Extracting financial data from quarterly reports — Public company filings often arrive as PDFs. Pulling tables into Excel makes the figures available for analysis, modeling, and comparison.
Pulling line items from invoices — Invoices in PDF format become tractable for expense categorization, automation, and bookkeeping once the line items are in spreadsheet form.
Aggregating data from multiple report PDFs — Comparing tables across many similarly-structured reports requires getting them all into a common format. Excel is that format.
Preparing PDF tables for further data work — Once in Excel, the data can be exported to CSV for ingestion into databases, BI tools, or scripts.
Migrating historical reports into a database — Organizations digitizing legacy archive material often need to pull tables out of PDF reports as the first step toward database ingestion.

Technical Details

PDF.js exposes a getTextContent API that returns text items with their bounding boxes. Each item has a string, a transform matrix (for position and rotation), and width/height. The converter sorts items by Y-coordinate to identify lines, then within each line by X-coordinate. Items at very similar Y positions form a row.

Column detection uses gap analysis: the X-distance between consecutive items in a row indicates whether they belong to the same cell or adjacent cells. A gap larger than a threshold (typically 1–2 character widths) signals a column boundary. Threshold tuning trades off between merging adjacent columns and splitting single columns.

Excel output uses SheetJS to construct a workbook in memory, with each detected table on its own sheet named Sheet1, Sheet2, etc. The workbook is serialized to .xlsx (Office Open XML) format and offered as a download. The result opens in Excel 2007+, Google Sheets, LibreOffice Calc, and Apple Numbers.

Best Practices

Use clean, text-based PDFs — The converter relies on extractable text. Scanned PDFs need to be OCR'd first; born-digital PDFs (generated from Word, Excel, or financial software) work much better than rasterized scans.
Plan for review — No extractor is perfect. Set aside time after conversion to verify rows and columns, fix merged cells, and confirm numeric values match the source.
Watch for currency formatting — PDFs often display $1,234.56 — the comma is a thousands separator, not a decimal. Excel may misinterpret. Confirm number formats after conversion.
For complex tables, consider Tabula — If extraction quality matters and the PDF is complex, the open-source Tabula desktop tool offers more control over table boundaries than any browser-based converter.

Frequently Asked Questions

Will every table convert correctly?: No. Simple tables with consistent columns and no merged cells convert well. Complex tables with merged cells, multi-line entries, footnotes, or unusual layouts typically need manual cleanup. The converter is a starting point, not a finished product.
Does it work on scanned PDFs?: No. The converter requires actual text in the PDF. Scanned PDFs are images of text and produce no extractable content. Run them through OCR (Tesseract, Adobe Acrobat, ocrmypdf) to add a text layer first.
Can it handle multi-page tables?: Tables that span multiple pages are recognized as separate tables on each page. After conversion, copy the rows from successive sheets into a single sheet to reassemble the multi-page table.
Are merged cells preserved?: Imperfectly. Merged cells in the PDF often appear as single cells with text spanning multiple columns; the converter may interpret these as column-shifted rows. Plan to fix manually.
What output format is produced?: .xlsx (Office Open XML), the modern Excel format. The file opens in Excel 2007+, Google Sheets, LibreOffice Calc, Apple Numbers, and any other modern spreadsheet.
Is my PDF uploaded to a server?: No. Parsing and Excel generation happen in your browser using PDF.js and SheetJS.
What is the maximum file size?: 50 MB. Conversion time depends on document complexity rather than file size alone — a graphics-heavy 50 MB PDF may take longer to extract than a text-heavy one.
Why are my numbers in the wrong columns?: Almost always because the converter's column detection threshold did not match the PDF's actual layout. Open the source PDF, look at where columns visually break, and manually shift cells in Excel as needed.

PDF to Excel (XLSX) Converter

Drop PDF file here

Related Tools

Excel to PDF Converter

PDF to PNG Converter

PNG to PDF Converter

PDF to Word (DOCX)

About PDF to Excel Conversion

Why Convert PDF to Excel

How to Convert PDF to Excel

Common Use Cases

Technical Details

Best Practices

Frequently Asked Questions

Related Articles

Image Format Guide: JPG vs PNG vs WebP vs SVG Explained

The Complete Guide to PDF Conversion: Methods, Tools, and Best Practices

Document Formats Explained: Word, PDF, TXT, and When to Use Each

Audio and Video Formats Explained: MP3, MP4, WAV, WebM, and Beyond

How to Convert Files Online Safely: Privacy and Security Guide

Why Browser-Based Tools Are the Future: No Installs, No Uploads, No Risk