Free Converter

PDF to Excel (XLSX) Converter

Extract tables and text from PDF and convert them to Excel XLSX spreadsheets securely in your browser.

Drop PDF file here

or click to select file

Or

About PDF to Excel Conversion

Pulling tabular data out of a PDF and into a spreadsheet is one of the most common document workflows in offices that handle invoices, financial reports, scientific papers, and government data. The PDF format does not natively understand tables — it just describes glyph positions on a page — so converting to Excel requires inferring table structure from the geometry of the text. Where one cell ends and the next begins must be guessed from horizontal whitespace; where one row ends and the next begins, from vertical whitespace.

This tool parses the PDF using PDF.js, extracts text items with their bounding boxes, and clusters the items into rows and columns based on position. The detected table is written to an Excel workbook using the SheetJS xlsx library. The output is a standard .xlsx file that opens in Excel, Google Sheets, Numbers, or any other spreadsheet application.

PDF table extraction is genuinely hard, and no extractor produces perfect results on every PDF. Tables with consistent column boundaries, no merged cells, and clear vertical alignment convert cleanly. Tables with merged cells, multi-line entries, footnotes, or unusual layouts typically need manual cleanup after extraction. Plan for review.

Why Convert PDF to Excel

The reason is almost always analysis. Data trapped in a PDF cannot be sorted, filtered, summed, charted, or pivoted. Once it is in Excel, every standard spreadsheet operation becomes available — and that opens up the difference between staring at a static report and actually working with the numbers in it.

Bulk data work is impossible in PDF. Aggregating quarterly figures across multiple PDF reports, comparing line items across vendors, or pulling specific columns for downstream analysis all require getting the data into a format that supports those operations. Excel and CSV are those formats. Conversion is the bridge.

How to Convert PDF to Excel

Drop a PDF containing tabular data, get a workbook with each table on its own sheet.

  1. Upload your PDF: Drag the file into the upload area or click to browse. Files up to 50 MB are supported. The PDF must contain actual text; scanned PDFs need OCR first.
  2. Wait for table detection: PDF.js extracts text items and their positions. The converter clusters items into rows and columns by analyzing horizontal and vertical alignment. Detection takes seconds for short documents and longer for multi-page tables.
  3. Review detected tables: Detected tables are previewed before download. Confirm the columns and rows match what you expect; misalignments here become Excel cleanup later.
  4. Download as XLSX: The converter writes each detected table to a separate sheet in an .xlsx workbook using SheetJS. Open the result in Excel or Google Sheets and clean up any residual issues.

Common Use Cases

Technical Details

PDF.js exposes a getTextContent API that returns text items with their bounding boxes. Each item has a string, a transform matrix (for position and rotation), and width/height. The converter sorts items by Y-coordinate to identify lines, then within each line by X-coordinate. Items at very similar Y positions form a row.

Column detection uses gap analysis: the X-distance between consecutive items in a row indicates whether they belong to the same cell or adjacent cells. A gap larger than a threshold (typically 1–2 character widths) signals a column boundary. Threshold tuning trades off between merging adjacent columns and splitting single columns.

Excel output uses SheetJS to construct a workbook in memory, with each detected table on its own sheet named Sheet1, Sheet2, etc. The workbook is serialized to .xlsx (Office Open XML) format and offered as a download. The result opens in Excel 2007+, Google Sheets, LibreOffice Calc, and Apple Numbers.

Best Practices

Frequently Asked Questions

Will every table convert correctly?
No. Simple tables with consistent columns and no merged cells convert well. Complex tables with merged cells, multi-line entries, footnotes, or unusual layouts typically need manual cleanup. The converter is a starting point, not a finished product.
Does it work on scanned PDFs?
No. The converter requires actual text in the PDF. Scanned PDFs are images of text and produce no extractable content. Run them through OCR (Tesseract, Adobe Acrobat, ocrmypdf) to add a text layer first.
Can it handle multi-page tables?
Tables that span multiple pages are recognized as separate tables on each page. After conversion, copy the rows from successive sheets into a single sheet to reassemble the multi-page table.
Are merged cells preserved?
Imperfectly. Merged cells in the PDF often appear as single cells with text spanning multiple columns; the converter may interpret these as column-shifted rows. Plan to fix manually.
What output format is produced?
.xlsx (Office Open XML), the modern Excel format. The file opens in Excel 2007+, Google Sheets, LibreOffice Calc, Apple Numbers, and any other modern spreadsheet.
Is my PDF uploaded to a server?
No. Parsing and Excel generation happen in your browser using PDF.js and SheetJS.
What is the maximum file size?
50 MB. Conversion time depends on document complexity rather than file size alone — a graphics-heavy 50 MB PDF may take longer to extract than a text-heavy one.
Why are my numbers in the wrong columns?
Almost always because the converter's column detection threshold did not match the PDF's actual layout. Open the source PDF, look at where columns visually break, and manually shift cells in Excel as needed.