Documents Active

PDF Processor API

Comprehensive PDF processing API powered by PyMuPDF. Extract text in multiple formats, tables, images, and attachments. Merge PDFs, convert documents to PDF, render pages to images, unlock password-protected files, extract XML with bounding boxes, and retrieve metadata.

https://pdf-process.localapi.ro
< 5s Response Time
1/sec Rate Limit
None Auth Required
Free Pricing

Quick Start

Request
# Extract text from PDF
curl -X POST "https://pdf-process.localapi.ro/text" \
  -F "[email protected]"

# Extract tables
curl -X POST "https://pdf-process.localapi.ro/tables?pages=1-3" \
  -F "[email protected]"

# Convert PDF to XML with bounding boxes
curl -X POST "https://pdf-process.localapi.ro/xml?pages=1" \
  -F "[email protected]" --output document.xml
Request
import requests

# Extract text
with open("document.pdf", "rb") as pdf:
    response = requests.post(
        "https://pdf-process.localapi.ro/text",
        files={"file": pdf},
        params={"mode": "text", "pages": "1-5"}
    )

data = response.json()
for page in data["pages"]:
    print(f"Page {page['page']}: {page['content'][:100]}...")

# Merge PDFs
files = [("files", open(f, "rb")) for f in ["doc1.pdf", "doc2.pdf"]]
response = requests.post(
    "https://pdf-process.localapi.ro/merge",
    files=files
)
with open("merged.pdf", "wb") as f:
    f.write(response.content)
Request
// Extract text from PDF
const formData = new FormData();
formData.append("file", pdfFile);

const response = await fetch(
  "https://pdf-process.localapi.ro/text?mode=text&pages=1-5",
  { method: "POST", body: formData }
);

const data = await response.json();
data.pages.forEach(page => {
  console.log(`Page ${page.page}: ${page.content.substring(0, 100)}...`);
});

Response

200 OK application/json
{
  "pages": [
    { "page": 1, "content": "Extracted text from page 1..." },
    { "page": 2, "content": "Extracted text from page 2..." }
  ],
  "mode": "text",
  "page_count": 5
}

Available Endpoints

POST /text

Extract text content from PDF in various formats.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
mode string No Output format: text|blocks|words|html|json|rawdict|xhtml (default: text)
pages string No 1-based page selection, e.g., "1,3-5"
POST /tables

Extract tables as structured data with cell content and bounding boxes.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
pages string No 1-based page selection, e.g., "1,3-5"
POST /images

Extract embedded images from PDF pages.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
pages string No 1-based page selection, e.g., "1,3-5"
include_bytes boolean No Include base64 image data in JSON response
as_zip boolean No Return images as ZIP archive
POST /render

Render PDF pages to PNG or JPEG images.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
pages string No 1-based page selection, e.g., "1,3-5"
dpi integer No Resolution 36-600 (overrides scale)
scale float No Zoom factor 0.1-10.0 (default: 2.0)
fmt string No Output format: png|jpeg (default: png)
as_zip boolean No Return as ZIP for multiple pages (default: true)
POST /merge

Merge multiple PDF files into a single document.

Parameters

Parameter Type Required Description
files file[] Yes PDF files to merge (in order)
POST /convert

Convert documents to PDF. Supports Office formats, images, EPUB, XPS, and more.

Parameters

Parameter Type Required Description
file file Yes Document to convert (doc, docx, xls, xlsx, ppt, pptx, odt, rtf, txt, csv, images, epub, xps, cbz)
POST /unlock

Remove password protection from encrypted PDF.

Parameters

Parameter Type Required Description
file file Yes Password-protected PDF file
password string Yes Password to unlock the PDF
POST /attachments

Extract embedded/attached files from PDF.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
include_bytes boolean No Include base64 file content in JSON
as_zip boolean No Return attachments as ZIP archive
POST /metadata

Get document metadata including page count, author, title, and table of contents.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
POST /xml

Convert PDF to XML format with text bounding boxes using pdftohtml. Limited to 5 pages max.

Parameters

Parameter Type Required Description
file file Yes PDF file to process
pages string No 1-based page selection (required if document has >5 pages, max 5 pages)

Error Codes

Code Description
400 Invalid request - missing file, invalid pages, or exceeds 5-page limit for /xml
401 Invalid password for /unlock endpoint
404 No attachments found in PDF
415 Unsupported media type or format
422 Processing error - corrupted document
429 Rate limit exceeded
500 Internal server error
504 Conversion timeout (120s limit)