PDF Processor API
Comprehensive PDF processing API powered by PyMuPDF. Extract text in multiple formats, tables, images, and attachments. Merge PDFs, convert documents to PDF, render pages to images, unlock password-protected files, extract XML with bounding boxes, and retrieve metadata.
https://pdf-process.localapi.ro Quick Start
# Extract text from PDF
curl -X POST "https://pdf-process.localapi.ro/text" \
-F "[email protected]"
# Extract tables
curl -X POST "https://pdf-process.localapi.ro/tables?pages=1-3" \
-F "[email protected]"
# Convert PDF to XML with bounding boxes
curl -X POST "https://pdf-process.localapi.ro/xml?pages=1" \
-F "[email protected]" --output document.xml import requests
# Extract text
with open("document.pdf", "rb") as pdf:
response = requests.post(
"https://pdf-process.localapi.ro/text",
files={"file": pdf},
params={"mode": "text", "pages": "1-5"}
)
data = response.json()
for page in data["pages"]:
print(f"Page {page['page']}: {page['content'][:100]}...")
# Merge PDFs
files = [("files", open(f, "rb")) for f in ["doc1.pdf", "doc2.pdf"]]
response = requests.post(
"https://pdf-process.localapi.ro/merge",
files=files
)
with open("merged.pdf", "wb") as f:
f.write(response.content) // Extract text from PDF
const formData = new FormData();
formData.append("file", pdfFile);
const response = await fetch(
"https://pdf-process.localapi.ro/text?mode=text&pages=1-5",
{ method: "POST", body: formData }
);
const data = await response.json();
data.pages.forEach(page => {
console.log(`Page ${page.page}: ${page.content.substring(0, 100)}...`);
}); Response
{
"pages": [
{ "page": 1, "content": "Extracted text from page 1..." },
{ "page": 2, "content": "Extracted text from page 2..." }
],
"mode": "text",
"page_count": 5
} Available Endpoints
/text Extract text content from PDF in various formats.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
mode | string | No | Output format: text|blocks|words|html|json|rawdict|xhtml (default: text) |
pages | string | No | 1-based page selection, e.g., "1,3-5" |
/tables Extract tables as structured data with cell content and bounding boxes.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
pages | string | No | 1-based page selection, e.g., "1,3-5" |
/images Extract embedded images from PDF pages.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
pages | string | No | 1-based page selection, e.g., "1,3-5" |
include_bytes | boolean | No | Include base64 image data in JSON response |
as_zip | boolean | No | Return images as ZIP archive |
/render Render PDF pages to PNG or JPEG images.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
pages | string | No | 1-based page selection, e.g., "1,3-5" |
dpi | integer | No | Resolution 36-600 (overrides scale) |
scale | float | No | Zoom factor 0.1-10.0 (default: 2.0) |
fmt | string | No | Output format: png|jpeg (default: png) |
as_zip | boolean | No | Return as ZIP for multiple pages (default: true) |
/merge Merge multiple PDF files into a single document.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
files | file[] | Yes | PDF files to merge (in order) |
/convert Convert documents to PDF. Supports Office formats, images, EPUB, XPS, and more.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Document to convert (doc, docx, xls, xlsx, ppt, pptx, odt, rtf, txt, csv, images, epub, xps, cbz) |
/unlock Remove password protection from encrypted PDF.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Password-protected PDF file |
password | string | Yes | Password to unlock the PDF |
/attachments Extract embedded/attached files from PDF.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
include_bytes | boolean | No | Include base64 file content in JSON |
as_zip | boolean | No | Return attachments as ZIP archive |
/metadata Get document metadata including page count, author, title, and table of contents.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
/xml Convert PDF to XML format with text bounding boxes using pdftohtml. Limited to 5 pages max.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF file to process |
pages | string | No | 1-based page selection (required if document has >5 pages, max 5 pages) |
Error Codes
| Code | Description |
|---|---|
400 | Invalid request - missing file, invalid pages, or exceeds 5-page limit for /xml |
401 | Invalid password for /unlock endpoint |
404 | No attachments found in PDF |
415 | Unsupported media type or format |
422 | Processing error - corrupted document |
429 | Rate limit exceeded |
500 | Internal server error |
504 | Conversion timeout (120s limit) |