How to Convert PDF to Text in Python (Free & Easy Guide)
Table of Contents
Install with Pip
pip install Spire.PDF
Related Links
Converting PDF files to editable text is a common need for researchers, analysts, and professionals who deal with large volumes of documents. Manual copying wastes time—Python offers a faster, more flexible solution. In this guide, you’ll learn how to convert PDF to text in Python efficiently, whether you want to keep the layout or extract specific content.

- Why Choose Spire.PDF for PDF to Text
- General Workflow for PDF to Text in Python
- Convert PDF to Text in Python Without Layout
- Convert PDF to Text in Python With Layout
- Convert a Specific PDF Page to Text
- To Wrap Up
- FAQs
Getting Started: Why Choose Spire.PDF for PDF to Text in Python
To convert PDF files to text using Python, you’ll need a reliable PDF processing library. Spire.PDF for Python is a powerful and developer-friendly API that allows you to read, edit, and convert PDF documents in Python applications — no need for Adobe Acrobat or other third-party software.
This library is ideal for automating PDF workflows such as extracting text, adding annotations, or merging and splitting files. It supports a wide range of PDF features and works seamlessly in both desktop and server environments. You can donwload it to install mannually or quickly install Spire.PDF via PyPI using the following command:
pip install Spire.PDF
For smaller or personal projects, a free version is available with basic functionality. If you need advanced features such as PDF signing or form filling, you can upgrade to the commercial edition at any time.
General Workflow for PDF to Text in Python
Converting a PDF to text becomes simple and efficient with the help of Spire.PDF for Python. You can easily complete the task by reusing the sample code provided in the following sections and customizing it to fit your needs. But before diving into the code, let’s take a quick look at the general workflow behind this process.
- Create an object of PdfDocument class and load a PDF file using LoadFromFile() method.
- Create an object of PdfTextExtractOptions class and set the text extracting options, including extracting all text, showing hidden text, only extracting text in a specified area, and simple extraction.
- Get a page in the document using PdfDocument.Pages.get_Item() method and create PdfTextExtractor objects based on each page to extract the text from the page using Extract() method with specified options.
- Save the extracted text as a text file and close the object.
How to Convert PDF to Text in Python Without Layout
If you only need the plain text content from a PDF and don’t care about preserving the original layout, you can use a simple method to extract text. This approach is faster and easier, especially when working with scanned documents or large batches of files. In this section, we’ll show you how to convert PDF to text in Python without preserving the layout.
To extract text without preserving layout, follow these simplified steps:
- Create an instance of PdfDocument and load the PDF file.
- Create a PdfTextExtractOptions object and configure the text extraction options.
- Set IsSimpleExtraction = True to ignore the layout and extract raw text.
- Loop through all pages of the PDF.
- Extract text from each page and write it to a .txt file.
from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Create a string object to store the text
extracted_text = ""
# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Set to use simple extraction method
extract_options.IsSimpleExtraction = True
# Loop through the pages in the document
for i in range(pdf.Pages.Count):
# Get a page
page = pdf.Pages.get_Item(i)
# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)
# Extract the text from the page
text = text_extractor.ExtractText(extract_options)
# Add the extracted text to the string object
extracted_text += text
# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
file.write(extracted_text)
pdf.Close()

How to Convert PDF to Text in Python With Layout
To convert PDF to text in Python with layout, Spire.PDF preserves formatting like tables and paragraphs by default. The steps are similar to the general overview, but you still need to loop through each page for full-text extraction.
from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Create a string object to store the text
extracted_text = ""
# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Loop through the pages in the document
for i in range(pdf.Pages.Count):
# Get a page
page = pdf.Pages.get_Item(i)
# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)
# Extract the text from the page
text = text_extractor.ExtractText(extract_options)
# Add the extracted text to the string object
extracted_text += text
# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
file.write(extracted_text)
pdf.Close()

Convert a Specific PDF Page to Text in Python
Need to extract text from only one page of a PDF instead of the entire document? With Spire.PDF, the PDF to Text converter in Python, you can easily target and convert a specific PDF page to text. The steps are the same as shown in the general overview. If you're already familiar with them, just copy the code below into any Python editor and automate your PDF to text conversion!
from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
from spire.pdf import RectangleF
# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Set to extract specific page area
extract_options.ExtractArea = RectangleF(50.0, 220.0, 700.0, 230.0)
# Get a page
page = pdf.Pages.get_Item(0)
# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)
# Extract the text from the page
extracted_text = text_extractor.ExtractText(extract_options)
# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
file.write(extracted_text)
pdf.Close()

To Wrap Up
In this post, we covered how to convert PDF to text using Python and Spire.PDF, with clear steps and code examples for fast, efficient conversion. We also highlighted the benefits and pointed to OCR tools for image-based PDFs. For any issues or support, feel free to contact us.
FAQs about Converting PDF to Text
Q1: How do I convert a PDF to readable and editable text in Python?
A: You can convert a PDF to text in Python using the Spire.PDF library. It allows you to extract text from PDF files while optionally keeping the original layout. You don’t need Adobe Acrobat, and both visible and image-based PDFs are supported.
Q2: Is there a free tool to convert PDF to text?
A: Yes. Spire.PDF for Python provides a free edition that allows you to convert PDF to text without relying on Adobe Acrobat or other software. Online tools are also available, but they’re more suitable for occasional use or small files.
Q3: Can Python extract data from PDF? A: Yes, Python can extract data from PDF files. Using Spire.PDF, you can easily extract not only text but also other elements such as images, annotations, bookmarks, and even attachments. This makes it a versatile tool for working with PDF content in Python.
SEE ALSO:
How to Merge Excel Files in Python (.xls & .xlsx) - No Excel Required

Merging Excel files is a common task for data analysts and financial teams working with large datasets. While Microsoft Excel supports manual merging, it becomes inefficient and error-prone when dealing with large volumes of files.
In this step-by-step guide, you will learn how to merge multiple Excel files (.xls and .xlsx) using Python and Spire.XLS for Python library. Whether you're combining workbooks, merging worksheets, or automating bulk Excel file processing, this guide will help you save time and streamline your workflow with practical solutions.
Table of Contents
- Why Merge Excel Files with Python?
- Getting Started with Spire.XLS for Python
- How to Merge Multiple Excel Files into One Workbook using Python
- How to Combine Multiple Excel Worksheets into a Single Worksheet using Python
- Conclusion
- FAQs: Merge Excel Files with Python
Why Merge Excel Files with Python?
Using Python to merge Excel files brings several key advantages:
- Automation: Save time and eliminate repetitive manual work by automating the merging process.
- No Excel Dependency: Merge files without installing Microsoft Excel—ideal for headless, server-side, or cloud environments.
- Flexible Merging: Customize merging by selecting specific sheets, ranges, columns, or rows.
- Scalability: Handle hundreds or even thousands of Excel files with consistent performance.
- Error Reduction: Reduce manual errors and ensure data accuracy with automated scripts.
Whether you’re consolidating monthly reports or merging large datasets, Python helps streamline the process efficiently.
Getting Started with Spire.XLS for Python
Spire.XLS for Python is a standalone library that allows developers to create, read, edit, and save Excel files without the need for Microsoft Excel installation.
Key Features Include:
- Supports Multiple Formats: .xls, .xlsx, and more.
- Worksheet Operations: Copy, rename, delete, and merge worksheets seamlessly across workbooks.
- Formula & Formatting Preservation: Retain formulas and formatting during editing or merging.
- Advanced Features: Includes chart creation, conditional formatting, pivot tables, and more.
- File Conversion: Convert Excel files to PDF, HTML, CSV, and more.
Installation
Run the following pip command in your terminal or command prompt to install Spire.XLS from PyPI:
pip install spire.xls
How to Merge Multiple Excel Files into One Workbook using Python
When working with multiple Excel files, consolidating all worksheets into a single workbook can simplify data management and reporting. This approach preserves each original worksheet separately, making it easy to organize and review data from different sources such as department budgets, regional reports, or monthly summaries.
Steps
To merge multiple Excel files into a single workbook using Python, follow these steps:
- Loop through the files.
- Load each Excel file using LoadFromFile().
- For the first file, assign it as the base workbook.
- For subsequent files, copy all worksheets into the base workbook using AddCopy().
- Save the final combined workbook to a new file.
Code Example
import os
from spire.xls import *
# Folder containing Excel files to merge
input_folder = './sample_files'
# Output file name for the merged workbook
output_file = 'merged_workbook.xlsx'
# Initialize merged workbook as None
merged_workbook = None
# Iterate over all files in the input folder
for filename in os.listdir(input_folder):
# Process only Excel files with .xls or .xlsx extensions
if filename.endswith('.xlsx') or filename.endswith('.xls'):
file_path = os.path.join(input_folder, filename)
# Load the current Excel file into a Workbook object
source_workbook = Workbook()
source_workbook.LoadFromFile(file_path)
if merged_workbook is None:
# For the first file, assign it as the base merged workbook
merged_workbook = source_workbook
else:
# For subsequent files, copy each worksheet into the merged workbook
for i in range(source_workbook.Worksheets.Count):
sheet = source_workbook.Worksheets.get_Item(i)
merged_workbook.Worksheets.AddCopy(sheet, WorksheetCopyType.CopyAll)
# Save the combined workbook to the specified output file
merged_workbook.SaveToFile(output_file, ExcelVersion.Version2016)

How to Combine Multiple Excel Worksheets into a Single Worksheet using Python
Merging data from multiple Excel worksheets into one worksheet allows you to aggregate information efficiently, especially when working with data such as sales logs, survey responses, or performance reports.
Steps
To combine worksheet data from multiple Excel files into a single worksheet using Python, follow these steps:
- Create a new workbook and select its first worksheet as the destination.
- Loop through the files.
- Load each Excel file using LoadFromFile().
- Get the desired worksheet that you want to merge from the current file.
- Copy the used cell range from the desired worksheet to the destination worksheet, placing data consecutively below the previously copied content.
- Save the combined data into a new Excel file.
Code Example
import os
from spire.xls import *
# Folder containing Excel files to merge
input_folder = './excel_worksheets'
# Output file name for the merged workbook
output_file = 'merged_into_one_sheet.xlsx'
# Create a new workbook to hold merged data
merged_workbook = Workbook()
# Use the first worksheet in the new workbook as the merge target
merged_sheet = merged_workbook.Worksheets[0]
# Initialize the starting row for copying data
current_row = 1
# Loop through all files in the input folder
for filename in os.listdir(input_folder):
# Process only Excel files (.xls or .xlsx)
if filename.endswith('.xlsx') or filename.endswith('.xls'):
file_path = os.path.join(input_folder, filename)
# Load the current Excel file
workbook = Workbook()
workbook.LoadFromFile(file_path)
# Get the first worksheet from the current workbook
sheet = workbook.Worksheets[0]
# Get the used range from the first worksheet
source_range = sheet.Range
# Set the destination range in the merged worksheet starting at current_row
dest_range = merged_sheet.Range[current_row, 1]
# Copy data from the used range to the destination range
source_range.Copy(dest_range)
# Update current_row to the row after the last copied row to prevent overlap
current_row += sheet.LastRow
# Save the merged workbook to the specified output file in Excel 2016 format
merged_workbook.SaveToFile(output_file, ExcelVersion.Version2016)

Conclusion
When merging multiple Excel files into a single document—whether by appending sheets or combining data row by row—using a Python library like Spire.XLS enables automation and improves accuracy. This approach can help streamline workflows, especially in enterprise scenarios that require handling large datasets without relying on Microsoft Excel.
FAQs: Merge Excel Files with Python
Q1: Can I merge .xls and .xlsx files together?
A1: Yes. Spire.XLS handles both formats without needing conversion.
Q2: Do I need Excel installed on my machine to use Spire.XLS?
A2: No. Spire.XLS is standalone and works without Microsoft Office installed.
Q3: Can I merge only specific sheets from each workbook?
A3: Yes. You can customize your code to merge sheets by name or index. For example:
sheet = source_workbook.Worksheets["Summary"]
Q4: How do I avoid copying header rows multiple times?
A4: Add logic like:
if current_row > 1:
start_row = 2 # Skip header
else:
start_row = 1
Q5: Can I keep track of which file each row came from?
A5: Yes. Add a new column in the merged sheet containing the source file name for each row.
Q6: Is there a file size or row limit when using Spire.XLS?
A6: Spire.XLS follows the same row and column limits as Excel: .xlsx supports up to 1,048,576 rows × 16,384 columns, and .xls supports up to 65,536 rows × 256 columns.
Q7: Can I preserve formulas and formatting while merging?
A7: Yes. When merging Excel files, formatting and formulas are preserved.
Adding Watermarks to PDF Files Using Python

Watermarking is a critical technique for securing documents, indicating ownership, and preventing unauthorized copying. Whether you're distributing drafts or branding final deliverables, applying watermarks helps protect your content effectively. In this tutorial, you’ll learn how to add watermarks to a PDF in Python using the powerful and easy-to-use Spire.PDF for Python library.
We'll walk through how to insert both text and image watermarks , handle transparency and positioning, and resolve common issues — all with clean, well-documented code examples.
Table of Contents:
- Python Library for Watermarking PDFs
- Adding a Text Watermark to a PDF
- Adding an Image Watermark to a PDF
- Troubleshooting Common Issues
- Wrapping Up
- FAQs
Python Library for Watermarking PDFs
Spire.PDF for Python is a robust library that provides comprehensive PDF manipulation capabilities. For watermarking specifically, it offers:
- High precision in watermark placement and rotation.
- Flexible transparency controls.
- Support for both text and image watermarks.
- Ability to apply watermarks to specific pages or entire documents.
- Preservation of original PDF quality.
Before proceeding, ensure you have Spire.PDF installed in your Python environment:
pip install spire.pdf
Adding a Text Watermark to a PDF
This code snippet demonstrates how to add a diagonal "DO NOT COPY" watermark to each page of a PDF file. It manages the size, color, positioning, rotation, and transparency of the watermark for a professional result.
from spire.pdf import *
from spire.pdf.common import *
import math
# Create an object of PdfDocument class
doc = PdfDocument()
# Load a PDF document from the specified path
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")
# Create an object of PdfTrueTypeFont class for the watermark font
font = PdfTrueTypeFont("Times New Roman", 48.0, 0, True)
# Specify the watermark text
text = "DO NOT COPY"
# Measure the dimensions of the text to ensure proper positioning
text_width = font.MeasureString(text).Width
text_height = font.MeasureString(text).Height
# Loop through each page in the document
for i in range(doc.Pages.Count):
# Get the current page
page = doc.Pages.get_Item(i)
# Save the current canvas state
state = page.Canvas.Save()
# Calculate the center coordinates of the page
x = page.Canvas.Size.Width / 2
y = page.Canvas.Size.Height / 2
# Translate the coodinate system to the center so that the center of the page becomes the origin (0, 0)
page.Canvas.TranslateTransform(x, y)
# Rotate the canvas 45 degrees counterclockwise for the watermark
page.Canvas.RotateTransform(-45.0)
# Set the transparency of the watermark
page.Canvas.SetTransparency(0.7)
# Draw the watermark text at the centered position using negative offsets
page.Canvas.DrawString(text, font, PdfBrushes.get_Blue(), PointF(-text_width / 2, -text_height / 2))
# Restore the canvas state to prevent transformations from affecting subsequent drawings
page.Canvas.Restore(state)
# Save the modified document to a new PDF file
doc.SaveToFile("output/TextWatermark.pdf")
# Dispose resources
doc.Dispose()
Breakdown of the Code :
- Load the PDF Document : The script loads an input PDF file from a specified path using the PdfDocument class.
- Configure Watermark Text : A watermark text ("DO NOT COPY") is set with a specific font (Times New Roman, 48pt) and measured for accurate positioning.
- Apply Transformations : For each page, the script:
- Centers the coordinate system.
- Rotates the canvas by 45 degrees counterclockwise.
- Sets transparency (70%) for the watermark.
- Draw the Watermark : The text is drawn at (-text_width / 2, -text_height / 2), which aligns the text perfectly around the center point of the page, regardless of the rotation applied.
- Save the Document : The modified document is saved to a new PDF file.
Output:

Adding an Image Watermark to a PDF
This code snippet adds a semi-transparent image watermark to each page of a PDF, ensuring proper positioning and a professional appearance.
from spire.pdf import *
from spire.pdf.common import *
# Create an object of PdfDocument class
doc = PdfDocument()
# Load a PDF document from the specified path
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")
# Load the watermark image from the specified path
image = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\logo.png")
# Get the width and height of the loaded image for positioning
imageWidth = float(image.Width)
imageHeight = float(image.Height)
# Loop through each page in the document to apply the watermark
for i in range(doc.Pages.Count):
# Get the current page
page = doc.Pages.get_Item(i)
# Set the transparency of the watermark to 50%
page.Canvas.SetTransparency(0.5)
# Get the dimensions of the current page
pageWidth = page.ActualSize.Width
pageHeight = page.ActualSize.Height
# Calculate the x and y coordinates to center the image on the page
x = (pageWidth - imageWidth) / 2
y = (pageHeight - imageHeight) / 2
# Draw the image at the calculated center position on the page
page.Canvas.DrawImage(image, x, y, imageWidth, imageHeight)
# Save the modified document to a new PDF file
doc.SaveToFile("output/ImageWatermark.pdf")
# Dispose resources
doc.Dispose()
Breakdown of the Code :
- Load the PDF Document : The script loads an input PDFfile from a specified path using the PdfDocument class.
- Configure Watermark Image : The watermark image is loaded from a specified path, and its dimensions are retrieved for accurate positioning.
- Apply Transformations : For each page, the script:
- Sets the watermark transparency (50%).
- Calculates the center position of the page for the watermark.
- Draw the Watermark : The image is drawn at the calculated center coordinates, ensuring it is centered on each page.
- Save the Document : The modified document is saved to a new PDF file.
Output:

Apart from watermarks, you can also add stamps to PDFs. Unlike watermarks, which are fixed in place, stamps can be freely moved or deleted, offering greater flexibility in document annotation.
Troubleshooting Common Issues
- Watermark Not Appearing :
- Verify file paths are correct.
- Check transparency isn't set to 0 (fully transparent).
- Ensure coordinates place the watermark within page bounds.
- Quality Issues :
- For text, use higher-quality fonts.
- For images, ensure adequate resolution.
- Rotation Problems :
- Remember that rotation occurs around the current origin point.
- The order of transformations matters (translate then rotate).
Wrapping Up
With Spire.PDF for Python, adding watermarks to PDF documents becomes a simple and powerful process. Whether you need bold "Confidential" text across every page or subtle branding with logos, the library handles it all efficiently. By combining coordinate transformations, transparency settings, and drawing commands, you can create highly customized watermarking workflows tailored to your document's purpose.
FAQs
Q1. Can I add both text and image watermarks to the same PDF?
Yes, you can combine both approaches in a single loop over the PDF pages.
Q2. How can I rotate image watermarks?
Use Canvas.RotateTransform(angle) before drawing the image, similar to the text watermark example.
Q3. Does Spire.PDF support transparent PNGs for watermarks?
Yes, Spire.PDF preserves the transparency of PNG images when used as watermarks.
Q4. Can I apply different watermarks to different pages?
Absolutely. You can implement conditional logic within your page loop to apply different watermarks based on page number or other criteria.
Get a Free License
To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.
C++: Create Tables in Word Documents
A table is a powerful tool for organizing and presenting data. It arranges data into rows and columns, making it easier for authors to illustrate the relationships between different data categories and for readers to understand and analyze complex data. In this article, you will learn how to programmatically create tables in Word documents in C++ using Spire.Doc for C++.
Install Spire.Doc for C++
There are two ways to integrate Spire.Doc for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.
Integrate Spire.Doc for C++ in a C++ Application
Create a Table in Word in C++
Spire.Doc for C++ offers the Section->AddTable() method to add a table to a section of a Word document. The detailed steps are as follows:
- Initialize an instance of the Document class.
- Add a section to the document using Document->AddSection() method.
- Define the data for the header row and remaining rows, storing them in a one-dimensional vector and a two-dimensional vector respectively.
- Add a table to the section using Section->AddTable() method.
- Specify the number of rows and columns in the table using Table->ResetCells(int, int) method.
- Add data in the one-dimensional vector to the header row and set formatting.
- Add data in the two-dimensional vector to the remaining rows and set formatting.
- Save the result document using Document->SaveToFile() method.
- C++
#include "Spire.Doc.o.h"
using namespace Spire::Doc;
using namespace std;
int main()
{
//Initialize an instance of the Document class
intrusive_ptr<Document> doc = new Document();
//Add a section to the document
intrusive_ptr<Section> section = doc->AddSection();
//Set page margins for the section
section->GetPageSetup()->GetMargins()->SetAll(72);
//Define the data for the header row
vector<wstring> header = { L"Name", L"Capital", L"Continent", L"Area", L"Population" };
//Define the data for the remaining rows
vector<vector<wstring>> data =
{
{L"Argentina", L"Buenos Aires", L"South America", L"2777815", L"32300003"},
{L"Bolivia", L"La Paz", L"South America", L"1098575", L"7300000"},
{L"Brazil", L"Brasilia", L"South America", L"8511196", L"150400000"},
{L"Canada", L"Ottawa", L"North America", L"9976147", L"26500000"},
{L"Chile", L"Santiago", L"South America", L"756943", L"13200000"},
{L"Colombia", L"Bogota", L"South America", L"1138907", L"33000000"},
{L"Cuba", L"Havana", L"North America", L"114524", L"10600000"},
{L"Ecuador", L"Quito", L"South America", L"455502", L"10600000"},
{L"El Salvador", L"San Salvador", L"North America", L"20865", L"5300000"},
{L"Guyana", L"Georgetown", L"South America", L"214969", L"800000"},
{L"Jamaica", L"Kingston", L"North America", L"11424", L"2500000"},
{L"Mexico", L"Mexico City", L"North America", L"1967180", L"88600000"},
{L"Nicaragua", L"Managua", L"North America", L"139000", L"3900000"},
{L"Paraguay", L"Asuncion", L"South America", L"406576", L"4660000"},
{L"Peru", L"Lima", L"South America", L"1285215", L"21600000"},
{L"United States", L"Washington", L"North America", L"9363130", L"249200000"},
{L"Uruguay", L"Montevideo", L"South America", L"176140", L"3002000"},
{L"Venezuela", L"Caracas", L"South America", L"912047", L"19700000"}
};
//Add a table to the section
intrusive_ptr<Table> table = section->AddTable(true);
//Specify the number of rows and columns for the table
table->ResetCells(data.size() + 1, header.size());
//Set the first row as the header row
intrusive_ptr<TableRow> row = table->GetRows()->GetItemInRowCollection(0);
row->SetIsHeader(true);
//Set height and background color for the header row
row->SetHeight(20);
row->SetHeightType(TableRowHeightType::Exactly);
for (int i = 0; i < row->GetCells()->GetCount(); i++)
{
row->GetCells()->GetItemInCellCollection(i)->GetCellFormat()->GetShading()->SetBackgroundPatternColor(Color::FromArgb(142, 170, 219));
}
//Add data to the header row and set formatting
for (size_t i = 0; i < header.size(); i++)
{
//Add a paragraph
intrusive_ptr<Paragraph> p1 = row->GetCells()->GetItemInCellCollection(i)->AddParagraph();
//Set alignment
p1->GetFormat()->SetHorizontalAlignment(HorizontalAlignment::Center);
row->GetCells()->GetItemInCellCollection(i)->GetCellFormat()->SetVerticalAlignment(VerticalAlignment::Middle);
//Add data
intrusive_ptr<TextRange> tR1 = p1->AppendText(header[i].c_str());
//Set data formatting
tR1->GetCharacterFormat()->SetFontName(L"Calibri");
tR1->GetCharacterFormat()->SetFontSize(12);
tR1->GetCharacterFormat()->SetBold(true);
}
//Add data to the remaining rows and set formatting
for (size_t r = 0; r < data.size(); r++)
{
//Set height for the remaining rows
intrusive_ptr<TableRow> dataRow = table->GetRows()->GetItemInRowCollection(r + 1);
dataRow->SetHeight(20);
dataRow->SetHeightType(TableRowHeightType::Exactly);
for (size_t c = 0; c < data[r].size(); c++)
{
//Add a paragraph
intrusive_ptr<Paragraph> p2 = dataRow->GetCells()->GetItemInCellCollection(c)->AddParagraph();
//Set alignment
dataRow->GetCells()->GetItemInCellCollection(c)->GetCellFormat()->SetVerticalAlignment(VerticalAlignment::Middle);
//Add data
intrusive_ptr<TextRange> tR2 = p2->AppendText(data[r][c].c_str());
//Set data formatting
tR2->GetCharacterFormat()->SetFontName(L"Calibri");
tR2->GetCharacterFormat()->SetFontSize(11);
}
}
//Save the result document
doc->SaveToFile(L"CreateTable.docx", FileFormat::Docx2013);
doc->Close();
}

Create a Nested Table in Word in C++
Spire.Doc for C++ offers the TableCell->AddTable() method to add a nested table to a specific table cell. The detailed steps are as follows:
- Initialize an instance of the Document class.
- Add a section to the document using Document->AddSection() method.
- Add a table to the section using Section.AddTable() method.
- Specify the number of rows and columns in the table using Table->ResetCells(int, int) method.
- Get the rows of the table and add data to the cells of each row.
- Add a nested table to a specific table cell using TableCell->AddTable() method.
- Specify the number of rows and columns in the nested table.
- Get the rows of the nested table and add data to the cells of each row.
- Save the result document using Document->SaveToFile() method.
- C++
#include "Spire.Doc.o.h"
using namespace Spire::Doc;
using namespace std;
int main()
{
//Initialize an instance of the Document class
intrusive_ptr<Document> doc = new Document();
//Add a section to the document
intrusive_ptr<Section> section = doc->AddSection();
//Set page margins for the section
section->GetPageSetup()->GetMargins()->SetAll(72);
//Add a table to the section
intrusive_ptr<Table> table = section->AddTable(true);
//Set the number of rows and columns in the table
table->ResetCells(2, 2);
//Autofit the table width to window
table->AutoFit(AutoFitBehaviorType::AutoFitToWindow);
//Get the table rows
intrusive_ptr<TableRow> row1 = table->GetRows()->GetItemInRowCollection(0);
intrusive_ptr<TableRow> row2 = table->GetRows()->GetItemInRowCollection(1);
//Add data to cells of the table
intrusive_ptr<TableCell> cell1 = row1->GetCells()->GetItemInCellCollection(0);
intrusive_ptr<TextRange> tR = cell1->AddParagraph()->AppendText(L"Product");
tR->GetCharacterFormat()->SetFontSize(13);
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> cell2 = row1->GetCells()->GetItemInCellCollection(1);
tR = cell2->AddParagraph()->AppendText(L"Description");
tR->GetCharacterFormat()->SetFontSize(13);
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> cell3 = row2->GetCells()->GetItemInCellCollection(0);
cell3->AddParagraph()->AppendText(L"Spire.Doc for C++");
intrusive_ptr<TableCell> cell4 = row2->GetCells()->GetItemInCellCollection(1);
cell4->AddParagraph()->AppendText(L"Spire.Doc for C++ is a professional Word "
L"library specifically designed for developers to create, "
L"read, write and convert Word documents in C++ "
L"applications with fast and high-quality performance.");
//Add a nested table to the fourth cell
intrusive_ptr<Table> nestedTable = cell4->AddTable(true);
//Set the number of rows and columns in the nested table
nestedTable->ResetCells(3, 2);
//Autofit the table width to content
nestedTable->AutoFit(AutoFitBehaviorType::AutoFitToContents);
//Get table rows
intrusive_ptr<TableRow> nestedRow1 = nestedTable->GetRows()->GetItemInRowCollection(0);
intrusive_ptr<TableRow> nestedRow2 = nestedTable->GetRows()->GetItemInRowCollection(1);
intrusive_ptr<TableRow> nestedRow3 = nestedTable->GetRows()->GetItemInRowCollection(2);
//Add data to cells of the nested table
intrusive_ptr<TableCell> nestedCell1 = nestedRow1->GetCells()->GetItemInCellCollection(0);
tR = nestedCell1->AddParagraph()->AppendText(L"Item");
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> nestedCell2 = nestedRow1->GetCells()->GetItemInCellCollection(1);
tR = nestedCell2->AddParagraph()->AppendText(L"Price");
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> nestedCell3 = nestedRow2->GetCells()->GetItemInCellCollection(0);
nestedCell3->AddParagraph()->AppendText(L"Developer Subscription");
intrusive_ptr<TableCell> nestedCell4 = nestedRow2->GetCells()->GetItemInCellCollection(1);
nestedCell4->AddParagraph()->AppendText(L"$999");
intrusive_ptr<TableCell> nestedCell5 = nestedRow3->GetCells()->GetItemInCellCollection(0);
nestedCell5->AddParagraph()->AppendText(L"Developer OEM Subscription");
intrusive_ptr<TableCell> nestedCell6 = nestedRow3->GetCells()->GetItemInCellCollection(1);
nestedCell6->AddParagraph()->AppendText(L"$2999");
//Save the result document
doc->SaveToFile(L"CreateNestedTable.docx", FileFormat::Docx2013);
doc->Close();
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
3 Efficient Methods to Write Data to Excel in Java

Looking to automate Excel data entry in Java? Manually inputting data into Excel worksheets is time-consuming and error-prone, especially when dealing with large datasets. The good news is that with the right Java Excel library, you can streamline this process. This comprehensive guide explores three efficient methods to write data to Excel in Java using the powerful Spire.XLS for Java library, covering basic cell-by-cell entries, bulk array inserts, and DataTable exports.
- Prerequisites: Setup & Installation
- 3 Ways to Write Data to Excel using Java
- Performance Tips for Large Datasets
- Frequently Asked Questions
- Final Thoughts
Prerequisites: Setup & Installation
Before you start, you’ll need to add Spire.XLS for Java to your project. Here’s how to do it quickly:
Option 1: Download the JAR File
- Visit the Spire.XLS for Java download page.
- Download the latest JAR file.
- Add the JAR to your project’s build path.
Option 2: Use Maven
If you’re using Maven, add the following repository and dependency to your pom.xml file. This automatically downloads and integrates the library:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.xls</artifactId>
<version>15.7.7</version>
</dependency>
3 Ways to Write Data to Excel using Java
Spire.XLS for Java offers flexible methods to write data, tailored to different scenarios. Let’s explore each with complete code samples, explanations, and use cases.
1. Write Text or Numbers to Excel Cells
Need to populate individual cells with text or numbers? Spire.XLS lets you directly target a specific cell using row/column indices (e.g., (2,1) for row 2, column 1) or Excel-style references (e.g., "A1", "B3"):
How It Works:
- Use the Worksheet.get(int row, int column) or Worksheet.get(String name) method to access a specific Excel cell.
- Use the setValue() method to write a text value to the cell.
- Use the setNumberValue() method to write a numeric value to the cell.
**Java code to write data to Excel: **
import com.spire.xls.*;
public class WriteToCells {
public static void main(String[] args) {
// Create a Workbook object
Workbook workbook = new Workbook();
// Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
// Write data to specific cells
worksheet.get("A1").setValue("Name");
worksheet.get("B1").setValue("Age");
worksheet.get("C1").setValue("Department");
worksheet.get("D1").setValue("Hiredate");
worksheet.get(2,1).setValue("Hazel");
worksheet.get(2,2).setNumberValue(29);
worksheet.get(2,3).setValue("Marketing");
worksheet.get(2,4).setValue("2019-07-01");
worksheet.get(3,1).setValue("Tina");
worksheet.get(3,2).setNumberValue(31);
worksheet.get(3,3).setValue("Technical Support");
worksheet.get(3,4).setValue("2015-04-27");
// Autofit column widths
worksheet.getAllocatedRange().autoFitColumns();
// Apply a style to the first row
CellStyle style = workbook.getStyles().addStyle("newStyle");
style.getFont().isBold(true);
worksheet.getRange().get(1,1,1,4).setStyle(style);
// Save to an Excel file
workbook.saveToFile("output/WriteToCells.xlsx", ExcelVersion.Version2016);
}
}
When to use this: Small datasets where you need precise control over cell placement (e.g., adding a title, single-row entries).

2. Write Arrays to Excel Worksheets
For bulk data, writing arrays (1D or 2D) is far more efficient than updating cells one by one. Spire.XLS for Java allows inserting arrays into a contiguous cell range.
insertArray() Method Explained:
The insertArray() method handles 1D arrays (single rows) and 2D arrays (multiple rows/columns) effortlessly. Its parameters are:
- Object[] array/ Object[][] array: The 1D or 2D array containing data to insert.
- int firstRow: The starting row index (1-based).
- int firstColumn: The starting column index (1-based).
- boolean isVertical: A boolean indicating the insertion direction:
- false: Insert horizontally (left to right).
- true: Insert vertically (top to bottom).
**Java code to insert arrays into Excel: **
import com.spire.xls.*;
public class WriteArrayToWorksheet {
public static void main(String[] args) {
// Create a Workbook instance
Workbook workbook = new Workbook();
// Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
// Create a one-dimensional array
Object[] oneDimensionalArray = {"January", "February", "March", "April","May", "June"};
// Write the array to the first row of the worksheet
worksheet.insertArray(oneDimensionalArray, 1, 1, false);
// Create a two-dimensional array
Object[][] twoDimensionalArray = {
{"Name", "Age", "Sex", "Dept.", "Tel."},
{"John", "25", "Male", "Development","654214"},
{"Albert", "24", "Male", "Support","624847"},
{"Amy", "26", "Female", "Sales","624758"}
};
// Write the array to the worksheet starting from the cell A3
worksheet.insertArray(twoDimensionalArray, 3, 1);
// Autofit column width in the located range
worksheet.getAllocatedRange().autoFitColumns();
// Apply a style to the first and the third row
CellStyle style = workbook.getStyles().addStyle("newStyle");
style.getFont().isBold(true);
worksheet.getRange().get(1,1,1,6).setStyle(style);
worksheet.getRange().get(3,1,3,6).setStyle(style);
// Save to an Excel file
workbook.saveToFile("WriteArrays.xlsx", ExcelVersion.Version2016);
}
}
When to use this: Sequential data (e.g., inventory logs, user lists) that needs bulk insertion.

3. Write DataTable to Excel
If your data is stored in a DataTable (e.g., from a database), Spire.XLS lets you directly export it to Excel with insertDataTable(), preserving structure and column headers.
insertDataTable() Method Explained:
The insertDataTable() method is a sophisticated bulk-insert operation designed specifically for transferring structured data collections into Excel. Its parameters are:
- DataTable dataTable: The DataTable object containing the data to insert.
- boolean columnHeaders: A boolean indicating whether to include column names from the DataTable as headers in Excel.
- true: Inserts column names as the first row.
- false: Skips column names; data starts from the first row.
- int firstRow: The starting row index (1-based).
- int firstColumn: The starting column index (1-based).
- boolean transTypes: A boolean indicating whether to preserve data types.
Java code to export DataTable to Excel:
import com.spire.xls.*;
import com.spire.xls.data.table.DataRow;
import com.spire.xls.data.table.DataTable;
public class WriteDataTableToWorksheet {
public static void main(String[] args) throws Exception {
// Create a Workbook instance
Workbook workbook = new Workbook();
// Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
// Create a DataTable object
DataTable dataTable = new DataTable();
dataTable.getColumns().add("SKU", Integer.class);
dataTable.getColumns().add("NAME", String.class);
dataTable.getColumns().add("PRICE", String.class);
// Create rows and add data
DataRow dr = dataTable.newRow();
dr.setInt(0, 512900512);
dr.setString(1,"Wireless Mouse M200");
dr.setString(2,"$85");
dataTable.getRows().add(dr);
dr = dataTable.newRow();
dr.setInt(0,512900637);
dr.setString(1,"B100 Cored Mouse ");
dr.setString(2,"$99");
dataTable.getRows().add(dr);
dr = dataTable.newRow();
dr.setInt(0,512901829);
dr.setString(1,"Gaming Mouse");
dr.setString(2,"$125");
dataTable.getRows().add(dr);
dr = dataTable.newRow();
dr.setInt(0,512900386);
dr.setString(1,"ZM Optical Mouse");
dr.setString(2,"$89");
dataTable.getRows().add(dr);
// Write datatable to the worksheet
worksheet.insertDataTable(dataTable,true,1,1,true);
// Autofit column width in the located range
worksheet.getAllocatedRange().autoFitColumns();
// Apply a style to the first row
CellStyle style = workbook.getStyles().addStyle("newStyle");
style.getFont().isBold(true);
worksheet.getRange().get(1,1,1,3).setStyle(style);
// Save to an Excel file
workbook.saveToFile("output/WriteDataTable.xlsx", ExcelVersion.Version2016);
}
}
When to use this: Database exports, CRM data, or any structured data stored in a DataTable (e.g., SQL query results, CSV imports).

Performance Tips for Large Datasets
- Use bulk operations (insertArray()/insertDataTable()) instead of writing cells one by one.
- Disable auto-fit columns or styling during data insertion, then apply them once after all data is written.
- For datasets with 100,000+ rows, consider streaming mode to reduce memory usage.
Frequently Asked Questions
Q1: What Excel formats does Spire.XLS support for writing data?
A: Spire.XLS for Java supports all major Excel formats, including:
- Legacy formats: XLS (Excel 97-2003)
- Modern formats: XLSX, XLSM (macro-enabled), XLSB, and more.
You can specify the output format when saving Excel with the saveToFile() method.
Q2: How do I format cells (colors, fonts, borders) when writing data?
A: Spire.XLS offers robust styling options. Check these guides:
- Set Background Color for Excel Cells in Java
- Apply Different Fonts to Excel Cells in Java
- Add Cell Borders in Excel in Java
Q3: How do I avoid the "Evaluation Warning" in output files?
A: To remove the evaluation sheets, get a 30-day free trial license here and then apply the license key in your code before creating the Workbook object:
com.spire.xls.license.LicenseProvider.setLicenseKey("Key");
Workbook workbook = new Workbook();
Final Thoughts
Mastering Excel export functionality is crucial for Java developers in data-driven applications. The Spire.XLS for Java library provides three efficient approaches to write data to Excel in Java:
- Precision control with cell-by-cell writing
- High-performance bulk inserts using arrays
- Database-style exporting with DataTables
Each method serves distinct use cases - from simple reports to complex enterprise data exports. By following the examples in the article, developers can easily create and write to Excel files in Java applications.
Extract Tables from PDFs in C# - Export to TXT & CSV
Extracting tables from PDF files is a common requirement in data processing, reporting, and automation tasks. PDFs are widely used for sharing structured data, but extracting tables programmatically can be challenging due to their complex layout. Fortunately, with the right tools, this process becomes straightforward. In this guide, we’ll explore how to extract tables from PDF in C# using the Spire.PDF for .NET library, and export the results to TXT and CSV formats for easy reuse.
Table of Contents:
- Prerequisites for Reading PDF Tables in C#
- Understanding PDF Table Structure
- How to Extract Tables from PDF in C#
- Extract PDF Tables to a Text File in C#
- Export PDF Tables to CSV in C#
- Conclusion
- FAQs
Prerequisites for Reading PDF Tables in C#
Spire.PDF for .NET is a powerful library for processing PDF files in C# and VB.NET. It supports a wide range of PDF operations, including table extraction, text extraction, image extraction, and more.
The easiest way to add the Spire.PDF library is via NuGet Package Manager.
1. Open Visual Studio and create a new C# project. (Here we create a Console App)
2. In Visual Studio, right-click your project > Manage NuGet Packages.
3. Search for “Spire.PDF” and install the latest version.
Understanding PDF Table Structure
Before coding, let’s clarify how PDFs store tables. Unlike Excel (which explicitly defines rows/columns), PDFs use:
- Text Blocks: Individual text elements positioned with coordinates.
- Borders/Lines: Visual cues (horizontal/vertical lines) that humans interpret as table edges.
- Spacing: Consistent gaps between text blocks to indicate cells.
The Spire.PDF library infers table structure by analyzing these visual cues, matching text blocks to rows/columns based on proximity and alignment.
How to Extract Tables from PDF in C#
If you need a quick way to preview table data (e.g., debugging or verifying extraction), printing it to the console is a great starting point.
Key methods to extract data from a PDF table:
- PdfDocument: Represents a PDF file.
- LoadFromFile: Loads the PDF file for processing.
- PdfTableExtractor: Analyzes the PDF to detect tables using visual cues (borders, spacing).
- ExtractTable(pageIndex): Returns an array of PdfTable objects for the specified page.
- GetRowCount()/GetColumnCount(): Retrieve the dimensions of each table.
- GetText(rowIndex, columnIndex): Extracts text from the cell at the specified row and column.
using Spire.Pdf;
using Spire.Pdf.Utilities;
namespace ExtractPdfTable
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument object
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("invoice.pdf");
// Initialize an instance of PdfTableExtractor class
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
// Loop through the pages
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
// Extract tables from a specific page
PdfTable[] tableList = extractor.ExtractTable(pageIndex);
// Determine if the table list is null
if (tableList != null && tableList.Length > 0)
{
int tableNumber = 1;
// Loop through the table in the list
foreach (PdfTable table in tableList)
{
Console.WriteLine($"\nTable {tableNumber} on Page {pageIndex + 1}:");
Console.WriteLine("-----------------------------------");
// Get row number and column number of a certain table
int row = table.GetRowCount();
int column = table.GetColumnCount();
// Loop through rows and columns
for (int i = 0; i < row; i++)
{
for (int j = 0; j < column; j++)
{
// Get text from the specific cell
string text = table.GetText(i, j);
// Print cell text to console with a separator
Console.Write($"{text}\t");
}
// New line after each row
Console.WriteLine();
}
tableNumber++;
}
}
}
// Close the document
pdf.Close();
}
}
}
When to Use This Method
- Quick debugging or validation of extracted data.
- Small datasets where you don’t need persistent storage.
Output: Retrieve PDF table data and output to the console

Extract PDF Tables to a Text File in C#
For lightweight, human-readable storage, saving tables to a text file is ideal. This method uses StringBuilder to efficiently compile table data, preserving row breaks for readability.
Key features of extracting PDF tables and exporting to TXT:
- Efficiency: StringBuilder minimizes memory overhead compared to string concatenation.
- Persistent Storage: Saves data to a text file for later review or sharing.
- Row Preservation: Uses \r\n to maintain row structure, making the text file easy to scan.
using Spire.Pdf;
using Spire.Pdf.Utilities;
using System.Text;
namespace ExtractTableToTxt
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument object
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("invoice.pdf");
// Create a StringBuilder object
StringBuilder builder = new StringBuilder();
// Initialize an instance of PdfTableExtractor class
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
// Declare a PdfTable array
PdfTable[] tableList = null;
// Loop through the pages
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
// Extract tables from a specific page
tableList = extractor.ExtractTable(pageIndex);
// Determine if the table list is null
if (tableList != null && tableList.Length > 0)
{
// Loop through the table in the list
foreach (PdfTable table in tableList)
{
// Get row number and column number of a certain table
int row = table.GetRowCount();
int column = table.GetColumnCount();
// Loop through the rows and columns
for (int i = 0; i < row; i++)
{
for (int j = 0; j < column; j++)
{
// Get text from the specific cell
string text = table.GetText(i, j);
// Add text to the string builder
builder.Append(text + " ");
}
builder.Append("\r\n");
}
}
}
}
// Write to a .txt file
File.WriteAllText("ExtractPDFTable.txt", builder.ToString());
}
}
}
When to Use This Method
- Archiving table data in a lightweight, universally accessible format.
- Sharing with teams that need to scan data without spreadsheet tools.
- Using as input for basic scripts (e.g., PowerShell) to extract specific values.
Output: Extract PDF table data and save to a text file.

Pro Tip: For VB.NET demos, convert the above code using our C# ⇆ VB.NET Converter.
Export PDF Tables to CSV in C#
CSV (Comma-Separated Values) is the industry standard for tabular data, compatible with Excel, Google Sheets, and databases. This method formats the extracted tables into a valid CSV file by quoting cells and handling special characters.
Key features of extracting tables from PDF to CSV:
- StreamWriter: Writes data incrementally to the CSV file, reducing memory usage for large PDFs.
- Quoted Cells: Cells are wrapped in double quotes (" ") to avoid misinterpreting commas within text as column separators.
- UTF-8 Encoding: Supports special characters in cell text.
- Spreadsheet Ready: Directly opens in Excel, Google Sheets, or spreadsheet tools for analysis.
using Spire.Pdf;
using Spire.Pdf.Utilities;
using System.Text;
namespace ExtractTableToCsv
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument object
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("invoice.pdf");
// Create a StreamWriter object for efficient CSV writing
using (StreamWriter csvWriter = new StreamWriter("PDFtable.csv", false, Encoding.UTF8))
{
// Create a PdfTableExtractor object
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
// Loop through the pages
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
// Extract tables from a specific page
PdfTable[] tableList = extractor.ExtractTable(pageIndex);
// Determine if the table list is null
if (tableList != null && tableList.Length > 0)
{
// Loop through the table in the list
foreach (PdfTable table in tableList)
{
// Get row number and column number of a certain table
int row = table.GetRowCount();
int column = table.GetColumnCount();
// Loop through the rows
for (int i = 0; i < row; i++)
{
// Creates a list to store data
List<string> rowData = new List<string>();
// Loop through the columns
for (int j = 0; j < column; j++)
{
// Retrieve text from table cells
string cellText = table.GetText(i, j).Replace("\"", "\"\"");
// Add the cell text to the list and wrap in double quotes
rowData.Add($"\"{cellText}\"");
}
// Join cells with commas and write to CSV
csvWriter.WriteLine(string.Join(",", rowData));
}
}
}
}
}
}
}
}
When to Use This Method
- Data analysis (import into Excel for calculations).
- Migrating PDF tables to databases (e.g., SQL Server, PostgreSQL, MySQL).
- Collaborating with teams that rely on spreadsheets.
Output: Parse PDF table data and export to a CSV file.

Recommendation: Integrate with Spire.XLS for .NET to extract tables from PDF to Excel directly.
Conclusion
This guide has outlined three efficient methods for extracting tables from PDFs in C#. By leveraging the Spire.PDF for .NET library, you can automate the PDF table extraction process and export results to console, TXT, or CSV for further analysis. Whether you’re building a data pipeline, report generator, or business tool, these approaches streamline workflows, save time, and minimize human error.
Refer to the online documentation and obtain a free trial license here to explore more advanced PDF operations.
FAQs
Q1: Why use Spire.PDF for .NET to extract tables?
A: Spire.PDF provides a dedicated PdfTableExtractor class that detects tables based on visual cues (borders, spacing, and text alignment), simplifying the process of parsing structured data from PDFs.
Q2: Can Spire.PDF extract tables from scanned (image-based) PDFs?
A: No. The .NET PDF library works only with text-based PDFs (where text is selectable). For scanned PDFs, use Spire.OCR to extract text before parsing tables.
Q3: Can I extract tables from multiple PDFs at once?
A: Yes. To batch-process multiple PDFs, use Directory.GetFiles() to list all PDF files in a folder, then loop through each file and run the extraction logic. For example:
string[] pdfFiles = Directory.GetFiles(@"C:\Invoices\", "*.pdf");
foreach (string file in pdfFiles)
{
// Run extraction code for each file
}
Q4: How can I improve performance when extracting tables from large PDFs?
A: For large PDFs (100+ pages), optimize performance by:
- Processing pages in batches instead of loading the entire PDF at once.
- Disposing of unused PdfTable or PdfDocument objects with the using statements to free memory.
- Skipping pages with no tables early (
using if (tableList == null || tableList.Length == 0)).
Java: Convert Images to PDF
Converting images to PDF is beneficial for many reasons. For one reason, it allows you to convert images into a format that is more readable and easier to share. For another reason, it dramatically reduces the size of the file while preserving the quality of images. In this article, you will learn how to convert images to PDF in Java using Spire.PDF for Java.
There is no straightforward method provided by Spire.PDF to convert images to PDF. You could, however, create a new PDF document and draw images at the specified locations. Depending on whether the page size of the generated PDF matches the image, this topic can be divided into two subtopics.
Install Spire.PDF for Java
First, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>11.10.3</version>
</dependency>
</dependencies>
Additionally, the imgscalr library is used in the first code example to resize images. It is not necessary to install it if you do not need to adjust the image’s size.
Add an Image to PDF at a Specified Location
The following are the steps to add an image to PDF at a specified location using Spire.PDF for Java.
- Create a PdfDocument object.
- Set the page margins using PdfDocument.getPageSettings().setMargins() method.
- Add a page using PdfDocument.getPages().add() method
- Load an image using ImageIO.read() method, and get the image width and height.
- If the image width is larger than the page (the content area) width, resize the image to make it to fit to the page width using the imgscalr library.
- Create a PdfImage object based on the scaled image or the original image.
- Draw the PdfImage object on the first page at (0, 0) using PdfPageBase.getCanvas().drawImage() method.
- Save the document to a PDF file using PdfDocument.saveToFile() method.
- Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.graphics.PdfImage;
import org.imgscalr.Scalr;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.IOException;
public class AddImageToPdf {
public static void main(String[] args) throws IOException {
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Set the margins
doc.getPageSettings().setMargins(20);
//Add a page
PdfPageBase page = doc.getPages().add();
//Load an image
BufferedImage image = ImageIO.read(new FileInputStream("C:\\Users\\Administrator\\Desktop\\announcement.jpg"));
//Get the image width and height
int width = image.getWidth();
int height = image.getHeight();
//Declare a PdfImage variable
PdfImage pdfImage;
//If the image width is larger than page width
if (width > page.getCanvas().getClientSize().getWidth())
{
//Resize the image to make it to fit to the page width
int widthFitRate = width / (int)page.getCanvas().getClientSize().getWidth();
int targetWidth = width / widthFitRate;
int targetHeight = height / widthFitRate;
BufferedImage scaledImage = Scalr.resize(image,Scalr.Method.QUALITY,targetWidth,targetHeight);
//Load the scaled image to the PdfImage object
pdfImage = PdfImage.fromImage(scaledImage);
} else
{
//Load the original image to the PdfImage object
pdfImage = PdfImage.fromImage(image);
}
//Draw image at (0, 0)
page.getCanvas().drawImage(pdfImage, 0, 0, pdfImage.getWidth(), pdfImage.getHeight());
//Save to file
doc.saveToFile("output/AddImage.pdf");
}
}

Convert an Image to PDF with the Same Width and Height
The following are the steps to convert an image to a PDF with the same page size as the image using Spire.PDF for Java.
- Create a PdfDocument object.
- Set the page margins to zero using PdfDocument.getPageSettings().setMargins() method.
- Load an image using ImageIO.read() method, and get the image width and height.
- Add a page to PDF based on the size of the image using PdfDocument.getPages().add() method.
- Create a PdfImage object based on the image.
- Draw the PdfImage object on the first page from the coordinate (0, 0) using PdfPageBase.getCanvas().drawImage() method.
- Save the document to a PDF file using PdfDocument.saveToFile() method.
- Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.graphics.PdfImage;
import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.IOException;
public class ConvertImageToPdfWithSameSize {
public static void main(String[] args) throws IOException {
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Set the margins to 0
doc.getPageSettings().setMargins(0);
//Load an image
BufferedImage image = ImageIO.read(new FileInputStream("C:\\Users\\Administrator\\Desktop\\announcement.jpg"));
//Get the image width and height
int width = image.getWidth();
int height = image.getHeight();
//Add a page of the same size as the image
PdfPageBase page = doc.getPages().add(new Dimension(width, height));
//Create a PdfImage object based on the image
PdfImage pdfImage = PdfImage.fromImage(image);
//Draw image at (0, 0) of the page
page.getCanvas().drawImage(pdfImage, 0, 0, pdfImage.getWidth(), pdfImage.getHeight());
//Save to file
doc.saveToFile("output/ConvertPdfWithSameSize.pdf");
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
How to Add a Digital Signature to a PDF Using Java

Digital signatures play a crucial role in ensuring the authenticity and integrity of PDF documents. Whether you need to sign contracts, legal documents, or financial reports, adding a digital signature helps verify the signer's identity and prevents unauthorized modifications.
In this tutorial, we will explore how to add invisible and visible digital signatures to PDFs using Spire.PDF for Java. We will also cover how to create a signature field for later signing.
- Java Library to Digitally Sign PDF Documents
- Adding an Invisible Digital Signature to a PDF
- Adding a Visible Digital Signature to a PDF
- Creating a Signature Field in a PDF
- Wrap Up
- FAQs
Java Library to Digitally Sign PDF Documents
To work with digital signatures in PDFs, we will use Spire.PDF for Java, a powerful library that allows developers to create, edit, and sign PDF documents programmatically.
Key Features
- Supports PFX certificates for digital signing.
- Allows invisible and visible signatures .
- Enables customization of signature appearance (image, text, details).
- Works with existing PDF forms or creates new signature fields.
Prerequisites
Before you start, ensure you have:
- Java Development Kit (JDK) installed.
- Spire.PDF for Java added to your project.
- A PFX certificate (for signing) and a sample PDF file.
Installation
Download Spire.PDF for Java from our website, and manually import the JAR file into your Java project. If you’re using Maven, add the following code to your project's pom.xml.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>11.7.5</version>
</dependency>
</dependencies>
Adding an Invisible Digital Signature to a PDF
An invisible digital signature embeds cryptographic authentication without displaying a visual element. This is useful for internal verification while keeping the document clean. Below are the steps to add an invisible signature to a PDF using Spire.PDF.
Step-by-Step Guide
- Initialize a PdfDocument object.
- Load the PDF file that you want to sign.
- Use PdfCertificate to load the PFX certificate with password.
- Initialize a PdfOrdinarySignatureMaker object to manage the signing process.
- Use the makeSignature method to embed the signature without visual elements.
- Save the signed PDF to a new file.
Code Example
import com.spire.pdf.PdfDocument;
import com.spire.pdf.interactive.digitalsignatures.PdfCertificate;
import com.spire.pdf.interactive.digitalsignatures.PdfOrdinarySignatureMaker;
public class AddInvisibleSignature {
public static void main(String[] args) {
// Create a new PDF document object
PdfDocument doc = new PdfDocument();
// Load the input PDF file that needs to be signed
doc.loadFromFile("C:/Users/Administrator/Desktop/Input.pdf");
// Specify the path to the PFX certificate and its password
String filePath = "C:/Users/Administrator/Desktop/certificate.pfx";
String password = "e-iceblue";
// Load the digital certificate (PFX format) with the given password
PdfCertificate certificate = new PdfCertificate(filePath, password);
// Create a signature maker object to apply the digital signature
PdfOrdinarySignatureMaker signatureMaker = new PdfOrdinarySignatureMaker(doc, certificate);
// Apply an invisible digital signature with the name "signature 1"
signatureMaker.makeSignature("signature 1");
// Save the signed PDF to a new file
doc.saveToFile("Signed.pdf");
// Release resources
doc.dispose();
}
}
Output:

You might also be interested in: How to Verify Signatures in PDF in Java
Adding a Visible Digital Signature to a PDF
A visible digital signature displays signer details (name, reason, image) at a specified location. This is ideal for contracts where visual confirmation is needed. Here’s how to add a visible signature using Spire.PDF.
Step-by-Step Guide
- Create a PdfDocument object and load your target PDF file.
- Load the PFX certificate by initializing the PdfCertificate object.
- Create a PdfOrdinarySignatureMaker instance.
- Define the signer’s name, contact info, and reason for signing.
- Design signature appearance by adding an image, labels, and setting the layout (SignImageAndSignDetail mode).
- Use the makeSignature method to place the signature at the desired coordinates on the PDF.
- Save the signed PDF to a new file.
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.graphics.PdfImage;
import com.spire.pdf.interactive.digitalsignatures.*;
public class AddVisibleSignature {
public static void main(String[] args) {
// Create a new PDF document object
PdfDocument doc = new PdfDocument();
// Load the input PDF file that needs to be signed
doc.loadFromFile("C:/Users/Administrator/Desktop/Input.pdf");
// Specify the path to the PFX certificate and its password
String filePath = "C:/Users/Administrator/Desktop/certificate.pfx";
String password = "e-iceblue";
// Load the digital certificate (PFX format) with the given password
PdfCertificate certificate = new PdfCertificate(filePath, password);
// Create a signature maker object to apply the digital signature
PdfOrdinarySignatureMaker signatureMaker = new PdfOrdinarySignatureMaker(doc, certificate);
// Get the pdf signature and set the sign details
PdfSignature signature = signatureMaker.getSignature();
signature.setName("Gary");
signature.setContactInfo("112554");
signature.setLocation("U.S.");
signature.setReason("This is the final version.");
// Create a signature appearance
PdfSignatureAppearance appearance = new PdfSignatureAppearance(signature);
// Set labels for the signature
appearance.setNameLabel("Signer: ");
appearance.setContactInfoLabel("Phone: ");
appearance.setLocationLabel("Location: ");
appearance.setReasonLabel("Reason: ");
// Load an image
PdfImage image = PdfImage.fromFile("C:/Users/Administrator/Desktop/signature.png");
// Set the image as the signature image
appearance.setSignatureImage(image);
// Set the graphic mode as SignImageAndSignDetail
appearance.setGraphicMode(GraphicMode.SignImageAndSignDetail);
// Get the last page
PdfPageBase page = doc.getPages().get(doc.getPages().getCount() - 1);
// Add the signature to a specified location of the page
signatureMaker.makeSignature("signature 1", page, 54.0f, 470.0f, 280.0f, 90.0f, appearance);
// Save the signed PDF to a new file
doc.saveToFile("Signed.pdf");
// Release resources
doc.dispose();
}
}
Output:

Creating a Signature Field in a PDF
A signature field reserves a space in the PDF for later signing. This is useful for forms or documents that require user signatures. To create a signature field in PDF, follow these steps:
Step-by-Step Guide
- Create a PdfDocument object and load your PDF file.
- Get a specific page (usually last one) where the signature field will be placed.
- Create a PdfSignatureField object for the selected page.
- Customize the field’s appearance by setting border style and color, and field bounds.
- Add the signature field to the document's form.
- Save the updated document to a new PDF file.
Code Example
import com.spire.pdf.PdfDocument;
import com.spire.pdf.fields.PdfBorderStyle;
import com.spire.pdf.fields.PdfHighlightMode;
import com.spire.pdf.fields.PdfSignatureField;
import com.spire.pdf.graphics.PdfRGBColor;
import com.spire.pdf.PdfPageBase;
import java.awt.Rectangle;
public class AddDigitalSignatureField {
public static void main(String[] args) {
// Initialize a new PdfDocument object
PdfDocument doc = new PdfDocument();
// Load the existing PDF from the specified path
doc.loadFromFile("C:/Users/Administrator/Desktop/Input.pdf");
// Retrieve the last page of the document
PdfPageBase page = doc.getPages().get(doc.getPages().getCount() - 1);
// Create a signature field on the specified page
PdfSignatureField signatureField = new PdfSignatureField(page, "signature");
// Customize the appearance of the signature field
signatureField.setBorderWidth(1.0f);
signatureField.setBorderStyle(PdfBorderStyle.Solid);
signatureField.setBorderColor(new PdfRGBColor(java.awt.Color.BLACK));
signatureField.setHighlightMode(PdfHighlightMode.Outline);
signatureField.setBounds(new Rectangle(54, 470, 200, 100));
// Enable form creation if none exists in the document
doc.setAllowCreateForm(doc.getForm() == null);
// Add the signature field to the document's form
doc.getForm().getFields().add(signatureField);
// Save the modified document to a new file
doc.saveToFile("SignatureField.pdf");
// Release resources
doc.dispose();
}
}
Output:

Wrap Up
In this tutorial, we explored how to add digital signatures to PDF documents in Java using the Spire.PDF library. We covered the steps for adding both invisible and visible signatures, as well as creating interactive signature fields. With these skills, you can enhance document security and ensure the integrity of your digital communications.
FAQs
Q1. What is a digital signature?
A digital signature is an electronic signature that uses cryptographic techniques to provide proof of the authenticity and integrity of a digital message or document.
Q2. Do I need a special certificate for signing PDFs?
Yes, a valid digital certificate (usually in PFX format) is required to sign PDF documents digitally.
Q3. How do I verify a signed PDF?
You can verify a signed PDF using Adobe Reader or by using Spire.PDF’s PdfSignature.verifySignature() method.
Q4. How can I customize the appearance of my visible digital signature?
With Spire.PDF for Java, you can fully customize visible signatures by:
- Setting text properties (font, color, labels for signer info).
- Adding a signature image (e.g., company logo or scanned handwritten signature).
- Choosing layout modes (SignImageOnly, SignDetail, or SignImageAndSignDetail).
- Adjusting position and dimensions on the page.
Q5. Can I add a timestamp when digitally signing a PDF document?
Yes, you can. Refer to the code:
PdfPKCS7Formatter formatter = new PdfPKCS7Formatter(certificate, false);
formatter.setTimestampService(new TSAHttpService("http://tsa.cesnet.cz:3161/tsa"));
PdfOrdinarySignatureMaker signatureMaker = new PdfOrdinarySignatureMaker(doc, formatter);
signatureMaker.makeSignature("signature 1");
Get a Free License
To fully experience the capabilities of Spire.PDF for Java without any evaluation limitations, you can request a free 30-day trial license.
Master PDF Compression in Java: Reduce PDF File Size Efficiently

Handling large PDF files is a common challenge for Java developers. PDFs with high-resolution images, embedded fonts, and multimedia content can quickly become heavy, slowing down applications, increasing storage costs, and creating a poor user experience—especially on mobile devices.
Mastering PDF compression in Java is essential to reduce file size efficiently while maintaining document quality. This step-by-step guide demonstrates how to compress and optimize PDF files in Java. You’ll learn how to compress document content, optimize images, fonts, and metadata, ensuring faster file transfers, improved performance, and a smoother user experience in your Java applications.
What You Will Learn
- Setting Up Your Development Environment
- Reduce PDF File Size by Compressing Document Content in Java
- Reduce PDF File Size by Optimizing Specific Elements in Java
- Full Java Example that Combines All PDF Compressing Techniques
- Best Practices for PDF Compression
- Conclusion
- FAQs
1. Setting Up Your Development Environment
Before implementing PDF compression in Java, ensure your development environment is properly configured.
1.1. Prerequisites
- Java Development Kit (JDK): Ensure you have JDK 1.8 or later installed.
- Build Tool: Maven or Gradle is recommended for dependency management.
- Integrated Development Environment (IDE): IntelliJ IDEA or Eclipse is suitable.
1.2. Adding Dependencies
To programmatically compress PDF files, you need a PDF library that supports compression features. Spire.PDF for Java provides APIs for loading, reading, editing, and compressing PDF documents. You can include it via Maven or Gradle.
Maven (pom.xml):
Add the following repository and dependency to your project's pom.xml file within the <repositories> and <dependencies> tags, respectively:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>11.10.3</version>
</dependency>
</dependencies>
Gradle (build.gradle):
For Gradle users, add the repository and dependency as follows:
repositories {
mavenCentral()
maven {
url "https://repo.e-iceblue.com/nexus/content/groups/public/"
}
}
dependencies {
implementation 'e-iceblue:spire.pdf:11.8.0'
}
After adding the dependency, refresh your Maven or Gradle project to download the necessary JAR files.
2. Reduce PDF File Size by Compressing Document Content in Java
One of the most straightforward techniques for reducing PDF file size is to apply document content compression. This approach automatically compresses the internal content streams of the PDF, such as text and graphics data, without requiring any manual fine-tuning. It is especially useful when you want a quick and effective solution that minimizes file size while maintaining document integrity.
The following example demonstrates how to enable and apply content compression in a PDF file using Java.
import com.spire.pdf.conversion.compression.PdfCompressor;
public class CompressContent {
public static void main(String[] args){
// Create a compressor
PdfCompressor compressor = new PdfCompressor("test.pdf");
// Enable document content compression
compressor.getOptions().setCompressContents(true);
// Compress and save
compressor.compressToFile("ContentCompression.pdf");
}
}
Key Points:
- setCompressContents(true) enables document content compression.
- Original PDFs remain unchanged; compressed files are saved separately.
3. Reduce PDF File Size by Optimizing Specific Elements in Java
Beyond compressing content streams, developers can also optimize individual elements of the PDF, such as images, fonts, and metadata. This allows for granular control over file size optimization.
3.1. Image Compression
Images are frequently the primary reason for large files. By lowering the image quality, you can significantly minimize the size of image-heavy PDF files.
import com.spire.pdf.conversion.compression.ImageCompressionOptions;
import com.spire.pdf.conversion.compression.ImageQuality;
import com.spire.pdf.conversion.compression.PdfCompressor;
public class CompressImages {
public static void main(String[] args){
// Load the PDF document
PdfCompressor compressor = new PdfCompressor("test.pdf");
// Get image compression options
ImageCompressionOptions imageCompression = compressor.getOptions().getImageCompressionOptions();
// Compress images and set quality
imageCompression.setCompressImage(true); // Enable image compression
imageCompression.setImageQuality(ImageQuality.Low); // Set image quality (Low, Medium, High)
imageCompression.setResizeImages(true); // Resize images to reduce size
// Save the compressed PDF
compressor.compressToFile("ImageCompression.pdf");
}
}
Key Points:
- setCompressImage(true) enables image compression.
- setImageQuality(...) adjusts the output image quality; the lower the quality, the smaller the image size.
- setResizeImages(true) enables image resizing.
3.2. Font Compression or Unembedding
When a PDF uses custom fonts, the entire font file might be embedded, even if only a few characters are used. Font compression or unembedding is a technique that reduces the size of embedded fonts by compressing them or removing them entirely from the PDF.
import com.spire.pdf.conversion.compression.PdfCompressor;
import com.spire.pdf.conversion.compression.TextCompressionOptions;
public class CompressPDFWithOptions {
public static void main(String[] args){
// Load the PDF document
PdfCompressor compressor = new PdfCompressor("test.pdf");
// Get text compression options
TextCompressionOptions textCompression = compressor.getOptions().getTextCompressionOptions();
// Compress fonts
textCompression.setCompressFonts(true);
// Optional: unembed fonts to reduce size
// textCompression.setUnembedFonts(true);
// Save the compressed PDF
compressor.compressToFile("FontOptimization.pdf");
}
}
Key Points:
- setCompressFonts(true) compresses embedded fonts while preserving document appearance.
- setUnembedFonts(true) removes embedded fonts entirely, which may reduce file size but could affect text rendering if the fonts are not available on the system.
3.3 Metadata Removal
PDFs often store metadata such as author details, timestamps, and editing history that aren’t needed for viewing. Removing metadata reduces file size and protects sensitive information.
import com.spire.pdf.conversion.compression.PdfCompressor;
public class CompressPDFWithOptions {
public static void main(String[] args){
// Load the PDF document
PdfCompressor compressor = new PdfCompressor("test.pdf");
// Remove metadata
compressor.getOptions().setRemoveMetadata(true);
// Save the compressed PDF
compressor.compressToFile("MetadataRemoval.pdf");
}
}
4. Full Java Example that Combines All PDF Compressing Techniques
After exploring both document content compression and element-specific optimizations (images, fonts, and metadata), let’s explore how to apply all these techniques together in one workflow.
import com.spire.pdf.conversion.compression.ImageQuality;
import com.spire.pdf.conversion.compression.OptimizationOptions;
import com.spire.pdf.conversion.compression.PdfCompressor;
public class CompressPDFWithAllTechniques {
public static void main(String[] args){
// Initialize compressor
PdfCompressor compressor = new PdfCompressor("test.pdf");
// Enable document content compression
OptimizationOptions options = compressor.getOptions();
options.setCompressContents(true);
// Optimize images (downsampling and compression)
options.getImageCompressionOptions().setCompressImage(true);
options.getImageCompressionOptions().setImageQuality(ImageQuality.Low);
options.getImageCompressionOptions().setResizeImages(true);
// Optimize fonts (compression or unembedding)
// Compress fonts
options.getTextCompressionOptions().setCompressFonts(true);
// Optional: unembed fonts to reduce size
// options.getTextCompressionOptions().setUnembedFonts(true);
// Remove unnecessary metadata
options.setRemoveMetadata(true);
// Save the compressed PDF
compressor.compressToFile("CompressPDFWithAllTechniques.pdf");
}
}
Reviewing the Compression Effect:
After running the code, the original sample PDF of 3.09 MB was reduced to 742 KB. The compression ratio is approximately 76%.

5. Best Practices for PDF Compression
When applying PDF compression in Java, it’s important to follow some practical guidelines to ensure the file size is reduced effectively without sacrificing usability or compatibility.
- Choose methods based on content: PDF compression depends heavily on the type of content. Text-based files may only require content and font optimization, while image-heavy documents benefit more from image compression. In many cases, combining multiple techniques yields the best results.
- Balance quality with file size: Over-compression may influence the document's readability, so it’s important to maintain a balance.
- Test across PDF readers: Ensure compatibility with Adobe Acrobat, browser viewers, and mobile apps.
6. Conclusion
Compressing PDF in Java is not just about saving disk space—it directly impacts performance, user experience, and system efficiency. Using Libraries like Spire.PDF for Java, developers can implement fine-grained compression techniques, from compressing content, optimizing images and fonts, to cleaning up unused metadata.
By applying the right strategies, you can minimize PDF size in Java significantly without sacrificing quality. This leads to faster file transfers, lower storage costs, and smoother rendering across platforms. Mastering these compression methods ensures your Java applications remain responsive and efficient, even when handling complex, resource-heavy PDFs.
7. FAQs
Q1: Can I reduce PDF file size in Java without losing quality?
A1: Yes. Spire.PDF allows selective compression of images, fonts, and other objects while maintaining readability and layout.
Q2: Will compressed PDFs remain compatible with popular PDF readers?
A2: Yes. Compressed PDFs remain compatible with Adobe Acrobat, browser viewers, mobile apps, and other standard PDF readers.
Q3: What’s the difference between image compression and font compression?
A3: Image compression reduces the size of embedded images, while font compression reduces embedded font data or removes unused fonts. Both techniques together optimize file size effectively.
Q4: How do I choose the best compression strategy?
A4: Consider the PDF content. Use image compression for image-heavy PDFs and font compression for text-heavy PDFs. Often, combining both techniques yields the best results without affecting readability.
Q5: Can I automate PDF compression for multiple files in Java?
A5: Yes. You can write Java scripts to batch compress multiple PDFs by applying the same compression settings consistently across all files.
Merge PDF Files in Java: Full, Partial, and Stream-Based Merging

Merging PDFs in Java is a critical requirement for document-intensive applications, from consolidating financial reports to automating archival systems. However, developers face significant challenges in preserving formatting integrity or managing resource efficiency across diverse PDF sources. Spire.PDF for Java provides a robust and straightforward solution to streamline the PDF merging task.
This comprehensive guide explores how to combine PDFs in Java, complete with practical examples to merge multiple files, selected pages, or stream-based merging.
- Setting Up the Java PDF Merge Library
- Merge Multiple PDF Files in Java
- Merge Specific Pages from Multiple PDFs in Java
- Merge PDF Files by Streams in Java
- Conclusion
- FAQs
Setting Up the Java PDF Merge Library
Why Choose Spire.PDF for Java?
- No External Dependencies: Pure Java implementation.
- Rich Features: Merge, split, encrypt, and annotate PDFs.
- Cross-Platform: Works on Windows, Linux, and macOS.
Installation
Before using Spire.PDF for Java, you need to add it to your project.
Option 1: Maven
Add the repository and dependency to pom.xml:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>11.10.3</version>
</dependency>
</dependencies>
Option 2: Manual JAR
Download the JAR from the E-iceblue website and add it to your project's build path.
Merge Multiple PDF Files in Java
This example is ideal when you want to merge two or more PDF documents entirely. It’s simple, straightforward, and perfect for batch processing.
How It Works:
- Define File Paths: Create an array of strings containing the full paths to the source PDFs.
- Merge Files: The mergeFiles() method takes the array of paths, combines the PDFs, and returns a PdfDocumentBase object representing the merged file.
- Save the Result: The merged PDF is saved to a new file using the save() method.
Java code to combine PDFs:
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfDocumentBase;
public class MergePdfs {
public static void main(String[] args) {
// Get the paths of the PDF documents to be merged
String[] files = new String[] {"sample-1.pdf", "sample-2.pdf", "sample-3.pdf"};
// Merge these PDF documents
PdfDocumentBase pdf = PdfDocument.mergeFiles(files);
// Save the merged PDF file
pdf.save("MergePDF.pdf", FileFormat.PDF);
}
}
Best For:
- Merging entire PDFs stored locally.
- Simple batch operations where no page selection is needed.
Result: Combine three PDF files (a total of 10 pages) into one PDF file.

Merging PDFs often results in large file sizes. To reduce the size, refer to: Compress PDF Files in Java.
Merge Specific Pages from Multiple PDFs in Java
Sometimes, you may only want to merge specific pages from different PDFs (e.g., pages 1-3 from File A and pages 2-5 from File B). This example gives you granular control over which pages to include from each source PDF.
How It Works:
- Load PDFs: Load each source PDF into a PdfDocument object and store them in an array.
- Create a New PDF: A blank PDF document is initialized to serve as the container for merged pages.
- Insert Specific Pages:
- insertPage(): Insert a specified page into the new PDF.
- insertPageRange(): Inserts a range of pages into the new PDF.
- Save the Result: The merged PDF is saved using the saveToFile() method.
Java code to combine selected PDF pages:
import com.spire.pdf.PdfDocument;
public class MergeSelectedPages {
public static void main(String[] args) {
// Get the paths of the PDF documents to be merged
String[] files = new String[] {"sample-1.pdf", "sample-2.pdf", "sample-3.pdf"};
// Create an array of PdfDocument
PdfDocument[] pdfs = new PdfDocument[files.length];
// Loop through the documents
for (int i = 0; i < files.length; i++)
{
// Load a specific document
pdfs[i] = new PdfDocument(files[i]);
}
// Create a new PDF document
PdfDocument pdf = new PdfDocument();
// Insert the selected pages from different PDFs to the new PDF
pdf.insertPage(pdfs[0], 0);
pdf.insertPageRange(pdfs[1], 1,3);
pdf.insertPage(pdfs[2], 0);
// Save the merged PDF
pdf.saveToFile("MergePdfPages.pdf");
}
}
Best For:
- Creating custom PDFs with selected pages (e.g., extracting key sections from reports).
- Scenarios where you need to exclude irrelevant pages from source documents.
Result: Combine selected pages from three separate PDF files into a new PDF

Merge PDF Files by Streams in Java
In applications where PDFs are stored as streams (e.g., PDFs from network streams, in-memory data, or temporary files), Spire.PDF supports merging without saving files to disk.
How It Works:
- Create Input Streams: The FileInputStream objects read the raw byte data of each PDF file.
- Merge Streams: The mergeFiles() method accepts an array of streams, merges them, and returns a PdfDocumentBase object.
- Save and Clean Up: The merged PDF is saved, and all streams and documents are closed to free system resources (critical for preventing leaks).
Java code to merge PDFs via streams:
import com.spire.pdf.*;
import java.io.*;
public class mergePdfsByStream {
public static void main(String[] args) throws IOException {
// Create FileInputStream objects for each PDF document file
FileInputStream stream1 = new FileInputStream(new File("Template_1.pdf"));
FileInputStream stream2 = new FileInputStream(new File("Template_2.pdf"));
FileInputStream stream3 = new FileInputStream(new File("Template_3.pdf"));
// Initialize an array of InputStream objects containing the file input streams
InputStream[] streams = new FileInputStream[]{stream1, stream2, stream3};
// Merge the input streams into a single PdfDocumentBase object
PdfDocumentBase pdf = PdfDocument.mergeFiles(streams);
// Save the merged PDF file
pdf.save("MergePdfsByStream.pdf", FileFormat.PDF);
// Releases system resources used by the merged document
pdf.close();
pdf.dispose();
// Closes all input streams to free up resources
stream1.close();
stream2.close();
stream3.close();
}
}
Best For:
- Merging PDFs from non-file sources (e.g., network downloads, in-memory generation).
- Environments where direct file path access is restricted.
Conclusion
Spire.PDF for Java simplifies complex PDF merging tasks through its intuitive, user-friendly API. Whether you need to merge entire documents, create custom page sequences, or combine PDFs from stream sources, these examples enable efficient PDF merging in Java to address diverse document processing requirements.
To explore more features (e.g., encrypting merged PDFs, adding bookmarks), refer to the official documentation.
Frequently Asked Questions (FAQs)
Q1: Why do merged PDFs show "Evaluation Warning" watermarks?
A: The commercial version adds watermarks. Solutions:
- Request a 30-day trial license to test without any restrictions.
- Use the free version for documents ≤10 pages
Q2: How do I control the order of pages in the merged PDF?
A: The order of pages in the merged PDF is determined by the order of input files (or streams) and the pages you select. For example:
- In full-document merging, files in the input array are merged in the order they appear.
- In selective page merging, use insertPage() or insertPageRange() in the sequence you want pages to appear.
Q3: Can I merge password-protected PDFs?
A: Yes. Spire.PDF for Java supports merging encrypted PDFs, but you must provide the password when loading the file. Use the overloaded loadFromFile() method with the password parameter:
PdfDocument pdf = new PdfDocument();
pdf.loadFromFile("sample.pdf", "userPassword"); // Decrypt with password
Q4: How to merge scanned/image-based PDFs?
A: Spire.PDF handles image-PDFs like regular PDFs, but file sizes may increase significantly.