Text Alignment in Python | Left, Right, Center Align & More

In the world of document automation, proper text alignment is crucial for creating professional, readable, and visually appealing documents. For developers and data professionals building reports, drafting letters, or designing invoices, mastering text alignment in Python is essential to producing polished, consistent documents without manual editing.
This guide delivers a step-by-step walkthrough on how to align text in Python using Spire.Doc for Python, a library that enables effortless control over Word document formatting.
- Why Choose Spire.Doc for Python to Align Text?
- Core Text Alignment Types in Spire.Doc
- Step-by-Step: Align Text in Word in Python
- FAQs About Python Text Alignment
- Conclusion
Why Choose Spire.Doc for Python to Align Text?
Before diving into code, let’s clarify why Spire.Doc is a top choice for text alignment tasks:
- Full Alignment Support: Natively supports all standard alignment types (Left, Right, Center, Justify) for paragraphs.
- No Microsoft Word Dependency: Runs independently - no need to install Word on your machine.
- High Compatibility: Works with .docx, .doc, and other Word formats, ensuring your aligned documents open correctly across devices.
- Fine-Grained Control: Adjust alignment for entire paragraphs or table cells.
Core Text Alignment Types in Spire.Doc
Spire.Doc uses the HorizontalAlignment enum to define text alignment. The most common values are:
- HorizontalAlignment.Left: Aligns text to the left margin (default).
- HorizontalAlignment.Right: Aligns text to the right margin.
- HorizontalAlignment.Center: Centers text horizontally between margins.
- HorizontalAlignment.Justify: Adjusts text spacing so both left and right edges align with margins.
- HorizontalAlignment.Distribute: Adjusts character spacing (adds space between letters) and word spacing to fill the line.
Below, we’ll cover how to programmatically set paragraph alignment (left, right, center, justified, and distributed) in Word using Python
Step-by-Step: Align Text in Word in Python
Here are the actionable steps to generate a Word document with 5 paragraphs, each using a different alignment style.
Step 1: Install Spire.Doc for Python
Open your terminal/command prompt, and then run the following command to install the latest version:
pip install Spire.Doc
Step 2: Import Required Modules
Import the core classes from Spire.Doc. These modules let you create documents, sections, paragraphs, and configure formatting:
from spire.doc import *
from spire.doc.common import *
Step 3: Create a New Word Document
Initialize a Document instance that represents your empty Word file:
# Create a Document instance
doc = Document()
Step 4: Add a Section to the Document
Word documents organize content into sections (each section can have its own margins, page size, etc.). We’ll add one section to hold our paragraphs:
# Add a section to the document
section = doc.AddSection()
Step 5: Add Paragraphs with Different Alignments
A section contains paragraphs, and each paragraph’s alignment is controlled via the HorizontalAlignment enum. We’ll create 5 paragraphs, one for each alignment type.
Left alignment is the default for most text (text aligns to the left margin).
# Left aligned text
paragraph1 = section.AddParagraph()
paragraph1.AppendText("This is left-aligned text.")
paragraph1.Format.HorizontalAlignment = HorizontalAlignment.Left
Right alignment is useful for dates, signatures, or page numbers (text aligns to the right margin).
# Right aligned text
paragraph2 = section.AddParagraph()
paragraph2.AppendText("This is right-aligned text.")
paragraph2.Format.HorizontalAlignment = HorizontalAlignment.Right
Center alignment works well for titles or headings (text centers between left and right margins). Use to center text in Python:
# Center aligned text
paragraph3 = section.AddParagraph()
paragraph3.AppendText("This is center-aligned text.")
paragraph3.Format.HorizontalAlignment = HorizontalAlignment.Center
Justified text aligns both left and right margins (spaces between words are adjusted for consistency). Ideal for formal documents like essays or reports.
# Justified
paragraph4 = section.AddParagraph()
paragraph4.AppendText("This is justified text.")
paragraph4.Format.HorizontalAlignment = HorizontalAlignment.Justify
Note: Justified alignment is more visible with longer text - short phrases may not show the spacing adjustment.
Distributed alignment is similar to justified, but evenly distributes single-line text (e.g., unevenly spaced words or short phrases).
# Distributed
Paragraph5 = section.AddParagraph()
Paragraph5.AppendText("This is evenly distributed text.")
Paragraph5.Format.HorizontalAlignment = HorizontalAlignment.Distribute
Step 6: Save and Close the Document
Finally, save the document to a specified path and close the Document instance to free resources:
# Save the document
document.SaveToFile("TextAlignment.docx", FileFormat.Docx2016)
# Close the document to release memory
document.Close()
Output:

Pro Tip: Spire.Doc for Python also provides interfaces to align tables in Word or align text in table cells.
FAQs About Python Text Alignment
Q1: Is Spire.Doc for Python free?
A: Spire.Doc offers a free version with limitations. For full functionality, you can request a 30-day trial license here.
Q2: Can I set text alignment for existing Word documents
A: Yes. Spire.Doc lets you load existing documents and modify text alignment for specific paragraphs. Here’s a quick example:
from spire.doc import *
# Load an existing document
doc = Document()
doc.LoadFromFile("ExistingDocument.docx")
# Get the first section and first paragraph
section = doc.Sections[0]
paragraph = section.Paragraphs[0]
# Change alignment to center
paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
# Save the modified document
doc.SaveToFile("UpdatedDocument.docx", FileFormat.Docx2016)
doc.Close()
Q3: Can I apply different alignments to different parts of the same paragraph?
A: No. Text alignment is a paragraph-level setting in Word, not a character-level setting. This means all text within a single paragraph must share the same alignment (left, right, center, etc.).
If you need mixed alignment in the same line, you’ll need to use a table with invisible borders.
Q4: Can Spire.Doc for Python handle other text formatting?
A: Absolutely! Spire.Doc lets you combine alignment with other formatting like fonts, line spacing, bullet points, and more.
Conclusion
Automating Word text alignment with Python and Spire.Doc saves time, reduces human error, and ensures consistency across documents. The code example provided offers a clear template for implementing left, right, center, justified, and distributed alignment, and adapting it to your needs is as simple as modifying the text or adding more formatting rules.
Try experimenting with different alignment combinations, and explore Spire.Doc’s online documentation to unlock more formatting possibilities.
Read Word Document in C# .NET: Extract Text, Tables, Images

Word documents (.doc and .docx) are widely used in business, education, and professional workflows for reports, contracts, manuals, and other essential content. As a C# developer, you may find the need to programmatically read these files to extract information, analyze content, and integrate document data into applications.
In this complete guide, we will delve into the process of reading Word documents in C#. We will explore various scenarios, including:
- Extracting text, paragraphs, and formatting details
- Retrieving images and structured table data
- Accessing comments and document metadata
- Reading headers and footers for comprehensive document analysis
By the end of this guide, you will have a solid understanding of how to efficiently parse Word documents in C#, allowing your applications to access and utilize document content with accuracy and ease.
Table of Contents
- Set Up Your Development Environment for Reading Word Documents in C#
- Load Word Document (.doc/.docx) in C#
- Read and Extract Content from Word Document in C#
- Advanced Tips and Best Practices for Reading Word Documents in C#
- Conclusion
- FAQs
Set Up Your Development Environment for Reading Word Documents in C#
Before you can read Word documents in C#, it’s crucial to ensure that your development environment is properly set up. This section outlines the necessary prerequisites and step-by-step installation instructions to get you ready for seamless Word document handling.
Prerequisites
- Development Environment: Ensure you have Visual Studio or another compatible C# IDE installed.
- .NET Requirement: Ensure you have .NET Framework or .NET Core installed.
- Library Requirement: Spire.Doc for .NET, a versatile library that allows developers to:
- Create Word documents from scratch
- Edit and format existing Word documents
- Read and extract text, tables, images, and other content programmatically
- Convert Word documents to PDF, HTML, and other formats
- Work independently without requiring Microsoft Word installation
Install Spire.Doc
To incorporate Spire.Doc into your C# project, follow these steps to install it via NuGet:
- Open your project in Visual Studio.
- Right-click on your project in the Solution Explorer and select Manage NuGet Packages.
- In the Browse tab, search for "Spire.Doc" and click Install.
Alternatively, you can use the Package Manager Console with the following command:
PM> Install-Package Spire.Doc
This installation adds the necessary references, enabling you to programmatically work with Word documents.
Load Word Document (.doc/.docx) in C#
To begin, you need to load a Word document into your project. The following example demonstrates how to load a .docx or .doc file in C#:
using Spire.Doc;
using Spire.Doc.Documents;
using System;
namespace LoadWordExample
{
class Program
{
static void Main(string[] args)
{
// Specify the path of the Word document
string filePath = @"C:\Documents\Sample.docx";
// Create a Document object
using (Document document = new Document())
{
// Load the Word .docx or .doc document
document.LoadFromFile(filePath);
}
}
}
}
This code loads a Word file from the specified path into a Document object, which is the entry point for accessing all document elements.
Read and Extract Content from Word Document in C#
After loading the Word document into a Document object, you can access its contents programmatically. This section covers various methods for extracting different types of content effectively.
Extract Text
Extracting text is often the first step in reading Word documents. You can retrieve all text content using the built-in GetText() method:
using (StreamWriter writer = new StreamWriter("ExtractedText.txt", false, Encoding.UTF8))
{
// Get all text from the document
string allText = document.GetText();
// Write the entire text to a file
writer.Write(allText);
}
This method extracts all text, disregarding formatting and non-text elements like images.

Read Paragraphs and Formatting Information
When working with Word documents, it is often useful not only to access the text content of paragraphs but also to understand how each paragraph is formatted. This includes details such as alignment and spacing after the paragraph, which can affect layout and readability.
The following example demonstrates how to iterate through all paragraphs in a Word document and retrieve their text content and paragraph-level formatting in C#:
using (StreamWriter writer = new StreamWriter("Paragraphs.txt", false, Encoding.UTF8))
{
// Loop through all sections
foreach (Section section in document.Sections)
{
// Loop through all paragraphs in the section
foreach (Paragraph paragraph in section.Paragraphs)
{
// Get paragraph alignment
HorizontalAlignment alignment = paragraph.Format.HorizontalAlignment;
// Get spacing after paragraph
float afterSpacing = paragraph.Format.AfterSpacing;
// Write paragraph formatting and text to the file
writer.WriteLine($"[Alignment: {alignment}, AfterSpacing: {afterSpacing}]");
writer.WriteLine(paragraph.Text);
writer.WriteLine(); // Add empty line between paragraphs
}
}
}
This approach allows you to extract both the text and key paragraph formatting attributes, which can be useful for tasks such as document analysis, conditional processing, or preserving layout when exporting content.
Extract Images
Images embedded within Word documents play a vital role in conveying information. To extract these images, you will examine each paragraph's content, identify images (typically represented as DocPicture objects), and save them for further use:
// Create the folder if it does not exist
string imageFolder = "ExtractedImages";
if (!Directory.Exists(imageFolder))
Directory.CreateDirectory(imageFolder);
int imageIndex = 1;
// Loop through sections and paragraphs to find images
foreach (Section section in document.Sections)
{
foreach (Paragraph paragraph in section.Paragraphs)
{
foreach (DocumentObject obj in paragraph.ChildObjects)
{
if (obj is DocPicture picture)
{
// Save each image as a separate PNG file
string fileName = Path.Combine(imageFolder, $"Image_{imageIndex}.png");
picture.Image.Save(fileName, System.Drawing.Imaging.ImageFormat.Png);
imageIndex++;
}
}
}
}
This code saves all images in the document as separate PNG files, with options to choose other formats like JPEG or BMP.

Extract Table Data
Tables are commonly used to organize structured data, such as financial reports or survey results. To access this data, iterate through the tables in each section and retrieve the content of individual cells:
// Create a folder to store tables
string tableDir = "Tables";
if (!Directory.Exists(tableDir))
Directory.CreateDirectory(tableDir);
// Loop through each section
for (int sectionIndex = 0; sectionIndex < document.Sections.Count; sectionIndex++)
{
Section section = document.Sections[sectionIndex];
TableCollection tables = section.Tables;
// Loop through all tables in the section
for (int tableIndex = 0; tableIndex < tables.Count; tableIndex++)
{
ITable table = tables[tableIndex];
string fileName = Path.Combine(tableDir, $"Section{sectionIndex + 1}_Table{tableIndex + 1}.txt");
using (StreamWriter writer = new StreamWriter(fileName, false, Encoding.UTF8))
{
// Loop through each row
for (int rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++)
{
TableRow row = table.Rows[rowIndex];
// Loop through each cell
for (int cellIndex = 0; cellIndex < row.Cells.Count; cellIndex++)
{
TableCell cell = row.Cells[cellIndex];
// Loop through each paragraph in the cell
for (int paraIndex = 0; paraIndex < cell.Paragraphs.Count; paraIndex++)
{
writer.Write(cell.Paragraphs[paraIndex].Text.Trim() + " ");
}
// Add tab between cells
if (cellIndex < row.Cells.Count - 1) writer.Write("\t");
}
// Add newline after each row
writer.WriteLine();
}
}
}
}
This method allows efficient extraction of structured data, making it ideal for generating reports or integrating content into databases.

Read Comments
Comments are valuable for collaboration and feedback within documents. Extracting them is crucial for auditing and understanding the document's revision history.
The Document object provides a Comments collection, which allows you to access all comments in a Word document. Each comment contains one or more paragraphs, and you can extract their text for further processing or save them into a file.
using (StreamWriter writer = new StreamWriter("Comments.txt", false, Encoding.UTF8))
{
// Loop through all comments in the document
foreach (Comment comment in document.Comments)
{
// Loop through each paragraph in the comment
foreach (Paragraph p in comment.Body.Paragraphs)
{
writer.WriteLine(p.Text);
}
// Add empty line to separate different comments
writer.WriteLine();
}
}
This code retrieves the content of all comments and outputs it into a single text file.
Retrieve Document Metadata
Word documents contain metadata such as the title, author, and subject. These metadata items are stored as document properties, which can be accessed through the BuiltinDocumentProperties property of the Document object:
using (StreamWriter writer = new StreamWriter("Metadata.txt", false, Encoding.UTF8))
{
// Write built-in document properties to file
writer.WriteLine("Title: " + document.BuiltinDocumentProperties.Title);
writer.WriteLine("Author: " + document.BuiltinDocumentProperties.Author);
writer.WriteLine("Subject: " + document.BuiltinDocumentProperties.Subject);
}
Read Headers and Footers
Headers and footers frequently contain essential content like page numbers and titles. To programmatically access this information, iterate through each section's header and footer paragraphs and retrieve the text of each paragraph:
using (StreamWriter writer = new StreamWriter("HeadersFooters.txt", false, Encoding.UTF8))
{
// Loop through all sections
foreach (Section section in document.Sections)
{
// Write header paragraphs
foreach (Paragraph headerParagraph in section.HeadersFooters.Header.Paragraphs)
{
writer.WriteLine("Header: " + headerParagraph.Text);
}
// Write footer paragraphs
foreach (Paragraph footerParagraph in section.HeadersFooters.Footer.Paragraphs)
{
writer.WriteLine("Footer: " + footerParagraph.Text);
}
}
}
This method ensures that all recurring content is accurately captured during document processing.
Advanced Tips and Best Practices for Reading Word Documents in C#
To get the most out of programmatically reading Word documents, following these tips can help improve efficiency, reliability, and code maintainability:
- Use using Statements: Always wrap Document objects in using to ensure proper memory management.
- Check for Null or Empty Sections: Prevent errors by verifying sections, paragraphs, tables, or images exist before accessing them.
- Batch Reading Multiple Documents: Loop through a folder of Word files and apply the same extraction logic to each file. This helps automate workflows and consolidate extracted content efficiently.
Conclusion
Efficiently reading Word documents programmatically in C# involves handling various content types. With the techniques outlined in this guide, developers can:
- Load Word documents (.doc and .docx) with ease.
- Extract text, paragraphs, and formatting details for thorough analysis.
- Retrieve images, structured table data, and comments.
- Access headers, footers, and document metadata for complete insights.
FAQs
Q1: Can I read Word documents without installing Microsoft Word?
A1: Yes, libraries like Spire.Doc enable you to read and process Word files without requiring Microsoft Word installation.
Q2: Does this support both .doc and .docx formats?
A2: Absolutely, all methods discussed in this guide work seamlessly with both legacy (.doc) and modern (.docx) Word files.
Q3: Can I extract only specific sections of a document?
A3: Yes, by iterating through sections and paragraphs, you can selectively filter and extract the desired content.
Convert HTML to PDF in Python: Complete Guide with Code

Converting HTML to PDF in Python is a common need when you want to generate printable reports, preserve web content, or create offline documentation with consistent formatting. In this tutorial, you’ll learn how to convert HTML to PDF in Python— whether you're working with a local HTML file or a HTML string. If you're looking for a simple and reliable way to generate PDF files from HTML in Python, this guide is for you.
Install Spire.Doc to Convert HTML to PDF Easily
To convert HTML to PDF in Python, you’ll need a reliable library that supports HTML parsing and PDF rendering. Spire.Doc for Python is a powerful and easy-to-use HTML to PDF converter library that lets you generate PDF documents from HTML content — without relying on a browser, headless engine, or third-party tools.
Install via pip
You can install the library quickly with pip:
pip install spire.doc
Alternative: Manual Installation
You can also download the Spire.Doc package and perform a custom installation if you need more control over the environment.
Tip: Spire.Doc offers a free version suitable for small projects or evaluation purposes.
Once installed, you're ready to convert HTML to PDF in Python in just a few lines of code.
Convert HTML Files to PDF in Python
Spire.Doc for Python makes it easy to convert HTML files to PDF. The Document.LoadFromFile() method supports loading various file formats, including .html, .doc, and .docx. After loading an HTML file, you can convert it to PDF by calling Document.SaveToFile() method. Follow the steps below to convert an HTML file to PDF in Python using Spire.Doc.
Steps to convert an HTML file to PDF in Python:
- Create a Document object.
- Load an HTML file using Document.LoadFromFile() method.
- Convert it to PDF using Document.SaveToFile() method.
The following code shows how to convert an HTML file directly to PDF in Python:
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Load an HTML file
document.LoadFromFile("Sample.html", FileFormat.Html, XHTMLValidationType.none)
# Save the HTML file to a pdf file
document.SaveToFile("output/ToPdf.pdf", FileFormat.PDF)
document.Close()

Convert an HTML String to PDF in Python
If you want to convert an HTML string to PDF in Python, Spire.Doc for Python provides a straightforward solution. For simple HTML content like paragraphs, text styles, and basic formatting, you can use the Paragraph.AppendHTML() method to insert the HTML into a Word document. Once added, you can save the document as a PDF using the Document.SaveToFile() method.
Here are the steps to convert an HTML string to a PDF file in Python.
- Create a Document object.
- Add a section using Document.AddSection() method and insert a paragraph using Section.AddParagraph() method.
- Specify the HTML string and add it to the paragraph using Paragraph.AppendHTML() method.
- Save the document as a PDF file using Document.SaveToFile() method.
Here's the complete Python code that shows how to convert an HTML string to a PDF:
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Add a section to the document
sec = document.AddSection()
# Add a paragraph to the section
paragraph = sec.AddParagraph()
# Specify the HTML string
htmlString = """
<html>
<head>
<title>HTML to Word Example</title>
<style>
body {
font-family: Arial, sans-serif;
}
h1 {
color: #FF5733;
font-size: 24px;
margin-bottom: 20px;
}
p {
color: #333333;
font-size: 16px;
margin-bottom: 10px;
}
ul {
list-style-type: disc;
margin-left: 20px;
margin-bottom: 15px;
}
li {
font-size: 14px;
margin-bottom: 5px;
}
table {
border-collapse: collapse;
width: 100%;
margin-bottom: 20px;
}
th, td {
border: 1px solid #CCCCCC;
padding: 8px;
text-align: left;
}
th {
background-color: #F2F2F2;
font-weight: bold;
}
td {
color: #0000FF;
}
</style>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
<p>Here's an unordered list:</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<p>And here's a table:</p>
<table>
<tr>
<th>Name</th>
<th>Age</th>
<th>Gender</th>
</tr>
<tr>
<td>John Smith</td>
<td>35</td>
<td>Male</td>
</tr>
<tr>
<td>Jenny Garcia</td>
<td>27</td>
<td>Female</td>
</tr>
</table>
</body>
</html>
"""
# Append the HTML string to the paragraph
paragraph.AppendHTML(htmlString)
# Save the document as a pdf file
document.SaveToFile("output/HtmlStringToPdf.pdf", FileFormat.PDF)
document.Close()

Customize the Conversion from HTML to PDF in Python
While converting HTML to PDF in Python is often straightforward, there are times when you need more control over the output. For example, you may want to set a password to protect the PDF document, or embed fonts to ensure consistent formatting across different devices. In this section, you’ll learn how to customize the HTML to PDF conversion using Spire.Doc for Python.
1. Set a Password to Protect the PDF
To prevent unauthorized viewing or editing, you can encrypt the PDF by specifying a user password and an owner password.
# Create a ToPdfParameterList object
toPdf = ToPdfParameterList()
# Set PDF encryption passwords
userPassword = "viewer"
ownerPassword = "E-iceblue"
toPdf.PdfSecurity.Encrypt(userPassword, ownerPassword, PdfPermissionsFlags.Default, PdfEncryptionKeySize.Key128Bit)
# Save as PDF with password protection
document.SaveToFile("/HtmlToPdfWithPassword.pdf", toPdf)
2. Embed Fonts to Preserve Formatting
To ensure the PDF displays correctly across all devices, you can embed all fonts used in the document.
# Create a ToPdfParameterList object
ppl = ToPdfParameterList()
ppl.IsEmbeddedAllFonts = True
# Save as PDF with embedded fonts
document.SaveToFile("/HtmlToPdfWithEmbeddedFonts.pdf", ppl)
These options give you finer control when you convert HTML to PDF in Python, especially for professional document sharing or long-term storage scenarios.
The Conclusion
Converting HTML to PDF in Python becomes simple and flexible with Spire.Doc for Python. Whether you're handling static HTML files or dynamic HTML strings, or need to secure and customize your PDFs, this library provides everything you need — all in just a few lines of code. Get a free 30-day license and start converting HTML to high-quality PDF documents in Python today!
FAQs
Q1: Can I convert an HTML file to PDF in Python? Yes. Using Spire.Doc for Python, you can convert a local HTML file to PDF with just a few lines of code.
Q2: How do I convert HTML to PDF in Chrome? While Chrome allows manual "Save as PDF", it’s not suitable for batch or automated workflows. If you're working in Python, Spire.Doc provides a better solution for programmatically converting HTML to PDF.
Q3: How do I convert HTML to PDF without losing formatting? To preserve formatting:
- Use embedded or inline CSS (not external files).
- Use absolute URLs for images and resources.
- Embed fonts using Spire.Doc options like IsEmbeddedAllFonts(True).
How to Convert PDF to Text in Python (Free & Easy Guide)
Table of Contents
Install with Pip
pip install Spire.PDF
Related Links
Converting PDF files to editable text is a common need for researchers, analysts, and professionals who deal with large volumes of documents. Manual copying wastes time—Python offers a faster, more flexible solution. In this guide, you’ll learn how to convert PDF to text in Python efficiently, whether you want to keep the layout or extract specific content.

- Why Choose Spire.PDF for PDF to Text
- General Workflow for PDF to Text in Python
- Convert PDF to Text in Python Without Layout
- Convert PDF to Text in Python With Layout
- Convert a Specific PDF Page to Text
- To Wrap Up
- FAQs
Getting Started: Why Choose Spire.PDF for PDF to Text in Python
To convert PDF files to text using Python, you’ll need a reliable PDF processing library. Spire.PDF for Python is a powerful and developer-friendly API that allows you to read, edit, and convert PDF documents in Python applications — no need for Adobe Acrobat or other third-party software.
This library is ideal for automating PDF workflows such as extracting text, adding annotations, or merging and splitting files. It supports a wide range of PDF features and works seamlessly in both desktop and server environments. You can donwload it to install mannually or quickly install Spire.PDF via PyPI using the following command:
pip install Spire.PDF
For smaller or personal projects, a free version is available with basic functionality. If you need advanced features such as PDF signing or form filling, you can upgrade to the commercial edition at any time.
General Workflow for PDF to Text in Python
Converting a PDF to text becomes simple and efficient with the help of Spire.PDF for Python. You can easily complete the task by reusing the sample code provided in the following sections and customizing it to fit your needs. But before diving into the code, let’s take a quick look at the general workflow behind this process.
- Create an object of PdfDocument class and load a PDF file using LoadFromFile() method.
- Create an object of PdfTextExtractOptions class and set the text extracting options, including extracting all text, showing hidden text, only extracting text in a specified area, and simple extraction.
- Get a page in the document using PdfDocument.Pages.get_Item() method and create PdfTextExtractor objects based on each page to extract the text from the page using Extract() method with specified options.
- Save the extracted text as a text file and close the object.
How to Convert PDF to Text in Python Without Layout
If you only need the plain text content from a PDF and don’t care about preserving the original layout, you can use a simple method to extract text. This approach is faster and easier, especially when working with scanned documents or large batches of files. In this section, we’ll show you how to convert PDF to text in Python without preserving the layout.
To extract text without preserving layout, follow these simplified steps:
- Create an instance of PdfDocument and load the PDF file.
- Create a PdfTextExtractOptions object and configure the text extraction options.
- Set IsSimpleExtraction = True to ignore the layout and extract raw text.
- Loop through all pages of the PDF.
- Extract text from each page and write it to a .txt file.
from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Create a string object to store the text
extracted_text = ""
# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Set to use simple extraction method
extract_options.IsSimpleExtraction = True
# Loop through the pages in the document
for i in range(pdf.Pages.Count):
# Get a page
page = pdf.Pages.get_Item(i)
# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)
# Extract the text from the page
text = text_extractor.ExtractText(extract_options)
# Add the extracted text to the string object
extracted_text += text
# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
file.write(extracted_text)
pdf.Close()

How to Convert PDF to Text in Python With Layout
To convert PDF to text in Python with layout, Spire.PDF preserves formatting like tables and paragraphs by default. The steps are similar to the general overview, but you still need to loop through each page for full-text extraction.
from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Create a string object to store the text
extracted_text = ""
# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Loop through the pages in the document
for i in range(pdf.Pages.Count):
# Get a page
page = pdf.Pages.get_Item(i)
# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)
# Extract the text from the page
text = text_extractor.ExtractText(extract_options)
# Add the extracted text to the string object
extracted_text += text
# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
file.write(extracted_text)
pdf.Close()

Convert a Specific PDF Page to Text in Python
Need to extract text from only one page of a PDF instead of the entire document? With Spire.PDF, the PDF to Text converter in Python, you can easily target and convert a specific PDF page to text. The steps are the same as shown in the general overview. If you're already familiar with them, just copy the code below into any Python editor and automate your PDF to text conversion!
from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
from spire.pdf import RectangleF
# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Set to extract specific page area
extract_options.ExtractArea = RectangleF(50.0, 220.0, 700.0, 230.0)
# Get a page
page = pdf.Pages.get_Item(0)
# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)
# Extract the text from the page
extracted_text = text_extractor.ExtractText(extract_options)
# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
file.write(extracted_text)
pdf.Close()

To Wrap Up
In this post, we covered how to convert PDF to text using Python and Spire.PDF, with clear steps and code examples for fast, efficient conversion. We also highlighted the benefits and pointed to OCR tools for image-based PDFs. For any issues or support, feel free to contact us.
FAQs about Converting PDF to Text
Q1: How do I convert a PDF to readable and editable text in Python?
A: You can convert a PDF to text in Python using the Spire.PDF library. It allows you to extract text from PDF files while optionally keeping the original layout. You don’t need Adobe Acrobat, and both visible and image-based PDFs are supported.
Q2: Is there a free tool to convert PDF to text?
A: Yes. Spire.PDF for Python provides a free edition that allows you to convert PDF to text without relying on Adobe Acrobat or other software. Online tools are also available, but they’re more suitable for occasional use or small files.
Q3: Can Python extract data from PDF? A: Yes, Python can extract data from PDF files. Using Spire.PDF, you can easily extract not only text but also other elements such as images, annotations, bookmarks, and even attachments. This makes it a versatile tool for working with PDF content in Python.
SEE ALSO:
How to Merge Excel Files in Python (.xls & .xlsx) - No Excel Required

Merging Excel files is a common task for data analysts and financial teams working with large datasets. While Microsoft Excel supports manual merging, it becomes inefficient and error-prone when dealing with large volumes of files.
In this step-by-step guide, you will learn how to merge multiple Excel files (.xls and .xlsx) using Python and Spire.XLS for Python library. Whether you're combining workbooks, merging worksheets, or automating bulk Excel file processing, this guide will help you save time and streamline your workflow with practical solutions.
Table of Contents
- Why Merge Excel Files with Python?
- Getting Started with Spire.XLS for Python
- How to Merge Multiple Excel Files into One Workbook using Python
- How to Combine Multiple Excel Worksheets into a Single Worksheet using Python
- Conclusion
- FAQs: Merge Excel Files with Python
Why Merge Excel Files with Python?
Using Python to merge Excel files brings several key advantages:
- Automation: Save time and eliminate repetitive manual work by automating the merging process.
- No Excel Dependency: Merge files without installing Microsoft Excel—ideal for headless, server-side, or cloud environments.
- Flexible Merging: Customize merging by selecting specific sheets, ranges, columns, or rows.
- Scalability: Handle hundreds or even thousands of Excel files with consistent performance.
- Error Reduction: Reduce manual errors and ensure data accuracy with automated scripts.
Whether you’re consolidating monthly reports or merging large datasets, Python helps streamline the process efficiently.
Getting Started with Spire.XLS for Python
Spire.XLS for Python is a standalone library that allows developers to create, read, edit, and save Excel files without the need for Microsoft Excel installation.
Key Features Include:
- Supports Multiple Formats: .xls, .xlsx, and more.
- Worksheet Operations: Copy, rename, delete, and merge worksheets seamlessly across workbooks.
- Formula & Formatting Preservation: Retain formulas and formatting during editing or merging.
- Advanced Features: Includes chart creation, conditional formatting, pivot tables, and more.
- File Conversion: Convert Excel files to PDF, HTML, CSV, and more.
Installation
Run the following pip command in your terminal or command prompt to install Spire.XLS from PyPI:
pip install spire.xls
How to Merge Multiple Excel Files into One Workbook using Python
When working with multiple Excel files, consolidating all worksheets into a single workbook can simplify data management and reporting. This approach preserves each original worksheet separately, making it easy to organize and review data from different sources such as department budgets, regional reports, or monthly summaries.
Steps
To merge multiple Excel files into a single workbook using Python, follow these steps:
- Loop through the files.
- Load each Excel file using LoadFromFile().
- For the first file, assign it as the base workbook.
- For subsequent files, copy all worksheets into the base workbook using AddCopy().
- Save the final combined workbook to a new file.
Code Example
import os
from spire.xls import *
# Folder containing Excel files to merge
input_folder = './sample_files'
# Output file name for the merged workbook
output_file = 'merged_workbook.xlsx'
# Initialize merged workbook as None
merged_workbook = None
# Iterate over all files in the input folder
for filename in os.listdir(input_folder):
# Process only Excel files with .xls or .xlsx extensions
if filename.endswith('.xlsx') or filename.endswith('.xls'):
file_path = os.path.join(input_folder, filename)
# Load the current Excel file into a Workbook object
source_workbook = Workbook()
source_workbook.LoadFromFile(file_path)
if merged_workbook is None:
# For the first file, assign it as the base merged workbook
merged_workbook = source_workbook
else:
# For subsequent files, copy each worksheet into the merged workbook
for i in range(source_workbook.Worksheets.Count):
sheet = source_workbook.Worksheets.get_Item(i)
merged_workbook.Worksheets.AddCopy(sheet, WorksheetCopyType.CopyAll)
# Save the combined workbook to the specified output file
merged_workbook.SaveToFile(output_file, ExcelVersion.Version2016)

How to Combine Multiple Excel Worksheets into a Single Worksheet using Python
Merging data from multiple Excel worksheets into one worksheet allows you to aggregate information efficiently, especially when working with data such as sales logs, survey responses, or performance reports.
Steps
To combine worksheet data from multiple Excel files into a single worksheet using Python, follow these steps:
- Create a new workbook and select its first worksheet as the destination.
- Loop through the files.
- Load each Excel file using LoadFromFile().
- Get the desired worksheet that you want to merge from the current file.
- Copy the used cell range from the desired worksheet to the destination worksheet, placing data consecutively below the previously copied content.
- Save the combined data into a new Excel file.
Code Example
import os
from spire.xls import *
# Folder containing Excel files to merge
input_folder = './excel_worksheets'
# Output file name for the merged workbook
output_file = 'merged_into_one_sheet.xlsx'
# Create a new workbook to hold merged data
merged_workbook = Workbook()
# Use the first worksheet in the new workbook as the merge target
merged_sheet = merged_workbook.Worksheets[0]
# Initialize the starting row for copying data
current_row = 1
# Loop through all files in the input folder
for filename in os.listdir(input_folder):
# Process only Excel files (.xls or .xlsx)
if filename.endswith('.xlsx') or filename.endswith('.xls'):
file_path = os.path.join(input_folder, filename)
# Load the current Excel file
workbook = Workbook()
workbook.LoadFromFile(file_path)
# Get the first worksheet from the current workbook
sheet = workbook.Worksheets[0]
# Get the used range from the first worksheet
source_range = sheet.Range
# Set the destination range in the merged worksheet starting at current_row
dest_range = merged_sheet.Range[current_row, 1]
# Copy data from the used range to the destination range
source_range.Copy(dest_range)
# Update current_row to the row after the last copied row to prevent overlap
current_row += sheet.LastRow
# Save the merged workbook to the specified output file in Excel 2016 format
merged_workbook.SaveToFile(output_file, ExcelVersion.Version2016)

Conclusion
When merging multiple Excel files into a single document—whether by appending sheets or combining data row by row—using a Python library like Spire.XLS enables automation and improves accuracy. This approach can help streamline workflows, especially in enterprise scenarios that require handling large datasets without relying on Microsoft Excel.
FAQs: Merge Excel Files with Python
Q1: Can I merge .xls and .xlsx files together?
A1: Yes. Spire.XLS handles both formats without needing conversion.
Q2: Do I need Excel installed on my machine to use Spire.XLS?
A2: No. Spire.XLS is standalone and works without Microsoft Office installed.
Q3: Can I merge only specific sheets from each workbook?
A3: Yes. You can customize your code to merge sheets by name or index. For example:
sheet = source_workbook.Worksheets["Summary"]
Q4: How do I avoid copying header rows multiple times?
A4: Add logic like:
if current_row > 1:
start_row = 2 # Skip header
else:
start_row = 1
Q5: Can I keep track of which file each row came from?
A5: Yes. Add a new column in the merged sheet containing the source file name for each row.
Q6: Is there a file size or row limit when using Spire.XLS?
A6: Spire.XLS follows the same row and column limits as Excel: .xlsx supports up to 1,048,576 rows × 16,384 columns, and .xls supports up to 65,536 rows × 256 columns.
Q7: Can I preserve formulas and formatting while merging?
A7: Yes. When merging Excel files, formatting and formulas are preserved.
Adding Watermarks to PDF Files Using Python

Watermarking is a critical technique for securing documents, indicating ownership, and preventing unauthorized copying. Whether you're distributing drafts or branding final deliverables, applying watermarks helps protect your content effectively. In this tutorial, you’ll learn how to add watermarks to a PDF in Python using the powerful and easy-to-use Spire.PDF for Python library.
We'll walk through how to insert both text and image watermarks , handle transparency and positioning, and resolve common issues — all with clean, well-documented code examples.
Table of Contents:
- Python Library for Watermarking PDFs
- Adding a Text Watermark to a PDF
- Adding an Image Watermark to a PDF
- Troubleshooting Common Issues
- Wrapping Up
- FAQs
Python Library for Watermarking PDFs
Spire.PDF for Python is a robust library that provides comprehensive PDF manipulation capabilities. For watermarking specifically, it offers:
- High precision in watermark placement and rotation.
- Flexible transparency controls.
- Support for both text and image watermarks.
- Ability to apply watermarks to specific pages or entire documents.
- Preservation of original PDF quality.
Before proceeding, ensure you have Spire.PDF installed in your Python environment:
pip install spire.pdf
Adding a Text Watermark to a PDF
This code snippet demonstrates how to add a diagonal "DO NOT COPY" watermark to each page of a PDF file. It manages the size, color, positioning, rotation, and transparency of the watermark for a professional result.
from spire.pdf import *
from spire.pdf.common import *
import math
# Create an object of PdfDocument class
doc = PdfDocument()
# Load a PDF document from the specified path
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")
# Create an object of PdfTrueTypeFont class for the watermark font
font = PdfTrueTypeFont("Times New Roman", 48.0, 0, True)
# Specify the watermark text
text = "DO NOT COPY"
# Measure the dimensions of the text to ensure proper positioning
text_width = font.MeasureString(text).Width
text_height = font.MeasureString(text).Height
# Loop through each page in the document
for i in range(doc.Pages.Count):
# Get the current page
page = doc.Pages.get_Item(i)
# Save the current canvas state
state = page.Canvas.Save()
# Calculate the center coordinates of the page
x = page.Canvas.Size.Width / 2
y = page.Canvas.Size.Height / 2
# Translate the coodinate system to the center so that the center of the page becomes the origin (0, 0)
page.Canvas.TranslateTransform(x, y)
# Rotate the canvas 45 degrees counterclockwise for the watermark
page.Canvas.RotateTransform(-45.0)
# Set the transparency of the watermark
page.Canvas.SetTransparency(0.7)
# Draw the watermark text at the centered position using negative offsets
page.Canvas.DrawString(text, font, PdfBrushes.get_Blue(), PointF(-text_width / 2, -text_height / 2))
# Restore the canvas state to prevent transformations from affecting subsequent drawings
page.Canvas.Restore(state)
# Save the modified document to a new PDF file
doc.SaveToFile("output/TextWatermark.pdf")
# Dispose resources
doc.Dispose()
Breakdown of the Code :
- Load the PDF Document : The script loads an input PDF file from a specified path using the PdfDocument class.
- Configure Watermark Text : A watermark text ("DO NOT COPY") is set with a specific font (Times New Roman, 48pt) and measured for accurate positioning.
- Apply Transformations : For each page, the script:
- Centers the coordinate system.
- Rotates the canvas by 45 degrees counterclockwise.
- Sets transparency (70%) for the watermark.
- Draw the Watermark : The text is drawn at (-text_width / 2, -text_height / 2), which aligns the text perfectly around the center point of the page, regardless of the rotation applied.
- Save the Document : The modified document is saved to a new PDF file.
Output:

Adding an Image Watermark to a PDF
This code snippet adds a semi-transparent image watermark to each page of a PDF, ensuring proper positioning and a professional appearance.
from spire.pdf import *
from spire.pdf.common import *
# Create an object of PdfDocument class
doc = PdfDocument()
# Load a PDF document from the specified path
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")
# Load the watermark image from the specified path
image = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\logo.png")
# Get the width and height of the loaded image for positioning
imageWidth = float(image.Width)
imageHeight = float(image.Height)
# Loop through each page in the document to apply the watermark
for i in range(doc.Pages.Count):
# Get the current page
page = doc.Pages.get_Item(i)
# Set the transparency of the watermark to 50%
page.Canvas.SetTransparency(0.5)
# Get the dimensions of the current page
pageWidth = page.ActualSize.Width
pageHeight = page.ActualSize.Height
# Calculate the x and y coordinates to center the image on the page
x = (pageWidth - imageWidth) / 2
y = (pageHeight - imageHeight) / 2
# Draw the image at the calculated center position on the page
page.Canvas.DrawImage(image, x, y, imageWidth, imageHeight)
# Save the modified document to a new PDF file
doc.SaveToFile("output/ImageWatermark.pdf")
# Dispose resources
doc.Dispose()
Breakdown of the Code :
- Load the PDF Document : The script loads an input PDFfile from a specified path using the PdfDocument class.
- Configure Watermark Image : The watermark image is loaded from a specified path, and its dimensions are retrieved for accurate positioning.
- Apply Transformations : For each page, the script:
- Sets the watermark transparency (50%).
- Calculates the center position of the page for the watermark.
- Draw the Watermark : The image is drawn at the calculated center coordinates, ensuring it is centered on each page.
- Save the Document : The modified document is saved to a new PDF file.
Output:

Apart from watermarks, you can also add stamps to PDFs. Unlike watermarks, which are fixed in place, stamps can be freely moved or deleted, offering greater flexibility in document annotation.
Troubleshooting Common Issues
- Watermark Not Appearing :
- Verify file paths are correct.
- Check transparency isn't set to 0 (fully transparent).
- Ensure coordinates place the watermark within page bounds.
- Quality Issues :
- For text, use higher-quality fonts.
- For images, ensure adequate resolution.
- Rotation Problems :
- Remember that rotation occurs around the current origin point.
- The order of transformations matters (translate then rotate).
Wrapping Up
With Spire.PDF for Python, adding watermarks to PDF documents becomes a simple and powerful process. Whether you need bold "Confidential" text across every page or subtle branding with logos, the library handles it all efficiently. By combining coordinate transformations, transparency settings, and drawing commands, you can create highly customized watermarking workflows tailored to your document's purpose.
FAQs
Q1. Can I add both text and image watermarks to the same PDF?
Yes, you can combine both approaches in a single loop over the PDF pages.
Q2. How can I rotate image watermarks?
Use Canvas.RotateTransform(angle) before drawing the image, similar to the text watermark example.
Q3. Does Spire.PDF support transparent PNGs for watermarks?
Yes, Spire.PDF preserves the transparency of PNG images when used as watermarks.
Q4. Can I apply different watermarks to different pages?
Absolutely. You can implement conditional logic within your page loop to apply different watermarks based on page number or other criteria.
Get a Free License
To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.
C++: Create Tables in Word Documents
A table is a powerful tool for organizing and presenting data. It arranges data into rows and columns, making it easier for authors to illustrate the relationships between different data categories and for readers to understand and analyze complex data. In this article, you will learn how to programmatically create tables in Word documents in C++ using Spire.Doc for C++.
Install Spire.Doc for C++
There are two ways to integrate Spire.Doc for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.
Integrate Spire.Doc for C++ in a C++ Application
Create a Table in Word in C++
Spire.Doc for C++ offers the Section->AddTable() method to add a table to a section of a Word document. The detailed steps are as follows:
- Initialize an instance of the Document class.
- Add a section to the document using Document->AddSection() method.
- Define the data for the header row and remaining rows, storing them in a one-dimensional vector and a two-dimensional vector respectively.
- Add a table to the section using Section->AddTable() method.
- Specify the number of rows and columns in the table using Table->ResetCells(int, int) method.
- Add data in the one-dimensional vector to the header row and set formatting.
- Add data in the two-dimensional vector to the remaining rows and set formatting.
- Save the result document using Document->SaveToFile() method.
- C++
#include "Spire.Doc.o.h"
using namespace Spire::Doc;
using namespace std;
int main()
{
//Initialize an instance of the Document class
intrusive_ptr<Document> doc = new Document();
//Add a section to the document
intrusive_ptr<Section> section = doc->AddSection();
//Set page margins for the section
section->GetPageSetup()->GetMargins()->SetAll(72);
//Define the data for the header row
vector<wstring> header = { L"Name", L"Capital", L"Continent", L"Area", L"Population" };
//Define the data for the remaining rows
vector<vector<wstring>> data =
{
{L"Argentina", L"Buenos Aires", L"South America", L"2777815", L"32300003"},
{L"Bolivia", L"La Paz", L"South America", L"1098575", L"7300000"},
{L"Brazil", L"Brasilia", L"South America", L"8511196", L"150400000"},
{L"Canada", L"Ottawa", L"North America", L"9976147", L"26500000"},
{L"Chile", L"Santiago", L"South America", L"756943", L"13200000"},
{L"Colombia", L"Bogota", L"South America", L"1138907", L"33000000"},
{L"Cuba", L"Havana", L"North America", L"114524", L"10600000"},
{L"Ecuador", L"Quito", L"South America", L"455502", L"10600000"},
{L"El Salvador", L"San Salvador", L"North America", L"20865", L"5300000"},
{L"Guyana", L"Georgetown", L"South America", L"214969", L"800000"},
{L"Jamaica", L"Kingston", L"North America", L"11424", L"2500000"},
{L"Mexico", L"Mexico City", L"North America", L"1967180", L"88600000"},
{L"Nicaragua", L"Managua", L"North America", L"139000", L"3900000"},
{L"Paraguay", L"Asuncion", L"South America", L"406576", L"4660000"},
{L"Peru", L"Lima", L"South America", L"1285215", L"21600000"},
{L"United States", L"Washington", L"North America", L"9363130", L"249200000"},
{L"Uruguay", L"Montevideo", L"South America", L"176140", L"3002000"},
{L"Venezuela", L"Caracas", L"South America", L"912047", L"19700000"}
};
//Add a table to the section
intrusive_ptr<Table> table = section->AddTable(true);
//Specify the number of rows and columns for the table
table->ResetCells(data.size() + 1, header.size());
//Set the first row as the header row
intrusive_ptr<TableRow> row = table->GetRows()->GetItemInRowCollection(0);
row->SetIsHeader(true);
//Set height and background color for the header row
row->SetHeight(20);
row->SetHeightType(TableRowHeightType::Exactly);
for (int i = 0; i < row->GetCells()->GetCount(); i++)
{
row->GetCells()->GetItemInCellCollection(i)->GetCellFormat()->GetShading()->SetBackgroundPatternColor(Color::FromArgb(142, 170, 219));
}
//Add data to the header row and set formatting
for (size_t i = 0; i < header.size(); i++)
{
//Add a paragraph
intrusive_ptr<Paragraph> p1 = row->GetCells()->GetItemInCellCollection(i)->AddParagraph();
//Set alignment
p1->GetFormat()->SetHorizontalAlignment(HorizontalAlignment::Center);
row->GetCells()->GetItemInCellCollection(i)->GetCellFormat()->SetVerticalAlignment(VerticalAlignment::Middle);
//Add data
intrusive_ptr<TextRange> tR1 = p1->AppendText(header[i].c_str());
//Set data formatting
tR1->GetCharacterFormat()->SetFontName(L"Calibri");
tR1->GetCharacterFormat()->SetFontSize(12);
tR1->GetCharacterFormat()->SetBold(true);
}
//Add data to the remaining rows and set formatting
for (size_t r = 0; r < data.size(); r++)
{
//Set height for the remaining rows
intrusive_ptr<TableRow> dataRow = table->GetRows()->GetItemInRowCollection(r + 1);
dataRow->SetHeight(20);
dataRow->SetHeightType(TableRowHeightType::Exactly);
for (size_t c = 0; c < data[r].size(); c++)
{
//Add a paragraph
intrusive_ptr<Paragraph> p2 = dataRow->GetCells()->GetItemInCellCollection(c)->AddParagraph();
//Set alignment
dataRow->GetCells()->GetItemInCellCollection(c)->GetCellFormat()->SetVerticalAlignment(VerticalAlignment::Middle);
//Add data
intrusive_ptr<TextRange> tR2 = p2->AppendText(data[r][c].c_str());
//Set data formatting
tR2->GetCharacterFormat()->SetFontName(L"Calibri");
tR2->GetCharacterFormat()->SetFontSize(11);
}
}
//Save the result document
doc->SaveToFile(L"CreateTable.docx", FileFormat::Docx2013);
doc->Close();
}

Create a Nested Table in Word in C++
Spire.Doc for C++ offers the TableCell->AddTable() method to add a nested table to a specific table cell. The detailed steps are as follows:
- Initialize an instance of the Document class.
- Add a section to the document using Document->AddSection() method.
- Add a table to the section using Section.AddTable() method.
- Specify the number of rows and columns in the table using Table->ResetCells(int, int) method.
- Get the rows of the table and add data to the cells of each row.
- Add a nested table to a specific table cell using TableCell->AddTable() method.
- Specify the number of rows and columns in the nested table.
- Get the rows of the nested table and add data to the cells of each row.
- Save the result document using Document->SaveToFile() method.
- C++
#include "Spire.Doc.o.h"
using namespace Spire::Doc;
using namespace std;
int main()
{
//Initialize an instance of the Document class
intrusive_ptr<Document> doc = new Document();
//Add a section to the document
intrusive_ptr<Section> section = doc->AddSection();
//Set page margins for the section
section->GetPageSetup()->GetMargins()->SetAll(72);
//Add a table to the section
intrusive_ptr<Table> table = section->AddTable(true);
//Set the number of rows and columns in the table
table->ResetCells(2, 2);
//Autofit the table width to window
table->AutoFit(AutoFitBehaviorType::AutoFitToWindow);
//Get the table rows
intrusive_ptr<TableRow> row1 = table->GetRows()->GetItemInRowCollection(0);
intrusive_ptr<TableRow> row2 = table->GetRows()->GetItemInRowCollection(1);
//Add data to cells of the table
intrusive_ptr<TableCell> cell1 = row1->GetCells()->GetItemInCellCollection(0);
intrusive_ptr<TextRange> tR = cell1->AddParagraph()->AppendText(L"Product");
tR->GetCharacterFormat()->SetFontSize(13);
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> cell2 = row1->GetCells()->GetItemInCellCollection(1);
tR = cell2->AddParagraph()->AppendText(L"Description");
tR->GetCharacterFormat()->SetFontSize(13);
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> cell3 = row2->GetCells()->GetItemInCellCollection(0);
cell3->AddParagraph()->AppendText(L"Spire.Doc for C++");
intrusive_ptr<TableCell> cell4 = row2->GetCells()->GetItemInCellCollection(1);
cell4->AddParagraph()->AppendText(L"Spire.Doc for C++ is a professional Word "
L"library specifically designed for developers to create, "
L"read, write and convert Word documents in C++ "
L"applications with fast and high-quality performance.");
//Add a nested table to the fourth cell
intrusive_ptr<Table> nestedTable = cell4->AddTable(true);
//Set the number of rows and columns in the nested table
nestedTable->ResetCells(3, 2);
//Autofit the table width to content
nestedTable->AutoFit(AutoFitBehaviorType::AutoFitToContents);
//Get table rows
intrusive_ptr<TableRow> nestedRow1 = nestedTable->GetRows()->GetItemInRowCollection(0);
intrusive_ptr<TableRow> nestedRow2 = nestedTable->GetRows()->GetItemInRowCollection(1);
intrusive_ptr<TableRow> nestedRow3 = nestedTable->GetRows()->GetItemInRowCollection(2);
//Add data to cells of the nested table
intrusive_ptr<TableCell> nestedCell1 = nestedRow1->GetCells()->GetItemInCellCollection(0);
tR = nestedCell1->AddParagraph()->AppendText(L"Item");
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> nestedCell2 = nestedRow1->GetCells()->GetItemInCellCollection(1);
tR = nestedCell2->AddParagraph()->AppendText(L"Price");
tR->GetCharacterFormat()->SetBold(true);
intrusive_ptr<TableCell> nestedCell3 = nestedRow2->GetCells()->GetItemInCellCollection(0);
nestedCell3->AddParagraph()->AppendText(L"Developer Subscription");
intrusive_ptr<TableCell> nestedCell4 = nestedRow2->GetCells()->GetItemInCellCollection(1);
nestedCell4->AddParagraph()->AppendText(L"$999");
intrusive_ptr<TableCell> nestedCell5 = nestedRow3->GetCells()->GetItemInCellCollection(0);
nestedCell5->AddParagraph()->AppendText(L"Developer OEM Subscription");
intrusive_ptr<TableCell> nestedCell6 = nestedRow3->GetCells()->GetItemInCellCollection(1);
nestedCell6->AddParagraph()->AppendText(L"$2999");
//Save the result document
doc->SaveToFile(L"CreateNestedTable.docx", FileFormat::Docx2013);
doc->Close();
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
3 Efficient Methods to Write Data to Excel in Java

Looking to automate Excel data entry in Java? Manually inputting data into Excel worksheets is time-consuming and error-prone, especially when dealing with large datasets. The good news is that with the right Java Excel library, you can streamline this process. This comprehensive guide explores three efficient methods to write data to Excel in Java using the powerful Spire.XLS for Java library, covering basic cell-by-cell entries, bulk array inserts, and DataTable exports.
- Prerequisites: Setup & Installation
- 3 Ways to Write Data to Excel using Java
- Performance Tips for Large Datasets
- Frequently Asked Questions
- Final Thoughts
Prerequisites: Setup & Installation
Before you start, you’ll need to add Spire.XLS for Java to your project. Here’s how to do it quickly:
Option 1: Download the JAR File
- Visit the Spire.XLS for Java download page.
- Download the latest JAR file.
- Add the JAR to your project’s build path.
Option 2: Use Maven
If you’re using Maven, add the following repository and dependency to your pom.xml file. This automatically downloads and integrates the library:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.xls</artifactId>
<version>15.7.7</version>
</dependency>
3 Ways to Write Data to Excel using Java
Spire.XLS for Java offers flexible methods to write data, tailored to different scenarios. Let’s explore each with complete code samples, explanations, and use cases.
1. Write Text or Numbers to Excel Cells
Need to populate individual cells with text or numbers? Spire.XLS lets you directly target a specific cell using row/column indices (e.g., (2,1) for row 2, column 1) or Excel-style references (e.g., "A1", "B3"):
How It Works:
- Use the Worksheet.get(int row, int column) or Worksheet.get(String name) method to access a specific Excel cell.
- Use the setValue() method to write a text value to the cell.
- Use the setNumberValue() method to write a numeric value to the cell.
**Java code to write data to Excel: **
import com.spire.xls.*;
public class WriteToCells {
public static void main(String[] args) {
// Create a Workbook object
Workbook workbook = new Workbook();
// Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
// Write data to specific cells
worksheet.get("A1").setValue("Name");
worksheet.get("B1").setValue("Age");
worksheet.get("C1").setValue("Department");
worksheet.get("D1").setValue("Hiredate");
worksheet.get(2,1).setValue("Hazel");
worksheet.get(2,2).setNumberValue(29);
worksheet.get(2,3).setValue("Marketing");
worksheet.get(2,4).setValue("2019-07-01");
worksheet.get(3,1).setValue("Tina");
worksheet.get(3,2).setNumberValue(31);
worksheet.get(3,3).setValue("Technical Support");
worksheet.get(3,4).setValue("2015-04-27");
// Autofit column widths
worksheet.getAllocatedRange().autoFitColumns();
// Apply a style to the first row
CellStyle style = workbook.getStyles().addStyle("newStyle");
style.getFont().isBold(true);
worksheet.getRange().get(1,1,1,4).setStyle(style);
// Save to an Excel file
workbook.saveToFile("output/WriteToCells.xlsx", ExcelVersion.Version2016);
}
}
When to use this: Small datasets where you need precise control over cell placement (e.g., adding a title, single-row entries).

2. Write Arrays to Excel Worksheets
For bulk data, writing arrays (1D or 2D) is far more efficient than updating cells one by one. Spire.XLS for Java allows inserting arrays into a contiguous cell range.
insertArray() Method Explained:
The insertArray() method handles 1D arrays (single rows) and 2D arrays (multiple rows/columns) effortlessly. Its parameters are:
- Object[] array/ Object[][] array: The 1D or 2D array containing data to insert.
- int firstRow: The starting row index (1-based).
- int firstColumn: The starting column index (1-based).
- boolean isVertical: A boolean indicating the insertion direction:
- false: Insert horizontally (left to right).
- true: Insert vertically (top to bottom).
**Java code to insert arrays into Excel: **
import com.spire.xls.*;
public class WriteArrayToWorksheet {
public static void main(String[] args) {
// Create a Workbook instance
Workbook workbook = new Workbook();
// Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
// Create a one-dimensional array
Object[] oneDimensionalArray = {"January", "February", "March", "April","May", "June"};
// Write the array to the first row of the worksheet
worksheet.insertArray(oneDimensionalArray, 1, 1, false);
// Create a two-dimensional array
Object[][] twoDimensionalArray = {
{"Name", "Age", "Sex", "Dept.", "Tel."},
{"John", "25", "Male", "Development","654214"},
{"Albert", "24", "Male", "Support","624847"},
{"Amy", "26", "Female", "Sales","624758"}
};
// Write the array to the worksheet starting from the cell A3
worksheet.insertArray(twoDimensionalArray, 3, 1);
// Autofit column width in the located range
worksheet.getAllocatedRange().autoFitColumns();
// Apply a style to the first and the third row
CellStyle style = workbook.getStyles().addStyle("newStyle");
style.getFont().isBold(true);
worksheet.getRange().get(1,1,1,6).setStyle(style);
worksheet.getRange().get(3,1,3,6).setStyle(style);
// Save to an Excel file
workbook.saveToFile("WriteArrays.xlsx", ExcelVersion.Version2016);
}
}
When to use this: Sequential data (e.g., inventory logs, user lists) that needs bulk insertion.

3. Write DataTable to Excel
If your data is stored in a DataTable (e.g., from a database), Spire.XLS lets you directly export it to Excel with insertDataTable(), preserving structure and column headers.
insertDataTable() Method Explained:
The insertDataTable() method is a sophisticated bulk-insert operation designed specifically for transferring structured data collections into Excel. Its parameters are:
- DataTable dataTable: The DataTable object containing the data to insert.
- boolean columnHeaders: A boolean indicating whether to include column names from the DataTable as headers in Excel.
- true: Inserts column names as the first row.
- false: Skips column names; data starts from the first row.
- int firstRow: The starting row index (1-based).
- int firstColumn: The starting column index (1-based).
- boolean transTypes: A boolean indicating whether to preserve data types.
Java code to export DataTable to Excel:
import com.spire.xls.*;
import com.spire.xls.data.table.DataRow;
import com.spire.xls.data.table.DataTable;
public class WriteDataTableToWorksheet {
public static void main(String[] args) throws Exception {
// Create a Workbook instance
Workbook workbook = new Workbook();
// Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
// Create a DataTable object
DataTable dataTable = new DataTable();
dataTable.getColumns().add("SKU", Integer.class);
dataTable.getColumns().add("NAME", String.class);
dataTable.getColumns().add("PRICE", String.class);
// Create rows and add data
DataRow dr = dataTable.newRow();
dr.setInt(0, 512900512);
dr.setString(1,"Wireless Mouse M200");
dr.setString(2,"$85");
dataTable.getRows().add(dr);
dr = dataTable.newRow();
dr.setInt(0,512900637);
dr.setString(1,"B100 Cored Mouse ");
dr.setString(2,"$99");
dataTable.getRows().add(dr);
dr = dataTable.newRow();
dr.setInt(0,512901829);
dr.setString(1,"Gaming Mouse");
dr.setString(2,"$125");
dataTable.getRows().add(dr);
dr = dataTable.newRow();
dr.setInt(0,512900386);
dr.setString(1,"ZM Optical Mouse");
dr.setString(2,"$89");
dataTable.getRows().add(dr);
// Write datatable to the worksheet
worksheet.insertDataTable(dataTable,true,1,1,true);
// Autofit column width in the located range
worksheet.getAllocatedRange().autoFitColumns();
// Apply a style to the first row
CellStyle style = workbook.getStyles().addStyle("newStyle");
style.getFont().isBold(true);
worksheet.getRange().get(1,1,1,3).setStyle(style);
// Save to an Excel file
workbook.saveToFile("output/WriteDataTable.xlsx", ExcelVersion.Version2016);
}
}
When to use this: Database exports, CRM data, or any structured data stored in a DataTable (e.g., SQL query results, CSV imports).

Performance Tips for Large Datasets
- Use bulk operations (insertArray()/insertDataTable()) instead of writing cells one by one.
- Disable auto-fit columns or styling during data insertion, then apply them once after all data is written.
- For datasets with 100,000+ rows, consider streaming mode to reduce memory usage.
Frequently Asked Questions
Q1: What Excel formats does Spire.XLS support for writing data?
A: Spire.XLS for Java supports all major Excel formats, including:
- Legacy formats: XLS (Excel 97-2003)
- Modern formats: XLSX, XLSM (macro-enabled), XLSB, and more.
You can specify the output format when saving Excel with the saveToFile() method.
Q2: How do I format cells (colors, fonts, borders) when writing data?
A: Spire.XLS offers robust styling options. Check these guides:
- Set Background Color for Excel Cells in Java
- Apply Different Fonts to Excel Cells in Java
- Add Cell Borders in Excel in Java
Q3: How do I avoid the "Evaluation Warning" in output files?
A: To remove the evaluation sheets, get a 30-day free trial license here and then apply the license key in your code before creating the Workbook object:
com.spire.xls.license.LicenseProvider.setLicenseKey("Key");
Workbook workbook = new Workbook();
Final Thoughts
Mastering Excel export functionality is crucial for Java developers in data-driven applications. The Spire.XLS for Java library provides three efficient approaches to write data to Excel in Java:
- Precision control with cell-by-cell writing
- High-performance bulk inserts using arrays
- Database-style exporting with DataTables
Each method serves distinct use cases - from simple reports to complex enterprise data exports. By following the examples in the article, developers can easily create and write to Excel files in Java applications.
Extract Tables from PDFs in C# - Export to TXT & CSV
Extracting tables from PDF files is a common requirement in data processing, reporting, and automation tasks. PDFs are widely used for sharing structured data, but extracting tables programmatically can be challenging due to their complex layout. Fortunately, with the right tools, this process becomes straightforward. In this guide, we’ll explore how to extract tables from PDF in C# using the Spire.PDF for .NET library, and export the results to TXT and CSV formats for easy reuse.
Table of Contents:
- Prerequisites for Reading PDF Tables in C#
- Understanding PDF Table Structure
- How to Extract Tables from PDF in C#
- Extract PDF Tables to a Text File in C#
- Export PDF Tables to CSV in C#
- Conclusion
- FAQs
Prerequisites for Reading PDF Tables in C#
Spire.PDF for .NET is a powerful library for processing PDF files in C# and VB.NET. It supports a wide range of PDF operations, including table extraction, text extraction, image extraction, and more.
The easiest way to add the Spire.PDF library is via NuGet Package Manager.
1. Open Visual Studio and create a new C# project. (Here we create a Console App)
2. In Visual Studio, right-click your project > Manage NuGet Packages.
3. Search for “Spire.PDF” and install the latest version.
Understanding PDF Table Structure
Before coding, let’s clarify how PDFs store tables. Unlike Excel (which explicitly defines rows/columns), PDFs use:
- Text Blocks: Individual text elements positioned with coordinates.
- Borders/Lines: Visual cues (horizontal/vertical lines) that humans interpret as table edges.
- Spacing: Consistent gaps between text blocks to indicate cells.
The Spire.PDF library infers table structure by analyzing these visual cues, matching text blocks to rows/columns based on proximity and alignment.
How to Extract Tables from PDF in C#
If you need a quick way to preview table data (e.g., debugging or verifying extraction), printing it to the console is a great starting point.
Key methods to extract data from a PDF table:
- PdfDocument: Represents a PDF file.
- LoadFromFile: Loads the PDF file for processing.
- PdfTableExtractor: Analyzes the PDF to detect tables using visual cues (borders, spacing).
- ExtractTable(pageIndex): Returns an array of PdfTable objects for the specified page.
- GetRowCount()/GetColumnCount(): Retrieve the dimensions of each table.
- GetText(rowIndex, columnIndex): Extracts text from the cell at the specified row and column.
using Spire.Pdf;
using Spire.Pdf.Utilities;
namespace ExtractPdfTable
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument object
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("invoice.pdf");
// Initialize an instance of PdfTableExtractor class
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
// Loop through the pages
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
// Extract tables from a specific page
PdfTable[] tableList = extractor.ExtractTable(pageIndex);
// Determine if the table list is null
if (tableList != null && tableList.Length > 0)
{
int tableNumber = 1;
// Loop through the table in the list
foreach (PdfTable table in tableList)
{
Console.WriteLine($"\nTable {tableNumber} on Page {pageIndex + 1}:");
Console.WriteLine("-----------------------------------");
// Get row number and column number of a certain table
int row = table.GetRowCount();
int column = table.GetColumnCount();
// Loop through rows and columns
for (int i = 0; i < row; i++)
{
for (int j = 0; j < column; j++)
{
// Get text from the specific cell
string text = table.GetText(i, j);
// Print cell text to console with a separator
Console.Write($"{text}\t");
}
// New line after each row
Console.WriteLine();
}
tableNumber++;
}
}
}
// Close the document
pdf.Close();
}
}
}
When to Use This Method
- Quick debugging or validation of extracted data.
- Small datasets where you don’t need persistent storage.
Output: Retrieve PDF table data and output to the console

Extract PDF Tables to a Text File in C#
For lightweight, human-readable storage, saving tables to a text file is ideal. This method uses StringBuilder to efficiently compile table data, preserving row breaks for readability.
Key features of extracting PDF tables and exporting to TXT:
- Efficiency: StringBuilder minimizes memory overhead compared to string concatenation.
- Persistent Storage: Saves data to a text file for later review or sharing.
- Row Preservation: Uses \r\n to maintain row structure, making the text file easy to scan.
using Spire.Pdf;
using Spire.Pdf.Utilities;
using System.Text;
namespace ExtractTableToTxt
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument object
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("invoice.pdf");
// Create a StringBuilder object
StringBuilder builder = new StringBuilder();
// Initialize an instance of PdfTableExtractor class
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
// Declare a PdfTable array
PdfTable[] tableList = null;
// Loop through the pages
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
// Extract tables from a specific page
tableList = extractor.ExtractTable(pageIndex);
// Determine if the table list is null
if (tableList != null && tableList.Length > 0)
{
// Loop through the table in the list
foreach (PdfTable table in tableList)
{
// Get row number and column number of a certain table
int row = table.GetRowCount();
int column = table.GetColumnCount();
// Loop through the rows and columns
for (int i = 0; i < row; i++)
{
for (int j = 0; j < column; j++)
{
// Get text from the specific cell
string text = table.GetText(i, j);
// Add text to the string builder
builder.Append(text + " ");
}
builder.Append("\r\n");
}
}
}
}
// Write to a .txt file
File.WriteAllText("ExtractPDFTable.txt", builder.ToString());
}
}
}
When to Use This Method
- Archiving table data in a lightweight, universally accessible format.
- Sharing with teams that need to scan data without spreadsheet tools.
- Using as input for basic scripts (e.g., PowerShell) to extract specific values.
Output: Extract PDF table data and save to a text file.

Pro Tip: For VB.NET demos, convert the above code using our C# ⇆ VB.NET Converter.
Export PDF Tables to CSV in C#
CSV (Comma-Separated Values) is the industry standard for tabular data, compatible with Excel, Google Sheets, and databases. This method formats the extracted tables into a valid CSV file by quoting cells and handling special characters.
Key features of extracting tables from PDF to CSV:
- StreamWriter: Writes data incrementally to the CSV file, reducing memory usage for large PDFs.
- Quoted Cells: Cells are wrapped in double quotes (" ") to avoid misinterpreting commas within text as column separators.
- UTF-8 Encoding: Supports special characters in cell text.
- Spreadsheet Ready: Directly opens in Excel, Google Sheets, or spreadsheet tools for analysis.
using Spire.Pdf;
using Spire.Pdf.Utilities;
using System.Text;
namespace ExtractTableToCsv
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument object
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("invoice.pdf");
// Create a StreamWriter object for efficient CSV writing
using (StreamWriter csvWriter = new StreamWriter("PDFtable.csv", false, Encoding.UTF8))
{
// Create a PdfTableExtractor object
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
// Loop through the pages
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
// Extract tables from a specific page
PdfTable[] tableList = extractor.ExtractTable(pageIndex);
// Determine if the table list is null
if (tableList != null && tableList.Length > 0)
{
// Loop through the table in the list
foreach (PdfTable table in tableList)
{
// Get row number and column number of a certain table
int row = table.GetRowCount();
int column = table.GetColumnCount();
// Loop through the rows
for (int i = 0; i < row; i++)
{
// Creates a list to store data
List<string> rowData = new List<string>();
// Loop through the columns
for (int j = 0; j < column; j++)
{
// Retrieve text from table cells
string cellText = table.GetText(i, j).Replace("\"", "\"\"");
// Add the cell text to the list and wrap in double quotes
rowData.Add($"\"{cellText}\"");
}
// Join cells with commas and write to CSV
csvWriter.WriteLine(string.Join(",", rowData));
}
}
}
}
}
}
}
}
When to Use This Method
- Data analysis (import into Excel for calculations).
- Migrating PDF tables to databases (e.g., SQL Server, PostgreSQL, MySQL).
- Collaborating with teams that rely on spreadsheets.
Output: Parse PDF table data and export to a CSV file.

Recommendation: Integrate with Spire.XLS for .NET to extract tables from PDF to Excel directly.
Conclusion
This guide has outlined three efficient methods for extracting tables from PDFs in C#. By leveraging the Spire.PDF for .NET library, you can automate the PDF table extraction process and export results to console, TXT, or CSV for further analysis. Whether you’re building a data pipeline, report generator, or business tool, these approaches streamline workflows, save time, and minimize human error.
Refer to the online documentation and obtain a free trial license here to explore more advanced PDF operations.
FAQs
Q1: Why use Spire.PDF for .NET to extract tables?
A: Spire.PDF provides a dedicated PdfTableExtractor class that detects tables based on visual cues (borders, spacing, and text alignment), simplifying the process of parsing structured data from PDFs.
Q2: Can Spire.PDF extract tables from scanned (image-based) PDFs?
A: No. The .NET PDF library works only with text-based PDFs (where text is selectable). For scanned PDFs, use Spire.OCR to extract text before parsing tables.
Q3: Can I extract tables from multiple PDFs at once?
A: Yes. To batch-process multiple PDFs, use Directory.GetFiles() to list all PDF files in a folder, then loop through each file and run the extraction logic. For example:
string[] pdfFiles = Directory.GetFiles(@"C:\Invoices\", "*.pdf");
foreach (string file in pdfFiles)
{
// Run extraction code for each file
}
Q4: How can I improve performance when extracting tables from large PDFs?
A: For large PDFs (100+ pages), optimize performance by:
- Processing pages in batches instead of loading the entire PDF at once.
- Disposing of unused PdfTable or PdfDocument objects with the using statements to free memory.
- Skipping pages with no tables early (
using if (tableList == null || tableList.Length == 0)).
Java: Convert Images to PDF
Converting images to PDF is beneficial for many reasons. For one reason, it allows you to convert images into a format that is more readable and easier to share. For another reason, it dramatically reduces the size of the file while preserving the quality of images. In this article, you will learn how to convert images to PDF in Java using Spire.PDF for Java.
There is no straightforward method provided by Spire.PDF to convert images to PDF. You could, however, create a new PDF document and draw images at the specified locations. Depending on whether the page size of the generated PDF matches the image, this topic can be divided into two subtopics.
Install Spire.PDF for Java
First, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>12.5.1</version>
</dependency>
</dependencies>
Additionally, the imgscalr library is used in the first code example to resize images. It is not necessary to install it if you do not need to adjust the image’s size.
Add an Image to PDF at a Specified Location
The following are the steps to add an image to PDF at a specified location using Spire.PDF for Java.
- Create a PdfDocument object.
- Set the page margins using PdfDocument.getPageSettings().setMargins() method.
- Add a page using PdfDocument.getPages().add() method
- Load an image using ImageIO.read() method, and get the image width and height.
- If the image width is larger than the page (the content area) width, resize the image to make it to fit to the page width using the imgscalr library.
- Create a PdfImage object based on the scaled image or the original image.
- Draw the PdfImage object on the first page at (0, 0) using PdfPageBase.getCanvas().drawImage() method.
- Save the document to a PDF file using PdfDocument.saveToFile() method.
- Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.graphics.PdfImage;
import org.imgscalr.Scalr;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.IOException;
public class AddImageToPdf {
public static void main(String[] args) throws IOException {
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Set the margins
doc.getPageSettings().setMargins(20);
//Add a page
PdfPageBase page = doc.getPages().add();
//Load an image
BufferedImage image = ImageIO.read(new FileInputStream("C:\\Users\\Administrator\\Desktop\\announcement.jpg"));
//Get the image width and height
int width = image.getWidth();
int height = image.getHeight();
//Declare a PdfImage variable
PdfImage pdfImage;
//If the image width is larger than page width
if (width > page.getCanvas().getClientSize().getWidth())
{
//Resize the image to make it to fit to the page width
int widthFitRate = width / (int)page.getCanvas().getClientSize().getWidth();
int targetWidth = width / widthFitRate;
int targetHeight = height / widthFitRate;
BufferedImage scaledImage = Scalr.resize(image,Scalr.Method.QUALITY,targetWidth,targetHeight);
//Load the scaled image to the PdfImage object
pdfImage = PdfImage.fromImage(scaledImage);
} else
{
//Load the original image to the PdfImage object
pdfImage = PdfImage.fromImage(image);
}
//Draw image at (0, 0)
page.getCanvas().drawImage(pdfImage, 0, 0, pdfImage.getWidth(), pdfImage.getHeight());
//Save to file
doc.saveToFile("output/AddImage.pdf");
}
}

Convert an Image to PDF with the Same Width and Height
The following are the steps to convert an image to a PDF with the same page size as the image using Spire.PDF for Java.
- Create a PdfDocument object.
- Set the page margins to zero using PdfDocument.getPageSettings().setMargins() method.
- Load an image using ImageIO.read() method, and get the image width and height.
- Add a page to PDF based on the size of the image using PdfDocument.getPages().add() method.
- Create a PdfImage object based on the image.
- Draw the PdfImage object on the first page from the coordinate (0, 0) using PdfPageBase.getCanvas().drawImage() method.
- Save the document to a PDF file using PdfDocument.saveToFile() method.
- Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.graphics.PdfImage;
import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.IOException;
public class ConvertImageToPdfWithSameSize {
public static void main(String[] args) throws IOException {
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Set the margins to 0
doc.getPageSettings().setMargins(0);
//Load an image
BufferedImage image = ImageIO.read(new FileInputStream("C:\\Users\\Administrator\\Desktop\\announcement.jpg"));
//Get the image width and height
int width = image.getWidth();
int height = image.getHeight();
//Add a page of the same size as the image
PdfPageBase page = doc.getPages().add(new Dimension(width, height));
//Create a PdfImage object based on the image
PdfImage pdfImage = PdfImage.fromImage(image);
//Draw image at (0, 0) of the page
page.getCanvas().drawImage(pdfImage, 0, 0, pdfImage.getWidth(), pdfImage.getHeight());
//Save to file
doc.saveToFile("output/ConvertPdfWithSameSize.pdf");
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.