How to Convert HEIC to PDF with Online Tools, Desktop Software, & Python

HEIC (High Efficiency Image Container) is the default image format used by Apple devices such as iPhone and iPad. While HEIC offers high image quality with smaller file sizes, it is not universally supported across platforms and applications. This often leads to compatibility issues when sharing, printing, or archiving images.

Converting HEIC files to PDF is a common and effective solution. PDF is a widely accepted format that ensures consistent display across devices and operating systems. In this article, we will explore several ways to convert HEIC to PDF, including free methods and an automated approach using Python code.

Quick Navigation


What Is a HEIC File and Why Convert HEIC to PDF?

HEIC is an image container format based on HEVC (High Efficiency Video Coding). It allows Apple devices to store photos with better compression and high visual quality. However, HEIC files are not natively supported by many Windows applications, web browsers, or document management systems, which can create friction when you try to open or share them.

Typical situations where converting HEIC to PDF becomes useful include:

  • Improving compatibility across platforms
  • Combining multiple HEIC images into a single document
  • Preparing images for printing or submission
  • Archiving images in a stable, non-editable format

Because of these limitations, converting HEIC to PDF is often less about preference and more about ensuring accessibility and consistency across different environments.


Method 1: Convert HEIC to PDF Using Online Tools (Free)

Online converters are often the fastest way to convert HEIC to PDF, especially when you need a quick result without installing additional software.

Steps:

  1. Open an online HEIC to PDF converter (such as CloudConvert).

    CloudConvert Online HEIC to PDF Converter

  2. Upload your HEIC file.

  3. Select PDF as the output format.

  4. Download the converted PDF file.

Pros:

  • Free and easy to use
  • No installation required

Cons:

  • File size and batch limits
  • Potential privacy and data security concerns
  • Limited control over output quality

This method is suitable for one-time or occasional conversions.

For converting other images formats to PDF, you can check How to Convert Images to PDF on Various Platforms.


Method 2: Convert HEIC to PDF Using Desktop Software

Desktop software provides a more stable and private way to convert HEIC to PDF, especially if you frequently work with local files or prefer not to upload images to third-party platforms. Many modern operating systems already include built-in tools that can handle this task without requiring additional downloads.

For example, Microsoft Photos on Windows and macOS Preview allow you to open HEIC files and export or print them as PDF documents. Compared to online tools, desktop solutions typically offer better reliability and fewer file size restrictions.

Steps:

  1. Open the HEIC file using a compatible image viewer (e.g., Microsoft Photos).

    Open HEIC file in Microsoft Photos

  2. Choose the export or print option.

  3. Select PDF as the output format.

    Export HEIC to PDF in Microsoft Photos

  4. Save the PDF file.

Pros:

  • Better privacy than online tools
  • No coding required

Cons:

  • Manual operation
  • Inefficient for batch processing

This approach works well for users who need basic conversion with minimal technical effort.

If the result PDF file is too large, you can try compressing the PDF file to avoid transmission and sharing issues.


Method 3: Convert HEIC to PDF Using Python (Recommended for Automation)

If you need to convert HEIC files to PDF regularly, handle large image collections, or integrate conversion into an existing workflow, manual tools can quickly become inefficient. In these scenarios, automation offers a significant advantage.

Python enables you to build a repeatable conversion process that runs with minimal human intervention. Instead of converting images one by one, you can process entire folders, standardize output settings, and integrate the workflow into backend services or data pipelines.

Required Libraries

To implement this solution, you will need:

  • pillow – for image processing
  • pillow-heif – to enable HEIC image support
  • Spire.PDF for Python – to create and manage PDF documents

You can install these libraries using pip:

pip install pillow pillow-heif spire.pdf

This combination allows you to read HEIC images, process them, and generate high-quality PDF files programmatically.

Step-by-Step: Convert HEIC to PDF with Python

Below is a general workflow for converting a HEIC file to PDF using Python:

  1. Register HEIC support and open the HEIC image with Pillow.
  2. Convert the image to RGB and save it as JPEG or PNG in memory.
  3. Create a PDF document and load the image from a byte stream.
  4. Render the image onto a PDF page and save the PDF file.

Python Example: Convert HEIC to PDF

from spire.pdf import PdfDocument, PdfImage, PointF, Stream, SizeF, PdfMargins
from PIL import Image
import pillow_heif
import io

# Register the HEIF support
pillow_heif.register_heif_opener()

# Open the HEIC image and convert to RGB (HEIC images may be RGBA)
img = Image.open("Image.heic")
img = img.convert("RGB")

# Save the image to JPEG or PNG bytes (for Spire.PDF support)
buffer = io.BytesIO()
img.save(buffer, format="JPEG")  # Or "PNG"
image_bytes = buffer.getvalue()

# Load the image from bytes
stream = Stream(image_bytes)
image = PdfImage.FromStream(stream)

# Create a PDF document
pdf = PdfDocument()

# Set the page size to the image size and margins to 0
# This will make the image fill the page
pdf.PageSettings.Size.Width = image.Width
pdf.PageSettings.Size.Height = image.Height
pdf.PageSettings.Margins.All = 0

# Add a page to the document
page = pdf.AppendPage()

# Draw the image on the page
page.Canvas.DrawImage(image, PointF(0.0, 0.0))

# Save the PDF document
pdf.SaveToFile("output/HEICToPDF.pdf")
pdf.Close()

Below is a preview of the generated PDF file:

PDF converted from HEIC file using Python

This approach supports batch processing and can be extended to merge multiple images into a single PDF. For more advanced image-to-PDF conversions, including other formats and options like page size, margins, and image quality, you can check out our Python guide on converting images to PDF.


How to Choose the Right HEIC to PDF Conversion Method

The best method depends less on technical preference and more on your actual usage scenario. Instead of assuming that one approach fits all situations, consider how often you perform conversions and how much control you need over the output.

Choose an Online Tool if:

  • You only need to convert a few files occasionally
  • You want the fastest possible solution
  • Installing software is not convenient

Online converters are ideal for quick, low-effort tasks. However, they may not be suitable for sensitive images or large batches.

Choose Desktop Software if:

  • You prefer working offline
  • Privacy is important
  • You want a simple, no-code experience

Desktop tools strike a balance between ease of use and reliability. They are often the most practical choice for everyday office workflows.

Choose Python if:

  • You regularly convert large numbers of HEIC files
  • The process needs to be automated
  • You want precise control over layout, resolution, or document structure
  • Conversion is part of a larger system or application

While this approach requires some technical knowledge, it becomes significantly more efficient at scale.

Overall, online tools suit quick tasks, desktop software supports everyday workflows, and Python is ideal for automated or large-scale conversions.


Conclusion

Converting HEIC files to PDF is a practical solution for improving compatibility and usability. While free online tools and desktop software are suitable for simple tasks, they often fall short when automation, privacy, or batch processing is required.

For developers and advanced users, converting HEIC to PDF using Python provides the most flexibility. With libraries such as pillow-heif and Spire.PDF for Python, you can build an efficient and professional conversion workflow tailored to your needs.


FAQ: HEIC to PDF Conversion

What are the free ways to convert HEIC to PDF?

Yes, many online tools offer free HEIC to PDF conversion, but they may have limitations on file size or usage.

Will converting HEIC to PDF affect image quality?

Image quality depends on the conversion method. Programmatic solutions allow better control over resolution and compression.

How can I merge multiple HEIC files into one PDF?

Yes. Using a programming approach, multiple HEIC images can be merged into a single PDF document.

Is it possible to convert HEIC to PDF for free with Spire.PDF?

Yes, Free Spire.PDF for Python allows you to convert HEIC files to PDF at no charge.

Convert ODT to PDF with Online Converters, Desktop Tools, and Python Automation

ODT files (OpenDocument Text) are widely used for creating and editing documents in LibreOffice or Apache OpenOffice. However, sharing or distributing ODT files can be inconvenient, as not all devices or platforms support this format. Converting ODT files to PDF ensures that your document layout, fonts, and formatting remain intact, making it easier to share, print, or archive reliably.

This article guides you through practical methods to convert ODT to PDF, covering online tools, desktop software, and Python-based automation for both occasional and large-scale conversions.

Quick Navigation


What Is an ODT File and Why Convert It to PDF?

An ODT file is a word processing document based on the OpenDocument standard. It is commonly used in environments that prioritize open formats and cross-platform compatibility. ODT files support rich text, images, tables, and styles, making them suitable for everyday document creation.

However, ODT is not universally supported outside office applications. When documents need to be shared with a wider audience, submitted formally, or archived for long-term use, PDF becomes the more practical choice.

Converting ODT to PDF helps address several common needs:

  • Consistent layout: PDFs display the same way on all devices and operating systems.
  • Improved compatibility: PDF readers are widely available and require no editing software.
  • Document integrity: PDFs reduce the risk of accidental content changes.
  • Professional distribution: PDFs are often preferred for reports, contracts, and official documents.

Convert ODT to PDF Online (Free and Web-Based Tools)

Online converters are often the fastest way to convert an ODT file to PDF, especially for users who only need occasional conversions and do not want to install additional software.

Step-by-Step: Convert ODT to PDF Online

  1. Open a web-based ODT to PDF conversion service in your browser (for example, CloudXDocs ODT to PDF Converter). CloudXDocs ODT to PDF Converter
  2. Upload the ODT file from your local device or supported cloud storage.
  3. Wait while the file is processed and converted on the server.
  4. Download the resulting PDF file to your computer. Download ODT to PDF Conversion Result from CloudXDocs

Most online tools follow this simple workflow, making them accessible even to non-technical users.

Advantages of Online Conversion

  • No installation or setup required
  • Accessible from any modern browser
  • Suitable for quick, one-time conversions
  • Often available for free with basic usage

Limitations of Online Converters

Online tools typically impose file size limits or daily conversion caps. Uploading documents to third-party servers may also be unsuitable for confidential or sensitive files. In addition, complex formatting—such as custom fonts or advanced layouts—may not always be preserved accurately, and batch conversion is rarely supported.

In cases where editing and collaboration are required, converting ODT files to Word format can be a more practical option than PDF.


Convert ODT to PDF Using Desktop Software

Desktop office applications provide a more controlled environment for converting ODT files to PDF. LibreOffice and Apache OpenOffice both include built-in PDF export functionality.

Step-by-Step: Convert ODT to PDF with LibreOffice or OpenOffice

  1. Open the ODT file in LibreOffice Writer (or Apache OpenOffice Writer). Open ODT File in LibreOffice Writer
  2. Review the document to ensure formatting and layout are correct.
  3. Click File and select Export As PDF. Export ODT File to PDF in LibreOffice Writer
  4. Configure export options such as image quality, font embedding, or page range.
  5. Save the exported PDF file to your local system.

This approach gives users greater confidence in the final output, particularly for visually complex documents.

Advantages of Desktop-Based Conversion

  • Better preservation of formatting and layout
  • No need to upload files to external servers
  • Greater control over PDF export settings
  • Reliable for documents with tables, images, and styles

Limitations of Manual Conversion

Manual conversion requires user interaction for each document, making it inefficient for high-volume workflows. It is also difficult to automate and unsuitable for server-side or headless environments where no graphical interface is available.


When Online and Manual Methods Are Not Enough

There are many scenarios where online tools and desktop software no longer meet practical requirements. As document volume increases or workflows become more complex, manual conversion quickly turns into a bottleneck.

Typical situations where traditional methods fall short include:

  • Converting large numbers of ODT files on a regular basis
  • Running conversions automatically on a server or backend system
  • Integrating document conversion into an existing application or service
  • Ensuring consistent output without user intervention
  • Operating in environments without a graphical interface

In these cases, a programmatic approach provides greater reliability, scalability, and control.


Convert ODT to PDF Programmatically with Python

Python is widely used for automation, data processing, and backend development. Using Python to convert ODT files to PDF allows the process to be fully automated and integrated into larger systems.

Why Use Python for ODT to PDF Conversion?

Python enables batch processing, repeatable workflows, and seamless integration with existing applications. Once implemented, conversions can run unattended, ensuring consistent results while reducing manual effort.

Using Spire.Doc for Python

Spire.Doc for Python is a document processing library that supports OpenDocument formats and enables direct conversion from ODT to PDF. It preserves text, images, and layout without relying on Microsoft Word or LibreOffice, making it suitable for server-side and enterprise use.

Installation

Spire.Doc for Python can be installed via pip:

pip install spire-doc

Example: Convert ODT to PDF with Python

from spire.doc import Document, FileFormat

# Create a Document object and load the ODT file
doc = Document()
doc.LoadFromFile("Sample.odt", FileFormat.Odt)

# Save the ODT document as a PDF file
doc.SaveToFile("output/ODTToPDF.pdf", FileFormat.PDF)
doc.Close()

Below is a preview of the PDF file converted from ODT using Python:

Conversion Result: ODT to PDF using Python

This approach can easily be extended to handle batch conversions or integrated into automated workflows.

After converting the ODT file to PDF, you can further edit the generated PDF using Python, such as adding watermarks, setting metadata, or applying additional security options.

Save ODT as PDF to a Stream (Optional)

In backend or web-based applications, you may want to generate a PDF file in memory without writing it to disk. Spire.Doc for Python allows saving the converted document to a stream.

from spire.doc import Document, FileFormat, Stream

doc = Document()
doc.LoadFromFile("Sample.odt", FileFormat.Odt)

# Save the ODT document to a PDF stream
pdf_stream = Stream()
doc.SaveToStream(pdf_stream, FileFormat.PDF)
doc.Close()

# Get PDF bytes from the stream
pdf_bytes = bytes(pdf_stream.ToArray())

This approach is useful for returning PDF files in web APIs, uploading to cloud storage, or processing documents in memory.

If your workflow requires further processing of PDF files in memory—such as merging documents, adding watermarks, or applying security settings—you may find it helpful to explore techniques for working with PDF documents directly in streams.


Choosing the Right Method for Your Needs

The best way to convert ODT to PDF depends on how often you perform the task and the level of control required.

  • Online tools are ideal for quick, occasional conversions when convenience is the priority.
  • Desktop software works well for users who need better formatting control and convert files manually.
  • Python-based automation is the most suitable option for large-scale processing, backend systems, and enterprise workflows.

Understanding your usage scenario helps ensure you choose a method that balances efficiency, reliability, and maintainability.


FAQs About Converting ODT to PDF

Is it safe to convert ODT to PDF online?

Online converters can be convenient for non-sensitive documents, but they may not be suitable for sensitive documents due to privacy concerns.

Will the document formatting change after conversion?

Simple documents usually convert accurately, while complex layouts may benefit from desktop or programmatic methods.

Can multiple ODT files be converted at once?

Batch conversion is rarely supported by online tools and is inefficient manually. Programmatic solutions handle this more effectively.

Do Python-based conversions require office software?

No. Libraries such as Spire.Doc for Python operate independently of office applications.


Conclusion

Converting ODT files to PDF can be accomplished in several ways, each suited to different needs. Online tools and desktop applications are effective for everyday use, while Python-based automation provides a scalable solution for advanced and high-volume scenarios. By selecting the appropriate method, you can ensure accurate, efficient, and reliable document conversion.

See Also

Tutorial on Converting PDF Tables to CSV with Manual, Online & Automated Methods

Converting tables from PDF files into CSV format is a common requirement in reporting, analytics, and data integration workflows. CSV files are lightweight, widely supported, and well suited for automation, making them far more useful than static PDFs once tabular data needs to be reused.

In practice, however, converting a PDF table to CSV is rarely straightforward. PDF files are designed to preserve visual appearance rather than logical structure. A table that looks perfectly aligned on screen may not exist as rows and columns internally, which is why naïve conversion methods often fail.

This article focuses on practical PDF table to CSV conversion methods. Instead of covering every theoretical option, it explains the most commonly used approaches, how they behave in practice, and when each method is appropriate.

Table of Contents


Common Practical Ways to Convert PDF Tables to CSV

In most real workflows, converting a PDF table to CSV falls into one of the following categories:

  • Exporting tables via PDF to spreadsheet tools (such as Acrobat)
  • Using online PDF table to CSV converters
  • Extracting tables programmatically using Python code

Simple copy-and-paste techniques are intentionally excluded, as they usually flatten tables into plain text and require extensive manual reconstruction.


Method 1: Export PDF to Spreadsheet Using Acrobat

Exporting a PDF to a spreadsheet format and then saving it as CSV is a common choice for users who prefer desktop tools and visual inspection.

When This Method Works Well

  • The PDF is text-based and well structured
  • Tables have clear row and column boundaries
  • Manual review and correction are acceptable

Typical Acrobat-Based Workflow

  1. Open the PDF file in Acrobat

  2. Choose Export PDF and select Spreadsheet as the output format

    Acrobat Export PDF to Spreadsheet

  3. Export the document to Excel format

  4. Review and adjust the table structure if necessary

  5. Save or export the spreadsheet as a CSV file

    Excel Save as CSV

This workflow often produces better structural results than direct copying, especially for single-page or consistently formatted tables.

Practical Limitations

  • Complex or multi-page tables may be split across sheets
  • Merged cells can lead to misaligned columns in CSV output
  • Manual cleanup is often required before export
  • Not suitable for batch or automated processing

This approach is effective for occasional conversions where visual validation matters, but it does not scale well.

For users looking for a free alternative to Acrobat for converting PDF tables to Excel before saving as CSV, see How to Convert PDF to Excel for Free.


Method 2: Online PDF Table to CSV Conversion

Online converters are widely used because they require no installation and provide fast results.

When Online Conversion Is a Good Fit

  • The PDF contains selectable (non-scanned) text
  • Table layouts are relatively simple
  • Only a small number of files need conversion

Typical Online PDF Table to CSV Workflow

Most online tools follow a similar process (Zamzar example):

  1. Open an online PDF to CSV converter

    Zamzar PDF to CSV Online Converter

  2. Upload the PDF file containing the table

  3. Configure page range or table detection options, if available

  4. Start the conversion process

  5. Download the generated CSV file

    Zamzar PDF to CSV Output

For straightforward PDFs, this process can generate usable CSV output in seconds.

Common Considerations With Online Converters

  • Columns may shift when spacing is inconsistent
  • Converters often export the whole PDF as CSV, not just the tables
  • Line breaks inside cells may create extra rows
  • Output quality varies by document layout
  • File size limits and privacy concerns may apply

Online tools are best treated as a convenience option rather than a predictable or reusable solution.


Method 3: Programmatic PDF Table Extraction with Python

When accuracy, consistency, or automation is required, programmatic extraction is often the most reliable way to convert PDF tables to CSV.

Why Programmatic Extraction Is Often Preferred

  • Tables can be processed page by page
  • Multi-page tables can be handled consistently
  • The same extraction logic can be reused in batch jobs
  • Output is reproducible and easier to validate

This approach is common in data pipelines, reporting systems, and backend services that process PDFs at scale. With Spire.PDF for Python, developers can accurately extract tables from PDF documents, handle multi-page and complex layouts, and automate the conversion to CSV with minimal manual intervention.

Typical Programmatic Workflow for PDF Table to CSV

Most programmatic solutions follow a similar high-level process:

  1. Load the PDF document
  2. Iterate through each page
  3. Detect table structures on each page
  4. Extract rows and columns as structured data
  5. Normalize extracted text where necessary
  6. Write the structured data to CSV files

Python is widely used for this task because it combines readability with strong data-processing capabilities.

Example: Convert PDF Tables to CSV Using Python

Before running the example below, make sure the required PDF processing library is installed.

You can install Spire.PDF for Python using pip:

pip install spire.pdf

Once installed, you can proceed with the table extraction example.

The following example demonstrates how to convert PDF tables to CSV using Spire.PDF for Python.

import os
import csv
from spire.pdf import PdfDocument, PdfTableExtractor

# Load the PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a table extractor
extractor = PdfTableExtractor(pdf)

# Normalize text to handle PDF ligatures and PUA characters
def normalize_text(text: str) -> str:
    if not text:
        return text
    if not any('\uE000' <= ch <= '\uF8FF' for ch in text):
        return text

    ligatures = {
        '\uE000': 'ff',
        '\uE001': 'fi',
        '\uE002': 'fl',
        '\uE003': 'ffl',
        '\uE004': 'ffi',
        '\uE005': 'ft',
        '\uE006': 'st',
    }
    for lig, repl in ligatures.items():
        text = text.replace(lig, repl)
    return text

# Extract tables page by page
for page_index in range(pdf.Pages.Count):
    tables = extractor.ExtractTable(page_index)
    if tables:
        for table_index, table in enumerate(tables):
            rows = []
            for r in range(table.GetRowCount()):
                row = []
                for c in range(table.GetColumnCount()):
                    cell = normalize_text(table.GetText(r, c)).replace("\n", " ")
                    row.append(cell)
                rows.append(row)

            os.makedirs("output/Tables", exist_ok=True)
            with open(
                f"output/Tables/Page{page_index + 1}-Table{table_index + 1}.csv",
                "w",
                newline="",
                encoding="utf-8",
            ) as f:
                writer = csv.writer(f)
                writer.writerows(rows)

pdf.Close()

Below is a preview of the PDF table to CSV conversion results:

PDF Table to CSV Output from Python

How This Implementation Works

This implementation focuses on preserving table structure rather than inferring layout from text positions:

  • Cell-level extraction ensures rows and columns are preserved as logical units instead of being reconstructed from spacing
  • Page-by-page processing prevents tables from being merged incorrectly across page boundaries
  • Explicit text normalization handles common PDF issues such as ligatures and private-use Unicode characters, which can silently corrupt CSV output
  • Direct CSV writing avoids intermediate formats that may introduce additional formatting artifacts

As a result, the generated CSV files are more stable and suitable for automated processing. For a step-by-step guide on extracting tables from PDF documents, see Detailed Guide: Extracting Tables from PDF.


Handling Real-World PDF Table Scenarios

In real-world workflows, PDF tables often behave differently from how they look on screen. Typical issues include:

  • Tables spanning multiple pages with repeated or missing headers
  • Slight column position shifts between pages
  • Rows with empty, wrapped, or irregular cells
  • Large batches of PDFs with similar but not identical layouts

These factors are usually where generic export tools and online converters start to produce inconsistent CSV output.

From a practical perspective, programmatic extraction is better suited to these cases because it allows:

  • Page-by-page processing without accidentally merging unrelated tables
  • Controlled handling of multi-page tables
  • Stable column alignment even when layouts are not perfectly uniform

One additional usability detail worth noting is CSV encoding:

  • When extracted data includes non-ASCII characters, CSV files opened directly in Excel may display garbled text
  • Saving CSV output as UTF-8 with BOM (UTF-8-SIG) helps ensure correct character display without manual import steps

These considerations become especially relevant when working with real-world PDFs rather than idealized examples.


Key Takeaways: Converting PDF Tables to CSV

In practice, converting a PDF table to CSV usually comes down to three options:

  • Acrobat export works well for occasional, visually verified conversions, such as single-page invoices or reports
  • Online converters are convenient for simple, one-off tasks with straightforward tables
  • Programmatic extraction offers the most reliable results for complex, multi-page, or repeated workflows, especially in automated pipelines

Choosing the right method depends less on the tool itself and more on how the extracted data will be used.


FAQ

Can scanned PDF tables be converted to CSV directly?
No. Scanned PDFs require OCR before table extraction is possible. For a step-by-step guide on extracting text from scanned PDFs using Python, see Extracting Text from Scanned PDFs with Python.

Is CSV better than Excel for extracted PDF tables? CSV is simpler and better suited for automation, while Excel is often preferred for manual review.

Is Python suitable for batch PDF table conversion? Yes. Python is widely used for large-scale and automated PDF table extraction due to its flexibility and readability.

See Also

Tutorial on How to Export a DataTable to PDF Using C#

In many .NET-based business systems, structured data is often represented as a DataTable. When this data needs to be distributed, archived, or delivered as a read-only report, exporting a DataTable to PDF using C# becomes a common and practical requirement.

Compared with formats such as Excel or CSV, PDF is typically chosen when layout stability, visual consistency, and document integrity are more important than data editability. This makes PDF especially suitable for reports, invoices, audit records, and system-generated documents.

This tutorial takes a code-first approach to converting a DataTable to PDF in C#, focusing on the technical implementation rather than conceptual explanations. The solution is based on Spire.PDF for .NET, using its PdfGrid component to render DataTable content as a structured table inside a PDF document.

Table of Contents


1. Overview: DataTable to PDF Export in C#

Exporting a DataTable to PDF is fundamentally a data-binding and rendering task, not a low-level drawing problem.

Instead of manually calculating row positions, column widths, or page breaks, the recommended approach is to bind an existing DataTable to a PDF table component and let the rendering engine handle layout and pagination automatically.

In Spire.PDF for .NET, this role is fulfilled by the PdfGrid class.

Why PdfGrid Is the Right Abstraction

PdfGrid is a Spire.PDF for .NET component designed specifically for rendering structured, tabular data in PDF documents. It treats rows, columns, headers, and pagination as first-class concepts rather than graphical primitives.

From a technical standpoint, PdfGrid provides:

  • Direct binding via the DataSource property, which accepts a DataTable
  • Automatic column generation based on the DataTable schema
  • Built-in header and row rendering
  • Automatic page breaking when content exceeds page bounds

As a result, exporting a DataTable to PDF becomes a declarative operation: you describe what data should be rendered, and the PDF engine determines how it is laid out across pages.

The following sections focus on the concrete implementation and practical refinements of this approach.


2. Environment Setup

All examples in this article apply to both .NET Framework and modern .NET (6+) projects. The implementation is based entirely on managed code and does not require platform-specific configuration.

Installing Spire.PDF for .NET

Spire.PDF for .NET can be installed via NuGet:

Install-Package Spire.PDF

You can also download Spire.PDF for .NET and include it in your project manually.

Once installed, the library provides APIs for PDF document creation, page management, table rendering, and style control.


3. DataTable to PDF in C#: Core Workflow and Code Implementation

With the environment prepared, exporting a DataTable to PDF becomes a linear, implementation-driven process.

At its core, the workflow relies on binding an existing DataTable to PdfGrid and delegating layout, pagination, and table rendering to the PDF engine. There is no need to manually draw rows, columns, or borders.

From an implementation perspective, the process consists of the following steps:

  1. Prepare a populated DataTable
  2. Create a PDF document and page
  3. Bind the DataTable to a PdfGrid
  4. Render the grid onto the page
  5. Save the PDF output

These steps are typically executed together as a single, continuous code path in real-world applications. The following example demonstrates the complete workflow in one place.

Complete Example: Exporting a DataTable to PDF

The example below uses a business-oriented DataTable schema to reflect a typical reporting scenario. The source of the DataTable (database, API, or in-memory processing) does not affect the export logic.

DataTable dataTable = new DataTable();
dataTable.Columns.Add("OrderId", typeof(int));
dataTable.Columns.Add("CustomerName", typeof(string));
dataTable.Columns.Add("OrderDate", typeof(DateTime));
dataTable.Columns.Add("TotalAmount", typeof(decimal));

dataTable.Rows.Add(1001, "Contoso Ltd.", DateTime.Today, 1280.50m);
dataTable.Rows.Add(1002, "Northwind Co.", DateTime.Today, 760.00m);
dataTable.Rows.Add(1003, "Adventure Works", DateTime.Today, 2145.75m);
dataTable.Rows.Add(1004, "Wingtip Toys", DateTime.Today, 1230.00m);
dataTable.Rows.Add(1005, "Bike World", DateTime.Today, 1230.00m);
dataTable.Rows.Add(1006, "Woodgrove Bank", DateTime.Today, 1230.00m);

PdfDocument document = new PdfDocument();
PdfPageBase page = document.Pages.Add();

PdfGrid grid = new PdfGrid();
grid.DataSource = dataTable;

grid.Draw(page, new PointF(40f, 0));

document.SaveToFile("DataTableToPDF.pdf");
document.Close();

This single code block completes the entire DataTable-to-PDF export process. Below is a preview of the generated PDF:

PDF document generated from DataTable using C#

Key technical characteristics of this implementation:

  • PdfGrid.DataSource accepts a DataTable directly, with no manual row or column mapping
  • Column headers are generated automatically from DataColumn.ColumnName
  • Row data is populated from each DataRow
  • Pagination and page breaks are handled internally during rendering
  • No coordinate-level table layout logic is required

The result is a structured, paginated PDF table that accurately reflects the DataTable’s schema and data. This method is already a fully functional and production-ready solution for exporting a DataTable to PDF in C#.

In practical applications, however, additional control is often required for layout positioning, page size, orientation, and visual styling. The following sections focus on refining table placement, appearance, and pagination behavior without altering the core export logic.


4. Controlling Table Layout, Page Flow, and Pagination

In real-world documents, table rendering is part of a larger page composition. Page geometry, table start position, and pagination behavior together determine how tabular data flows across one or more pages.

In PdfGrid, these concerns are resolved during rendering. The grid itself does not manage absolute layout or page transitions; instead, layout and pagination are governed by page configuration and the parameters supplied when calling Draw.

The following example demonstrates a typical layout and pagination configuration used in production reports.

Layout and Pagination Example

PdfDocument document = new PdfDocument();

// Create an A4 page with margins
PdfPageBase page = document.Pages.Add(
    PdfPageSize.A4,
    new PdfMargins(40),
    PdfPageRotateAngle.RotateAngle0,  // Rotates the page coordinate system
    PdfPageOrientation.Landscape  // Sets the page orientation
);

PdfGrid grid = new PdfGrid();
grid.DataSource = dataTable;

// Enable header repetition across pages
grid.RepeatHeader = true;

// Define table start position
float startX = 40f;
float startY = 80f;

// Render the table
grid.Draw(page, new PointF(startX, startY));

Below is a preview of the generated PDF with page configuration applied:

PDF generated from DataTable with Page Configuration Using C#

Technical Explanation

The rendering behavior illustrated above can be understood as a sequence of layout and flow decisions applied at draw time:

  • PdfPageBase

    • Pages.Add creates a new page with configurable size, margins, rotation, and orientation.
  • RepeatHeader

    • Boolean property controlling whether column headers are rendered on each page. When enabled, headers repeat automatically during multi-page rendering.
  • Draw method

    • Accepts a PointF defining the starting position on the page.
    • Responsible for rendering the grid and automatically handling pagination.

By configuring page geometry, table start position, and pagination behavior together, PdfGrid enables predictable multi-page table rendering without manual page management or row-level layout control.

Page numbers are also important for PDF reports. Refer to How to Add Pages Numbers to PDF with C# to learn page numbering techniques.


5. Customizing Table Appearance

Once layout is stable, appearance becomes the primary concern. PdfGrid provides a centralized styling model that allows table-wide, column-level, and row-level customization without interfering with data binding or pagination.

The example below consolidates common styling configurations typically applied in reporting scenarios.

Styling Example: Headers, Rows, and Columns

PdfDocument document = new PdfDocument();
PdfPageBase page = document.AppendPage();

PdfGrid grid = new PdfGrid();
grid.DataSource = dataTable;

// Create and apply the header style
PdfGridCellStyle headerStyle = new PdfGridCellStyle();
headerStyle.Font =
    new PdfFont(PdfFontFamily.Helvetica, 10f, PdfFontStyle.Bold);
headerStyle.BackgroundBrush =
    new PdfSolidBrush(Color.FromArgb(60, 120, 200));
headerStyle.TextBrush = PdfBrushes.White;
grid.Headers.ApplyStyle(headerStyle);


// Create row styles
PdfGridCellStyle defaultStyle = new PdfGridCellStyle();
defaultStyle.Font = new PdfFont(PdfFontFamily.Helvetica, 9f);
PdfGridCellStyle alternateStyle = new PdfGridCellStyle();
alternateStyle.BackgroundBrush = new PdfSolidBrush(Color.LightSkyBlue);
// Apply row styles
for (int rowIndex = 0; rowIndex < grid.Rows.Count; rowIndex++)
{
    if (rowIndex % 2 == 0)
    {
        grid.Rows[rowIndex].ApplyStyle(defaultStyle);
    }
    else
    {
        grid.Rows[rowIndex].ApplyStyle(alternateStyle);
    }
}

// Explicit column widths
grid.Columns[0].Width = 60f;    // OrderId
grid.Columns[1].Width = 140f;   // CustomerName
grid.Columns[2].Width = 90f;    // OrderDate
grid.Columns[3].Width = 90f;    // TotalAmount

// Render the table
grid.Draw(page, new PointF(40f, 80f));

Below is a preview of the generated PDF with the above styling applied:

PDF document generated from DataTable with styling applied

Styling Behavior Notes

  • Header styling

    • Header appearance is defined through a dedicated PdfGridCellStyle and applied using grid.Headers.ApplyStyle(...).
    • This ensures all header cells share the same font, background color, text color, and alignment across pages.
  • Row styling

    • Data rows are styled explicitly via grid.Rows[i].ApplyStyle(...).
    • Alternating row appearance is controlled by the row index, making the behavior predictable and easy to extend with additional conditions if needed.
  • Column width control

    • Column widths are assigned directly through grid.Columns[index].Width.
    • Explicit widths avoid layout shifts caused by content length and produce consistent results in report-style documents.

Make sure to bind the styles before applying styles.

All styles (header, rows, and columns) are resolved before calling grid.Draw(...). The rendering process applies these styles without affecting pagination or data binding.

For more complex styling scenarios, check out How to Create and Style Tables in PDF with C#.


6. Output Options: File vs Stream

Once the table has been rendered, the final step is exporting the PDF output.
The rendering logic remains identical regardless of the output destination.

Saving to a File

Saving directly to a file is suitable for desktop applications, background jobs, and batch exports.

document.SaveToFile("DataTableReport.pdf");
document.Close();

This approach is typically used in:

  • Windows desktop applications
  • Scheduled report generation
  • Offline or server-side batch processing

Writing to a Stream (Web and API Scenarios)

In web-based systems, saving to disk is often unnecessary or undesirable. Instead, the PDF can be written directly to a stream.

using (MemoryStream stream = new MemoryStream())
{
    document.SaveToStream(stream);
    document.Close();

    byte[] pdfBytes = stream.ToArray();
    // return pdfBytes as HTTP response
}

Stream output integrates cleanly with ASP.NET controllers or minimal APIs, without the need for temporary file storage.

For a complete example of returning a generated PDF from an ASP.NET application, see how to create and return PDF documents in ASP.NET.


7. Practical Tips and Common Issues

This section focuses on issues commonly encountered in real-world projects when exporting DataTables to PDF.

7.1 Formatting Dates and Numeric Values

PdfGrid renders values using their string representation. To ensure consistent formatting, values should be normalized before binding.

Typical examples include:

  • Formatting DateTime values using a fixed culture
  • Standardizing currency precision
  • Avoiding locale-dependent formats in multi-region systems

This preparation step belongs in the data layer, not the rendering layer.

7.2 Handling Null and Empty Values

DBNull.Value may result in empty cells or inconsistent alignment. Normalizing values before binding avoids layout surprises.

row["TotalAmount"] =
    row["TotalAmount"] == DBNull.Value ? 0m : row["TotalAmount"];

This approach keeps rendering logic simple and predictable.

7.3 Preventing Table Width Overflow

Wide DataTables can exceed page width if left unconfigured.

Common mitigation strategies include:

  • Explicit column width configuration
  • Slight font size reduction
  • Switching to landscape orientation
  • Increasing page margins selectively

These adjustments should be applied at the layout level rather than modifying the underlying data.

7.4 Large DataTables and Performance Considerations

When exporting DataTables with hundreds or thousands of rows, performance characteristics become more visible.

Practical recommendations:

  • Avoid per-cell or per-row styling in large tables.
  • Prefer table-level or column-level styles
  • Use standard fonts instead of custom embedded fonts
  • Keep layout calculations simple and consistent

For example, applying styles using grid.Rows[rowIndex].ApplyStyle(...) inside a loop can introduce unnecessary overhead for large datasets. In such cases, prefer applying a unified style at the row or column collection level (e.g., grid.Rows.ApplyStyle(...)) when individual row differentiation is not required.

In addition to rendering efficiency, in web environments, PDF generation should be performed outside the request thread when possible to avoid blocking.


8. Conclusion

Exporting a DataTable to PDF in C# can be handled directly through PdfGrid without manual table construction or low-level drawing. By binding an existing DataTable, you can generate paginated PDF tables while keeping layout and appearance fully under control.

This article focused on a practical, code-first approach, covering layout positioning, styling, and data preparation as they apply in real-world export scenarios. With these patterns in place, the same workflow scales cleanly from simple reports to large, multi-page documents.

If you plan to evaluate this workflow in a real project, you can apply for a temporary license from E-ICEBLUE to test the full functionality without limitations.


FAQ: DataTable to PDF in C#

When is PdfGrid the right choice for exporting DataTables to PDF?

PdfGrid is most suitable when you need structured, paginated tables with consistent layout. It handles column generation, headers, and page breaks automatically, making it a better choice than manual drawing for reports, invoices, and audit documents.

Should formatting be handled in the DataTable or in PdfGrid?

Data normalization (such as date formats, numeric precision, and null handling) should be done before binding. PdfGrid is best used for layout and visual styling, not for value transformation.

Can PdfGrid handle large DataTables efficiently?

Yes. PdfGrid supports automatic pagination and header repetition. For large datasets, applying table-level or column-level styles instead of per-cell styling helps maintain stable performance.

Tutorial on creating Word documents in Python

Creating Word documents programmatically is a common requirement in Python applications. Reports, invoices, contracts, audit logs, and exported datasets are often expected to be delivered as editable .docx files rather than plain text or PDFs.

Unlike plain text output, a Word document is a structured document composed of sections, paragraphs, styles, and layout rules. When generating Word documents in Python, treating .docx files as simple text containers quickly leads to layout issues and maintenance problems.

This tutorial focuses on practical Word document creation in Python using Spire.Doc for Python. It demonstrates how to construct documents using Word’s native object model, apply formatting at the correct structural level, and generate .docx files that remain stable and editable as content grows.

Content Overview


1. Understanding Word Document Structure in Python

Before writing code, it is important to understand how a Word document is structured internally.

A .docx file is not a linear stream of text. It consists of multiple object layers, each with a specific responsibility:

  • Document – the root container for the entire file
  • Section – defines page-level layout such as size, margins, and orientation
  • Paragraph – represents a logical block of text
  • Run (TextRange) – an inline segment of text with character formatting
  • Style – a reusable formatting definition applied to paragraphs or runs

When you create a Word document in Python, you are explicitly constructing this hierarchy in code. Formatting and layout behave predictably only when content is added at the appropriate level.

Spire.Doc for Python provides direct abstractions for these elements, allowing you to work with Word documents in a way that closely mirrors how Word itself organizes content.


2. Creating a Basic Word Document in Python

This section shows how to generate a valid Word document in Python using Spire.Doc. The example focuses on establishing the correct document structure and essential workflow.

Installing Spire.Doc for Python

pip install spire.doc

Alternatively, you can download Spire.Doc for Python and integrate it manually.

Creating a Simple .docx File

from spire.doc import Document, FileFormat

# Create the document container
document = Document()

# Add a section (defines page-level layout)
section = document.AddSection()

# Add a paragraph to the section
paragraph = section.AddParagraph()
paragraph.AppendText(
    "This document was generated using Python. "
    "It demonstrates basic Word document creation with Spire.Doc."
)

# Save the document
document.SaveToFile("basic_document.docx", FileFormat.Docx)
document.Close()

This example creates a minimal but valid .docx file that can be opened in Microsoft Word. It demonstrates the essential workflow: creating a document, adding a section, inserting a paragraph, and saving the file.

Basic Word document generated with Python

From a technical perspective:

  • The Document object represents the Word file structure and manages its content.
  • The Section defines the page-level layout context for paragraphs.
  • The Paragraph contains the visible text and serves as the basic unit for all paragraph-level formatting.

All Word documents generated with Spire.Doc follow this same structural pattern, which forms the foundation for more advanced operations.


3. Adding and Formatting Text Content

Text in a Word document is organized hierarchically. Formatting can be applied at the paragraph level (controlling alignment, spacing, indentation, etc.) or the character level (controlling font, size, color, bold, italic, etc.). Styles provide a convenient way to store these formatting settings so they can be consistently applied to multiple paragraphs or text ranges without redefining the formatting each time. Understanding the distinction between paragraph formatting, character formatting, and styles is essential when creating or editing Word documents in Python.

Adding and Setting Paragraph Formatting

All visible text in a Word document must be added through paragraphs, which serve as containers for text and layout. Paragraph-level formatting controls alignment, spacing, and indentation, and can be set directly via the Paragraph.Format property. Character-level formatting, such as font size, bold, or color, can be applied to text ranges within the paragraph via the TextRange.CharacterFormat property.

from spire.doc import Document, HorizontalAlignment, FileFormat, Color

document = Document()
section = document.AddSection()

# Add the title paragraph
title = section.AddParagraph()
title.Format.HorizontalAlignment = HorizontalAlignment.Center
title.Format.AfterSpacing = 20  # Space after the title
title.Format.BeforeSpacing = 20
title_range = title.AppendText("Monthly Sales Report")
title_range.CharacterFormat.FontSize = 18
title_range.CharacterFormat.Bold = True
title_range.CharacterFormat.TextColor = Color.get_LightBlue()

# Add the body paragraph
body = section.AddParagraph()
body.Format.FirstLineIndent = 20
body_range = body.AppendText(
    "This report provides an overview of monthly sales performance, "
    "including revenue trends across different regions and product categories. "
    "The data presented below is intended to support management decision-making."
)
body_range.CharacterFormat.FontSize = 12

# Save the document
document.SaveToFile("formatted_paragraph.docx", FileFormat.Docx)
document.Close()

Below is a preview of the generated Word document.

Formatted paragraph in Word document generated with Python

Technical notes

  • Paragraph.Format sets alignment, spacing, and indentation for the entire paragraph
  • AppendText() returns a TextRange object, which allows character-level formatting (font size, bold, color)
  • Every paragraph must belong to a section, and paragraph order determines reading flow and pagination

Creating and Applying Styles

Styles allow you to define paragraph-level and character-level formatting once and reuse it across the document. They can store alignment, spacing, font, and text emphasis, making formatting more consistent and easier to maintain. Word documents support both custom styles and built-in styles, which must be added to the document before being applied.

Creating and Applying a Custom Paragraph Style

from spire.doc import (
    Document, HorizontalAlignment, BuiltinStyle,
    TextAlignment, ParagraphStyle, FileFormat
)

document = Document()

# Create a new custom paragraph style
custom_style = ParagraphStyle(document)
custom_style.Name = "CustomStyle"
custom_style.ParagraphFormat.HorizontalAlignment = HorizontalAlignment.Center
custom_style.ParagraphFormat.TextAlignment = TextAlignment.Auto
custom_style.CharacterFormat.Bold = True
custom_style.CharacterFormat.FontSize = 20

# Inherit properties from a built-in heading style
custom_style.ApplyBaseStyle(BuiltinStyle.Heading1)

# Add the style to the document
document.Styles.Add(custom_style)

# Apply the custom style
title_para = document.AddSection().AddParagraph()
title_para.ApplyStyle(custom_style.Name)
title_para.AppendText("Regional Performance Overview")

Adding and Applying Built-in Styles

# Add a built-in style to the document
built_in_style = document.AddStyle(BuiltinStyle.Heading2)
document.Styles.Add(built_in_style)

# Apply the built-in style
heading_para = document.Sections.get_Item(0).AddParagraph()
heading_para.ApplyStyle(built_in_style.Name)
heading_para.AppendText("Sales by Region")

document.SaveToFile("document_styles.docx", FileFormat.Docx)

Preview of the generated Word document.

Word document with custom and built-in styles applied

Technical Explanation

  • ParagraphStyle(document) creates a reusable style object associated with the current document
  • ParagraphFormat controls layout-related settings such as alignment and text flow
  • CharacterFormat defines font-level properties like size and boldness
  • ApplyBaseStyle() allows the custom style to inherit semantic meaning and default behavior from a built-in Word style
  • Adding the style to document.Styles makes it available for use across all sections

Built-in styles, such as Heading 2, can be added explicitly and applied in the same way, ensuring the document remains compatible with Word features like outlines and tables of contents.


4. Inserting Images into a Word Document

In Word’s document model, images are embedded objects that belong to paragraphs, which ensures they flow naturally with text. Paragraph-anchored images adjust pagination automatically and maintain relative positioning when content changes.

Adding an Image to a Paragraph

from spire.doc import Document, TextWrappingStyle, HorizontalAlignment, FileFormat

document = Document()
section = document.AddSection()
section.AddParagraph().AppendText("\r\n\r\nExample Image\r\n")

# Insert an image
image_para = section.AddParagraph()
image_para.Format.HorizontalAlignment = HorizontalAlignment.Center
image = image_para.AppendPicture("Screen.jpg")

# Set the text wrapping style
image.TextWrappingStyle = TextWrappingStyle.Square
# Set the image size
image.Width = 350
image.Height = 200
# Set the transparency
image.FillTransparency(0.7)
# Set the horizontal alignment
image.HorizontalAlignment = HorizontalAlignment.Center

document.SaveToFile("document_images.docx", FileFormat.Docx)

Preview of the generated Word document.

Word document with an image inserted generated with Python

Technical details

  • AppendPicture() inserts the image into the paragraph, making it part of the text flow
  • TextWrappingStyle determines how surrounding text wraps around the image
  • Width and Height control the displayed size of the image
  • FillTransparency() sets the image opacity
  • HorizontalAlignment can center the image within the paragraph

Adding images to paragraphs ensures they behave like part of the text flow.

  • Pagination adjusts automatically when images change size.
  • Surrounding text reflows correctly when content is edited.
  • When exporting to formats like PDF, images maintain their relative position.

These behaviors are consistent with Word’s handling of inline images.

For more advanced image operations in Word documents using Python, see how to insert images into a Word document with Python for a complete guide.


5. Creating and Populating Tables

Tables are commonly used to present structured data such as reports, summaries, and comparisons.

Internally, a table consists of rows, cells, and paragraphs inside each cell.

Creating and Formatting a Table in a Word Document

from spire.doc import Document, DefaultTableStyle, FileFormat, AutoFitBehaviorType

document = Document()
section = document.AddSection()
section.AddParagraph().AppendText("\r\n\r\nExample Table\r\n")

# Define the table data
table_headers = ["Region", "Product", "Units Sold", "Unit Price ($)", "Total Revenue ($)"]
table_data = [
    ["North", "Laptop", 120, 950, 114000],
    ["North", "Smartphone", 300, 500, 150000],
    ["South", "Laptop", 80, 950, 76000],
    ["South", "Smartphone", 200, 500, 100000],
    ["East", "Laptop", 150, 950, 142500],
    ["East", "Smartphone", 250, 500, 125000],
    ["West", "Laptop", 100, 950, 95000],
    ["West", "Smartphone", 220, 500, 110000]
]

# Add a table to the section
table = section.AddTable()
table.ResetCells(len(table_data) + 1, len(table_headers))

# Populate table headers
for col_index, header in enumerate(table_headers):
    header_range = table.Rows[0].Cells[col_index].AddParagraph().AppendText(header)
    header_range.CharacterFormat.FontSize = 14
    header_range.CharacterFormat.Bold = True

# Populate table data
for row_index, row_data in enumerate(table_data):
    for col_index, cell_data in enumerate(row_data):
        data_range = table.Rows[row_index + 1].Cells[col_index].AddParagraph().AppendText(str(cell_data))
        data_range.CharacterFormat.FontSize = 12

# Apply a default table style and auto-fit columns
table.ApplyStyle(DefaultTableStyle.ColorfulListAccent6)
table.AutoFit(AutoFitBehaviorType.AutoFitToContents)

document.SaveToFile("document_tables.docx", FileFormat.Docx)

Preview of the generated Word document.

Word document with a table generated with Python

Technical details

  • Section.AddTable() inserts the table into the section content flow
  • ResetCells(rows, columns) defines the table grid explicitly
  • Table[row, column] or Table.Rows[row].Cells[col] returns a TableCell

Tables in Word are designed so that each cell acts as an independent content container. Text is always inserted through paragraphs, and each cell can contain multiple paragraphs, images, or formatted text. This structure allows tables to scale from simple grids to complex report layouts, making them flexible for reports, summaries, or any structured content.

For more detailed examples and advanced operations using Python, such as dynamically generating tables, merging cells, or formatting individual cells, see how to insert tables into Word documents with Python for a complete guide.


6. Adding Headers and Footers

Headers and footers in Word are section-level elements. They are not part of the main content flow and do not affect body pagination.

Each section owns its own header and footer, which allows different parts of a document to display different repeated content.

Adding Headers and Footers in a Section

from spire.doc import Document, FileFormat, HorizontalAlignment, FieldType, BreakType

document = Document()
section = document.AddSection()
section.AddParagraph().AppendBreak(BreakType.PageBreak)

# Add a header
header = section.HeadersFooters.Header
header_para1 = header.AddParagraph()
header_para1.AppendText("Monthly Sales Report").CharacterFormat.FontSize = 12
header_para1.Format.HorizontalAlignment = HorizontalAlignment.Left

header_para2 = header.AddParagraph()
header_para2.AppendText("Company Name").CharacterFormat.FontSize = 12
header_para2.Format.HorizontalAlignment = HorizontalAlignment.Right

# Add a footer with page numbers
footer = section.HeadersFooters.Footer
footer_para = footer.AddParagraph()
footer_para.Format.HorizontalAlignment = HorizontalAlignment.Center
footer_para.AppendText("Page ").CharacterFormat.FontSize = 12
footer_para.AppendField("PageNum", FieldType.FieldPage).CharacterFormat.FontSize = 12
footer_para.AppendText(" of ").CharacterFormat.FontSize = 12
footer_para.AppendField("NumPages", FieldType.FieldNumPages).CharacterFormat.FontSize = 12

document.SaveToFile("document_header_footer.docx", FileFormat.Docx)
document.Dispose()

Preview of the generated Word document.

Word document with a header and footer generated with Python

Technical notes

  • section.HeadersFooters.Header / .Footer provides access to header/footer of the section
  • AppendField() inserts dynamic fields like FieldPage or FieldNumPages to display dynamic content

Headers and footers are commonly used for report titles, company information, and page numbering. They update automatically as the document changes and are compatible with Word, PDF, and other export formats.

For more detailed examples and advanced operations, see how to insert headers and footers in Word documents with Python.


7. Controlling Page Layout with Sections

In Spire.Doc for Python, all page-level layout settings are managed through the Section object. Page size, orientation, and margins are defined by the section’s PageSetup and apply to all content within that section.

Configuring Page Size and Orientation

from spire.doc import PageSize, PageOrientation

section.PageSetup.PageSize = PageSize.A4()
section.PageSetup.Orientation = PageOrientation.Portrait

Technical explanation

  • PageSetup is a layout configuration object owned by the Section
  • PageSize defines the physical dimensions of the page
  • Orientation controls whether pages are rendered in portrait or landscape mode

PageSetup defines the layout for the entire section. All paragraphs, tables, and images added to the section will follow these settings. Changing PageSetup in one section does not affect other sections in the document, allowing different sections to have different page layouts.

Setting Page Margins

section.PageSetup.Margins.Top = 50
section.PageSetup.Margins.Bottom = 50
section.PageSetup.Margins.Left = 60
section.PageSetup.Margins.Right = 60

Technical explanation

  • Margins defines the printable content area for the section
  • Margin values are measured in document units

Margins control the body content area for the section. They are evaluated at the section level, so you do not need to set them for individual paragraphs, and header/footer areas are not affected.

Using Multiple Sections for Different Layouts

When a document requires different page layouts, additional sections must be created.

landscape_section = document.AddSection()
landscape_section.PageSetup.Orientation = PageOrientation.Landscape

Technical notes

  • AddSection() creates a new section and appends it to the document
  • Each section maintains its own PageSetup, headers, and footers
  • Content added after this call belongs to the new section

Using multiple sections allows mixing portrait and landscape pages or applying different layouts within a single Word document.

Below is an example preview of the above settings in a Word document:

Settting Page Layout in a Word Document Using Spire.Doc for Python


8. Setting Document Properties and Metadata

In addition to visible content, Word documents expose metadata through built-in document properties. These properties are stored at the document level and do not affect layout or rendering.

Assigning Built-in Document Properties

document.BuiltinDocumentProperties.Title = "Monthly Sales Report"
document.BuiltinDocumentProperties.Author = "Data Analytics System"
document.BuiltinDocumentProperties.Company = "Example Corp"

Technical notes

  • BuiltinDocumentProperties provides access to standard document properties
  • Properties such as Title, Author, and Company can be set programmatically

Document properties are commonly used for file indexing, search, document management, and audit workflows. In addition to built-in properties, Word documents support other metadata such as Keywords, Subject, Comments, and Hyperlink base. You can also define custom properties using Document.CustomDocumentProperties.

For a guide on managing document custom properties with Python, see how to manage custom metadata in Word documents with Python.


9. Saving, Exporting, and Performance Considerations

After constructing a Word document in memory, the final step is saving or exporting it to the required output format. Spire.Doc for Python supports multiple export formats through a unified API, allowing the same document structure to be reused without additional formatting logic.

Saving and Exporting Word Documents in Multiple Formats

A document can be saved as DOCX for editing or exported to other commonly used formats for distribution.

from spire.doc import FileFormat

document.SaveToFile("output.docx", FileFormat.Docx)
document.SaveToFile("output.pdf", FileFormat.PDF)
document.SaveToFile("output.html", FileFormat.Html)
document.SaveToFile("output.rtf", FileFormat.Rtf)

The export process preserves document structure, including sections, tables, images, headers, and footers, ensuring consistent layout across formats. Check out all the supported formats in the FileFormat enumeration.

Performance Considerations for Document Generation

For scenarios involving frequent or large-scale Word document generation, performance can be improved by:

  • Reusing document templates and styles
  • Avoiding unnecessary section creation
  • Writing documents to disk only after all content has been generated
  • After saving or exporting, explicitly releasing resources using document.Close()

When generating many similar documents with different data, mail merge is more efficient than inserting content programmatically for each file. Spire.Doc for Python provides built-in mail merge support for batch document generation. For details, see how to generate Word documents in bulk using mail merge in Python.

Saving and exporting are integral parts of Word document generation in Python. By using Spire.Doc for Python’s export capabilities and following basic performance practices, Word documents can be generated efficiently and reliably for both individual files and batch workflows.


10. Common Pitfalls When Creating Word Documents in Python

The following issues frequently occur when generating Word documents programmatically.

Treating Word Documents as Plain Text

Issue Formatting breaks when content length changes.

Recommendation Always work with sections, paragraphs, and styles rather than inserting raw text.

Hard-Coding Formatting Logic

Issue Global layout changes require editing multiple code locations.

Recommendation Centralize formatting rules using styles and section-level configuration.

Ignoring Section Boundaries

Issue Margins or orientation changes unexpectedly affect the entire document.

Recommendation Use separate sections to isolate layout rules.


11. Conclusion

Creating Word documents in Python involves more than writing text to a file. A .docx document is a structured object composed of sections, paragraphs, styles, and embedded elements.

By using Spire.Doc for Python and aligning code with Word’s document model, you can generate editable, well-structured Word files that remain stable as content and layout requirements evolve. This approach is especially suitable for backend services, reporting pipelines, and document automation systems.

For scenarios involving large documents or document conversion requirements, a licensed version is required.

Create Excel file in Python using code

Creating Excel files in Python is a common requirement in data-driven applications. When application data needs to be delivered in a format that business users can easily review and share, Excel remains one of the most practical and widely accepted choices.

In real projects, generating an Excel file with Python is often the starting point of an automated process. Data may come from databases, APIs, or internal services, and Python is responsible for turning that data into a structured Excel file that follows a consistent layout and naming convention.

This article shows how to create Excel files in Python, from generating a workbook from scratch, to writing data, applying basic formatting, and updating existing files when needed. All examples are presented from a practical perspective, focusing on how Excel files are created and used in real automation scenarios.

Table of Contents


1. Typical Scenarios for Creating Excel Files with Python

Creating Excel files with Python usually happens as part of a larger system rather than a standalone task. Common scenarios include:

  • Generating daily, weekly, or monthly business reports
  • Exporting database query results for analysis or auditing
  • Producing Excel files from backend services or batch jobs
  • Automating data exchange between internal systems or external partners

In these situations, Python is often used to generate Excel files automatically, helping teams reduce manual effort while ensuring data consistency and repeatability.


2. Environment Setup: Preparing to Create Excel Files in Python

In this tutorial, we use Free Spire.XLS for Python to demonstrate Excel file operations. Before generating Excel files with Python, ensure that the development environment is ready.

Python Version

Any modern Python 3.x version is sufficient for Excel automation tasks.

Free Spire.XLS for Python can be installed via pip:

pip install spire.xls.free

You can also download Free Spire.XLS for Python and include it in your project manually.

The library works independently of Microsoft Excel, which makes it suitable for server environments, scheduled jobs, and automated workflows where Excel is not installed.


3. Creating a New Excel File from Scratch in Python

This section focuses on creating an Excel file from scratch using Python. The goal is to define a basic workbook structure, including worksheets and header rows, before any data is written.

By generating the initial layout programmatically, you can ensure that all output files share the same structure and are ready for later data population.

Example: Creating a Blank Excel Template

from spire.xls import Workbook, FileFormat

# Initialize a new workbook
workbook = Workbook()

# Access the default worksheet
sheet = workbook.Worksheets[0]
sheet.Name = "Template"

# Add a placeholder title
sheet.Range["B2"].Text = "Monthly Report Template"

# Save the Excel file
workbook.SaveToFile("template.xlsx", FileFormat.Version2016)
workbook.Dispose()

The preview of the template file:

Creating a Blank Excel Template Using Python

In this example:

  • Workbook() creates a new Excel workbook that already contains three default worksheets.
  • The first worksheet is accessed via Worksheets[0] and renamed to define the basic structure.
  • The Range[].Text property writes text to a specific cell, allowing you to set titles or placeholders before real data is added.
  • The SaveToFile() method saves the workbook to an Excel file. And FileFormat.Version2016 specifies the Excel version or format to use.

Creating Excel Files with Multiple Worksheets in Python

In Python-based Excel generation, a single workbook can contain multiple worksheets to organize related data logically. Each worksheet can store a different data set, summary, or processing result within the same file.

The following example shows how to create an Excel file with multiple worksheets and write data to each one.

from spire.xls import Workbook, FileFormat

workbook = Workbook()

# Default worksheet
data_sheet = workbook.Worksheets[0]
data_sheet.Name = "Raw Data"

# Remove the second default worksheet
workbook.Worksheets.RemoveAt(1)

# Add a summary worksheet
summary_sheet = workbook.Worksheets.Add("Summary")

summary_sheet.Range["A1"].Text = "Summary Report"

workbook.SaveToFile("multi_sheet_report.xlsx", FileFormat.Version2016)
workbook.Dispose()

This pattern is commonly combined with read/write workflows, where raw data is imported into one worksheet and processed results are written to another.

Excel File Formats in Python Automation

When creating Excel files programmatically in Python, XLSX is the most commonly used format and is fully supported by modern versions of Microsoft Excel. It supports worksheets, formulas, styles, and is suitable for most automation scenarios.

In addition to XLSX, Spire.XLS for Python supports generating several common Excel formats, including:

  • XLSX – the default format for modern Excel automation
  • XLS – legacy Excel format for compatibility with older systems
  • CSV – plain-text format often used for data exchange and imports

In this article, all examples use the XLSX format, which is recommended for report generation, structured data exports, and template-based Excel files. You can check the FileFormat enumeration for a complete list of supported formats.


4. Writing Structured Data to an XLSX File Using Python

In real applications, data written to Excel rarely comes from hard-coded lists. It is more commonly generated from database queries, API responses, or intermediate processing results.

A typical pattern is to treat Excel as the final delivery format for already-structured data.

Python Example: Generating a Monthly Sales Report from Application Data

Assume your application has already produced a list of sales records, where each record contains product information and calculated totals. In this example, sales data is represented as a list of dictionaries, simulating records returned from an application or service layer.

from spire.xls import Workbook

workbook = Workbook()
sheet = workbook.Worksheets[0]
sheet.Name = "Sales Report"

headers = ["Product", "Quantity", "Unit Price", "Total Amount"]
for col, header in enumerate(headers, start=1):
    sheet.Range[1, col].Text = header

# Data typically comes from a database or service layer
sales_data = [
    {"product": "Laptop", "qty": 15, "price": 1200},
    {"product": "Monitor", "qty": 30, "price": 250},
    {"product": "Keyboard", "qty": 50, "price": 40},
    {"product": "Mouse", "qty": 80, "price": 20},
    {"product": "Headset", "qty": 100, "price": 10}
]

row = 2
for item in sales_data:
    sheet.Range[row, 1].Text = item["product"]
    sheet.Range[row, 2].NumberValue = item["qty"]
    sheet.Range[row, 3].NumberValue = item["price"]
    sheet.Range[row, 4].NumberValue = item["qty"] * item["price"]
    row += 1

workbook.SaveToFile("monthly_sales_report.xlsx")
workbook.Dispose()

The preview of the monthly sales report:

Generating a Monthly Sales Report from Application Data Using Python

In this example, text values such as product names are written using the CellRange.Text property, while numeric fields use CellRange.NumberValue. This ensures that quantities and prices are stored as numbers in Excel, allowing proper calculation, sorting, and formatting.

This approach scales naturally as the dataset grows and keeps business logic separate from Excel output logic. For more Excel writing examples, please refer to the How to Automate Excel Writing in Python.


5. Formatting Excel Data for Real-World Reports in Python

In real-world reporting, Excel files are often delivered directly to stakeholders. Raw data without formatting can be difficult to read or interpret.

Common formatting tasks include:

  • Making header rows visually distinct
  • Applying background colors or borders
  • Formatting numbers and currencies
  • Automatically adjusting column widths

The following example demonstrates how these common formatting operations can be applied together to improve the overall readability of a generated Excel report.

Python Example: Improving Excel Report Readability

from spire.xls import Workbook, Color, LineStyleType

# Load the created Excel file
workbook = Workbook()
workbook.LoadFromFile("monthly_sales_report.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Format header row
header_range = sheet.Range.Rows[0] # Get the first used row
header_range.Style.Font.IsBold = True
header_range.Style.Color = Color.get_LightBlue()

# Apply currency format
sheet.Range["C2:D6"].NumberFormat = "$#,##0.00"

# Format data rows
for i in range(1, sheet.Range.Rows.Count):
    if i % 2 == 0:
        row_range = sheet.Range[i, 1, i, sheet.Range.Columns.Count]
        row_range.Style.Color = Color.get_LightGreen()
    else:
        row_range = sheet.Range[i, 1, i, sheet.Range.Columns.Count]
        row_range.Style.Color = Color.get_LightYellow()

# Add borders to data rows
sheet.Range["A2:D6"].BorderAround(LineStyleType.Medium, Color.get_LightBlue())

# Auto-fit column widths
sheet.AllocatedRange.AutoFitColumns()

# Save the formatted Excel file
workbook.SaveToFile("monthly_sales_report_formatted.xlsx")
workbook.Dispose()

The preview of the formatted monthly sales report:

Improving Excel Report Readability Using Python

While formatting is not strictly required for data correctness, it is often expected in business reports that are shared or archived. Check How to Format Excel Worksheets with Python for more advanced formatting techniques.


6. Reading and Updating Existing Excel Files in Python Automation

Updating an existing Excel file usually involves locating the correct row before writing new values. Instead of updating a fixed cell, automation scripts often scan rows to find matching records and apply updates conditionally.

Python Example: Updating an Excel File

from spire.xls import Workbook

workbook = Workbook()
workbook.LoadFromFile("monthly_sales_report.xlsx")
sheet = workbook.Worksheets[0]

# Locate the target row by product name
for row in range(2, sheet.LastRow + 1):
    product_name = sheet.Range[row, 1].Text
    if product_name == "Laptop":
        sheet.Range[row, 5].Text = "Reviewed"
        break

sheet.Range["E1"].Text = "Status"

workbook.SaveToFile("monthly_sales_report_updated.xlsx")
workbook.Dispose()

The preview of the updated monthly sales report:

Updating Excel Worksheet Using Python


7. Combining Read and Write Operations in a Single Workflow

When working with imported Excel files, raw data is often not immediately suitable for reporting or further analysis. Common issues include duplicated records, inconsistent values, or incomplete rows.

This section demonstrates how to read existing Excel data, normalize it, and write the processed result to a new file using Python.

In real-world automation systems, Excel files are often used as intermediate data carriers rather than final deliverables.
They may be imported from external platforms, manually edited by different teams, or generated by legacy systems before being processed further.

As a result, raw Excel data frequently contains issues such as:

  • Multiple rows for the same business entity
  • Inconsistent or non-numeric values
  • Empty or incomplete records
  • Data structures that are not suitable for reporting or analysis

A common requirement is to read unrefined Excel data, apply normalization rules in Python, and write the cleaned results into a new worksheet that downstream users can rely on.

Python Example: Normalizing and Aggregating Imported Sales Data

In this example, a raw sales Excel file contains multiple rows per product.
The goal is to generate a clean summary worksheet where each product appears only once, with its total sales amount calculated programmatically.

from spire.xls import Workbook, Color

workbook = Workbook()
workbook.LoadFromFile("raw_sales_data.xlsx")

source = workbook.Worksheets[0]
summary = workbook.Worksheets.Add("Summary")

# Define headers for the normalized output
summary.Range["A1"].Text = "Product"
summary.Range["B1"].Text = "Total Sales"

product_totals = {}

# Read raw data and aggregate values by product
for row in range(2, source.LastRow + 1):
    product = source.Range[row, 1].Text
    value = source.Range[row, 4].Value

    # Skip incomplete or invalid rows
    if not product or value is None:
        continue

    try:
        amount = float(value)
    except ValueError:
        continue

    if product not in product_totals:
        product_totals[product] = 0

    product_totals[product] += amount

# Write aggregated results to the summary worksheet
target_row = 2
for product, total in product_totals.items():
    summary.Range[target_row, 1].Text = product
    summary.Range[target_row, 2].NumberValue = total
    target_row += 1
# Create a total row
summary.Range[summary.LastRow, 1].Text = "Total"
summary.Range[summary.LastRow, 2].Formula = "=SUM(B2:B" + str(summary.LastRow - 1) + ")"

# Format the summary worksheet
summary.Range.Style.Font.FontName = "Arial"
summary.Range[1, 1, 1, summary.LastColumn].Style.Font.Size = 12
summary.Range[1, 1, 1, summary.LastColumn].Style.Font.IsBold = True
for row in range(2, summary.LastRow + 1):
    for column in range(1, summary.LastColumn + 1):
        summary.Range[row, column].Style.Font.Size = 10
summary.Range[summary.LastRow, 1, summary.LastRow, summary.LastColumn].Style.Color = Color.get_LightGray()
summary.Range.AutoFitColumns()

workbook.SaveToFile("normalized_sales_summary.xlsx")
workbook.Dispose()

The preview of the normalized sales summary:

Normalizing and Aggregating Excel Data Using Python

Python handles data validation, aggregation, and normalization logic, while Excel remains the final delivery format for business users—eliminating the need for manual cleanup or complex spreadsheet formulas.


Choosing the Right Python Approach for Excel File Creation

Python offers multiple ways to create Excel files, and the best approach depends on how Excel is used in your workflow.

Free Spire.XLS for Python is particularly well-suited for scenarios where:

  • Excel files are generated or updated without Microsoft Excel installed
  • Files are produced by backend services, batch jobs, or scheduled tasks
  • You need precise control over worksheet structure, formatting, and formulas
  • Excel is used as a delivery or interchange format, not as an interactive analysis tool

For data exploration or statistical analysis, Python users may rely on other libraries upstream, while using Excel generation libraries like Free Spire.XLS for producing structured, presentation-ready files at the final stage.

This separation keeps data processing logic in Python and presentation logic in Excel, improving maintainability and reliability.

For more detailed guidance and examples, see the Spire.XLS for Python Tutorial.


8. Common Issues When Creating and Writing Excel Files in Python

When automating Excel generation, several practical issues are frequently encountered.

  • File path and permission errors

    Always verify that the target directory exists and that the process has write access before saving files.

  • Unexpected data types

    Explicitly control whether values are written as text or numbers to avoid calculation errors in Excel.

  • Accidental file overwrites

    Use timestamped filenames or output directories to prevent overwriting existing reports.

  • Large datasets

    When handling large volumes of data, write rows sequentially and avoid unnecessary formatting operations inside loops.

Addressing these issues early helps ensure Excel automation remains reliable as data size and complexity grow.


9. Conclusion

Creating Excel files in Python is a practical solution for automating reporting, data export, and document updates in real business environments. By combining file creation, structured data writing, formatting, and update workflows, Excel automation can move beyond one-off scripts and become part of a stable system.

Spire.XLS for Python provides a reliable way to implement these operations in environments where automation, consistency, and maintainability are essential. You can apply a temporary license to unlock the full potential of Python automation in Excel file processing.


FAQ: Creating Excel Files in Python

Can Python create Excel files without Microsoft Excel installed?

Yes. Libraries such as Spire.XLS for Python operate independently of Microsoft Excel, making them suitable for servers, cloud environments, and automated workflows.

Is Python suitable for generating large Excel files?

Python can generate large Excel files effectively, provided that data is written sequentially and unnecessary formatting operations inside loops are avoided.

How can I prevent overwriting existing Excel files?

A common approach is to use timestamped filenames or dedicated output directories when saving generated Excel reports.

Can Python update Excel files created by other systems?

Yes. Python can read, modify, and extend Excel files created by other applications, as long as the file format is supported.

See Also

Tutorial on how to convert XML to XLSX files: from basic to advanced

Converting XML to XLSX is a common requirement in data processing, reporting workflows, and system integration tasks. XML remains one of the most commonly used formats for structured or semi-structured data, but Excel’s XLSX format is far more convenient for analysis, filtering, visualization, and sharing with non-technical users.

Although the basic idea of transforming XML files into XLSX files sounds simple, real-world XML files vary widely in structure. Some resemble clean database tables, while others contain deeply nested nodes, attributes, or mixed content.

This guide provides a detailed, practical explanation of how to convert XML to XLSX using online tools, Microsoft Excel, and Python automation. It also discusses how to handle complex scenarios such as large datasets, nested elements, optional fields, and reverse conversion from XLSX back to XML.

Methods Overview:


1. Understanding XML to XLSX Conversion

XML (Extensible Markup Language) is a simple text format that stores data using tags, forming a tree-like structure where parent elements contain children, and information may appear as either elements or attributes. XLSX, by contrast, is strictly row-and-column based, so converting XML to XLSX means flattening this tree into a table while keeping the data meaningful.

For straightforward XML—for example, a file with repeated <item> nodes—each node naturally becomes a row and its children become columns. But real-world XML often contains:

  • nested details
  • nodes that appear only in some records
  • data stored in attributes
  • namespaces used in enterprise systems

Such variations require decisions on how to flatten the hierarchy. Some tools do this automatically, while others need manual mapping. This guide covers both simple and complex cases, including how to convert XML to XLSX without opening Excel, which is common in automated workflows.


2. Method 1: Convert XML to XLSX Online

Online XML-to-XLSX converters—such as Convertion Tools, AConvert, or DataConverter.io—are convenient when you need a quick transformation without installing software. The process is typically very simple:

  1. Visit a website that supports XML-to-XLSX conversion(such as DataConverter.io).

    DataConverter.io XML to XLSX Converter

  2. Upload your XML file or paste the XML string.

  3. Some converter allow you to edit the mapping before conversion.

    Convert XML to XLSX using DataConverter.io Online Tool

  4. Click Download to download the generated .xlsx file.

This method works well for one-time tasks and for XML files with straightforward structures where automatic mapping is usually accurate.

Advantages

  • Fast, no installation required.
  • Suitable for simple or moderate XML structures.
  • Ideal for one-time or occasional conversions.

Limitations

  • Limited understanding of schemas, namespaces, and nested hierarchies.
  • Deep XML may be flattened incorrectly, produce generic column names, or lose optional fields.
  • Upload size limits and possible browser freezes with large files.

Despite these constraints, online tools remain a practical choice for quick, small-scale XML-to-XLSX conversions.

You may also like: How to Convert CSV to Excel Files.


3. Method 2: Convert XML to XLSX in Excel

Excel provides native support for XML import, and for many users, this is the most transparent and controllable method. When used properly, Excel can read XML structures, apply customizable mappings, and save the converted result directly as an XLSX file.

3.1 Opening XML Directly in Excel

When you open an XML file through File → Open, Excel attempts to infer a schema and convert the data into a table. The correct sequence for this method is:

  1. Go to File → Open and select the XML file.

  2. When prompted, choose “As an XML table”.

    Open As an XML Table in Excel

  3. Excel loads the XML and automatically maps child nodes to columns.

This works well for “flat” XML structures, where each repeating element corresponds neatly to a row. However, hierarchical XML often causes issues: nested nodes may be expanded into repeated columns, or Excel may prompt you to define an XML table manually if it cannot determine a clear mapping.

This direct-open method remains useful when the XML resembles a database-style list of records and you need a fast way to inspect or work with the data.

3.2 Importing XML via Excel’s Data Tab

For structured XML files—especially those based on XSD schemas—Excel provides a more user-friendly import method through the Data tab. This approach gives you control over how XML elements are mapped to the worksheet without manually using the XML Source pane.

Steps:

  1. Open an Excel workbook or create a new one.

  2. Go to Data → Get Data → From File → From XML.

    Excel XML Import

  3. Select your XML file and click Import.

  4. Click Transform Data in the pop-up window.

  5. In the Power Query Editor window, select the elements or tables you want to load.

    Excel Power Query Editor

  6. Click Close & Load to save the changes, and the converted data will appear in a new worksheet.

This method allows Excel to automatically interpret the XML structure and map it into a table. It works well for hierarchical XML because you can select which sections to load, keeping optional fields and relationships intact.

This approach is especially useful for:

  • Importing government e-form data
  • Working with ERP/CRM exported XML
  • Handling industry-specific standards such as UBL or HL7

By using this workflow, you can efficiently control how XML data is represented in Excel while minimizing manual mapping steps.

3.3 Saving the Imported XML Data as an XLSX File

Once the XML data has been successfully imported—whether by directly opening the XML file or using Data → Get Data → From XML—the final step is simply saving the workbook in Excel’s native .xlsx format. At this stage, the data behaves like any other Excel table, meaning you can freely adjust column widths, apply filters, format cells, or add formulas.

To save the converted XML as an XLSX file:

  1. Go to File → Save As.
  2. Choose Excel Workbook (*.xlsx) as the file type.
  3. Specify a location and click Save.

Below is a preview of the Excel table imported from XML:

Result of importing XML to XLSX using Excel

If the XML file is based on an XSD schema and the mapping is preserved, Excel can even export the modified worksheet back to XML. However, for deeply nested XML structures, some preprocessing or manual adjustments might still be required before export.


4. Method 3: Convert XML to XLSX Using Python

Python is an excellent choice for converting XML to XLSX when you require automation, large-scale processing, or the ability to perform XML to XLSX conversion without opening Excel. Python scripts can run on servers, schedule tasks, and handle hundreds or thousands of XML files consistently.

4.1 Parsing XML in Python

Parsing XML is the first step in the workflow. Python’s xml.etree.ElementTree or lxml libraries provide event-based or tree-based parsing. They allow you to walk through each node, extract attributes, handle namespaces, and process deeply nested data.

The main challenge is defining how each XML node maps to an Excel row. Most workflows use either:

  • a predefined mapping (e.g., a “schema” defined in code), or
  • an auto-flattening logic that recursively converts nodes into columns.

Core XML Parsing Example:

The following Python code demonstrates how to parse an XML file and flatten it into a list of dictionaries, which can be used to generate an XLSX file.

import xml.etree.ElementTree as ET

xml_file = "Orders.xml"

# Recursively flatten an XML element into a flat dictionary
def flatten(e, prefix=""):
    r = {}
    # Add attributes
    for k, v in e.attrib.items():
        r[prefix + k] = v
    # Add children
    for c in e:
        key = prefix + c.tag
        # Scalar node (no children, has text)
        if len(c) == 0 and c.text and c.text.strip():
            r[key] = c.text.strip()
        else:
            # Nested node → recurse
            r.update(flatten(c, key + "_"))
    return r

# Parse XML
root = ET.parse(xml_file).getroot()

# Flatten all <Order> elements
rows = [flatten(order) for order in root.iter("Order")]

# Collect headers
headers = sorted({k for row in rows for k in row})

This snippet illustrates how to recursively flatten XML nodes and attributes into a structure suitable for Excel. For complex XML, this ensures that no data is lost and that each node maps to the correct column.


4.2 Generating XLSX Files from Parsed XML

Once the XML is parsed and flattened, the next step is writing the data into an Excel .xlsx file. Python libraries such as Free Spire.XLS for Python enable full spreadsheet creation without needing Excel installed, which is ideal for Linux servers or cloud environments.

Install Free Spire.XLS for Python:

pip install spire.xls.free

Steps for generating XLSX:

  1. Create a new workbook.
  2. Write headers and rows from the flattened data.
  3. Optionally, apply styles for better readability.
  4. Save the workbook as .xlsx.

Python Example:

This example demonstrates how to generate an XLSX file from the parsed XML data.

from spire.xls import Workbook, BuiltInStyles

xlsx_output = "output/XMLToExcel1.xlsx"

wb = Workbook()
ws = wb.Worksheets.get_Item(0)

# Header row
for col, h in enumerate(headers, 1):
    ws.Range.get_Item(1, col).Value = h

# Data rows
for row_idx, row in enumerate(rows, 2):
    for col_idx, h in enumerate(headers, 1):
        ws.Range.get_Item(row_idx, col_idx).Value = row.get(h, "")

# Apply styles (optional)
ws.AllocatedRange.Rows.get_Item(0).BuiltInStyle = BuiltInStyles.Heading2
for row in range(1, ws.AllocatedRange.Rows.Count):
    if row % 2 == 0:
        ws.AllocatedRange.Rows.get_Item(row).BuiltInStyle = BuiltInStyles.Accent2_20
    else:
        ws.AllocatedRange.Rows.get_Item(row).BuiltInStyle = BuiltInStyles.Accent2_40

# Save to XLSX
wb.SaveToFile(xlsx_output)
print("Done!")

After running the script, each XML node is flattened into rows, with columns representing attributes and child elements. This approach supports multiple worksheets, custom column names, and integration with further data transformations.

Below is the preview of the generated XLSX file:

Result of converting XML to XLSX using Python

For more examples of writing different types of data to Excel files using Python, see our Python write data to Excel guide.

4.3 Handling Complex XML

Business XML often contains irregular patterns. Using Python, you can:

  • recursively flatten nested elements
  • promote attributes into normal columns
  • skip irrelevant elements
  • create multiple sheets for hierarchical sections
  • handle missing or optional fields by assigning defaults

The example above shows a single XML file; the same logic can be extended to handle complex structures without data loss.

If you are working with Office Open XML (OOXML) files, you can also directly load them and save as XLSX files using Free Spire.XLS for Python. Check out How to Convert OOXML to XLSX conversion with Python.

4.4 Batch Conversion

Python’s strength becomes especially clear when converting large folders of XML files. A script can:

  • scan directories,
  • parse each file using the same flattening logic,
  • generate consistent XLSX files automatically.

This eliminates manual work and ensures reliable, error-free conversion across projects or datasets.

The following snippet illustrates a simple approach for batch converting multiple XML files to XLSX.

import os

input_dir = "xml_folder"
output_dir = "xlsx_folder"

for file_name in os.listdir(input_dir):
    if file_name.endswith(".xml"):
        xml_path = os.path.join(input_dir, file_name)
        
        # Parse XML and generate XLSX (using previously defined logic)
        convert_xml_to_xlsx(xml_path, output_dir)

5. Method 4: Custom Scripts or APIs for Enterprise Workflows

While the previous methods are suitable for one-time or batch conversions, enterprise environments often require automated, standardized, and scalable solutions for XML to XLSX conversion. Many business XML formats follow industry standards, involve complex schemas with mandatory and optional fields, and are integrated into broader data pipelines.

In these cases, companies typically develop custom scripts or API-based workflows to handle conversions reliably. For example:

  • ERP or CRM exports: Daily XML exports containing invoices or orders are automatically converted to XLSX and fed into reporting dashboards.
  • ETL pipelines: XML data from multiple systems is validated, normalized, and converted during Extract-Transform-Load processes.
  • Cloud integration: Scripts or APIs run on cloud platforms (AWS Lambda, Azure Functions) to process large-scale XML files without manual intervention.

Key benefits of this approach include:

  • Ensuring schema compliance through XSD validation.
  • Maintaining consistent mapping rules across multiple systems.
  • Automating conversions as part of regular business processes.
  • Integrating seamlessly with cloud services and workflow automation platforms.

This workflow is ideal for scenarios where XML conversion is a recurring task, part of an enterprise reporting system, or required for compliance with industry data standards.

Tools like Spire.XLS for Python can also be integrated into these workflows to generate XLSX files programmatically on servers or cloud functions, enabling reliable, Excel-free conversion within automated enterprise pipelines.


6. Troubleshooting XML to XLSX Conversion

Depending on the method you choose—online tools, Excel, or Python—different issues may arise during XML conversion. Understanding these common problems helps ensure that your final XLSX file is complete and accurate.

Deeply Nested or Irregular XML

Nested structures may be difficult to flatten into a single sheet.

  • Excel may require manual mapping or splitting into multiple sheets.
  • Python allows recursive flattening or creating multiple sheets programmatically.

Missing or Optional Elements

Not all XML nodes appear in every record. Ensure column consistency by using blank cells for missing fields, rather than skipping them, to avoid misaligned data.

Attributes vs. Elements

Decide which attributes should become columns and which can remain internal.

  • Excel may prompt for mapping.
  • Python can extract all attributes flexibly using recursive parsing.

Encoding Errors

Incorrect character encoding can cause parsing failures.

  • Ensure the XML declares encoding correctly (UTF-8, UTF-16, etc.).
  • Python tip: ET.parse(xml_file, parser=ET.XMLParser(encoding='utf-8')) helps handle encoding explicitly.

Large XML Files

Very large XML files may exceed browser or Excel limits.

  • Online tools might fail or freeze.
  • Excel may become unresponsive.
  • Python can use streaming parsers like iterparse to process large files with minimal memory usage.

7. Frequently Asked Questions

Here are some frequently asked questions about XML to XLSX conversion:

1. How to convert XML file to XLSX?

You can convert XML to XLSX using Excel, online tools, or Python automation, depending on your needs.

  • For quick, simple files, online tools are convenient (see Section 2).
  • For files with structured or nested XML, Excel’s Data import offers control (see Section 3).
  • For large-scale or automated processing, Python provides full flexibility (see Section 4).

2. How do I open an XML file in Excel?

Excel can import XML as a table. Simple XML opens directly, while complex or hierarchical XML may require mapping via the Data tab → Get Data → From XML workflow (see Section 3.2).

3. How can I convert XML to other formats?

Besides XLSX, XML can be converted to CSV, JSON, or databases using Python scripts or specialized tools. Python libraries such as xml.etree.ElementTree or lxml allow parsing and transforming XML into various formats programmatically.

4. How to convert XML to Excel online for free?

Free online converters can handle straightforward XML-to-XLSX conversions without installing software. They are ideal for small or moderate files but may struggle with deeply nested XML or large datasets (see Section 2).


8. Conclusion

XML to XLSX conversion takes multiple forms depending on the structure of your data and the tools available. Online converters offer convenience for quick tasks, while Excel provides greater control with XML mapping and schema support. When automation, large datasets, or custom mapping rules are required, Python is the most flexible and robust solution.

Whether your workflow involves simple XML lists, deeply nested business data, or large-scale batch processing, the methods in this guide offer practical and reliable ways to convert XML to XLSX and manage the data effectively across systems.

See Also

Tutorial on How to Cionvert PDF Table to Word

Converting a PDF table to Word sounds simple, but anyone who has tried it knows the process can be surprisingly inconsistent. PDF files are designed primarily for display, not for structured editing, which often leads to corrupted table layouts when converting or copying. Users frequently encounter broken rows, merged columns, lost borders, inconsistent cell spacing, or tables being exported as images rather than editable Word tables.

This complete guide explains reliable methods to convert PDF tables to Word tables. You will learn online tools, manual approaches, and highly accurate programmatic solutions. If you need to convert PDF tables to Word, extract structured data from PDF, or produce fully editable Word tables for professional or automated workflows, this article provides the practical knowledge and technical insights you need.


1. Why Converting PDF Tables to Word Is Difficult

Before exploring conversion methods, it’s important to understand why tables in PDFs are difficult to interpret. This helps you select the right tool depending on layout complexity.

1.1 PDFs Do Not Contain Real Tables

Unlike Word or HTML, PDF files do not store table structures. Instead, they store:

  • text using absolute positions
  • lines and borders as drawing paths
  • rows/columns only as visual alignment, not structured grid data

As a result:

  • Rows and columns are not recognized as cells
  • Line elements may not correspond to actual table boundaries
  • Choosing text or copying often disrupts the layout

This is why simple copy-paste almost always fails.

1.2 Word Requires Structured Table Elements

Microsoft Word expects:

  • a defined <table> element
  • consistent row/column counts
  • true cell boundaries
  • adjustable column widths

If the PDF content cannot be interpreted into this structure, Word creates unpredictable results—or exports the table as an image.

Understanding these limitations clarifies why reliable PDF table extraction requires intelligent parsing beyond simple visual detection.


2. Overview of Reliable Methods

This guide covers three practical ways to convert PDF tables into Word tables:

  1. Online PDF-to-Word converters – fastest, minimal control
  2. Desktop software – more stable, better accuracy
  3. Programmatic extraction and table reconstruction – highest precision and fully editable results

Tip: Most non-programmatic solutions convert the entire PDF into a Word file. If you only need the tables, you may need to manually remove the surrounding content afterward.

The most accurate method is extracting table data programmatically and rebuilding the Word table—this avoids formatting losses and ensures fully editable, clean table output.


3. Method 1: Convert PDF Table to Word Using Online Tools (Fastest & Easiest)

Online PDF-to-Word converters are convenient for quick conversions. These tools attempt to detect table structures automatically and export them into a Word document.

Typical Workflow

  1. Open an online converter (e.g., Free PDF Converter).

    Free PDF Converter - Convert PDF to Word

  2. Upload your PDF.

  3. Wait for automatic conversion.

  4. Download the Word file.

    Download the Converted Word File

  5. Adjust the table formatting manually if necessary.

Pros

  • No installation
  • Works on any device
  • Very fast

Cons

  • Poor accuracy for complex tables
  • Privacy concerns (cloud upload)
  • May output tables as images
  • Limited customization

Online tools are best for simple, one-time conversions.


4. Method 2: Convert PDF Tables Using Desktop Software (More Stable & Secure)

Desktop applications process files locally, offering better accuracy and privacy. Microsoft Word, Acrobat, and dedicated PDF software often provide acceptable table extraction for standard layouts.

General Workflow

  1. Install the software (e.g., Microsoft Word).

  2. Open the PDF file in the application.

    Open PDF in Microsoft Word

  3. Confirm the conversion by clicking .

  4. Wait for processing.

  5. Edit and save the result as a .docx file.

    Edit and Save the Converted Document as a .docx File

Pros

  • Higher detection accuracy
  • Supports large and multi-page files
  • No upload-related risks

Cons

  • Some software is paid
  • Still unreliable for irregular tables
  • Features differ across tools

Desktop tools work well for moderate complexity—but not for structured data that must remain perfectly editable.


5. Method 3: Extract and Convert PDF Tables Programmatically (Most Accurate Method)

For users needing consistent, automated, and high-fidelity table reconstruction, the programmatic approach is the most reliable. It allows:

  • precise extraction of table content
  • full control over Word table construction
  • batch processing
  • consistent formatting

This method can successfully convert even complex or non-standard PDF tables into perfectly editable Word tables.

5.1 Option A: Convert the Entire PDF to Word Automatically

Using Free Spire.PDF for Python, you can convert a PDF directly into a Word document. The library attempts to infer table structures by analyzing line elements, text positioning, and column alignment.

Install Free Spire.PDF for Python using pip:

pip install spire.pdf.free

Python Code Example for PDF to Word Conversion

from spire.pdf import PdfDocument, FileFormat

input_pdf = "sample.pdf"
output_docx = "output/pdf_to_docx.docx"

# Open a PDF document
pdf = PdfDocument()
pdf.LoadFromFile(input_pdf)

# Save the PDF to a Word document
pdf.SaveToFile(output_docx, FileFormat.DOCX)

Below is a preview of the PDF to Word conversion result:

Python PDF to Word Conversion Result

When to Use

  • Tables with clear grid lines
  • Simple to moderately complex layouts
  • When table fidelity does not need to be 100% perfect

Limitations

5.2 Option B: Extract Table Data and Rebuild Word Tables Manually (Best Accuracy)

You can also extract table data from PDFs using Free Spire.PDF for Python and build Word tables using Free Spire.Doc for Python. This method is the most reliable and precise method for converting PDF tables into Word documents. It provides:

  • Full table editability
  • Predictable structure
  • Complete formatting control
  • Reliable automation

Install Free Spire.Doc for Python:

pip install spire.doc.free

The workflow:

  1. Extract table data from PDF
  2. Create a Word document programmatically
  3. Insert a table using the extracted data
  4. Apply formatting

Python Code Example for Extracting PDF Tables and Building Word Tables

from spire.pdf import PdfDocument, PdfTableExtractor
from spire.doc import Document, FileFormat, DefaultTableStyle, AutoFitBehaviorType, BreakType

input_pdf = "sample.pdf"
output_docx = "output/pdf_table_to_docx.docx"

# Open a PDF document
pdf = PdfDocument()
pdf.LoadFromFile(input_pdf)

# Create a Word document
doc = Document()
section = doc.AddSection()

# Extract table data from the PDF
table_extractor = PdfTableExtractor(pdf)
for i in range(pdf.Pages.Count):
    tables = table_extractor.ExtractTable(i)
    if tables is not None and len(tables) > 0:
        for i in range(len(tables)):
            table = tables[i]
            # Create a table in the Word document
            word_table = section.AddTable()
            word_table.ApplyStyle(DefaultTableStyle.ColorfulGridAccent4)
            word_table.ResetCells(table.GetRowCount(), table.GetColumnCount())
            for j in range(table.GetRowCount()):
                for k in range(table.GetColumnCount()):
                    cell_text = table.GetText(j, k).replace("\n", " ")
                    # Write the cell text to the corresponding cell in the Word table
                    tr = word_table.Rows[j].Cells[k].AddParagraph().AppendText(cell_text)
                    tr.CharacterFormat.FontName = "Arial"
                    tr.CharacterFormat.FontSize = 11
            # Auto-fit the table
            word_table.AutoFit(AutoFitBehaviorType.AutoFitToContents)
            section.AddParagraph().AppendBreak(BreakType.LineBreak)

# Save the Word document
doc.SaveToFile(output_docx, FileFormat.Docx)

Below is a preview of the rebuilt Word tables:

Python Extracting PDF Tables and Building Word Tables

Why This Method Is Superior

  • Output tables are always editable
  • Ideal for automation and batch processing
  • Works even without visible table lines
  • Allows custom formatting, fonts, borders, and styles

This is the recommended solution for professional use cases.

If you need to export PDF tables in other formats, check out How to Extract Tables from PDF Using Python.


6. Accuracy Comparison of All Methods

Method Accuracy Editable Formatting Control Best For
Online converters ★★★★☆ Yes Low Quick one-time use
Desktop software ★★★★☆ Yes Medium Standard professional documents
Programmatic extraction + reconstruction ★★★★★ Yes Full Automation, business workflows
Full PDF → Word conversion (auto) ★★★★☆ Yes Medium Clean, well-structured PDFs

7. Best Practices for High-Quality Conversion

To ensure the best results, follow these best practices:

File Preparation

  • Prefer original text-based PDFs (not scanned)
  • Run OCR before table extraction if the PDF is scanned

Table Design Tips

  • Keep column alignment consistent
  • Avoid unnecessary merged cells
  • Maintain clear spacing between columns

Technical Recommendations

  • Use programmatic extraction for batch workflows
  • Reconstruct Word tables for exact formatting
  • Always validate extracted data for accuracy

8. Frequently Asked Questions

1. How do I convert a PDF table to an editable Word table without losing formatting?

Use either high-quality desktop converters or a programmatic library like Spire.PDF + Spire.Doc. Programmatic extraction provides the most consistent results.

2. Can I extract just the table (not the whole PDF) to Word?

Yes. Extract only the table data and rebuild the table programmatically. This produces fully editable Word tables.

3. Why did my PDF table appear as an image in Word?

The converter could not interpret the structure and exported the content as an image. Use a tool that supports table reconstruction.

4. What is the most accurate method for complex or irregular tables?

Programmatic extraction combined with manual table construction in Word.


9. Conclusion

Converting PDF tables to Word tables ranges from simple to highly complex depending on the structure of the original PDF. Quick online tools and desktop applications work well for simple layouts, but they often struggle with merged cells, irregular spacing, or multi-row structures.

For users requiring precise, editable, and reliable output, especially in business automation and large-scale document processing, the programmatic approach provides unmatched accuracy. It enables true table reconstruction in Word with full control over formatting, style, and cell structure.

Whether you need a fast online conversion or a deeply accurate automated pipeline, the methods in this guide ensure you can reliably convert PDF tables to fully editable Word tables across all complexity levels.

See Also

How to Convert Python .py Files to PDF Documents

Python scripts are commonly shared in development workflows, documentation pipelines, training courses, and academic environments. Converting Python (.py) files to PDF ensures consistent formatting, improves readability during distribution, and provides a non-editable version of your code suitable for archiving, printing, or publishing.

This guide explains every practical method to convert Python code to PDF, including online tools, IDE print-to-PDF workflows, Python-automated conversions with or without syntax highlighting, and batch-processing solutions. Each method includes detailed steps, technical notes, and installation guidance where required.

Overview:


1. Online Python-to-PDF Converters

Online tools offer the quickest way to convert a .py file into PDF without configuring software. They are convenient for users who simply need a readable PDF version of their code.

How Online Converters Work

After uploading a .py file, the service processes the content and outputs a PDF that preserves code indentation and basic formatting. Some platforms provide optional settings such as font size, page margins, and line numbering.

Example: Convert Python Code to PDF with CodeConvert AI

Steps

  1. Navigate to CodeConvert AI (a browser-based code conversion platform).

  2. Upload your .py file or paste the code into the editor.

  3. Choose Generate PDF to generate the PDF document.

    CodeConvert AI Python to PDF Output

  4. Print the document as a PDF file in the pop-up menu.

    Print Document as PDF File

Advantages

  • No installation or setup required.
  • Works across operating systems and devices.
  • Suitable for quick conversions and lightweight needs.

Limitations

  • Avoid uploading confidential or proprietary scripts.
  • Formatting options depend on the service.
  • Syntax highlighting quality varies across platforms.

2. Convert Python to PDF via IDE “Print to PDF”

Most development editors—including VS Code, PyCharm, Sublime Text, and Notepad++—support exporting files to PDF through the operating system’s print subsystem.

How It Works

When printing, the IDE renders the code using its internal syntax highlighting engine and passes the styled output to the OS, which then generates a PDF.

Steps (PyCharm Example)

  1. Open your Python file.

  2. Go to File → Print.

    PyCharm Print Python Code

  3. Optionally adjust page setup (margins, orientation, scaling).

    PyCharm Print Setup

  4. Choose Microsoft Print to PDF as the printer and save the PDF document.

    Print Python to PDF Using Microsoft Print to PDF

Advantages

  • Clean, readable formatting.
  • Syntax highlighting usually preserved.
  • No additional libraries required.

Limitations

  • Minimal control over line wrapping or layout.
  • Not designed for batch workflows.
  • Output quality varies by IDE theme.

3. Python Script–Based PY to PDF Conversion (Automated and Customizable)

Python-based tools provide flexible ways to convert .py files to PDF, supporting automated pipelines, consistent formatting, and optional syntax highlighting.
Before using the following methods, install the required components.

Required Packages

Free Spire.Doc for Python – handles PDF export

pip install spire.doc.free

Pygments – generates syntax-highlighted HTML

pip install pygments

3.1 Method A: Direct Text-to-PDF (No Syntax Highlighting)

This method reads the .py file as plain text and writes it into a PDF. It is suitable for simple exports, documentation snapshots, and internal archiving, which may not require syntax highlighting.

Example Code

from spire.doc import Document, FileFormat, BreakType, Color, LineSpacingRule, LineNumberingRestartMode, TextRange

# Read the Python code as a string
with open("Python.py", "r", encoding="utf-8") as f:
    python_code = f.read()

# Create a new document
doc = Document()

# Add the Python code to the document
section = doc.AddSection()
paragraph = section.AddParagraph()
for line_number, line in enumerate(python_code.split("\n")):
    tr = paragraph.AppendText(line)
    # Set the character format
    tr.CharacterFormat.FontName = "Consolas"
    tr.CharacterFormat.FontSize = 10.0
    if line_number < len(python_code.split("\n")) - 1:
        paragraph.AppendBreak(BreakType.LineBreak)

# Optional settings
# Set the background color and line spacing
paragraph.Format.BackColor = Color.get_LightGray()
paragraph.Format.LineSpacingRule = LineSpacingRule.Multiple
paragraph.Format.LineSpacing = 12.0 # 12pt meaning single spacing
# Set the line numbering
section.PageSetup.LineNumberingStartValue = 1
section.PageSetup.LineNumberingStep = 1
section.PageSetup.LineNumberingRestartMode = LineNumberingRestartMode.RestartPage
section.PageSetup.LineNumberingDistanceFromText = 10.0

# Save the document to a PDF file
doc.SaveToFile("output/Python-PDF.docx", FileFormat.Docx)
doc.SaveToFile("output/Python-PDF.pdf", FileFormat.PDF)

How It Works

This method inserts the Python code line by line using AppendText(), adds line breaks with AppendBreak(), and exports the final document through SaveToFile().

Below is the output effect after applying this method:

Convert Python to PDF Using Spire.Doc

For more details on customizing and inserting text into a PDF document, see our guide on Appending Text to PDF Documents with Python.

3.2 Method B: Syntax-Highlighted PDF (HTML → PDF)

When producing tutorials or readable documentation, syntax highlighting helps distinguish keywords and improves overall clarity. This method uses Pygments to generate inline-styled HTML, then inserts the HTML into a document.

Example Code

from spire.doc import Document, FileFormat, ParagraphStyle, IParagraphStyle
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

def py_to_inline_html(py_file_path):
    with open(py_file_path, "r", encoding="utf-8") as f:
        code = f.read()

    formatter = HtmlFormatter(noclasses=True, linenostart=1, linenos='inline') # Line numbers are optional
    inline_html = highlight(code, PythonLexer(), formatter)
    return inline_html

html_result = py_to_inline_html("Python.py")

doc = Document()
section = doc.AddSection()
paragraph = section.AddParagraph()

# Insert formatted HTML
paragraph.AppendHTML(html_result)

doc.SaveToFile("output/Python-PDF-Highlighted.pdf", FileFormat.PDF)

How It Works

Pygments first formats the code as inline CSS-based HTML. The HTML is added through AppendHTML(), and the completed document is exported using SaveToFile().

Below is the visual result generated by this styled HTML method:

Convert Python to PDF with Formatting using Pygments and Spire.Doc

3.3 Batch Conversion (Folder-to-PDF Workflows)

For converting multiple .py files at once, you only need a short loop to process all files in a directory and save them as PDFs.

Example Code (Minimal Batch Processing)

import os

input_folder = "scripts"
output_folder = "pdf-output"
os.makedirs(output_folder, exist_ok=True)

for file in os.listdir(input_folder):
    if file.endswith(".py"):
        py_path = os.path.join(input_folder, file)
        pdf_path = os.path.join(output_folder, file.replace(".py", ".pdf"))

        # Call your chosen conversion function here
        convert_py_to_pdf(py_path, pdf_path)

Notes

  • Reuses the conversion function defined earlier.
  • Automatically saves each .py file as a PDF with the same filename.
  • Works for both plain-text and syntax-highlighted methods.

You can also refer to our guide "Directly Convert HTML to PDF in Python" to learn more.


4. Comparison Table: Choosing the Right Conversion Method

Method Advantages Drawbacks Best Use Case
Online Converters Fast, no installation Privacy concerns, limited formatting Small, non-sensitive files
IDE Print-to-PDF Easy, preserves syntax (often) No automation Single-file conversion
Python Script (Direct/HTML) Automation, batch processing, customization Requires scripting knowledge Documentation, tutorials, pipelines

5. Best Practices for Generating Clear and Readable PDF Code

Follow these practices to ensure your Python code PDFs are easy to read, well-formatted, and professional-looking.

Use a Monospace Font

Monospace fonts preserve indentation and alignment, making code easier to read and debug.

Manage Long Lines and Wrapping

Enable line wrapping or adjust page margins to prevent horizontal scrolling and clipped code lines.

Maintain Consistent Formatting

Keep font sizes, colors, spacing, and page layout consistent across multiple PDFs, especially in batch processing.

Preserve Logical Code Blocks

Avoid splitting functions, loops, or multi-line statements across pages to maintain readability and structure.

Organize File Naming and Folder Structure

Use systematic file names and folder organization for batch exports, automated documentation, or project archives.


6. Conclusion

Converting Python (.py) files to PDF provides a reliable way to preserve code formatting, improve readability, and create shareable or archival documents. Whether for documentation, tutorials, educational purposes, or personal use, these methods allow developers, teams, and individual users to generate consistent and professional-looking PDF files from their Python source code.


7. Frequently Asked Questions About Converting Python Files to PDF

Below are some common questions developers, educators, and students ask about converting Python files to PDF. These answers are based on the methods covered in this guide.

How do I turn a Python code file (.py) into a PDF?

You can convert a Python .py file into a PDF using online converters like CodeConvert AI, IDE print-to-PDF functions, or Python-based scripts with Spire.Doc for automated and batch processing. Choose the method based on your workflow, formatting needs, and whether syntax highlighting is required.

How can I convert a Python Jupyter notebook to PDF?

For Jupyter notebooks (.ipynb), you can use the built-in File → Download As → PDF option if LaTeX is installed, or first export the notebook to .py script and then apply the Python-to-PDF methods described in this guide.

How do I ensure syntax highlighting is preserved in the PDF?

To retain syntax colors and formatting, convert the code to HTML with inline CSS styles using Pygments, then append it to a PDF document using AppendHTML(). This preserves keywords, comments, and indentation for clearer, professional-looking output.

How do I convert a text file to a PDF using Python on Windows?

On Windows, you can use free Python libraries like Spire.Doc to read the .txt or .py file, write it to a PDF, and optionally style it. IDE print-to-PDF is another option for single-file conversion without coding. Batch scripts can automate multiple files while maintaining consistent formatting.

See Also

Tutorial on how to save Excel files using C# code

Saving Excel files in C# is a common task in many .NET applications, especially when generating reports, exporting analytical data, or automating system logs. Whether you’re working with financial summaries or daily operations data, being able to create and save Excel files programmatically can significantly improve efficiency and accuracy.

In C#, developers can handle Excel files in multiple ways—creating new workbooks, writing data, and saving them in various formats such as XLSX, CSV, or PDF. With the help of dedicated Excel libraries, these operations can be automated efficiently without relying on Microsoft Excel or manual intervention.

In this article, we will explore how to:


Prepare the Development Environment

Before diving into code, set up your development environment with an Excel library that supports creating, reading, and saving files in .NET. In this tutorial, we’ll use Free Spire.XLS for .NET.

Step 1: Install Spire.XLS via NuGet

Install-Package FreeSpire.XLS

Step 2: Import the Required Namespace

using Spire.Xls;

Step 3: Create, Write, and Save a Simple Excel File

// Create a new workbook and get the first worksheet
Workbook workbook = new Workbook();
Worksheet sheet = workbook.Worksheets[0];

// Write "Hello World!" into cell A1
sheet.Range["A1"].Text = "Hello World!";

// Save the workbook to a file
workbook.SaveToFile("HelloWorld.xlsx", ExcelVersion.Version2016);

This simple example shows the basic workflow: creating a workbook, writing data into a cell, and saving the file.

After this, you can explore key classes and methods such as:

  • Workbook – represents the entire Excel file.
  • Worksheet – represents a single sheet within the workbook.
  • Range – allows access to specific cells for input, formatting, or styling.
  • Workbook.SaveToFile() – saves the workbook to disk in the specified Excel format.

Save Data to an Excel File in C#

Saving structured data like DataTable or DataGridView into an Excel file is one of the most practical tasks in C# development. Whether your application produces database results, UI grid content, or automated reports, exporting these datasets into Excel provides better readability and compatibility.

Example 1: Save DataTable to Excel

using Spire.Xls;
using System.Data;

Workbook workbook = new Workbook();
Worksheet sheet = workbook.Worksheets[0];
sheet.Name = "EmployeeData";

DataTable table = new DataTable();
table.Columns.Add("EmployeeID");
table.Columns.Add("FullName");
table.Columns.Add("Department");
table.Columns.Add("HireDate");
table.Columns.Add("Salary");

// Add sample rows
table.Rows.Add("E001", "Alice Johnson", "Finance", "2020-03-12", "7500");
table.Rows.Add("E002", "Bob Williams", "Human Resources", "2019-08-05", "6800");
table.Rows.Add("E003", "Catherine Lee", "IT", "2021-01-20", "8200");
table.Rows.Add("E004", "David Smith", "Marketing", "2018-11-30", "7100");
table.Rows.Add("E005", "Emily Davis", "Sales", "2022-06-15", "6900");

// Insert the DataTable into worksheet
sheet.InsertDataTable(table, true, 1, 1);

// Apply built-in formats
sheet.AllocatedRange.Rows[0].BuiltInStyle = BuiltInStyles.Heading1;
for (int i = 1; i < sheet.AllocatedRange.Rows.Count(); i++)
{
    sheet.AllocatedRange.Rows[i].BuiltInStyle = BuiltInStyles.Accent1;
}
sheet.AllocatedRange.AutoFitColumns();
sheet.AllocatedRange.AutoFitRows();

// Save to Excel
workbook.SaveToFile("EmployeeDataExport.xlsx", FileFormat.Version2016);

How it works:

  • InsertDataTable() inserts data starting from a specific cell.
  • The true argument includes column headers.
  • SaveToFile() saves the workbook to disk; the second parameter specifies the Excel format version.
  • FileFormat.Version2016 specifies the Excel format version.

Below is a sample output showing how the exported DataTable looks in Excel:

Result of saving DataTable to Excel using C#


Example 2: Save DataGridView to Excel

Workbook workbook = new Workbook();
Worksheet sheet = workbook.Worksheets[0];

sheet.InsertDataTable(((DataTable)dataGridView1.DataSource), true, 1, 1);
workbook.SaveToFile("GridViewExport.xlsx", FileFormat.Version2016);

Tip: Before saving, ensure that your DataGridView’s data source is properly cast to a DataTable.This ensures the exported structure matches the UI grid layout.

If you want learn how to create Excel files with more data types, formatting, and other elements, you can explore the article How to Create Excel Files in C#.


Save Excel File as CSV or XLS in C#

Different systems and platforms require different spreadsheet formats. While XLSX is now the standard, CSV, XLS, and other formats remain common in enterprise environments. Exporting to different formats allows Excel data to be shared, processed, or imported by various applications.

Example 1: Save Excel as CSV

CSV (Comma-Separated Values) is a simple text-based format ideal for exchanging data with databases, web applications, or other systems that support plain text files.

Workbook workbook = new Workbook();
workbook.LoadFromFile("EmployeeDataExport.xlsx");
workbook.SaveToFile("Report.csv", ",", FileFormat.CSV);

Example 2: Save Excel as XLS (Legacy Format)

XLS (Excel 97–2003 format) is a legacy binary format still used in older systems or applications that do not support XLSX. Saving to XLS ensures compatibility with legacy enterprise workflows.

Workbook workbook = new Workbook();
workbook.LoadFromFile("EmployeeDataExport.xlsx");
workbook.SaveToFile("Report_legacy.xls", ExcelVersion.Version97to2003);

Additional Supported Spreadsheet Formats

In addition to the commonly used CSV, XLS, and XLSX formats, the library also supports several other spreadsheet and template formats. The table below lists these formats together with their corresponding FileFormat enumeration values for easy reference when saving files programmatically.

Format Description Corresponding Enum (FileFormat)
ODS OpenDocument Spreadsheet FileFormat.ODS
XLSM Macro-enabled Excel workbook FileFormat.Xlsm
XLSB Binary Excel workbook FileFormat.Xlsb2007 / FileFormat.Xlsb2010
XLT Excel 97–2003 template FileFormat.XLT
XLTX Excel Open XML template FileFormat.XLTX
XLTM Macro-enabled Excel template FileFormat.XLTM

These additional formats are useful for organizations that work with legacy systems, open document standards, or macro/template–based automation workflows.


Save Excel as PDF or HTML in C#

In many cases, Excel files need to be converted into document or web formats for easier publishing, printing, or sharing.
Exporting to PDF is ideal for fixed-layout reports and printing, while HTML is suitable for viewing Excel data in a web browser.

Example 1: Save Excel as PDF

The following example shows how to save an Excel workbook as a PDF file using C#. This is useful for generating reports that preserve layout and formatting.

Workbook workbook = new Workbook();
workbook.LoadFromFile("EmployeeDataExport.xlsx");
workbook.SaveToFile("EmployeeDataExport.pdf", FileFormat.PDF);

Here is an example of the generated PDF file after exporting from Excel:

Result of saving Excel as PDF using C#


Example 2: Save Excel as HTML

This example demonstrates how to save an Excel workbook as an HTML file, making it easy to render the data in a web browser or integrate with web applications.

Workbook workbook = new Workbook();
workbook.LoadFromFile("EmployeeDataExport.xlsx");
workbook.SaveToFile("EmployeeDataExport.html", FileFormat.HTML);

Below is a preview of the exported HTML file rendered in a browser:

Result of saving Excel as HTML using C#


Additional Supported Document & Web Formats

In addition to PDF and HTML, the library supports several other document and web-friendly formats. The table below shows these formats together with their FileFormat enumeration values for easy reference.

Format Description Corresponding Enum (FileFormat)
XML Excel data exported as XML FileFormat.XML
Bitmap / Image Export Excel as Bitmap or other image formats FileFormat.Bitmap
XPS XML Paper Specification document FileFormat.XPS
PostScript PostScript document FileFormat.PostScript
OFD Open Fixed-layout Document format FileFormat.OFD
PCL Printer Command Language file FileFormat.PCL
Markdown Markdown file format FileFormat.Markdown

These formats provide additional flexibility for distributing Excel content across different platforms and workflows, whether for printing, web publishing, or automation.


Save an Excel File to MemoryStream in C#

In web applications or cloud services, saving Excel files directly to disk may not be ideal due to security or performance reasons. Using MemoryStream allows you to generate Excel files in memory and deliver them directly to clients for download. Spire.XLS for .NET also supports both loading and saving workbooks through MemoryStream, making it easy to handle Excel files entirely in memory.

Workbook workbook = new Workbook();
Worksheet sheet = workbook.Worksheets[0];
sheet.Range["A1"].Text = "Export to Stream";

using (MemoryStream stream = new MemoryStream())
{
    workbook.SaveToStream(stream, ExcelVersion.Version2016);
    byte[] bytes = stream.ToArray();

    // Example: send bytes to client for download in ASP.NET
    // Response.BinaryWrite(bytes);
}

This approach is particularly useful for ASP.NET, Web API, or cloud services, where you want to serve Excel files dynamically without creating temporary files on the server.


Open and Re-Save Excel Files in C#

In many applications, you may need to load an existing Excel workbook, apply updates or modifications, and then save it back to disk or convert it to a different format. This is common when updating reports, modifying exported data, or automating Excel file workflows.

Example: Open and Update an Excel File

The following C# code loads the previous workbook, updates the first cell, and saves the changes:

Workbook workbook = new Workbook();
workbook.LoadFromFile("EmployeeDataExport.xlsx");

Worksheet sheet = workbook.Worksheets[0];
sheet.Range["A1"].Text = "Updated Content"; // Update the cell value
sheet.Range["A1"].AutoFitColumns(); // Autofit the column width

// Save the updated workbook
workbook.Save(); // Saves to original file
workbook.SaveToFile("UpdatedCopy.xlsx", ExcelVersion.Version2016); // Save as new file

The screenshot below shows the updated Excel sheet after modifying and saving the file:

Result of updating an Excel file using C#

You can also check out the detailed guide on editing Excel files using C# for more advanced scenarios.


Best Practices When Saving Excel Files

  1. Avoid File Overwrites

    Check if the target file exists before saving to prevent accidental data loss.

  2. Handle Permissions and Paths Properly

    Ensure your application has write access to the target folder, especially in web or cloud environments.

  3. Choose the Right Format

    Use XLSX for modern compatibility, CSV for data exchange, and PDF for printing or sharing reports.


Conclusion

Saving Excel files in C# covers a wide range of operations—from writing structured datasets, exporting to different spreadsheet formats, converting to PDF/HTML, to handling file streams in web applications.

With the flexibility offered by libraries such as Spire.XLS for .NET, developers can implement powerful Excel automation workflows with ease.


FAQ

Q1: How do I save an Excel file in C#?

Use SaveToFile() with the appropriate ExcelVersion or FileFormat:

workbook.SaveToFile("Report.xlsx", ExcelVersion.Version2016);

Q2: How do I open and modify an existing Excel file?

Load the workbook using LoadFromFile(), make changes, then save:

Workbook workbook = new Workbook();
workbook.LoadFromFile("ExistingFile.xlsx");
workbook.Worksheets[0].Range["A1"].Text = "Updated Content";
workbook.SaveToFile("UpdatedFile.xlsx", ExcelVersion.Version2016);

Q3: How do I save as CSV or PDF?

Specify the desired FileFormat in SaveToFile():

workbook.SaveToFile("Report.csv", ",", FileFormat.CSV);
workbook.SaveToFile("Report.pdf", FileFormat.PDF);

Q4: Can I save Excel to memory instead of disk?

Yes. Use SaveToStream() to output to a MemoryStream, useful in web or cloud applications:

using (MemoryStream stream = new MemoryStream())
{
    workbook.SaveToStream(stream, ExcelVersion.Version2016);
    byte[] bytes = stream.ToArray();
}

See Also

Page 3 of 4