page 1

Subscribe to this RSS feed

Python (365)

Children categories

Spire.Presentation for Python (53)

View items...

Spire.OCR for Python (3)

View items...

How to Convert Word to JSON in Python (DOCX to JSON)

2026-06-17 01:26:12 Written by Allen Yang

Converting Word documents to JSON in Python

Converting Word documents to JSON is a common requirement when building automated document processing pipelines, feeding content into AI models, or migrating structured data from DOCX files into databases and APIs. Unlike CSV or XML, JSON provides a flexible, hierarchical format that can represent paragraphs, tables, and nested document structures in a single output.

However, Word files do not have a native JSON export format. A .docx file is a rich-text document composed of sections, paragraphs, styles, and tables—not a structured data source. Converting it to JSON requires deciding how to map that content into a meaningful schema.

This tutorial demonstrates how to convert Word to JSON in Python using Spire.Doc for Python. You will learn three progressively advanced methods: extracting plain paragraph text, converting Word tables to JSON arrays, and preserving the full document structure—including headings, paragraphs, and tables—in a hierarchical JSON output. The examples in this tutorial work with both DOCX and legacy DOC files supported by Spire.Doc.

Quick Navigation

How Is Word Converted into JSON?
Install the Required Library
Method 1 – Convert Word Text to JSON
Method 2 – Convert Word Tables to JSON
Method 3 – Preserve Document Structure in JSON
When to Use Word to JSON Conversion
Limitations and Best Practices
FAQ
Conclusion

1. How Is Word Converted into JSON?

A Word document is a rich-text format organized into sections, paragraphs, and tables—not a structured data format. When you convert Word to JSON, there is no single standard for how the content should be represented. The right schema depends on how the JSON will be used:

Goal	Recommended Schema	Key Characteristics
AI embedding / semantic search	Paragraph array	Flat list of text strings, one per paragraph
Full-text search indexing	Text blocks with metadata	Paragraphs with section index and style info
Database import from tables	Table row objects	Header-keyed dictionaries, one per row
RAG pipeline / knowledge base	Hierarchical structure	Nested sections with headings, paragraphs, and tables
Document archival / interchange	Full document model	Sections, styles, metadata, and all content types

For example, a Word document containing a heading and a paragraph could be represented in JSON as:

{
  "document": [
    {"type": "heading", "level": 1, "text": "Project Overview"},
    {"type": "paragraph", "text": "This report summarizes the quarterly results."}
  ]
}

The three methods in this tutorial correspond directly to these schema choices:

Method 1 produces a paragraph array (AI embedding, search indexing)
Method 2 produces table row objects (database import, data extraction)
Method 3 produces a hierarchical structure (RAG, knowledge base, document understanding)

Choose the method that matches your goal, or combine elements from multiple methods to build a custom schema.

2. Install the Required Library

This tutorial uses Spire.Doc for Python to read and parse DOC/DOCX files. Install it via pip:

pip install spire.doc

Alternatively, you can download Spire.Doc for Python and integrate it manually.

After installation, import the library in your Python script:

from spire.doc import Document, FileFormat
from spire.doc.common import *

Spire.Doc provides APIs to load Word documents, iterate through sections, paragraphs, and tables, and extract text content—everything needed to build a Word-to-JSON pipeline.

3. Method 1 – Convert Word Text to JSON

The simplest way to convert Word to JSON is to extract all paragraph text from the document and store it in a JSON array. This approach works well when you need the full text content without structural metadata—such as for full-text search, AI text embedding, or simple content export.

3.1 Read Paragraphs from a Word Document

Spire.Doc represents a Word document as a collection of Sections, each containing Paragraphs. To extract all text, you iterate through every section and every paragraph within it.

from spire.doc import Document
from spire.doc.common import *

input_file = "ProjectReport.docx"

document = Document()
document.LoadFromFile(input_file)

paragraphs = []
for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    for j in range(section.Paragraphs.Count):
        paragraph = section.Paragraphs.get_Item(j)
        text = paragraph.Text
        if text.strip():
            paragraphs.append(text)

document.Close()

Each paragraph's .Text property returns the plain text content, stripping away formatting. The if text.strip() check filters out empty paragraphs that exist as spacing or layout elements in Word.

3.2 Serialize the Extracted Text to JSON

Assuming the paragraph data extracted in the previous step is stored in the paragraphs list, you can serialize it to JSON and save it to a file as follows:

import json

output_file = "paragraphs.json"

result = {
    "source": input_file,
    "paragraph_count": len(paragraphs),
    "paragraphs": paragraphs
}

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(result, f, indent=2, ensure_ascii=False)

Output Example

The following JSON snippet shows the structure of the generated output file:

{
  "source": "ProjectReport.docx",
  "paragraph_count": 3,
  "paragraphs": [
    "Quarterly Sales Report",
    "This document provides an overview of sales performance across all regions."
  ]
}

Conversion Result

The image below shows the source Word document and the JSON file generated after extracting paragraph text.

Word to JSON conversion result - paragraph extraction

3.3 Explanation

Why iterate through Sections and Paragraphs instead of extracting all text at once? Because Word documents are organized hierarchically. A document contains one or more sections (each with its own page layout), and each section contains paragraphs. Iterating at this level gives you control over which content to include or skip—such as filtering empty paragraphs or limiting extraction to specific sections.

Storing paragraphs as a JSON array is the most straightforward structure. Each element is a string, making the output easy to consume in downstream systems. This approach is well-suited for:

Full-text indexing – feed paragraph text into search engines like Elasticsearch
AI text embedding – convert paragraphs into vector representations for semantic search
Simple content export – extract readable text from Word files without formatting

However, this method loses structural information. Headings, body text, and list items are all treated the same way. If you need to distinguish between them, see Method 3.

If your goal is simply to extract text content from Word documents without converting it to JSON, you may also be interested in our guide on extracting text from Word documents in Python.

4. Method 2 – Convert Word Tables to JSON

In many Word documents—reports, invoices, product lists, configuration tables—the most valuable content lives inside tables, not in paragraphs. Converting Word tables to JSON allows you to extract structured row-and-column data that can be directly loaded into databases, APIs, or data analysis tools.

Why Tables Need Special Handling

Tables in Word are stored as a grid of rows and cells, where each cell contains its own paragraphs. Unlike paragraph text, table data has an inherent two-dimensional structure that maps naturally to JSON objects. The first row often contains column headers, and subsequent rows contain data records.

Extracting Tables from a Word Document

The following code reads all tables from a Word document, uses the first row as column headers, and converts each subsequent row into a JSON object:

import json
from spire.doc import Document
from spire.doc.common import *

input_file = "SalesData.docx"
output_file = "tables.json"

document = Document()
document.LoadFromFile(input_file)

all_tables = []

for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    for t in range(section.Tables.Count):
        table = section.Tables.get_Item(t)
        rows_data = []

        if table.Rows.Count < 2:
            continue

        header_row = table.Rows[0]
        headers = []
        for c in range(header_row.Cells.Count):
            cell_text = header_row.Cells[c].Paragraphs[0].Text.strip()
            headers.append(cell_text)

        for r in range(1, table.Rows.Count):
            row = table.Rows[r]
            row_dict = {}
            for c in range(row.Cells.Count):
                cell_text = row.Cells[c].Paragraphs[0].Text.strip()
                row_dict[headers[c] if c < len(headers) else f"Column_{c}"] = cell_text
            rows_data.append(row_dict)

        all_tables.append({
            "table_index": t,
            "headers": headers,
            "row_count": len(rows_data),
            "rows": rows_data
        })

document.Close()

result = {
    "source": input_file,
    "table_count": len(all_tables),
    "tables": all_tables
}

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(result, f, indent=2, ensure_ascii=False)

Output Example

The following JSON snippet shows the structure of the generated output file, with each table row mapped to a JSON object using the header row as keys:

{
  "source": "SalesData.docx",
  "table_count": 1,
  "tables": [
    {
      "table_index": 0,
      "headers": ["Region", "Product", "Units Sold", "Revenue"],
      "row_count": 3,
      "rows": [
        {"Region": "North", "Product": "Laptop", "Units Sold": "120", "Revenue": "114000"},
        {"Region": "South", "Product": "Laptop", "Units Sold": "80", "Revenue": "76000"}
      ]
    }
  ]
}

Conversion Result

The image below demonstrates how table data from a Word document is converted into structured JSON records.

Word to JSON conversion result - table extraction

Explanation

The code treats the first row as a header row and maps each cell in subsequent rows to the corresponding header key. This produces a JSON array of objects, which is the most common and useful format for tabular data.

Key considerations:

table.Rows.Count < 2 skips tables that have only a header row or are empty
row.Cells[c].Paragraphs[0].Text extracts text from the first paragraph in each cell. For simplicity, the example reads only the first paragraph. If a cell contains multiple paragraphs, iterate through the entire Paragraphs collection and concatenate the results:

cell_text = "\n".join(
    row.Cells[c].Paragraphs[p].Text.strip()
    for p in range(row.Cells[c].Paragraphs.Count)
    if row.Cells[c].Paragraphs[p].Text.strip()
)

headers[c] if c < len(headers) else f"Column_{c}" handles cases where a data row has more cells than the header row

This method is ideal for extracting structured data from reports, invoices, product catalogs, and configuration tables stored in Word documents. The resulting JSON can be directly loaded into databases, used in web APIs, or processed by data analysis tools.

If you need to generate Word documents from structured JSON data, see our tutorial on converting JSON to Word in Python, which covers creating Word content and tables directly from JSON objects and arrays.

5. Method 3 – Preserve Document Structure in JSON

Methods 1 and 2 treat paragraphs and tables as separate, isolated elements. In practice, Word documents have a meaningful hierarchy: headings introduce sections, paragraphs provide detail, and tables present structured data within a specific context.

Preserving this hierarchy in JSON produces output that is far more useful for knowledge base construction, RAG (Retrieval-Augmented Generation) pipelines, and document understanding systems. Instead of a flat list of text, you get a structured representation that maintains the logical flow of the original document.

How to Preserve Headings, Paragraphs, and Tables in a Hierarchical JSON Structure

The approach is to iterate through all child objects in each section's body, determine the type of each object (paragraph or table), and build a structured JSON representation accordingly. For paragraphs, you can detect headings by checking the StyleName property.

import json
from spire.doc import Document
from spire.doc.common import *

input_file = "ProjectReport.docx"
output_file = "structured_output.json"

HEADING_STYLES = {
    "Heading1": 1,
    "Heading2": 2,
    "Heading3": 3,
    "Heading4": 4,
}

def get_heading_level(style_name):
    return HEADING_STYLES.get(style_name, None)

def extract_table_data(table):
    rows_data = []
    if table.Rows.Count < 1:
        return {"headers": [], "rows": []}

    header_row = table.Rows[0]
    headers = []
    for c in range(header_row.Cells.Count):
        headers.append(header_row.Cells[c].Paragraphs[0].Text.strip())

    for r in range(1, table.Rows.Count):
        row = table.Rows[r]
        row_dict = {}
        for c in range(row.Cells.Count):
            cell_text = row.Cells[c].Paragraphs[0].Text.strip()
            row_dict[headers[c] if c < len(headers) else f"Column_{c}"] = cell_text
        rows_data.append(row_dict)

    return {"headers": headers, "rows": rows_data}

document = Document()
document.LoadFromFile(input_file)

sections_data = []

for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    content_items = []

    for j in range(section.Body.ChildObjects.Count):
        obj = section.Body.ChildObjects.get_Item(j)

        if isinstance(obj, Paragraph):
            text = obj.Text.strip()
            if not text:
                continue

            heading_level = get_heading_level(obj.StyleName)
            if heading_level:
                content_items.append({
                    "type": "heading",
                    "level": heading_level,
                    "text": text
                })
            else:
                content_items.append({
                    "type": "paragraph",
                    "text": text
                })

        elif isinstance(obj, Table):
            table_data = extract_table_data(obj)
            content_items.append({
                "type": "table",
                "row_count": len(table_data["rows"]),
                "data": table_data
            })

    sections_data.append({
        "section_index": i,
        "content": content_items
    })

document.Close()

result = {
    "source": input_file,
    "section_count": len(sections_data),
    "sections": sections_data
}

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(result, f, indent=2, ensure_ascii=False)

Output Example

The following JSON snippet shows how headings, paragraphs, and tables are represented in the hierarchical output structure:

{
  "source": "ProjectReport.docx",
  "section_count": 1,
  "sections": [
    {
      "section_index": 0,
      "content": [
        {
          "type": "heading",
          "level": 1,
          "text": "Quarterly Sales Report"
        },
        {
          "type": "paragraph",
          "text": "This report provides an overview of sales performance across all regions."
        },
        {
          "type": "heading",
          "level": 2,
          "text": "Regional Breakdown"
        },
        {
          "type": "table",
          "row_count": 3,
          "data": {
            "headers": ["Region", "Product", "Units Sold", "Revenue"],
            "rows": [
              {"Region": "North", "Product": "Laptop", "Units Sold": "120", "Revenue": "114000"}
            ]
          }
        }
      ]
    }
  ]
}

Conversion Result

The image below illustrates how headings, paragraphs, and tables are preserved in a hierarchical JSON structure.

Word to JSON conversion result - hierarchical structure

Explanation

This method differs from the previous two in a fundamental way: it uses section.Body.ChildObjects to iterate through all content elements in document order, rather than separately iterating paragraphs and tables. This preserves the original sequence and interleaving of headings, paragraphs, and tables.

Key design decisions:

Heading detection via StyleName – Word headings are paragraphs styled with "Heading1", "Heading2", etc. Checking the style name allows you to distinguish headings from body text and record the heading level. Note that the exact heading style names may vary depending on the Word template or language settings (e.g., "Heading 1" with a space, or localized names like "标题 1" in Chinese). To handle these variations, normalize the style name before lookup:

def get_heading_level(style_name):
    normalized = style_name.lower().replace(" ", "")
    heading_map = {"heading1": 1, "heading2": 2, "heading3": 3, "heading4": 4}
    return heading_map.get(normalized, None)

ChildObjects iteration – Unlike section.Paragraphs (which only returns paragraphs) or section.Tables (which only returns tables), ChildObjects returns all elements in their original order. This is essential for preserving the document's logical structure.
Structured JSON output – Each content item includes a type field (heading, paragraph, or table), making it easy for downstream systems to process different content types appropriately.

This approach is particularly valuable for:

RAG and AI pipelines – the heading structure enables chunking documents by section, improving retrieval accuracy
Knowledge base construction – hierarchical JSON maps directly to tree-structured knowledge graphs
Document understanding – preserving the relationship between headings and their associated content allows semantic analysis of document sections

If you need to extract specific content types from Word documents, such as headings, paragraphs, or tables, see our tutorial on reading Word documents in Python, which covers content extraction techniques in more detail.

6. When to Use Word to JSON Conversion

Word to JSON conversion is useful in any scenario where structured data needs to be extracted from Word documents at scale. Common use cases include:

AI and RAG document processing – Convert Word documents into JSON chunks for embedding and retrieval in LLM-based applications. The hierarchical structure from Method 3 enables section-level chunking, which produces better retrieval results than flat text splitting.
Knowledge base construction – Build structured knowledge bases from technical documentation, policy documents, or manuals stored as .docx files.
Batch data extraction – Extract data from hundreds of Word reports, invoices, or forms and load the results into a database or data warehouse.
Contract and resume parsing – Convert legal contracts, HR documents, or resumes into structured JSON for automated analysis and comparison.
API and web application data exchange – Serve Word document content through REST APIs as JSON, enabling web and mobile applications to consume document data without handling .docx files directly.

7. Limitations and Best Practices

Limitations

No standard JSON schema for Word – Unlike CSV or XML, there is no universally accepted format for representing Word content in JSON. The structure you choose must be designed for your specific use case.
Complex formatting is not captured – The methods in this tutorial extract text content and basic structural metadata (heading levels, table data). They do not capture fonts, colors, images, page layout, headers/footers, or footnotes. If your application requires these elements, additional extraction logic is needed.
Merged table cells require special handling – Word tables can contain merged cells (both horizontal and vertical). The simple row-by-row extraction in Method 2 assumes a regular grid. Documents with merged cells may produce unexpected results.
Large documents may need chunked processing – For documents with hundreds of pages or dozens of tables, consider processing sections or tables individually to manage memory usage.

Best Practices

Design your JSON schema before writing code – Decide what you need (text only? headings? tables? full structure?) and choose the appropriate extraction method.
Validate output against sample documents – Word documents vary widely in structure and formatting. Test your conversion logic against representative samples from your actual document set.
Handle encoding explicitly – Always specify encoding="utf-8" when writing JSON files to avoid character encoding issues with non-ASCII text.
Use ensure_ascii=False in json.dump – This preserves Unicode characters in the output rather than escaping them, which is important for documents containing non-English text.

8. FAQ

Can I convert DOCX to JSON in Python?

Yes. Using Spire.Doc for Python, you can load any .docx file, iterate through its sections, paragraphs, and tables, and serialize the extracted content to JSON using Python's built-in json module. This tutorial demonstrates three methods for doing so, from simple text extraction to full structural preservation.

What is the best Word to JSON converter for developers?

For developers who need batch processing, automation, or custom JSON schemas, a Python-based approach using Spire.Doc is more flexible than online converters. Online tools work for one-off conversions but cannot handle large-scale processing, custom output formats, or integration into automated pipelines.

Can I convert Word tables to JSON?

Yes. By iterating through the tables in a Word document and extracting cell text row by row, you can convert table data into a JSON array of objects. Method 2 in this tutorial demonstrates this with header-based key mapping.

Does Word have a native JSON export option?

No. Microsoft Word does not provide a built-in JSON export format. Word files can be saved as DOCX, PDF, HTML, RTF, and plain text, but converting to JSON requires a programmatic approach that reads the document structure and maps it to a JSON schema.

Can I preserve headings and structure when converting Word to JSON?

Yes. By iterating through all child objects in each section's body and checking paragraph style names, you can detect headings, body paragraphs, and tables, then build a hierarchical JSON structure that preserves the document's logical organization. Method 3 in this tutorial provides a complete implementation.

Can I convert Word to JSON online?

Yes, there are online Word to JSON converters that can handle one-off conversions. However, online tools are limited to single-file processing and do not allow customization of the JSON schema. For batch processing, automated pipelines, or custom output structures, a Python-based approach using Spire.Doc is more practical and scalable.

9. Conclusion

In this article, we demonstrated how to convert Word documents to JSON in Python using Spire.Doc for Python. We covered three methods of increasing complexity: extracting paragraph text as a flat JSON array, converting Word tables to structured JSON objects, and preserving the full document hierarchy—including headings, paragraphs, and tables—in a single JSON output.

Each method serves a different purpose. Plain text extraction works for indexing and embedding. Table extraction is ideal for data migration and report parsing. Full structural preservation enables knowledge base construction and RAG pipelines. Choose the approach that matches your requirements, and extend the JSON schema as needed for your specific use case.

Spire.Doc for Python provides comprehensive Word document processing capabilities beyond JSON conversion, including document creation, formatting, mail merge, and format conversion. You can apply for a 30-day free license to evaluate all features.

Published in Conversion

Tagged under

doc Python Conversion

How to Convert JSON to Word in Python (JSON to DOCX)

2026-06-12 08:48:58 Written by Allen Yang

Convert JSON data to Word documents in Python

JSON is one of the most common formats for exchanging structured data between applications, APIs, and databases. In many business scenarios, however, JSON data needs to be transformed into human-readable Word documents such as reports, invoices, summaries, contracts, or exported records.

Converting JSON to Word is not a simple file format conversion. JSON has no inherent Word structure, so the process requires parsing the JSON data and mapping its elements to appropriate Word document components such as paragraphs, tables, and headings.

This article demonstrates how to convert JSON data into Word documents in Python using Spire.Doc for Python. We'll cover multiple approaches, including exporting JSON as formatted text, creating Word tables from JSON arrays, and generating structured reports from nested JSON data.

Content Overview

Understanding JSON-to-Word Conversion
Install Spire.Doc for Python
Method 1: Convert JSON to Word as Formatted Text
Method 2: Convert JSON Arrays to Word Tables
Method 3: Generate Structured Word Reports from JSON
Handle Nested JSON Objects
Handle Missing or Optional Fields
Convert JSON Files to Word Documents
Why Use Spire.Doc for JSON-to-Word Conversion
FAQ
Conclusion

1. Understanding JSON-to-Word Conversion

JSON and Word documents serve fundamentally different purposes. JSON is a structured data format designed for data exchange and machine processing, while Word documents are intended for human consumption with rich formatting, visual hierarchy, and page layout.

As a result, converting JSON to Word is not a direct format transformation. The JSON data must first be parsed and mapped to appropriate document elements before a Word document can be generated.

The conversion process typically follows this workflow:

JSON Data
      ↓
Parse JSON (json.loads)
      ↓
Map Data Structure
      ↓
Spire.Doc for Python
      ↓
Paragraphs / Tables / Headings
      ↓
DOCX Document

In Python, the built-in json module is commonly used to parse JSON data, while Spire.Doc for Python handles document generation. After the JSON structure is analyzed and mapped, Spire.Doc can create paragraphs, tables, headings, images, and other Word elements programmatically, producing a fully formatted DOCX document.

The table below shows common mappings between JSON structures and Word elements:

JSON Structure	Word Element	Example
Key-Value Pair	Paragraph	`"Name": "John"` → `Name: John`
Array	Table	`[{...}, {...}]` → rows and columns
Object	Section	Nested object → grouped content
Title Field	Heading	`"title": "Report"` → Heading 1
URL/Image Path	Image	`"logo": "img.png"` → embedded image

Understanding these mappings is important because the same JSON data can be presented in different ways depending on the document's purpose. For example, simple key-value data may be exported as paragraphs, while collections of records are usually easier to read when rendered as tables. With Spire.Doc for Python, these mappings can be implemented programmatically to generate professional Word documents from structured JSON data.

2. Install Spire.Doc for Python

Before converting JSON to Word, you need to install Spire.Doc for Python in your development environment.

Install via pip (Recommended)

pip install spire.doc

Alternatively, you can download Spire.Doc for Python and integrate it manually.

After installation, import the library in your project:

from spire.doc import *
from spire.doc.common import *

3. Method 1: Convert JSON to Word as Formatted Text

This method is the simplest approach for converting JSON to Word. It works well for API responses, configuration files, and simple JSON exports where each key-value pair maps to a paragraph.

Sample JSON

{
  "Name": "John Smith",
  "Department": "Sales",
  "Country": "USA"
}

Python Code

import json
from spire.doc import Document, FileFormat, HorizontalAlignment

json_data = '{"Name": "John Smith", "Department": "Sales", "Country": "USA"}'
data = json.loads(json_data)

document = Document()
section = document.AddSection()

for key, value in data.items():
    paragraph = section.AddParagraph()
    text_range = paragraph.AppendText(f"{key}: {value}")
    text_range.CharacterFormat.FontSize = 12
    paragraph.Format.AfterSpacing = 6

document.SaveToFile("json_to_text.docx", FileFormat.Docx)
document.Close()

Output

The following Word document shows how JSON key-value pairs can be converted into formatted paragraphs.

JSON key-value pairs converted to Word paragraphs

When to Use This Approach

This method is best suited for:

Simple key-value JSON objects
API response exports
Configuration file documentation
Quick data snapshots

It is not ideal for large datasets or tabular data, where Method 2 (tables) provides better readability.

If your goal is to analyze, filter, or manipulate structured JSON data in a spreadsheet, you may also be interested in our guide on converting JSON to Excel in Python.

4. Method 2: Convert JSON Arrays to Word Tables

When JSON data contains arrays of objects, tables provide the most effective way to present the data in a Word document. This is the most common scenario for converting JSON to Word, as many APIs and databases return data as JSON arrays.

Sample JSON

[
  {"Product": "Laptop", "Price": 1200, "Stock": 45},
  {"Product": "Mouse", "Price": 30, "Stock": 200},
  {"Product": "Keyboard", "Price": 85, "Stock": 120}
]

Python Code

import json
from spire.doc import (
    Document, FileFormat, HorizontalAlignment,
    VerticalAlignment, TableRowHeightType, Color
)

json_data = '''[
  {"Product": "Laptop", "Price": 1200, "Stock": 45},
  {"Product": "Mouse", "Price": 30, "Stock": 200},
  {"Product": "Keyboard", "Price": 85, "Stock": 120}
]'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

if data:
    headers = list(data[0].keys())
    table = section.AddTable(True)
    table.ResetCells(len(data) + 1, len(headers))

    header_row = table.Rows[0]
    header_row.IsHeader = True
    header_row.Height = 20
    header_row.HeightType = TableRowHeightType.Exactly

    for col_index, header in enumerate(headers):
        header_row.Cells[col_index].CellFormat.Shading.BackgroundPatternColor = Color.get_Gray()
        header_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = header_row.Cells[col_index].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        text_range = paragraph.AppendText(header)
        text_range.CharacterFormat.Bold = True
        text_range.CharacterFormat.FontSize = 12

    for row_index, record in enumerate(data):
        data_row = table.Rows[row_index + 1]
        data_row.Height = 20
        data_row.HeightType = TableRowHeightType.Exactly
        for col_index, key in enumerate(headers):
            data_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
            paragraph = data_row.Cells[col_index].AddParagraph()
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            text_range = paragraph.AppendText(str(record.get(key, "")))
            text_range.CharacterFormat.FontSize = 11

document.SaveToFile("json_to_table.docx", FileFormat.Docx)
document.Close()

Output

The following screenshot shows the generated Word table created from the JSON array.

JSON array converted to Word table

Why Use Tables for JSON Arrays

Tables are the natural fit for JSON array data because:

Each JSON object maps to a table row
Each key maps to a column header
Data is aligned for easy scanning and comparison
Tables are the standard format for reports, inventory lists, and exported database records

Enhancing JSON Tables with Formatting

Unlike plain text exports, Spire.Doc allows JSON data to be rendered as professionally formatted Word tables. Beyond basic table creation, you can apply:

Table styles – Use DefaultTableStyle or ApplyStyle for consistent, polished table appearances
Borders and shading – Control cell borders, background colors, and alternating row colors
Alignment – Set horizontal and vertical alignment at the cell, row, or table level
Custom formatting – Apply font size, bold, and color to individual cells or ranges
Auto-fit behavior – Use AutoFit to adjust column widths to content or window size

These formatting capabilities transform raw JSON data into professional report layouts suitable for business documents, client deliverables, and automated reporting pipelines.

If you need to create more sophisticated Word tables, such as merged cells, custom table layouts, or advanced formatting, see our guide on creating and formatting tables in Word documents using Python.

5. Method 3: Generate Structured Word Reports from JSON

Real-world JSON data often contains a mix of metadata, summary text, and tabular data. This method combines headings, paragraphs, and tables to generate a complete structured Word report from JSON.

Sample JSON

{
  "title": "Monthly Sales Report",
  "period": "June 2026",
  "summary": "Total revenue reached $580,000 this month, representing a 12% increase over the previous period. All regions showed positive growth.",
  "sales": [
    {"Region": "North", "Revenue": 150000, "Units": 320},
    {"Region": "South", "Revenue": 120000, "Units": 280},
    {"Region": "East", "Revenue": 180000, "Units": 410},
    {"Region": "West", "Revenue": 130000, "Units": 290}
  ]
}

Python Code

import json
from spire.doc import (
    Document, FileFormat, HorizontalAlignment,
    VerticalAlignment, TableRowHeightType, Color,
    BuiltinStyle
)

json_data = '''{
  "title": "Monthly Sales Report",
  "period": "June 2026",
  "summary": "Total revenue reached $580,000 this month, representing a 12% increase over the previous period. All regions showed positive growth.",
  "sales": [
    {"Region": "North", "Revenue": 150000, "Units": 320},
    {"Region": "South", "Revenue": 120000, "Units": 280},
    {"Region": "East", "Revenue": 180000, "Units": 410},
    {"Region": "West", "Revenue": 130000, "Units": 290}
  ]
}'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

heading_style = document.AddStyle(BuiltinStyle.Heading1)
subheading_style = document.AddStyle(BuiltinStyle.Heading2)

title_para = section.AddParagraph()
title_para.ApplyStyle(heading_style.Name)
title_para.AppendText(data.get("title", "Report"))

period_para = section.AddParagraph()
period_para.AppendText(f"Period: {data.get('period', 'N/A')}")
period_para.Format.AfterSpacing = 12

summary_heading = section.AddParagraph()
summary_heading.ApplyStyle(subheading_style.Name)
summary_heading.AppendText("Executive Summary")

summary_para = section.AddParagraph()
summary_para.AppendText(data.get("summary", ""))
summary_para.Format.AfterSpacing = 12

sales_heading = section.AddParagraph()
sales_heading.ApplyStyle(subheading_style.Name)
sales_heading.AppendText("Sales Data")

sales = data.get("sales", [])
if sales:
    headers = list(sales[0].keys())
    table = section.AddTable(True)
    table.ResetCells(len(sales) + 1, len(headers))

    header_row = table.Rows[0]
    header_row.IsHeader = True
    header_row.Height = 20
    header_row.HeightType = TableRowHeightType.Exactly

    for col_index, header in enumerate(headers):
        header_row.Cells[col_index].CellFormat.Shading.BackgroundPatternColor = Color.get_Gray()
        header_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = header_row.Cells[col_index].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        text_range = paragraph.AppendText(header)
        text_range.CharacterFormat.Bold = True

    for row_index, record in enumerate(sales):
        data_row = table.Rows[row_index + 1]
        data_row.Height = 20
        data_row.HeightType = TableRowHeightType.Exactly
        for col_index, key in enumerate(headers):
            data_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
            paragraph = data_row.Cells[col_index].AddParagraph()
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            paragraph.AppendText(str(record.get(key, "")))

document.SaveToFile("json_report.docx", FileFormat.Docx)
document.Close()

Output

The generated Word document combines headings, descriptive text, and tabular data into a structured report, making the JSON data easier to read and share.

Structured Word report generated from JSON data

Key Techniques

This example demonstrates several important techniques for generating Word reports from JSON:

Headings – Use BuiltinStyle.Heading1 and Heading2 for document structure and table-of-contents compatibility
Paragraphs – Add summary and descriptive text between headings
Tables – Render JSON arrays as tabular data within the report
Combinations – Mix multiple Word element types in a single document

Why Structured Reports Matter

In business environments, JSON data rarely exists in isolation. It typically comes from APIs, databases, or reporting systems and needs to be transformed into documents that decision-makers can read, share, and archive. Common scenarios include:

Sales reports – Revenue, units, and regional breakdowns from CRM or ERP systems
Inventory reports – Stock levels, reorder alerts, and warehouse summaries
Customer summaries – Contact details, order history, and account status
Compliance reports – Audit logs, access records, and policy status
Automated reporting systems – Scheduled jobs that generate documents from JSON data and distribute them via email or document management systems

Spire.Doc makes it possible to transform structured JSON data into polished business documents automatically, combining headings, paragraphs, and tables in a single output.

If you need to build more sophisticated document layouts, such as multi-section reports, cover pages, tables of contents, headers, footers, or custom document templates, see our guide on creating structured Word documents in Python.

6. Handle Nested JSON Objects

Many real-world JSON responses contain nested objects. For example, a customer record may include an address object with its own fields. Handling these nested structures is essential for complete JSON-to-Word conversion.

Example JSON

{
  "customer": {
    "name": "Tom Wilson",
    "email": "tom@example.com",
    "address": {
      "street": "123 Main St",
      "city": "Springfield",
      "state": "IL"
    }
  }
}

Python Code

import json
from spire.doc import Document, FileFormat, HorizontalAlignment

def add_nested_object(section, obj, indent_level=0):
    for key, value in obj.items():
        if isinstance(value, dict):
            heading_para = section.AddParagraph()
            heading_text = "  " * indent_level + key.capitalize()
            text_range = heading_para.AppendText(heading_text)
            text_range.CharacterFormat.Bold = True
            text_range.CharacterFormat.FontSize = 12 - indent_level
            heading_para.Format.AfterSpacing = 4
            add_nested_object(section, value, indent_level + 1)
        else:
            paragraph = section.AddParagraph()
            label = "  " * indent_level + f"{key}: {value}"
            text_range = paragraph.AppendText(label)
            text_range.CharacterFormat.FontSize = 11
            paragraph.Format.AfterSpacing = 2

json_data = '''{
  "customer": {
    "name": "Tom Wilson",
    "email": "tom@example.com",
    "address": {
      "street": "123 Main St",
      "city": "Springfield",
      "state": "IL"
    }
  }
}'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

add_nested_object(section, data)

document.SaveToFile("json_nested.docx", FileFormat.Docx)
document.Close()

Output

The following screenshot shows the hierarchical Word document generated from the nested JSON structure.

Nested JSON converted to a hierarchical Word document

Nested JSON objects can be represented as hierarchical sections in a Word document, making complex data structures easier to read and navigate.

How It Works

The add_nested_object function recursively traverses the JSON structure:

When it encounters a dict value, it creates a bold heading for the key and recurses into the nested object
When it encounters a scalar value, it creates a paragraph with the key-value pair
The indent_level parameter controls indentation and font size to create a visual hierarchy

This recursive approach handles arbitrarily deep nesting and produces a readable hierarchical layout in the Word document.

7. Handle Missing or Optional JSON Fields

In real-world applications, JSON data from APIs and databases often contains missing or optional fields. Records may have inconsistent keys, and some fields may be absent entirely. Handling these cases gracefully prevents errors and ensures the generated Word document remains complete.

Example JSON with Missing Fields

[
  {"Name": "Tom Wilson", "Email": "tom@example.com", "Phone": "555-0100"},
  {"Name": "Jane Doe", "Email": "jane@example.com"},
  {"Name": "Bob Brown", "Phone": "555-0300"}
]

Python Code

import json
from spire.doc import (
    Document, FileFormat, HorizontalAlignment,
    VerticalAlignment, TableRowHeightType, Color
)

json_data = '''[
  {"Name": "Tom Wilson", "Email": "tom@example.com", "Phone": "555-0100"},
  {"Name": "Jane Doe", "Email": "jane@example.com"},
  {"Name": "Bob Brown", "Phone": "555-0300"}
]'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

if data:
    all_keys = []
    for record in data:
        for key in record.keys():
            if key not in all_keys:
                all_keys.append(key)

    table = section.AddTable(True)
    table.ResetCells(len(data) + 1, len(all_keys))

    header_row = table.Rows[0]
    header_row.IsHeader = True
    header_row.Height = 20
    header_row.HeightType = TableRowHeightType.Exactly

    for col_index, header in enumerate(all_keys):
        header_row.Cells[col_index].CellFormat.Shading.BackgroundPatternColor = Color.get_Gray()
        header_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = header_row.Cells[col_index].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        text_range = paragraph.AppendText(header)
        text_range.CharacterFormat.Bold = True

    for row_index, record in enumerate(data):
        data_row = table.Rows[row_index + 1]
        data_row.Height = 20
        data_row.HeightType = TableRowHeightType.Exactly
        for col_index, key in enumerate(all_keys):
            data_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
            paragraph = data_row.Cells[col_index].AddParagraph()
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            paragraph.AppendText(str(record.get(key, "N/A")))

document.SaveToFile("json_missing_fields.docx", FileFormat.Docx)
document.Close()

Output

The following screenshot shows the generated Word table, where missing fields are automatically filled with placeholder values to maintain a consistent document structure.

Word table generated from JSON data with missing fields

Key Techniques

dict.get(key, "N/A") – Returns a default value when a key is missing, preventing KeyError exceptions
Dynamic column collection – Iterates all records to build a complete set of column headers, ensuring no field is missed even when it appears in only some records
Consistent table structure – All rows have the same number of columns regardless of which fields are present in each record

This approach is essential for production use cases where API responses may vary in structure across different records or over time.

8. Convert JSON Files to Word Documents

In practice, JSON data often originates from files rather than inline strings. API export results, configuration files, database dumps, data exchange files, and log data are all commonly stored as .json files that need to be converted to Word documents.

The conversion process for JSON files follows this workflow:

JSON File (.json)
        ↓
Load JSON (json.load)
        ↓
Generate Word Document (Spire.Doc)
        ↓
DOCX Document

Python Code

import json
from spire.doc import Document, FileFormat

with open("data.json", "r", encoding="utf-8") as f:
    data = json.load(f)

document = Document()
section = document.AddSection()

# Process the loaded JSON data
# using any of the techniques shown in Methods 1–3
# (formatted text, tables, or structured reports)

document.SaveToFile("data_report.docx", FileFormat.Docx)
document.Close()

Key Points

json.load() reads and parses a JSON file directly, unlike json.loads() which parses a string
encoding="utf-8" ensures proper handling of non-ASCII characters in JSON files
Once the JSON file is loaded into a Python dictionary or list, Spire.Doc for Python can generate paragraphs, tables, or structured reports from the parsed data using any of the methods described earlier in this article

For complete examples of processing the loaded data, refer to Method 1 for formatted text, Method 2 for tables, or Method 3 for structured reports.

9. Why Use Spire.Doc for JSON-to-Word Conversion

Converting JSON to Word involves several practical challenges that go beyond simple data parsing. Generating properly formatted tables, applying consistent styles, creating structured reports with headings and paragraphs, and handling nested or incomplete data all require a capable document generation API.

Challenges of JSON-to-Word Conversion

Table generation – JSON arrays must be mapped to Word tables with headers, rows, and cell formatting
Document formatting – Raw data exports lack the visual hierarchy that makes Word documents readable
Structured reports – Combining headings, paragraphs, and tables in a single document requires coordinating multiple element types
Nested data – Deeply nested JSON objects need recursive traversal and hierarchical layout
Large documents – Generating multi-page reports from large JSON datasets demands efficient resource management

Benefits of Spire.Doc for Python

Spire.Doc for Python addresses these challenges with a straightforward API:

Create Word documents without Microsoft Word – No Office installation or Interop dependencies required
Generate paragraphs, tables, images, headers, and footers – Full coverage of Word document elements
Apply built-in and custom styles – Consistent formatting across documents using BuiltinStyle and ParagraphStyle
Automate report generation – Programmatically build structured reports from any JSON data source
Export to DOCX and other formats – Save to DOCX, PDF, HTML, RTF, and more using FileFormat

With Spire.Doc, the JSON-to-Word conversion process becomes a structured mapping from parsed data to Word elements, rather than manual string formatting or template manipulation.

10. FAQ

How do I convert JSON to Word in Python?

Parse the JSON data using Python's built-in json module, then use Spire.Doc for Python to create a Word document. Map JSON key-value pairs to paragraphs, JSON arrays to tables, and use headings for structure. See Method 1 for a basic example and Method 3 for a complete report.

Can JSON arrays be converted into Word tables?

Yes. JSON arrays of objects map naturally to Word tables, where each object becomes a row and each key becomes a column. See Method 2 for a complete code example that creates a formatted table from a JSON array.

How do I create a DOCX report from API JSON responses?

Fetch the API response as JSON, parse it, and use Spire.Doc for Python to generate the report. Combine headings for titles, paragraphs for summaries, and tables for data arrays. See Method 3 for a structured report example.

Can nested JSON objects be exported to Word?

Yes. Use a recursive function to traverse nested JSON objects, creating headings for object keys and paragraphs for scalar values. See Section 6 for a detailed example of handling nested structures with visual hierarchy.

How do I convert a JSON file to a Word document?

Use Python's json.load() to read the JSON file, then process the parsed data with Spire.Doc for Python. See Section 8 for a code example.

What is the best way to generate Word documents from JSON data?

The best approach depends on the JSON structure. For simple key-value data, use formatted paragraphs. For arrays, use tables. For complex nested data with mixed content, combine headings, paragraphs, and tables as shown in Method 3.

11. Conclusion

Generating Word documents from JSON data is a common requirement in reporting, document automation, and data export workflows. With Spire.Doc for Python, you can create paragraphs, tables, and structured document layouts directly from JSON, making it easier to produce professional DOCX files from application data.

The same approach can be extended to API responses, database records, configuration files, and other structured data sources, helping automate document generation in both small projects and enterprise systems.

For scenarios involving large documents or document conversion requirements, a licensed version is required.

Published in Conversion

Tagged under

doc Python Conversion

How to Convert Excel to Markdown in Python (Files, Sheets & Ranges)

2026-06-08 07:16:12 Written by alice yang

Visual Guide for Converting Excel to Markdown in Python

Excel files are commonly used to store structured data, while Markdown is widely used in technical documentation, static websites, and Git-based publishing workflows. When you need to reuse spreadsheet data in Markdown documents, manually copying and reformatting Excel tables can be time-consuming and error-prone. A more reliable approach is to automate the conversion with Python.

This tutorial demonstrates how to convert Excel to Markdown in Python using Spire.XLS for Python. You will learn how to convert entire workbooks, export specific sheets or cell ranges, as well as batch processing with simple code examples.

Why Convert Excel to Markdown?

Converting Excel tables to Markdown can be useful in several scenarios:

Create documentation: Add Excel tables to README files and wikis.
Use with Git: Markdown is text-based and easier to track than Excel files.
Publish online: Use Excel data in blogs or docs sites.
Share data easily: Markdown tables are lightweight and widely compatible across platforms.

Install Python Excel to Markdown Library

To convert Excel files to Markdown in Python, install Spire.XLS for Python from PyPI:

pip install spire.xls

Markdown conversion is supported in Spire.XLS for Python 16.4.0 and later versions. If you are using an earlier version, upgrade the package first:

pip install --upgrade spire.xls

Basic Excel to Markdown Conversion in Python

The simplest way to convert an Excel file to Markdown is to load the workbook and save it as a .md file.

The process only requires three main steps:

Create a Workbook object.
Load the Excel file using the Workbook.LoadFromFile() method.
Save the workbook as a Markdown file using the Workbook.SaveToMarkdown() method.

from spire.xls import Workbook

# Create a Workbook object
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile("report.xlsx")

# Save the workbook as a Markdown file
workbook.SaveToMarkdown("output.md")

# Release resources
workbook.Dispose()

Output:

Convert an Excel File to Markdown in Python

Advanced Excel to Markdown Conversion Scenarios

In many real-world projects, you may not always need to convert the entire workbook. You may want to customize how images and hyperlinks are exported, convert only one worksheet, export a selected range, or process a folder of Excel files automatically.

The following sections show how to implement these conversions in Python.

1. Customize Image and Hyperlink Export Options

When exporting Excel to Markdown, images and hyperlinks are written as Markdown syntax. You can use the properties of the MarkdownOptions class to control how image paths and hyperlinks are saved in the output file.

Property	When Set to True	When Set to False
SavePicInRelativePath	Images are saved with relative paths, such as `![Image](pic1.png)` .	Images are saved with absolute paths, such as `![Image](C:/full/path/to/pic1.png)` .
SaveHyperlinkAsRef	Hyperlinks are saved as reference-style links, such as `[Link Text][ref1]` .	Hyperlinks are saved as inline links, such as `[Link Text](https://example.com)` .

Using relative image paths is usually better for documentation projects because the Markdown file and image folder can be moved together. Inline links are often easier to read and maintain in smaller Markdown files.

The following example shows how to convert an Excel workbook to Markdown with custom image and hyperlink options:

from spire.xls import Workbook, MarkdownOptions

# Create a Workbook object
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile("sample.xlsx")

# Create a MarkdownOptions object
markdown_options = MarkdownOptions()

# Save images with relative paths
markdown_options.SavePicInRelativePath = True

# Save hyperlinks as inline links
markdown_options.SaveHyperlinkAsRef = False

# Save the workbook as a Markdown file
workbook.SaveToMarkdown("custom_options.md", markdown_options)

# Release resources
workbook.Dispose()

Output:

Convert Excel to Markdown with Custom Image and Hyperlink Options

2. Convert a Specific Sheet to Markdown

If an Excel workbook contains multiple worksheets, but you only need to export one sheet, you can copy the target worksheet to a new workbook with the AddCopy method, and then save that new workbook as a .md file.

This approach helps avoid exporting unnecessary sheets into the same Markdown document.

from spire.xls import Workbook


def convert_specific_sheet(excel_file, sheet_name, output_md):
    """
    Convert a specific worksheet in an Excel file to Markdown.
    """
    workbook = Workbook()
    new_workbook = None

    try:
        # Load the Excel file
        workbook.LoadFromFile(excel_file)

        # Find the target worksheet by name
        worksheet = None
        for ws in workbook.Worksheets:
            if ws.Name == sheet_name:
                worksheet = ws
                break

        if worksheet is None:
            print(f"Worksheet '{sheet_name}' was not found.")
            return

        # Create a new workbook that contains only the target worksheet
        new_workbook = Workbook()
        new_workbook.Worksheets.Clear()
        new_workbook.Worksheets.AddCopy(worksheet)

        # Save the new workbook as Markdown
        new_workbook.SaveToMarkdown(output_md)

        print(f"Worksheet '{sheet_name}' converted successfully to {output_md}.")

    finally:
        # Release resources
        if new_workbook is not None:
            new_workbook.Dispose()

        workbook.Dispose()


# Usage
convert_specific_sheet("report.xlsx", "Sheet 1", "sheet1.md")

3. Export a Selected Cell Range to Markdown

Sometimes, you may only need to export part of a worksheet, such as a summary table, a data range, or a report section. In this case, you can copy the required cell range to a new workbook and save it as a Markdown file.

The following example converts a selected range from a specific worksheet to a Markdown file:

from spire.xls import Workbook, CopyRangeOptions


def convert_cell_range_to_markdown(excel_file, sheet_name, cell_range, output_md):
    """Convert a specific cell range from an Excel worksheet to Markdown.

    Example cell range: "A1:C5"
    """
    workbook = Workbook()
    new_workbook = Workbook()

    try:
        # Load the original Excel file
        workbook.LoadFromFile(excel_file)

        # Get the target worksheet by name
        worksheet = workbook.Worksheets[sheet_name]
        if worksheet is None:
            print(f"Worksheet '{sheet_name}' was not found.")
            return

        # Get the specific source cell range (e.g., "A1:C5")
        src_range = worksheet.Range[cell_range]

        # Initialize the new workbook with a single blank sheet
        new_workbook.CreateEmptySheets(1)
        new_sheet = new_workbook.Worksheets[0]

        # Define the destination range starting at cell A1 in the new sheet.
        # We use the row and column count of the source range to match the size perfectly.
        dest_range = new_sheet.Range[
            1, 1, src_range.Rows.Count, src_range.Columns.Count
        ]

        # Copy ONLY the selected range (all data, formulas, and formatting)
        src_range.Copy(dest_range, CopyRangeOptions.All)

        # Save the new isolated workbook as Markdown
        new_workbook.SaveToMarkdown(output_md)

        print(
            f"Cell range '{cell_range}' from worksheet '{sheet_name}' "
            f"converted successfully to {output_md}."
        )

    except Exception as e:
        print(f"An error occurred: {e}")

    finally:
        # Release resources
        new_workbook.Dispose()
        workbook.Dispose()


# Usage
convert_cell_range_to_markdown(
    "report.xlsx", "Sheet 1", "A1:C5", "cell_range.md"
)

This method is useful when you want to reuse only the key part of a worksheet in documentation, instead of exporting the entire sheet.

4. Batch Convert Multiple Excel Files to Markdown

For large-scale conversion tasks, you can loop through a folder and convert all .xlsx and .xls files to Markdown automatically.

This is especially useful when you need to generate documentation from multiple reports, export datasets regularly, or integrate Excel-to-Markdown conversion into a publishing workflow.

from pathlib import Path
from spire.xls import Workbook


def batch_convert_excel_to_markdown(input_folder, output_folder):
    """
    Convert all Excel files in a folder to Markdown files.
    Supported formats: .xlsx and .xls
    """
    input_dir = Path(input_folder)
    output_dir = Path(output_folder)

    # Create the output folder if it does not exist
    output_dir.mkdir(parents=True, exist_ok=True)

    # Supported Excel file extensions
    excel_extensions = {".xlsx", ".xls"}

    converted_count = 0

    for input_file in input_dir.iterdir():
        # Skip folders, temporary Excel files, and unsupported files
        if not input_file.is_file():
            continue

        if input_file.name.startswith("~$"):
            continue

        if input_file.suffix.lower() not in excel_extensions:
            continue

        output_file = output_dir / f"{input_file.stem}.md"

        workbook = Workbook()

        try:
            # Load the Excel file
            workbook.LoadFromFile(str(input_file))

            # Save as Markdown
            workbook.SaveToMarkdown(str(output_file))

            converted_count += 1
            print(f"Converted: {input_file.name} -> {output_file.name}")

        except Exception as e:
            print(f"Failed to convert {input_file.name}: {e}")

        finally:
            workbook.Dispose()

    print(f"\nBatch conversion complete. {converted_count} file(s) converted.")


# Usage
batch_convert_excel_to_markdown("./excel_files", "./markdown_output")

Best Practices for Converting Excel to Markdown

To get cleaner Markdown output, keep the following tips in mind:

Use simple table structures whenever possible.
Unmerge merged cells if the output is intended for Markdown tables.
Remove unused rows and columns before conversion.
Use relative image paths for portable documentation projects.
Review the generated Markdown file before publishing it to GitHub, a wiki, or a static website.

Conclusion

Converting Excel to Markdown in Python with Spire.XLS for Python makes it easy to generate Markdown files from workbook data with minimal code. It is a practical solution for developers who need to add Excel data export to documentation, reporting, or publishing workflows.

FAQs

Q1: What Excel formats can be converted to Markdown?

A1: Common Excel formats such as .xlsx and .xls can be loaded and saved as Markdown files.

Q2: Are images preserved when converting Excel to Markdown?

A2: Yes. By default, images can be embedded in the Markdown output as Base64 strings. You can also configure the export options to save images with relative or absolute file paths.

Q3: Do I need Microsoft Office to convert Excel to Markdown in Python?

A3: No. Spire.XLS for Python works independently and does not require Microsoft Excel or Microsoft Office to be installed.

Get a Free License

To fully experience the capabilities of Spire.XLS for .NET without any evaluation limitations, you can request a free 30-day trial license.

1. Why Use Spire.PDF for Python

Spire.PDF for Python enables loading PDFs directly from memory, without needing a disk path. This makes in-memory processing fast and avoids unnecessary disk I/O.

Key capabilities include:

Load PDFs from bytes or Stream objects
Extract text, images, and metadata
Modify PDFs and convert to other formats
Efficiently handle large files in memory

These capabilities are particularly useful in web scraping pipelines, document archiving systems, automated report generation, and content extraction workflows, where performance and memory efficiency are important.

2. Install Required Libraries

Install Spire.PDF and requests via pip:

pip install spire.pdf requests

Import the necessary modules:

from spire.pdf import *
import requests

3. Download PDF from URL

Here’s a complete example showing how to download a PDF from a URL, process it in memory, and save it to disk. Each line includes explanations for clarity.

import requests
from spire.pdf import *

def download_pdf_from_url():

    # Specify the PDF URL
    url = "resource/sample.pdf"
    
    # Send HTTP GET request to download the PDF
    response = requests.get(url)
    # Raise an error if the request failed (4xx or 5xx)
    response.raise_for_status()

    # Create a Stream object from the downloaded bytes
    stream = Stream(response.content)

    # Load PDF from Stream
    document = PdfDocument(stream)

    # Save PDF to local file
    document.SaveToFile("Downloaded.pdf")
    document.Close()

    print("PDF downloaded and saved successfully!")

if __name__ == "__main__":
    download_pdf_from_url()

Output:

Download PDF from URL Using Python

Explanation of key components:

requests.get(url) – Sends the HTTP GET request. The server responds with headers and the PDF binary.
response.raise_for_status() – Checks for HTTP errors (e.g., 404, 500).
response.content – Contains raw PDF bytes.
Stream(response.content) – Wraps bytes in a readable, seekable in-memory stream.
PdfDocument(stream) – Loads the PDF into memory for further operations.
document.SaveToFile() – writes the PDF to disk.

This workflow loads PDF data into memory for instant saving, improving speed and avoiding unnecessary disk writes.

4. Processing PDFs Without Saving

You can extract metadata or text directly in memory without writing files:

def process_pdf_from_url():
    url = "resource/sample.pdf"
    response = requests.get(url)
    response.raise_for_status()

    # Load PDF in memory
    document = PdfDocument(Stream(response.content))

    # Retrieve document information
    print(f"Number of pages: {document.Pages.Count}")
    info = document.DocumentInformation
    print(f"Title: {info.Title}")
    print(f"Author: {info.Author}")

    # Extract text from the first page
    from spire.pdf import PdfTextExtractor
    extractor = PdfTextExtractor(document.Pages[0])
    text = extractor.ExtractText()
    print(f"First 100 characters: {text[:100]}")

    document.Close()

if __name__ == "__main__":
    process_pdf_from_url()

Why this is useful: You can analyze content, index text, or extract metadata without creating unnecessary files on disk. This is ideal for server-side scripts, cloud functions, or batch processing.

5. Handling Large PDFs

Downloading very large PDFs (e.g., 100MB+) can consume significant memory. Use streaming download and temporary files to reduce memory usage:

import tempfile
import os

def download_large_pdf(url: str, output_path: str):
    try:
        response = requests.get(url, stream=True, timeout=60)
        response.raise_for_status()

        # Write chunks to a temporary file
        with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    tmp.write(chunk)
            temp_path = tmp.name

        # Load PDF from temporary file
        document = PdfDocument()
        document.LoadFromFile(temp_path)
        document.SaveToFile(output_path)
        document.Close()

        # Clean up temporary file
        os.unlink(temp_path)
        print(f"Large PDF saved to: {output_path}")

    except Exception as e:
        print(f"Error: {e}")

Notes:

stream=True avoids loading the entire file into memory.
Temporary files allow processing PDFs that exceed available RAM.

6. Adding Retry Logic

Network requests may fail intermittently. Adding retries improves robustness:

import time

def download_with_retry(url: str, output_path: str, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=30)
            response.raise_for_status()
            document = PdfDocument(Stream(response.content))
            document.SaveToFile(output_path)
            document.Close()
            print(f"Downloaded successfully: {output_path}")
            return True
        except requests.exceptions.RequestException as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt
                print(f"Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
    print("All retry attempts failed.")
    return False

Why use this: Exponential backoff prevents overwhelming servers and handles transient network failures gracefully.

7. Common Issues and Troubleshooting

PDF Not Found (404)

Problem: The URL does not point to a valid PDF, resulting in a 404 error.

Solution: Verify the URL and add a User-Agent header if needed:

import requests

url = "https://example.com/missing.pdf"
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)

if response.status_code == 404:
    print("PDF not found (404)")

Server Returns HTML Instead of PDF

Problem: The URL returns an HTML page instead of a PDF.

Solution: Check the Content-Type and parse HTML to locate the actual PDF:

import requests
from bs4 import BeautifulSoup

url = "https://example.com/download-page"
response = requests.get(url)
content_type = response.headers.get('Content-Type', '')

if 'application/pdf' not in content_type and 'text/html' in content_type:
    soup = BeautifulSoup(response.text, 'html.parser')
    for link in soup.find_all('a', href=True):
        if link['href'].endswith('.pdf'):
            print(f"Found PDF link: {link['href']}")
            # Download the actual PDF URL

Extracted Text Shows Garbled Characters

Problem: Text extraction returns unreadable characters, often due to encoding or scanned PDFs.

Solution: Ensure proper handling or use OCR for scanned PDFs:

from spire.pdf import PdfDocument, PdfTextExtractor

document = PdfDocument("example.pdf")
extractor = PdfTextExtractor(document.Pages[0])
text = extractor.ExtractText()
print(text[:200])
# If text is still garbled, the PDF may be image-based; consider OCR

PDF Loads But Has No Pages

Problem: document.Pages.Count returns 0 even though the file exists.

Solution: PDF may be corrupted or password-protected:

from spire.pdf import PdfDocument, Stream

with open("protected.pdf", "rb") as f:
    pdf_bytes = f.read()

# For password-protected PDF
document = PdfDocument(Stream(pdf_bytes), "password")
print(f"Pages: {document.Pages.Count}")

8. Conclusion

In this article, we demonstrated how to download PDF files from URLs in Python using Spire.PDF for Python. By leveraging the Stream class, developers can load PDF data directly from memory without unnecessary disk I/O, enabling efficient document processing pipelines.

We covered the complete workflow: downloading PDF data with the requests library, creating Stream objects from bytes, loading PdfDocument instances, handling network errors, managing large files, and troubleshooting common issues. The production-ready code examples provide a solid foundation for building robust PDF download and processing systems.

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

9. FAQs

Q1. How do I download a PDF from a URL using Python?

Use the requests library to fetch the PDF data and Spire.PDF to load it from memory:

response = requests.get(url)
stream = Stream(response.content)
document = PdfDocument(stream)

Q2. How do I handle authentication-protected PDFs?

For basic authentication, use the auth parameter:

response = requests.get(url, auth=('username', 'password'))

For token-based authentication, add headers:

headers = {'Authorization': 'Bearer YOUR_TOKEN'}
response = requests.get(url, headers=headers)

Q3. What's the maximum PDF file size I can download?

The theoretical limit depends on your system's available memory. For files larger than 200MB, use the streaming approach with a temporary file instead of loading everything into memory.

Q4. Can I download multiple PDFs in parallel?

Yes. Use concurrent.futures or asyncio to download multiple PDFs simultaneously for better performance.

from concurrent.futures import ThreadPoolExecutor

urls = ["url1.pdf", "url2.pdf", "url3.pdf"]
with ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(lambda u: download_pdf(u), urls)

Published in Document Operation

Tagged under

pdf Python Document Operation

Inserting Equations into Word in Python (LaTeX & MathML)

2026-05-20 07:15:35 Written by Allen Yang

Tutorial on How to Insert Math Equations into Word in Python

Inserting mathematical equations into Word documents programmatically is essential for developers building scientific document generators, academic reporting systems, educational platforms, or engineering automation tools. Whether you're generating research papers, technical documentation, or mathematics worksheets, automating equation insertion greatly improves efficiency and consistency.

However, manually formatting equations in Microsoft Word is time-consuming, and building a mathematical rendering engine from scratch can be extremely complex. Developers often need a reliable way to add equations in Word while supporting standard mathematical formats such as LaTeX and MathML.

With Spire.Doc for Python, developers can insert mathematical equations into Word documents directly from LaTeX and MathML code using a straightforward API. This article demonstrates how to create Word equations in Python, including how to insert formulas, convert equations between LaTeX, MathML, and Office MathML (OMML), and export Word equations into different mathematical formats.

Quick Navigation

Understanding Mathematical Equations in Word Documents
Install Spire.Doc for Python
Insert Equations into Word from LaTeX in Python
Add MathML Equations to Word Documents in Python
Convert Word Equations to LaTeX or MathML
Render Equation as Image
Complete Example: Multi-Format Equation Processing
Common Pitfalls
FAQ

1. Understanding Mathematical Equations in Word Documents

Microsoft Word uses Office Math Markup Language (OMML) as its internal format for mathematical equations. OMML is an XML-based structure that controls equation layout, symbols, fractions, matrices, and other mathematical elements in Word documents. However, directly creating or editing OMML is cumbersome for most developers.

In real-world applications, mathematical content is more commonly written in LaTeX or MathML:

LaTeX is widely used in academia and scientific publishing because of its concise syntax and powerful mathematical typesetting capabilities.
MathML is an XML-based standard designed for mathematical content on the web and in educational systems.

To generate editable Word equations programmatically, developers often need to convert between these formats and Word's native equation objects.

Why Choose Spire.Doc for Python?

Spire.Doc for Python provides native support for Word equation processing through the OfficeMath class. Instead of manually generating OMML or relying on image-based workarounds, developers can directly create editable Word equations from LaTeX or MathML code.

Key capabilities include:

Capability	Supported
Insert equations from LaTeX	✓
Insert equations from MathML	✓
Export Word equations to LaTeX	✓
Export Word equations to MathML	✓
Access native OMML content	✓
Render equations as images	✓

These capabilities are particularly useful for academic report generation, educational platforms, MathML-to-Word conversion workflows, LaTeX publishing pipelines, and other automated document generation scenarios involving mathematical content.

2. Install Spire.Doc for Python

Install Spire.Doc for Python via pip:

pip install spire.doc

Import the required classes in your Python script:

from spire.doc import *

Alternatively, you can manually install the library from the Spire.Doc for Python download page.

3. Insert Equations into Word from LaTeX in Python

LaTeX is the most widely used format for writing mathematical equations in academic and scientific documents. With Spire.Doc for Python, you can convert LaTeX expressions into native Word equation objects and insert these equations directly into DOCX files.

The following example demonstrates how to insert multiple LaTeX equations into a Word document using the OfficeMath class.

from spire.doc import *

def insert_latex_equations():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Add a title paragraph
    title_para = section.AddParagraph()
    title_para.AppendText("Mathematical Equations from LaTeX")
    title_para.Format.HorizontalAlignment = HorizontalAlignment.Left
    
    # Define LaTeX equations to insert
    latex_equations = [
    r"x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}",  # Quadratic formula
    r"e^{i\pi} + 1 = 0",  # Euler's identity
    r"\int_0^\infty e^{-x} \, dx = 1",  # Definite integral
    # Summation formula
    r"\sum_{i=1}^{n} i = \frac{n(n+1)}{2}",
    r"\sum_{i=1}^{n} i = \frac{n(n+1)}{2}",  # Summation formula
    r"A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}",  # Matrix
    r"P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}",  # Probability formula
    r"\sin^2\theta + \cos^2\theta = 1",  # Trigonometric identity
    ]
    
    # Insert each LaTeX equation as a separate paragraph
    for latex_code in latex_equations:
        # Create an OfficeMath object from LaTeX code
        office_math = OfficeMath(doc)
        office_math.FromLatexMathCode(latex_code)
        
        # Add the equation to a new paragraph
        para = section.AddParagraph()
        para.Items.Add(office_math)
    
    # Save the document
    doc.SaveToFile("latex_equations.docx", FileFormat.Docx2019)
    doc.Close()
    print("LaTeX equations inserted successfully!")

if __name__ == "__main__":
    insert_latex_equations()

The following screenshot shows the generated Word document with equations converted from LaTeX code.

LaTeX equations inserted into Word document using Python

Key API Methods

Document – Represents the Word document container used to create sections and paragraphs
OfficeMath – Represents a mathematical equation object in Word documents
FromLatexMathCode() – Converts LaTeX mathematical code into an Office Math object that Word can render natively
Items.Add() – Adds the OfficeMath object to a paragraph's content collection
SaveToFile() – Saves the document to disk in DOCX format using FileFormat.Docx2019

This approach supports complex LaTeX constructs such as fractions, integrals, matrices, Greek letters, and other mathematical operators while preserving native Word equation formatting.

Adding Inline Equations

In addition to standalone equations, you can insert inline equations within text paragraphs. This is useful for embedding mathematical expressions within sentences or explanations.

from spire.doc import *

def insert_inline_equation():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Add introductory text
    para = section.AddParagraph()
    para.AppendText("The quadratic formula is ")
    
    # Insert inline equation
    office_math = OfficeMath(doc)
    office_math.FromLatexMathCode(r"x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}")
    para.Items.Add(office_math)
    
    para.AppendText(", where a ≠ 0.")
    
    # Save the document
    doc.SaveToFile("inline_equation.docx", FileFormat.Docx2019)
    doc.Close()

if __name__ == "__main__":
    insert_inline_equation()

The inserted equation appears inline within the text:

Inline equation inserted into Word document using Python

This approach makes it easy to embed mathematical expressions directly within regular text content, which is useful for educational materials, research papers, and technical documentation.

If you need to combine equations with formatted text, headings, tables, and other structured document elements, you can also refer to our tutorial on creating structured Word documents in Python.

4. Add MathML Equations to Word Documents in Python

MathML (Mathematical Markup Language) is an XML-based standard for representing mathematical expressions on the web and in digital documents. It's commonly used in online education platforms, scientific databases, and content management systems. The following example shows how to convert MathML to Word equations using Spire.Doc for Python.

from spire.doc import *

def insert_mathml_equations():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Add a title paragraph
    title_para = section.AddParagraph()
    title_para.AppendText("Mathematical Equations from MathML")
    
    # Define MathML equations to insert
    mathml_equations = [
    # Euler's identity
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<msup><mi>e</mi><mrow><mi>i</mi><mi>π</mi></mrow></msup>'
    r'<mo>+</mo><mn>1</mn><mo>=</mo><mn>0</mn>'
    r'</math>',
    # Pythagorean theorem
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<msup><mi>a</mi><mn>2</mn></msup>'
    r'<mo>+</mo>'
    r'<msup><mi>b</mi><mn>2</mn></msup>'
    r'<mo>=</mo>'
    r'<msup><mi>c</mi><mn>2</mn></msup>'
    r'</math>',
    # Fraction expression
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<mfrac>'
    r'<mrow><mi>x</mi><mo>+</mo><mi>y</mi></mrow>'
    r'<mrow><mi>z</mi><mo>−</mo><mn>1</mn></mrow>'
    r'</mfrac>'
    r'</math>',
    # Integral equation
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<msubsup><mo>∫</mo><mn>0</mn><mn>1</mn></msubsup>'
    r'<msup><mi>x</mi><mn>2</mn></msup>'
    r'<mi>d</mi><mi>x</mi>'
    r'<mo>=</mo>'
    r'<mfrac><mn>1</mn><mn>3</mn></mfrac>'
    r'</math>'
    ]
    
    # Insert each MathML equation as a separate paragraph
    for mathml_code in mathml_equations:
        # Create an OfficeMath object from MathML code
        office_math = OfficeMath(doc)
        office_math.FromMathMLCode(mathml_code)
        
        # Add the equation to a new paragraph
        para = section.AddParagraph()
        para.Items.Add(office_math)
    
    # Save the document
    doc.SaveToFile("mathml_equations.docx", FileFormat.Docx2019)
    doc.Close()
    print("MathML equations inserted successfully!")

if __name__ == "__main__":
    insert_mathml_equations()

The following screenshot shows the generated Word document with equations converted from MathML code.

MathML equations converted to Word format using Python

Key API Method

FromMathMLCode() – Parses MathML markup and converts it into a native Word equation object.

MathML support is especially useful when working with XML-based educational content, web-based equation systems, and STEM learning platforms that store mathematical expressions in MathML format.

Combining LaTeX and MathML in One Document

You can mix both LaTeX and MathML equations within the same document, allowing flexibility in content sources:

from spire.doc import *

def insert_mixed_equations():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Insert LaTeX equation
    latex_para = section.AddParagraph()
    latex_math = OfficeMath(doc)
    latex_math.FromLatexMathCode(r"E = mc^2")
    latex_para.Items.Add(latex_math)
    
    # Insert MathML equation
    mathml_para = section.AddParagraph()
    mathml_math = OfficeMath(doc)
    mathml_math.FromMathMLCode(
        r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
        r'<mi>F</mi><mo>=</mo><mi>m</mi><mi>a</mi>'
        r'</math>'
    )
    mathml_para.Items.Add(mathml_math)
    
    # Save the document
    doc.SaveToFile("mixed_equations.docx", FileFormat.Docx2019)
    doc.Close()

if __name__ == "__main__":
    insert_mixed_equations()

This approach is useful when mathematical content comes from different sources, such as LaTeX-based publishing systems and MathML-based web applications.

If your mathematical content originates from web pages or HTML-based systems, you can also refer to our tutorial on converting HTML content to Word documents in Python.

5. Convert Word Equations to LaTeX, MathML, and OMML

Besides inserting equations into Word documents, Spire.Doc for Python also supports exporting Word equations to multiple mathematical markup formats. This is useful for interoperability between Word, LaTeX publishing systems, web-based MathML platforms, and custom XML workflows.

The following example demonstrates how to extract equations from a Word document and export them as LaTeX, MathML, and Office MathML (OMML).

from spire.doc import *

def export_equation_formats():
    # Load a Word document containing equations
    doc = Document()
    doc.LoadFromFile("equations.docx")

    # Access the first paragraph
    section = doc.Sections[0]
    para = section.Paragraphs[0]

    # Find OfficeMath objects
    for item in para.ChildObjects:
        if isinstance(item, OfficeMath):

            # Export to LaTeX
            latex_code = item.ToLaTexMathCode()
            print("LaTeX:")
            print(latex_code)
            print()

            # Export to MathML
            mathml_code = item.ToMathMLCode()
            print("MathML:")
            print(mathml_code)
            print()

            # Export to Office MathML (OMML)
            omml_code = item.ToOfficeMathMLCode()
            print("OMML:")
            print(omml_code)

            # Save outputs to files
            with open("equation.tex", "w", encoding="utf-8") as f:
                f.write(latex_code)

            with open("equation.xml", "w", encoding="utf-8") as f:
                f.write(mathml_code)

            with open("equation.omml", "w", encoding="utf-8") as f:
                f.write(omml_code)

            break

    doc.Close()

if __name__ == "__main__":
    export_equation_formats()

The following screenshot shows the exported equation formats printed in the Python console.

Export Word equations to LaTeX, MathML, and OMML using Python

Supported Export Formats

Format	Primary Use Case	Characteristics
LaTeX	Academic publishing and scientific papers	Compact syntax widely used in academia
MathML	Web-based mathematical content	XML-based format designed for browsers and educational systems
OMML	Microsoft Word integration	Native Office equation format with full Word compatibility

These export capabilities make it easier to:

Convert Word equations into LaTeX publishing workflows
Publish equations on websites using MathML
Integrate Word documents with XML-based systems
Inspect and debug Word equation structures using OMML

6. Render Office Math Equations to Images

In some scenarios, you may need to export equations as image files for use in presentations, web pages, or other non-editable contexts. Spire.Doc for Python allows you to render Office Math equations into image streams that can be saved as image files.

from spire.doc import *

def render_equation_as_image():
    # Create a new Word document with an equation
    doc = Document()
    section = doc.AddSection()
    para = section.AddParagraph()

    # Insert an equation
    office_math = OfficeMath(doc)
    office_math.FromLatexMathCode(
        r"\int_0^\infty e^{-x^2} dx = \frac{\sqrt{\pi}}{2}"
    )
    para.Items.Add(office_math)

    # Render the equation as an image stream
    image_stream = office_math.SaveImageToStream(ImageType.Bitmap)

    # Save the image to file
    with open("equations/equation.png", "wb") as f:
        f.write(image_stream.ToArray())

    # Release unmanaged resources
    image_stream.Dispose()
    doc.Close()

    print("Equation rendered as image successfully!")

if __name__ == "__main__":
    render_equation_as_image()

The following screenshot shows the equation rendered as an image file.

Mathematical equation rendered as image from Word

This feature is particularly useful for:

Embedding equations in presentations
Displaying formulas on web pages
Generating static previews for document systems

If you want to render complete Word documents as images rather than exporting individual equations, check out our tutorial on converting Word documents to images in Python.

7. Complete Example: Multi-Format Equation Processing

The following comprehensive example demonstrates a complete workflow that combines multiple equation operations: inserting equations from different sources, exporting to various formats, and rendering as images.

from spire.doc import *

def complete_equation_workflow():
    """
    Demonstrates a complete workflow for equation processing:
    - Create equations from LaTeX and MathML
    - Export equations to LaTeX and MathML
    - Render equations as images
    """

    # Create a new Word document
    doc = Document()
    section = doc.AddSection()

    # Add document title
    title_para = section.AddParagraph()
    title_text = title_para.AppendText("Complete Equation Processing Workflow")
    title_text.CharacterFormat.FontSize = 16
    title_text.CharacterFormat.Bold = True
    title_para.Format.HorizontalAlignment = HorizontalAlignment.Center

    # Insert equations from LaTeX
    latex_section_title = section.AddParagraph()
    latex_title_text = latex_section_title.AppendText("\nEquations from LaTeX:")
    latex_title_text.CharacterFormat.Bold = True

    latex_examples = [
        (r"E = mc^2", "Einstein's Mass-Energy Equivalence"),
        (r"\sum_{i=1}^{n} i = \frac{n(n+1)}{2}", "Sum of First n Integers"),
        (r"\frac{d}{dx}\left(\int_a^x f(t)dt\right) = f(x)", "Fundamental Theorem of Calculus")
    ]

    first_equation = None

    for latex_code, description in latex_examples:
        # Add description
        desc_para = section.AddParagraph()
        desc_para.AppendText(f"{description}:")

        # Insert equation
        office_math = OfficeMath(doc)
        office_math.FromLatexMathCode(latex_code)

        eq_para = section.AddParagraph()
        eq_para.Items.Add(office_math)

        if first_equation is None:
            first_equation = office_math

    # Insert equations from MathML
    mathml_section_title = section.AddParagraph()
    mathml_title_text = mathml_section_title.AppendText("\nEquations from MathML:")
    mathml_title_text.CharacterFormat.Bold = True

    mathml_examples = [
        (
            r'<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>+</mo><mi>b</mi><mo>=</mo><mi>c</mi></math>',
            "Simple Addition"
        ),
        (
            r'<math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>e</mi><mrow><mi>i</mi><mi>π</mi></mrow></msup><mo>+</mo><mn>1</mn><mo>=</mo><mn>0</mn></math>',
            "Euler's Identity"
        )
    ]

    for mathml_code, description in mathml_examples:
        # Add description
        desc_para = section.AddParagraph()
        desc_para.AppendText(f"{description}:")

        # Insert equation
        office_math = OfficeMath(doc)
        office_math.FromMathMLCode(mathml_code)

        eq_para = section.AddParagraph()
        eq_para.Items.Add(office_math)

    # Save the Word document
    output_docx = "complete_equations.docx"
    doc.SaveToFile(output_docx, FileFormat.Docx2019)
    print(f"Word document saved: {output_docx}")

    # Export the first equation to LaTeX
    latex_export = first_equation.ToLaTexMathCode()

    with open("exported_equation.tex", "w", encoding="utf-8") as f:
        f.write(latex_export)

    print(f"Exported to LaTeX: {latex_export}")

    # Export the first equation to MathML
    mathml_export = first_equation.ToMathMLCode()

    with open("exported_equation.xml", "w", encoding="utf-8") as f:
        f.write(mathml_export)

    print("Exported to MathML")

    # Render the first equation as an image
    image_stream = first_equation.SaveImageToStream(ImageType.Bitmap)

    with open("equation_render.png", "wb") as f:
        f.write(image_stream.ToArray())

    # Release unmanaged resources
    image_stream.Dispose()

    print("Equation rendered as image successfully!")

    # Clean up
    doc.Close()

    print("\nWorkflow completed successfully!")

if __name__ == "__main__":
    complete_equation_workflow()

The generated Word document will look like this:

Complete Equation Processing Workflow

This complete example demonstrates:

Multi-source equation insertion – Combining LaTeX and MathML inputs
Descriptive labeling – Adding context to each equation
Format conversion – Exporting to LaTeX and MathML
Image rendering – Creating visual representations
Resource management – Proper cleanup of document objects

The resulting Word document contains well-formatted equations with descriptions, while the exported files provide alternative formats for different use cases.

8. Common Pitfalls

Raw String Literals for LaTeX

When writing LaTeX code in Python strings, always use raw strings (prefix with r) to prevent escape sequence interpretation:

# Correct: Use raw string
latex_code = r"\int_0^\infty e^{-x} dx"

# Incorrect: Backslashes will be interpreted as escape sequences
latex_code = "\int_0^\infty e^{-x} dx"

Unsupported LaTeX Commands

Not all LaTeX commands are supported by Word's equation engine. Some advanced LaTeX constructs may not render correctly. Stick to standard mathematical notation whenever possible:

# Supported: Standard mathematical notation
office_math.FromLatexMathCode(r"\alpha + \beta = \gamma")

# Some advanced LaTeX constructs may not be supported
# office_math.FromLatexMathCode(r"\begin{align} ... \end{align}")

MathML Namespace Requirements

MathML code must include the proper namespace declaration to parse correctly:

# Correct: Include namespace
mathml = r'<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>'

# Incorrect: Missing namespace may fail
mathml = r'<math><mi>x</mi></math>'

Memory Management

Always close documents after processing to release resources, especially in batch operations:

doc = Document()

try:
    # Process equations
    doc.SaveToFile("output.docx", FileFormat.Docx2019)

finally:
    doc.Close()  # Ensure cleanup even if errors occur

Character Encoding

When saving exported LaTeX or MathML to files, ensure proper UTF-8 encoding for special characters:

with open("equation.tex", "w", encoding="utf-8") as f:
    f.write(latex_code)

Image Stream Disposal

Always dispose of image streams after use to properly release resources:

image_stream = office_math.SaveImageToStream(ImageType.Bitmap)

try:
    with open("equation.png", "wb") as f:
        f.write(image_stream.ToArray())

finally:
    image_stream.Dispose()

Conclusion

In this article, we demonstrated how to insert mathematical equations into Word documents in Python using Spire.Doc for Python. By leveraging the Spire API, developers can create Word equations from LaTeX and MathML code, convert between LaTeX, MathML, and Word’s native OMML format, and render equations as images. This capability is essential for automating scientific document generation, educational content creation, and mathematical publishing workflows.

Spire.Doc for Python provides comprehensive equation processing capabilities beyond basic insertion, including conversion between LaTeX and MathML into Word’s native OMML format, as well as exporting Word equations back to LaTeX, MathML, and OMML. The library simplifies complex mathematical typesetting while maintaining compatibility with Microsoft Word’s native equation engine.

If you want to evaluate the full capabilities of Spire.Doc for Python, you can apply for a 30-day free license.

9. FAQ

How do I insert equations into Word using Python?

Use the OfficeMath class from Spire.Doc for Python. Create an OfficeMath object, call FromLatexMathCode() or FromMathMLCode() with your equation code, then add it to a paragraph using para.Items.Add(office_math). Finally, save the document using doc.SaveToFile().

Can I add LaTeX equations to Word documents in Python?

Yes. Spire.Doc for Python supports inserting equations from LaTeX code using the FromLatexMathCode() method. Standard mathematical notation such as fractions, integrals, superscripts, subscripts, and Greek letters can be converted into Word-compatible equations.

Does Spire.Doc support MathML equations?

Yes. You can create Word equations from MathML using the FromMathMLCode() method. Make sure the MathML content includes the correct namespace declaration:

<math xmlns="http://www.w3.org/1998/Math/MathML">

Can I export Word equations back to LaTeX or MathML?

Yes. Spire.Doc for Python provides methods such as ToLaTexMathCode() and ToMathMLCode() to export Office Math equations into LaTeX or MathML formats. This is useful for content migration, storage, or integration with other mathematical systems.

How can I render equations as images?

Use the SaveImageToStream() method on an OfficeMath object to render the equation as an image stream. You can then save the stream as an image file and use it in presentations, web pages, or preview systems.

Published in Document Operation

Tagged under

doc Python Document Operation

Convert JavaScript to Word with Python Automation

2026-05-15 09:23:33 Written by Allen Yang

JavaScript code displayed in a formatted Word document with syntax highlighting

Modern development teams often need to share JavaScript or JSX source code with project managers, clients, auditors, or educators who don't use code editors. However, raw .js and .jsx files are difficult to review outside tools like VS Code or WebStorm, while manually copying code into Word documents frequently breaks indentation, formatting, and readability.

Using Spire.Doc for Python together with Pygments, developers can convert JavaScript to Word in Python with syntax highlighting and customizable document formatting. This automated approach is useful for technical documentation, compliance archiving, educational materials, code reviews, and client deliverables.

In this article, you'll learn how to convert JavaScript and JSX files to Word documents in Python using Spire.Doc for Python, including basic conversion, advanced formatting techniques, batch processing, and PDF export.

Quick Navigation

Understanding the Conversion Workflow
Prerequisites
Basic Implementation of JavaScript to Word Conversion
Advanced Scenarios
Common Pitfalls
Conclusion
FAQ

1. Understanding the Conversion Workflow

The conversion process uses Pygments to generate syntax-highlighted HTML, then imports this HTML into a Word document using Spire.Doc's HTML import functionality:

Read source code from .js or .jsx files
Generate syntax-highlighted HTML using Pygments' highlight() function
Import the HTML into Word using AppendHTML()

This approach provides syntax coloring through Pygments' built-in styles, while Spire.Doc handles document structure including margins, headers, footers, and multi-format export. It provides a simple and flexible API for automating the conversion process.

2. Prerequisites

Before converting JavaScript files to Word documents in Python, you need to install Spire.Doc for Python and Pygments:

pip install spire.doc
pip install pygments

Verify the packages are available:

import spire.doc
from pygments import highlight
from pygments.formatters import HtmlFormatter

Alternatively, you can download Spire.Doc for Python and add it to your project.

3. Basic Implementation

The following example converts a JavaScript file to a Word document with syntax highlighting:

from spire.doc import *
from pygments import highlight
from pygments.lexers import JavascriptLexer
from pygments.formatters import HtmlFormatter

def convert_js_to_word(input_file: str, output_file: str) -> None:
    """Convert JavaScript file to Word document with syntax highlighting."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        js_code = file.read()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    title_paragraph = section.AddParagraph()
    title_text = title_paragraph.AppendText(f"Source Code: {input_file}")
    title_text.CharacterFormat.FontName = "Arial"
    title_text.CharacterFormat.FontSize = 14
    title_text.CharacterFormat.Bold = True
    title_paragraph.Format.AfterSpacing = 10
    
    html_formatter = HtmlFormatter(
        nowrap=True,
        style='colorful',
        noclasses=True
    )
    
    highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
    
    code_paragraph = section.AddParagraph()
    code_paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.Docx)
    document.Close()
    
    print(f"Converted {input_file} to {output_file}")

convert_js_to_word("app.js", "JavaScriptCode.docx")

Word document showing JavaScript code with blue keywords, green strings, and gray comments

Key Components

Document – Word document container for sections, paragraphs, and content
Section – Document section with page setup properties (margins, orientation)
Paragraph – Text container with formatting options
AppendHTML() – Imports HTML content into the paragraph, including inline styles for colors and fonts
highlight() – Pygments function that generates syntax-highlighted output
HtmlFormatter – Pygments formatter producing HTML with inline styles (use noclasses=True)
JavascriptLexer – Pygments lexer that identifies JavaScript syntax elements

Spire.Doc can import syntax-highlighted HTML generated by Pygments, allowing JavaScript code formatting and colors to be preserved in Word documents.

4. Advanced Scenarios

Convert JSX Files

For JSX files, it's recommended to use JsxLexer instead of JavascriptLexer to achieve more accurate syntax highlighting for component tags and embedded JSX expressions.

Example JSX input (App.jsx):

``jsx import React, { useState } from 'react';

const TodoList = () => { const [todos, setTodos] = useState([]);

return (
    <div className="todo-container">
        <h1>My Tasks</h1>
    </div>
);

};

export default TodoList;


Use `JsxLexer` when generating syntax-highlighted HTML:

```python
from pygments.lexers import JsxLexer

highlighted_html = highlight(
    jsx_code,
    JsxLexer(),
    html_formatter
)

Then convert the highlighted JSX content to Word using the same AppendHTML() workflow:

convert_js_to_word("App.jsx", "ReactComponent.docx")

The conversion result looks like this:

Word document showing JSX code with blue keywords, green strings, and gray comments

JsxLexer provides improved recognition for JSX tags, attributes, and embedded expressions compared to the standard JavaScript lexer, resulting in more accurate syntax coloring in the generated Word document.

Batch Convert Multiple Files

If you need to convert large numbers of JavaScript or JSX files, you can automate the process by scanning a folder and generating Word documents in batches.

import os
from pathlib import Path

def batch_convert_js_files(source_folder: str, output_folder: str) -> None:
    """Convert all JavaScript files in a folder to Word documents."""
    
    Path(output_folder).mkdir(parents=True, exist_ok=True)
    
    js_extensions = ('.js', '.jsx', '.mjs')
    
    converted_count = 0
    error_count = 0
    
    for filename in os.listdir(source_folder):
        if filename.lower().endswith(js_extensions):
            input_path = os.path.join(source_folder, filename)
            
            base_name = os.path.splitext(filename)[0]
            output_path = os.path.join(output_folder, f"{base_name}.docx")
            
            try:
                convert_js_to_word(input_path, output_path)
                converted_count += 1
            except Exception as e:
                print(f"Error converting {filename}: {str(e)}")
                error_count += 1
    
    print(f"\nBatch conversion complete:")
    print(f"  Converted: {converted_count} files")
    print(f"  Errors: {error_count} files")

batch_convert_js_files("src/scripts", "output/docs")

Add Line Numbers

Line numbers can improve readability during code reviews, audits, or technical documentation. Since Word HTML rendering may not fully support Pygments' built-in line number layouts, a practical approach is to prepend custom line numbers after syntax highlighting.

html_formatter = HtmlFormatter(
    nowrap=True,
    noclasses=True,
    style="colorful"
)

highlighted_html = highlight(
    js_code,
    JavascriptLexer(),
    html_formatter
)

highlighted_lines = highlighted_html.splitlines()

numbered_lines = []

for index, line in enumerate(highlighted_lines, start=1):

    numbered_line = (
        f'<span style="color: gray; font-weight: bold;">'
        f'{index:4d}  '
        f'</span>{line}'
    )

    numbered_lines.append(numbered_line)

combined_html = (
    '<pre style="font-family: Consolas; '
    'font-size: 10pt; line-height: 1.4;">'
    + '\n'.join(numbered_lines) +
    '</pre>'
)

paragraph.AppendHTML(combined_html)

The generated Word document with line numbers looks like this:

Word document showing JavaScript code with blue keywords, green strings, and gray comments with line numbers

Add Headers and Footers

Headers and footers help organize generated Word documents by adding titles, page numbers, and document metadata. This is especially useful for formal reports or exported technical documentation.

def add_document_metadata(section: Section, document_title: str) -> None:
    """Add header and footer to document section."""
    
    header = section.HeadersFooters.Header.AddParagraph()
    header_text = header.AppendText(document_title)
    header_text.CharacterFormat.FontName = "Arial"
    header_text.CharacterFormat.FontSize = 10
    header_text.CharacterFormat.TextColor = Color.get_Black()
    header.Format.HorizontalAlignment = HorizontalAlignment.Left
    header.Format.TextAlignment = TextAlignment.Top
    
    header.Format.Borders.Bottom.BorderType = BorderStyle.Single
    header.Format.Borders.Bottom.Color = Color.get_Black()
    
    footer = section.HeadersFooters.Footer.AddParagraph()
    footer.Format.HorizontalAlignment = HorizontalAlignment.Center
    footer.Format.TextAlignment = TextAlignment.Bottom
    
    page_field = footer.AppendField("page", FieldType.FieldPage)
    page_field.CharacterFormat.FontName = "Arial"
    page_field.CharacterFormat.FontSize = 9
    
    footer.AppendText(" of ")
    total_pages_field = footer.AppendField("numPages", FieldType.FieldNumPages)
    total_pages_field.CharacterFormat.FontName = "Arial"
    total_pages_field.CharacterFormat.FontSize = 9

document = Document()
document.LoadFromFile("CodeWithLines.docx")
section = document.Sections[0]
add_document_metadata(section, "JavaScript Source Code Documentation")
document.SaveToFile("CodeWithHeadersFooters.docx", FileFormat.Docx)

The generated Word document with headers and footers looks like this:

Word document showing JavaScript code with blue keywords, green strings, and gray comments with line numbers and headers and footers

For more advanced customization options, refer to our guide on how to add headers and footers to Word documents in Python.

Export to PDF Format

In addition to DOCX output, Spire.Doc can export syntax-highlighted JavaScript code directly to PDF format. This is useful when distributing read-only documentation or sharing code outside Microsoft Word environments.

def convert_js_to_pdf(input_file: str, output_file: str) -> None:
    """Convert JavaScript file directly to PDF."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        js_code = file.read()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    html_formatter = HtmlFormatter(noclasses=True, style='colorful')
    highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
    
    paragraph = section.AddParagraph()
    paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.PDF)
    document.Close()

convert_js_to_pdf("app.js", "JavaScriptCode.pdf")

For more advanced PDF conversion techniques, including layout control and document formatting, see our detailed guide on converting Word documents to PDF in Python.

Customize Syntax Highlighting Style

Pygments provides multiple built-in color schemes:

def convert_with_custom_style(input_file: str, output_file: str, style_name: str = 'monokai') -> None:
    """Convert JavaScript to Word with custom highlighting style."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        js_code = file.read()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    html_formatter = HtmlFormatter(
        noclasses=True,
        style=style_name,
        nowrap=True
    )
    
    highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
    
    paragraph = section.AddParagraph()
    paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.Docx)
    document.Close()

convert_with_custom_style("app.js", "CodeMonokai.docx", style_name='monokai')

Available styles include: 'monokai', 'colorful', 'vim', 'vs', 'tango', 'friendly', 'default'

5. Common Pitfalls

Missing HtmlFormatter Configuration

Problem: Default HtmlFormatter generates CSS classes instead of inline styles, which Word cannot process without external stylesheets.

Solution: Always use noclasses=True:

html_formatter = HtmlFormatter(noclasses=True, style='colorful')
highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)

Encoding Errors with Special Characters

Problem: Reading files without UTF-8 encoding causes character corruption on some platforms.

Solution: Explicitly specify UTF-8 encoding:

with open(input_file, "r", encoding="utf-8") as file:
    js_code = file.read()

For files with BOM (Byte Order Mark), use utf-8-sig:

with open(input_file, "r", encoding="utf-8-sig") as file:
    js_code = file.read()

Indentation Loss

Problem: Not wrapping highlighted code in <pre> tags causes indentation to disappear.

Solution: Wrap syntax-highlighted HTML in <pre> tags:

highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
paragraph.AppendHTML(f'<pre style="font-family: Consolas;">{highlighted_html}</pre>')

ModuleNotFoundError

Problem: Package not installed in current Python environment.

Solution:

pip install spire.doc

For virtual environments, ensure activation before installation:

source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows
pip install spire.doc

Performance with Large Files

Problem: Very large JavaScript files (10,000+ lines) may cause slow conversion.

Solution: Process files in chunks:

def convert_large_file(input_file: str, output_file: str, chunk_size: int = 500) -> None:
    """Convert large JavaScript file in chunks."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        lines = file.readlines()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    html_formatter = HtmlFormatter(noclasses=True, style='colorful')
    
    for i in range(0, len(lines), chunk_size):
        chunk = ''.join(lines[i:i + chunk_size])
        highlighted_html = highlight(chunk, JavascriptLexer(), html_formatter)
        
        paragraph = section.AddParagraph()
        paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.Docx)
    document.Close()

Conclusion

This article demonstrated how to convert JavaScript and JSX files to Word documents in Python using Spire.Doc for Python and Pygments. By leveraging the highlight() function with HtmlFormatter and Spire.Doc's AppendHTML() method, developers can automate code documentation workflows with syntax highlighting.

Spire.Doc for Python provides document generation capabilities including table creation, image insertion, header/footer management, and multi-format export.

You can apply for a 30-day free license to evaluate all features.

7. FAQ

Can Spire.Doc convert JSX files to Word documents?

Yes. Pygments can highlight many JSX constructs using the JavaScript lexer, including component tags, props, and embedded expressions. However, JSX-specific syntax may not receive dedicated highlighting categories.

Does this solution require Microsoft Word installation?

No. Spire.Doc for Python operates independently without requiring Microsoft Word. The library generates DOCX files directly, making it suitable for server environments and CI/CD pipelines.

Can I convert JavaScript to formats other than DOCX?

Yes. Spire.Doc supports multiple export formats:

document.SaveToFile("output.pdf", FileFormat.PDF)
document.SaveToFile("output.html", FileFormat.Html)
document.SaveToFile("output.rtf", FileFormat.Rtf)

How do I handle TypeScript files (.ts, .tsx)?

Use TypescriptLexer:

from pygments.lexers import TypescriptLexer

highlighted_html = highlight(ts_code, TypescriptLexer(), html_formatter)

Is this approach suitable for enterprise-scale projects?

Yes. Python automation integrates with CI/CD pipelines and batch processing workflows. Local execution avoids security risks from uploading source code to online converters. Consider implementing logging, progress reporting, and error tracking for large deployments.

Can I customize syntax highlighting colors?

Yes. Pygments offers numerous built-in styles:

html_formatter = HtmlFormatter(noclasses=True, style='monokai')

Available styles: 'monokai', 'colorful', 'vim', 'vs', 'tango', 'friendly', 'default'

Published in Conversion

Tagged under

doc Python Conversion

How to Convert PDF Data to a SQL Database Using Python

2026-04-17 07:34:23 Written by Allen Yang

Tutorial on PDF to Database Conversion Using Python

Converting PDF to database is a common requirement in data-driven applications. Many business documents—such as invoices, reports, and financial records—store structured information in PDF format, but this data is not directly usable for querying or analysis.

To make this data accessible, developers often need to convert PDF to SQL by extracting structured content and inserting it into relational databases like SQL Server, MySQL, or PostgreSQL. Manually handling this process is inefficient and error-prone, especially at scale.

In this guide, we focus on extracting table data from PDFs and building a complete pipeline to transform and insert it into an SQL database in Python with Spire.PDF for Python. This approach reflects the most practical and scalable solution for real-world PDF to database workflows.

Quick Navigation

Understanding the Workflow
Prerequisites
Step 1: Extract Table Data from PDF
Step 2: Transform and Insert Data into Database
Complete Pipeline: From PDF Extraction to SQL Storage
Adapting to Other SQL Databases
Handling Other Types of PDF Data
Common Pitfalls When Converting PDF Data to a Database
Conclusion
FAQ

Understanding the Workflow

Before diving into the implementation, it's important to understand the overall process of converting PDF data into a database.

Instead of treating each operation as completely separate, this workflow can be viewed as two main stages:

PDF to Database Workflow with Python

Each stage plays a distinct role in the pipeline:

Extract Tables: Retrieve structured table data from the PDF document
Process & Store Data: Clean, structure, and insert the extracted data into a relational database
- Transform Data: Convert raw rows into structured, database-ready records
- Insert into SQL Database: Persist the processed data into an SQL database

This end-to-end pipeline reflects how most real-world systems handle PDF to database workflows—by first extracting usable data, then processing and storing it in a database for querying and analysis.

Prerequisites

Before getting started, make sure you have the following:

Python 3.x installed
Spire.PDF for Python installed:
```
pip install Spire.PDF
```
You can also download Spire.PDF for Python and add it to your project manually.
A relational database system (e.g., SQLite, SQL Server, MySQL, or PostgreSQL)

This guide demonstrates the workflow using SQLite for simplicity, while also showing how the same approach can be applied to other SQL databases.

Step 1: Extract Table Data from PDF

In most business documents, such as invoices or reports, data is organized in tables. These tables already follow a row-and-column structure, making them ideal for direct insertion into an SQL database.

Table data in PDFs is typically already structured in rows and columns, making it the most suitable format for database storage.

Extract Tables Using Python

Below is an example of how to extract table data from a PDF file using Spire.PDF:

from spire.pdf import *
from spire.pdf.common import *

# Load PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Quarterly Sales.pdf")

# Method for ligature normalization
def normalize_text(text: str) -> str:
    if not text:
        return text
    ligature_map = {
        '\ue000': 'ff', '\ue001': 'ft', '\ue002': 'ffi', '\ue003': 'ffl', '\ue004': 'ti', '\ue005': 'fi',
    }
    for k, v in ligature_map.items():
        text = text.replace(k, v)
    return text.strip()

table_data = []

# Iterate through pages
for i in range(pdf.Pages.Count): 
    # Extract tables from pages
    extractor = PdfTableExtractor(pdf)
    tables = extractor.ExtractTable(i)
    
    if tables:
        print(f"Page {i} has {len(tables)} tables.")
        for table in tables:
            rows = []
            for row in range(table.GetRowCount()):
                row_data = []
                for col in range(table.GetColumnCount()):
                    text = table.GetText(row, col)
                    text = normalize_text(text)
                    row_data.append(text.strip() if text else "")
                rows.append(row_data)
            table_data.extend(rows)

pdf.Close()

# Print extracted data
for row in table_data:
    print(row)

Below is a preview of the extracting result:

Extract PDF Table Data Using Python

Code Explanation

LoadFromFile: Loads the PDF document
PdfTableExtractor: Identifies tables within each page
GetText(row, col): Retrieves cell content
table_data: Stores extracted rows as a list of lists

At this stage, the data is extracted but still unstructured in terms of database usage. Once the table data is extracted, we need to convert it into a structured format for SQL insertion.

Alternatively, you can export the extracted data to a CSV file for validation or batch import. See: Convert PDF Tables to CSV in Python

Step 2: Transform and Insert Data into Database

Raw table data extracted from PDFs often requires cleaning and structuring before it can be inserted into an SQL database.

For simplicity, the following examples demonstrate how to process a single extracted table. In real-world scenarios, PDFs may contain multiple tables, which can be handled using the same logic in a loop.

Transform Data (Single Table Example)

structured_data = []

# Assume first row is header
headers = table_data[0]

for row in table_data[1:]:
    if not any(row):
        continue

    record = {}
    for i in range(len(headers)):
        value = row[i] if i < len(row) else ""
        record[headers[i]] = value

    structured_data.append(record)

# Preview structured data
for item in structured_data:
    print(item)

What This Step Does

Converts rows into dictionary-based records
Maps column headers to values
Filters out empty rows
Prepares structured data for database insertion

You can also:

Normalize column names for SQL compatibility
Convert numeric fields
Standardize date formats

Transforming raw PDF data into a structured format ensures it can be reliably inserted into a relational database. After transformation, the data is immediately ready for database insertion, which completes the pipeline.

Insert Data into SQLite (Single Table Example)

Using the structured data from a single table, we can dynamically create a database schema and insert records without hardcoding column names.

import sqlite3

# Connect to SQLite database
conn = sqlite3.connect("sales_data.db")
cursor = conn.cursor()

# Create table dynamically based on headers
columns_def = ", ".join([f'"{h}" TEXT' for h in headers])

cursor.execute(f"""
CREATE TABLE IF NOT EXISTS invoices (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    {columns_def}
)
""")

# Prepare insert statement
placeholders = ", ".join(["?" for _ in headers])
column_names = ", ".join([f'"{h}"' for h in headers])

# Insert data
for record in structured_data:
    values = [record.get(h, "") for h in headers]
    cursor.execute(f"""
    INSERT INTO invoices ({column_names})
    VALUES ({placeholders})
    """, values)

# Commit and close
conn.commit()
conn.close()

Key Points

Dynamically creates database tables based on extracted headers
Uses parameterized queries (?) to prevent SQL injection
Keeps the schema flexible without hardcoding column names
Column names can be normalized to ensure SQL compatibility
Batch inserts can improve performance for large datasets

This section demonstrates the core workflow for converting PDF table data into a relational database using a single table example. In the next section, we extend this approach to handle multiple tables automatically.

Complete Pipeline: From PDF Extraction to SQL Storage

Here's a complete runnable example that demonstrates the entire workflow from PDF to database:

from spire.pdf import *
from spire.pdf.common import *
import sqlite3
import re

# ---------------------------
# Utility Functions
# ---------------------------

def normalize_text(text: str) -> str:
    if not text:
        return ""
    ligature_map = {
        '\ue000': 'ff', '\ue001': 'ft', '\ue002': 'ffi',
        '\ue003': 'ffl', '\ue004': 'ti', '\ue005': 'fi',
    }
    for k, v in ligature_map.items():
        text = text.replace(k, v)
    return text.strip()


def normalize_column_name(name: str, index: int) -> str:
    if not name:
        return f"column_{index}"
    name = name.lower()
    name = re.sub(r'[^a-z0-9]+', '_', name).strip('_')
    return name or f"column_{index}"


def deduplicate_columns(columns):
    seen = set()
    result = []
    for col in columns:
        base = col
        count = 1
        while col in seen:
            col = f"{base}_{count}"
            count += 1
        seen.add(col)
        result.append(col)
    return result


# ---------------------------
# Step 1: Extract Tables (STRUCTURED)
# ---------------------------

pdf = PdfDocument()
pdf.LoadFromFile("Quarterly Sales.pdf")

extractor = PdfTableExtractor(pdf)

all_tables = []

for i in range(pdf.Pages.Count):
    tables = extractor.ExtractTable(i)

    if tables:
        for table in tables:
            table_rows = []

            for row in range(table.GetRowCount()):
                row_data = []
                for col in range(table.GetColumnCount()):
                    text = table.GetText(row, col)
                    row_data.append(normalize_text(text))
                table_rows.append(row_data)

            if table_rows:
                all_tables.append(table_rows)

pdf.Close()

if not all_tables:
    raise ValueError("No tables found in PDF.")

# ---------------------------
# Step 2 & 3: Process + Insert Each Table
# ---------------------------

conn = sqlite3.connect("sales_data.db")
cursor = conn.cursor()

for table_index, table in enumerate(all_tables):

    if len(table) < 2:
        continue  # skip invalid tables

    raw_headers = table[0]

    # Normalize headers
    normalized_headers = [
        normalize_column_name(h, i)
        for i, h in enumerate(raw_headers)
    ]
    normalized_headers = deduplicate_columns(normalized_headers)

    # Generate table name
    table_name = f"table_{table_index+1}"

    # Create table
    columns_def = ", ".join([f'"{col}" TEXT' for col in normalized_headers])

    cursor.execute(f"""
    CREATE TABLE IF NOT EXISTS "{table_name}" (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        {columns_def}
    )
    """)

    # Prepare insert
    placeholders = ", ".join(["?" for _ in normalized_headers])
    column_names = ", ".join([f'"{col}"' for col in normalized_headers])

    insert_sql = f"""
    INSERT INTO "{table_name}" ({column_names})
    VALUES ({placeholders})
    """

    # Insert data
    batch = []
    for row in table[1:]:
        if not any(row):
            continue

        values = [
            row[i] if i < len(row) else ""
            for i in range(len(normalized_headers))
        ]
        batch.append(values)

    if batch:
        cursor.executemany(insert_sql, batch)

    print(f"Inserted {len(batch)} rows into {table_name}")

conn.commit()
conn.close()

print(f"Processed {len(all_tables)} tables from PDF.")

Below is a preview of the insertion result in the database:

Extract PDF Tables and Insert into Database with Python

This complete example demonstrates the full PDF to database pipeline:

Load and extract table data from PDF using Spire.PDF
Transform raw data into structured records
Insert into SQLite database with proper schema

SQLite automatically creates a system table called sqlite_sequence when using AUTOINCREMENT to track the current maximum ID. This is expected behavior and does not affect your data. You can run this code directly to convert PDF table data into a database.

Adapting to Other SQL Databases

While this guide uses SQLite for simplicity, the same approach works for other SQL databases. The extraction and transformation steps remain identical—only the database connection and insertion syntax vary slightly.

The following examples assume you are using the normalized column names (headers) generated in the previous step.

SQL Server Example

import pyodbc

# Connect to SQL Server
conn_str = (
    "DRIVER={SQL Server};"
    "SERVER=your_server_name;"
    "DATABASE=your_database_name;"
    "UID=your_username;"
    "PWD=your_password"
)
conn = pyodbc.connect(conn_str)
cursor = conn.cursor()

# Generate dynamic column definitions using normalized headers
columns_def = ", ".join([f"[{h}] NVARCHAR(MAX)" for h in headers])

# Create table dynamically
cursor.execute(f"""
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'invoices')
BEGIN
    CREATE TABLE invoices (
        id INT IDENTITY(1,1) PRIMARY KEY,
        {columns_def}
    )
END
""")

# Prepare insert statement
placeholders = ", ".join(["?" for _ in headers])
column_names = ", ".join([f"[{h}]" for h in headers])

# Insert data
for record in structured_data:
    values = [record.get(h, "") for h in headers]
    cursor.execute(f"""
    INSERT INTO invoices ({column_names})
    VALUES ({placeholders})
    """, values)

# Commit and close
conn.commit()
conn.close()

MySQL Example

import mysql.connector

conn = mysql.connector.connect(
    host="localhost",
    user="your_username",
    password="your_password",
    database="your_database"
)
cursor = conn.cursor()

# Use the same dynamic table creation and insert logic as shown earlier,
# with minor syntax adjustments if needed

PostgreSQL Example

import psycopg2

conn = psycopg2.connect(
    host="localhost",
    database="your_database",
    user="your_username",
    password="your_password"
)
cursor = conn.cursor()

# Use the same dynamic table creation and insert logic as shown earlier,
# with minor syntax adjustments if needed

The core extraction and transformation steps remain the same across different SQL databases, especially when using normalized column names for compatibility.

Handling Other Types of PDF Data

While this guide focuses on table extraction, PDFs often contain other types of data that can also be integrated into a database, depending on your use case.

Text Data (Unstructured → Structured)

In many documents, important information such as invoice numbers, customer names, or dates is embedded in plain text rather than tables.

You can extract raw text using:

from spire.pdf import *

pdf = PdfDocument()
pdf.LoadFromFile("Quarterly Sales.pdf")

for i in range(pdf.Pages.Count):
    page = pdf.Pages.get_Item(i)
    extractor = PdfTextExtractor(page)
    options = PdfTextExtractOptions()
    options.IsExtractAllText = True
    text = extractor.ExtractText(options)
    print(text)

However, raw text cannot be directly inserted into a database. It typically requires parsing into structured fields, for example:

Using regular expressions to extract key-value pairs
Identifying patterns such as dates, IDs, or totals
Converting text into dictionaries or structured records

Once structured, the data can be inserted into a database as part of the same transformation and insertion pipeline described earlier.

For more advanced techniques, you can learn more in the detailed Python PDF text extraction guide.

Images (OCR or File Reference)

Images in PDFs are usually not directly usable as structured data, but they can still be integrated into database workflows in two ways:

Option 1: OCR (Recommended for data extraction) Convert images to text using OCR tools, then process and store the extracted content.

Option 2: File Storage (Recommended for document systems) Store images as:

File paths in the database
Binary (BLOB) data if needed

Below is an example of extracting images:

from spire.pdf import *

pdf = PdfDocument()
pdf.LoadFromFile("Quarterly Sales.pdf")

helper = PdfImageHelper()

for i in range(pdf.Pages.Count):
    page = pdf.Pages.get_Item(i)
    images = helper.GetImagesInfo(page)
    for j, img in enumerate(images):
        img.Image.Save(f"image_{i}_{j}.png")

To further process image-based content, you can use OCR to extract text from images with Spire.OCR for Python.

Full PDF Storage (BLOB or File Reference)

In some scenarios, the goal is not to extract structured data, but to store the entire PDF file in a database.

This is commonly used in:

Document management systems
Archival systems
Compliance and auditing workflows

You can store PDFs as:

BLOB data in the database
File paths referencing external storage

This approach represents another meaning of "PDF in database", but it is different from structured data extraction.

Key Takeaway

While PDFs can contain multiple types of content, table data remains the most efficient and scalable format for database integration. Other data types typically require additional processing before they can be stored or queried effectively.

Common Pitfalls When Converting PDF Data to a Database

While the process of converting PDF to a database may seem straightforward, several practical challenges can arise.

1. Inconsistent Table Structures

Not all PDFs follow a consistent table format:

Missing columns
Merged cells
Irregular layouts

Solution:

Validate row lengths
Normalize structure
Handle missing values

2. Poor Table Detection

Some PDFs do not define tables properly internally, such as no grid structure or irregular cell sizes.

Solution:

Test with multiple files
Use fallback parsing logic
Preprocess PDFs if needed

3. Data Cleaning Issues

Extracted data may contain:

Extra spaces
Line breaks
Formatting issues

Solution:

Strip whitespace
Normalize values
Validate types

4. Character Encoding Issues (Ligatures & Fonts)

PDF table extraction can introduce unexpected characters due to font encoding and ligatures. For example, common letter combinations such as:

fi, ff, ffi, ffl, ft, ti

may be stored as single glyphs in the PDF. When extracted, they may appear as:

di\ue000erence   → difference
o\ue002ce        → office
\ue005le         → file

These are typically private Unicode characters (e.g., \ue000–\uf8ff) caused by custom font mappings.

Solution:

Detect private Unicode characters (\ue000–\uf8ff)
Build a mapping table for ligatures, such as:
- \ue000 → ff
- \ue001 → ft
- \ue002 → ffi
- \ue003 → ffl
- \ue004 → ti
- \ue005 → fi
Normalize text before inserting into the database
Optionally log unknown characters for further analysis

Handling encoding issues properly ensures data accuracy and prevents subtle corruption in downstream processing.

5. Cross-Page Table Fragmentation

Large tables in PDFs are often split across multiple pages. When extracted, each page may be treated as a separate table, leading to:

Broken datasets
Repeated headers
Incomplete records

Solution:

Compare column counts between consecutive tables
Check header consistency or data type patterns in the first row
Merge tables when structure and schema match
Skip duplicated header rows when concatenating data

In practice, combining column structure and value pattern detection provides a reliable way to reconstruct full tables across pages.

6. Database Schema Mismatch

Incorrect mapping between extracted data and database columns can cause errors.

Solution:

Align headers with schema
Use explicit field mapping

7. Performance Issues with Large Files

Processing large PDFs can be slow.

Solution:

Use batch processing
Optimize insert operations

By anticipating these issues, you can build a more reliable PDF to database workflow.

Conclusion

Converting PDF to a database is not a one-step operation, but a structured process involving extracting data and processing it for database storage (including transformation and insertion)

By focusing on table data and using Python, you can efficiently implement a complete PDF to database pipeline, making it easier to automate data integration tasks.

This approach is especially useful for handling invoices, reports, and other structured business documents that need to be stored in SQL Server or other relational databases.

If you want to evaluate the performance of Spire.PDF for Python and remove any limitations, you can apply for a 30-day free trial.

FAQ

What does "PDF to database" mean?

It refers to the process of extracting structured data from PDF files and storing it in a database. This typically involves parsing PDF content, transforming it into structured formats, and inserting it into SQL databases for further querying and analysis.

Can Python convert PDF directly to a database?

No. Python cannot directly convert a PDF into a database in one step. The process usually involves extracting data from the PDF first, transforming it into structured records, and then inserting it into a database using SQL connectors.

How do I convert PDF to SQL using Python?

The typical workflow includes:

Extracting table or text data from the PDF
Converting it into structured records (rows and columns)
Inserting the processed data into an SQL database such as SQLite, MySQL, or SQL Server using Python database libraries

Can I store PDF files directly in a database?

Yes. PDF files can be stored as binary (BLOB) data in a database. However, this approach is mainly used for document storage systems, while structured extraction is preferred for data analysis and querying.

What SQL databases can I use for PDF data integration?

You can use almost any SQL database, including SQLite, SQL Server, MySQL, and PostgreSQL. The overall extraction and transformation process remains the same, while only the database connection and insertion syntax differ slightly.

Published in Conversion

Tagged under

pdf Python Conversion

How to Import Excel File in Python (List, Dict, Database)

2026-03-20 07:27:19 Written by alice yang

Tutorial on How to Import Excel Data to Python

Importing an Excel file in Python typically involves more than just reading the file. In most cases, the data needs to be converted into Python structures such as lists, dictionaries, or other formats that can be directly used in your application.

This transformation step is important because Excel data is usually stored in a tabular format, while Python applications often require structured data for processing, integration, or storage. Depending on how the data will be used, it may be represented as a list for sequential processing, a dictionary for field-based access, custom objects for structured modeling, or a database for persistent storage.

This guide demonstrates how to import Excel file in Python and convert the data into multiple structures using Spire.XLS for Python, with practical examples for each approach.

Overall Implementation Approach and Quick Example

Importing Excel data into Python is essentially a two-step process:

Load Excel file – Load the Excel file and access its raw data
Transform data – Convert the data into Python structures such as lists, dictionaries, or objects

This separation is important because in real-world applications, simply reading Excel is not enough—the data must be transformed into a format that can be processed, stored, or integrated into systems.

Key Components

When importing Excel data using Spire.XLS for Python, the following components are involved:

Workbook – Represents the entire Excel file and is responsible for loading data from disk
Worksheet – Represents a single sheet within the Excel file
CellRange – Represents a group of cells that contain actual data
Data Transformation Layer – Your Python logic that converts cell values into target structures

Data Flow Overview

The typical workflow looks like this:

Excel File → Workbook → Worksheet → CellRange → Python Data Structure

Understanding this pipeline helps you design flexible import logic for different scenarios.

Quick Example: Import Excel File in Python

Before running the example, install Spire.XLS for Python using pip:

pip install spire.xls

If needed, you can also download Spire.XLS for Python manually and include it in your project.

The following example shows the simplest way to import Excel data into Python:

from spire.xls import *

workbook = Workbook()
workbook.LoadFromFile("SalesReport.xlsx")

data = []
sheet = workbook.Worksheets[0]

# Get the used cell range
cellRange = sheet.AllocatedRange

# Get the data from the first row
for col in range(cellRange.Columns.Count):
    data.append(sheet.Range[1, col +1].Value)

print(data)

workbook.Dispose()

Below is a preview of the data imported from the Excel file:

Import Data from Excel File in Python

This minimal example demonstrates the fundamental workflow: initialize a workbook, load the Excel file, access the worksheet and cell data, and then dispose of the workbook to release resources.

For more advanced scenarios, such as reading Excel files from memory or handling file streams, see how to import Excel data from a stream in Python.

Import Excel Data in Python as a List

One of the simplest ways to import Excel data in Python is to convert it into a list of rows. This structure is useful for iteration and basic data processing.

Example

from spire.xls import *

# Load the Workbook
workbook = Workbook()
workbook.LoadFromFile("SalesReport.xlsx")

# Get the used range in the first worksheet
sheet = workbook.Worksheets[0]
cellRange = sheet.AllocatedRange

# Create a list to store the data
data = []
for row_index in range(cellRange.RowCount):
    row_data = []
    for cell_index in range(cellRange.ColumnCount):
        row_data.append(cellRange[row_index + 1, cell_index + 1].Value)
    data.append(row_data)

workbook.Dispose()

Technical Explanation

Importing Excel data as a list treats each row in the worksheet as a Python list, preserving the original row order.

How the code works:

A nested loop is used to traverse the worksheet in a row-first (row-major) pattern
The outer loop iterates through rows, while the inner loop accesses each cell
Index offsets (+1) are applied because Spire.XLS uses 1-based indexing

Why this design works:

AllocatedRange limits iteration to only populated cells, improving efficiency
Row-by-row extraction keeps the structure consistent with Excel’s layout
The intermediate row_data list ensures clean aggregation before appending

This structure is ideal for sequential processing, simple transformations, or as a base format before converting into dictionaries or objects.

If you want to load more than just text and numeric data, see How to Read Excel Files in Python for more data types.

Import Excel Data as a Dictionary in Python

If your Excel file contains headers, importing it as a dictionary provides better data organization and access by column names.

Example

from spire.xls import *

workbook = Workbook()
workbook.LoadFromFile("SalesReport.xlsx")

sheet = workbook.Worksheets[0]
cellRange = sheet.AllocatedRange

rows = list(cellRange.Rows)

headers = [cellRange[1, cell_index + 1].Value for cell_index in range(cellRange.ColumnCount)]

data_dict = []
for row in rows[1:]:
    row_dict = {}
    for i, cell in enumerate(row.Cells):
        row_dict[headers[i]] = cell.Value
    data_dict.append(row_dict)

workbook.Dispose()

Technical Explanation

Importing Excel data as a dictionary converts each row into a key-value structure using column headers.

How the code works:

The first row is extracted as headers
Each subsequent row is iterated and processed
Cell values are mapped to headers using their column index

Why this design works:

Both headers and row cells follow the same column order, enabling simple index-based mapping
This removes reliance on fixed column positions
The result is a self-descriptive structure with named fields

This method is useful when you need structured data access, such as working with JSON, APIs, or labeled datasets.

Import Excel Data into Custom Objects

For structured applications, you may need to import Excel data into Python objects to maintain type safety and encapsulate business logic.

Example

class Employee:
    def __init__(self, name, age, department):
        self.name = name
        self.age = age
        self.department = department

from spire.xls import *
from spire.xls.common import *

workbook = Workbook()
workbook.LoadFromFile("EmployeeData.xlsx")

sheet = workbook.Worksheets[0]
cellRange = sheet.AllocatedRange

employees = []
for row in list(cellRange.Rows)[1:]:
    name = row.Cells[0].Value
    age = int(row.Cells[1].Value) if row.Cells[1].Value else None
    department = row.Cells[2].Value

    emp = Employee(name, age, department)
    employees.append(emp)

workbook.Dispose()

Technical Explanation

Importing Excel data into objects maps each row to a structured class instance.

How the code works:

A class is defined to represent the data model
Each row is read and its values are extracted
Values are passed into the class constructor to create objects

Why this design works:

The constructor acts as a controlled transformation point
It allows validation, type conversion, or preprocessing
Data is no longer loosely structured, but aligned with domain logic

This is ideal for applications with clear data models, such as backend systems or business logic layers.

Import Excel Data to Database in Python

In many applications, Excel data needs to be stored in a database for persistent storage and querying.

Example

import sqlite3
from spire.xls import *

# Connect to SQLite database
conn = sqlite3.connect("sales.db")
cursor = conn.cursor()

# Create table matching the Excel structure
cursor.execute("""
CREATE TABLE IF NOT EXISTS sales (
    product TEXT,
    category TEXT,
    region TEXT,
    sales REAL,
    units_sold INTEGER
)
""")

# Load the Excel file
workbook = Workbook()
workbook.LoadFromFile("Sales.xlsx")

# Access the first worksheet
sheet = workbook.Worksheets[0]
rows = list(sheet.AllocatedRange.Rows)

# Iterate through rows (skip header row)
for row in rows[1:]:
    product = row.Cells[0].Value
    category = row.Cells[1].Value
    region = row.Cells[2].Value

    # Remove thousand-separators and convert to float
    sales_text = row.Cells[3].Value
    sales = float(str(sales_text).replace(",", "")) if sales_text else 0

    # Convert units sold to integer
    units_text = row.Cells[4].Value
    units_sold = int(units_text) if units_text else 0

    # Insert data into the database
    cursor.execute(
        "INSERT INTO sales VALUES (?, ?, ?, ?, ?)",
        (product, category, region, sales, units_sold)
    )

# Commit changes and close connection
conn.commit()
conn.close()

# Release Excel resources
workbook.Dispose()

Here is a preview of the Excel data and the SQLite database structure:

Import Excel Data to Database in Python

Technical Explanation

Importing Excel data into a database converts each row into a persistent record.

How the code works:

A database connection is established and a table is created
The table schema is aligned with the Excel structure
Each row is read and inserted using parameterized SQL queries

Why this design works:

Schema alignment ensures consistent data mapping
Data normalization (e.g., numeric conversion) improves compatibility
Parameterized queries provide safety and proper type handling

When to use this approach:

This approach is suitable for data storage, querying, and integration into larger data pipelines.

For a more detailed guide on importing Excel data into Databases, check out How to Transfer Data Between Excel Files and Databases.

Why Use Spire.XLS for Importing Excel Data

The examples in this guide use Spire.XLS for Python because it provides a clear and consistent way to access and transform Excel data. The main advantages in this context include:

Structured Object Model The library exposes components such as Workbook, Worksheet, and CellRange, which align directly with how Excel data is organized. This makes the data flow easier to understand and implement. See more details on Spire.XLS for Python API Reference.
Focused Data Access Layer Instead of handling low-level file parsing, you can work directly with cell values and ranges, allowing the import logic to focus on data transformation rather than file structure.
Format Compatibility It supports common Excel formats, such as XLS and XLSX, and other spreadsheet formats, such as CSV, ODS, and OOXML, enabling the same import logic to be applied across different file types.
No External Dependencies Excel files can be processed without requiring Microsoft Excel to be installed, which is important for backend services and automated environments.

Common Pitfalls

Incorrect File Path

Ensure the Excel file path is correct and accessible from your script. Use absolute paths or verify the current working directory.

import os
print(os.getcwd())  # Check current directory

Missing Headers

When importing as a dictionary, verify that your Excel file has headers in the first row. Otherwise, the keys will be incorrect.

Memory Management

Always dispose of the workbook object after processing to release resources, especially when processing large files.

workbook.Dispose()

Data Type Conversion

Excel cells may return different data types than expected. Validate and convert data types as needed for your application.

Import vs Read Excel in Python

In Python, "reading" and "importing" Excel files refer to related but distinct steps in data processing.

Read Excel focuses on accessing raw file content. This typically involves retrieving cell values, rows, or specific ranges without changing how the data is structured.

Import Excel includes both reading and transformation. After extracting the data, it is converted into structures such as lists, dictionaries, objects, or database records so that it can be used directly within an application.

In practice, reading is a subset of importing. The distinction lies in the goal—reading retrieves data, while importing prepares it for use.

Conclusion

Importing Excel file in Python is not just about reading data—it's about converting it into structures that your application can use effectively. In this guide, you learned how to import Excel file in Python as a list, convert Excel data into dictionaries, map Excel data into Python objects, and import Excel data into a database.

With Spire.XLS for Python, you can easily import Excel data into different structures with minimal code. The library provides a consistent API for handling various Excel formats and complex content, making it suitable for a wide range of data processing scenarios.

To evaluate the full performance of Spire.XLS for Python, you can apply for a 30 day trial license.

FAQ

What does it mean to import Excel file in Python?

Importing Excel means converting Excel data into Python structures such as lists, dictionaries, or databases for further processing and integration into your applications.

How do I import Excel data into Python?

You can use libraries like Spire.XLS for Python to load Excel files and convert their content into usable Python data structures. The process involves loading the workbook, accessing the worksheet, and iterating through cells to extract data.

Can I import Excel data into a database using Python?

Yes, you can read Excel data and insert it into databases like SQLite, MySQL, or PostgreSQL using Python. This approach is commonly used for data migration and backend system integration.

What is the best structure for importing Excel data?

The best structure depends on your use case. Lists are suitable for simple iteration, dictionaries for structured data access by column names, objects for type safety and business logic, and databases for persistent storage and querying.

Do I need Microsoft Excel installed to import Excel files in Python?

No, libraries like Spire.XLS for Python work independently and do not require Microsoft Excel to be installed on the system.

Published in Document Operation

Tagged under

xls Python Document Operation

Convert Python Code to Word (Plain or Syntax-Highlighted)

2026-02-11 05:31:09 Written by Jack Du

Convert Python Code to Word Files

Developers often need to include Python code inside Word documents for technical documentation, tutorials, code reviews, internal reports, or client deliverables. While copying and pasting code manually works for small snippets, automated solutions provide better consistency, formatting control, and scalability — especially when working with long scripts or multiple files.

This tutorial demonstrates multiple practical methods to export Python code into Word documents using Python. Each method has its own strengths depending on whether you prioritize formatting, automation, syntax highlighting, or readability.

On This Page:

Install Required Libraries
Export Python Code to Word as Plain Text
- Method 1. Insert Raw Python Code into a Word Document
- Method 2. Generate a Word File from Markdown-Wrapped Code
Add Syntax-Highlighted Python Code to Word
Conclusion
FAQs

Install Required Libraries

Install the necessary dependencies before running the examples:

pip install spire.doc pygments

Library Overview:

Spire.Doc for Python — used to create and manipulate Word documents programmatically
Pygments — used to generate syntax-highlighted code in RTF, HTML, or image formats
Pathlib (built-in) — used for reading Python files from disk
textwrap (built-in) — used to wrap long code lines before generating images formatting

Export Python Code to Word as Plain Text

Plain text insertion is the most straightforward method for embedding code in Word. It keeps scripts fully editable and preserves formatting such as indentation and line breaks.

Method 1. Insert Raw Python Code into a Word Document

This method reads a .py file and inserts the code directly into Word while applying a monospace font style.

from pathlib import Path
from spire.doc import *

# Read Python file
code_string = Path("demo.py").read_text(encoding="utf-8")

# Create a Word document
doc = Document()

# Add a section
section = doc.AddSection()
section.PageSetup.Margins.All = 60

# Add a paragraph
paragraph = section.AddParagraph()

# Insert code string to the paragraph
paragraph.AppendText(code_string)

# Create a paragraph style
style = ParagraphStyle(doc)
style.Name = "code"
style.CharacterFormat.FontName = "Consolas"
style.CharacterFormat.FontSize = 12
style.ParagraphFormat.LineSpacing = 12
doc.Styles.Add(style)

# Apply the style to the paragraph
paragraph.ApplyStyle("code")

# Save the document
doc.SaveToFile("Output.docx", FileFormat.Docx2019)
doc.Dispose()

How It Works:

This technique treats Python code as plain text and inserts it directly into a Word paragraph. The script reads the .py file using Path.read_text(), preserving indentation, blank lines, and overall structure.

After inserting the text, a custom paragraph style is created and applied. The use of a monospace font such as Consolas ensures alignment and readability, while fixed line spacing maintains consistent formatting across lines.

Because no intermediate format is used, this is the simplest and fastest approach. However, it does not provide syntax highlighting or semantic styling—Word only displays the code as formatted text.

Output:

Insert Python Code into Word

You May Also Like: Generate Word Documents Using Python

Method 2. Generate a Word File from Markdown-Wrapped Code

If your workflow already uses Markdown, wrapping Python code inside fenced blocks provides a structured way to convert scripts into Word documents.

from pathlib import Path
from spire.doc import *

# Read Python file
code = Path("demo.py").read_text(encoding="utf-8")

# Convert to Markdown
md_content = f"```python\n{code}\n```"
Path("temp.md").write_text(md_content, encoding="utf-8")

# Load Markdown into Word
doc = Document()
doc.LoadFromFile("temp.md")

# Update page settings
doc.Sections[0].PageSetup.Margins.All = 60

# Save as a DOCX file
doc.SaveToFile("Output.docx", FileFormat.Docx)
doc.Dispose()

How It Works:

Instead of inserting text directly, this method wraps Python code inside Markdown fenced code blocks. The generated Markdown file is then loaded into Word using Spire.Doc’s Markdown parsing capability.

When Word imports Markdown, it automatically preserves code formatting such as indentation and line breaks. This approach is useful when your documentation workflow already uses Markdown or when code needs to coexist with headings, lists, and descriptive text.

Since Markdown itself does not inherently apply syntax coloring inside Word, the result is still plain code formatting—but the structure is cleaner and easier to manage within technical documentation pipelines.

Output:

Convert Markdown-Wrapped Code to Word

Add Syntax-Highlighted Python Code to Word

Syntax highlighting makes code easier to read and understand. By integrating Pygments, Python scripts can be converted into stylized formats before being embedded into Word.

This section explores three approaches — RTF, HTML, and image rendering — each with different strengths depending on your formatting goals.

Method 1. Use RTF for Preformatted Code Blocks

RTF allows syntax-highlighted code to remain fully editable within Word.

from pathlib import Path
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import RtfFormatter
from spire.doc import *

# Read Python file
code = Path("demo.py").read_text(encoding="utf-8")

# Set font
formatter = RtfFormatter(fontface ="Consolas")

# Specify the lexer
rtf_text = highlight(code, PythonLexer(), formatter)
rtf_text = rtf_text.replace(r"\f0", r"\f0\fs24") # font size (24 for 12-point font)

# Create a Word document
doc = Document()

# Add a section
section = doc.AddSection()
section.PageSetup.Margins.All = 60

# Add a paragraph
paragraph = section.AddParagraph()

# Insert the syntax-highlighted code as RTF
paragraph.AppendRTF(rtf_text)

# Save the document
doc.SaveToFile("Output.docx", FileFormat.Docx2019)
doc.Dispose()

How It Works:

Pygments analyzes Python syntax using a lexer, identifying tokens such as keywords, strings, and comments. The RTF formatter applies styling rules that represent colors and fonts using RTF control words.

The resulting RTF string is inserted directly into Word using AppendRTF(). Because RTF is a native Word-compatible format, the document preserves fonts, colors, and spacing without requiring additional rendering steps.

Font size is controlled by modifying RTF control words (e.g., \fs24), allowing precise control over appearance. This method produces editable, selectable code with syntax highlighting inside Word.

Output:

Convert Code to Word with Syntax Highlighting via RTF

Method 2. Render Highlighted Code via HTML Formatting

HTML rendering provides visually rich syntax highlighting and automatic text wrapping.

from pathlib import Path
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
from spire.doc import *

# Read Python file
code = Path("demo.py").read_text(encoding="utf-8")

# Generate HTML from the Python code with syntax highlighting
html_text = highlight(code, PythonLexer(), HtmlFormatter(full=True))

# Create a Word document
doc = Document()

# Add a section
section = doc.AddSection()
section.PageSetup.Margins.All = 60

# Add a paragraph
paragraph = section.AddParagraph()

# Add the HTML string to the paragraph
paragraph.AppendHTML(html_text)

# Save the document
doc.SaveToFile("Output.docx", FileFormat.Docx2019)
doc.Dispose()

How It Works:

Here, Pygments converts Python code into styled HTML using the HtmlFormatter. The HTML output includes inline styles or CSS rules that represent syntax colors and formatting.

Spire.Doc then interprets the HTML content and renders it into Word. During this process, HTML elements are translated into Word formatting structures, allowing the highlighted code to appear visually similar to web-based code blocks.

This approach is ideal when code originates from web content, static documentation sites, or Markdown-to-HTML workflows.

Output:

Convert Code to Word with Syntax Highlighting via HTML

You May Also Like: Convert HTML to Word DOC or DOCX in Python

Method 3. Insert Syntax-Highlighted Code as Images

For scenarios where visual consistency matters more than editability, code can be rendered as an image before insertion.

from pathlib import Path
import textwrap
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import ImageFormatter
from spire.doc import *

# Read Python file
code = Path("demo.py").read_text(encoding="utf-8")

# Wrap long lines manually
def wrap_code_lines(code_text, max_width=75):
    wrapped_lines = []
    for line in code_text.splitlines():
        if len(line) > max_width:
            wrapped_lines.extend(textwrap.wrap(
                line,
                width=max_width,
                replace_whitespace=False,
                drop_whitespace=False
            ))
        else:
            wrapped_lines.append(line)
    return "\n".join(wrapped_lines)

code = wrap_code_lines(code, max_width=75)

# Step 3: Generate image
formatter = ImageFormatter(
    font_name="Consolas",
    font_size=18,
    scale=2,            
    image_pad=10,
    line_pad=2,
    background_color="#ffffff"
)

img_bytes = highlight(code, PythonLexer(), formatter)

with open("code.png", "wb") as f:
    f.write(img_bytes)

# Create a Word document
doc = Document()
section = doc.AddSection()
section.PageSetup.Margins.All = 60

# Insert into Word
paragraph = section.AddParagraph()
picture = paragraph.AppendPicture("code.png")

# Ensure image fits page width
page_width = (
    section.PageSetup.PageSize.Width
    - section.PageSetup.Margins.Left
    - section.PageSetup.Margins.Right
)
picture.Width = page_width

# Save the document
doc.SaveToFile("Output.docx", FileFormat.Docx2019)
doc.Dispose()

How It Works:

This method renders Python code as an image instead of editable text. Pygments generates a syntax-highlighted bitmap using the ImageFormatter, allowing full visual control over fonts, colors, padding, and DPI.

Since image rendering does not automatically wrap long lines, the script manually wraps lengthy code lines using Python’s textwrap module before generating the image. This prevents oversized images that exceed page width.

After inserting the image into Word, its width is dynamically resized to fit the printable page area. Because the code is embedded as a graphic, it preserves exact visual appearance across platforms and prevents formatting inconsistencies—but the text is no longer editable.

Output:

Insert Syntax-Highlighted Code as Images in Word

Conclusion

Converting Python code to Word documents can be achieved through several approaches depending on your goals. Plain text methods provide simplicity and flexibility, while RTF and HTML techniques offer powerful syntax highlighting with selectable text. Image-based code blocks deliver consistent visual formatting but require careful line wrapping and scaling.

For most documentation workflows:

Use plain text for editable technical content
Use HTML or RTF for syntax-highlighted documentation
Use images when formatting consistency is critical

FAQs

Which method is best for tutorials?

HTML or RTF methods provide clear syntax highlighting with selectable text.

How can I preserve indentation and blank lines?

Read the .py file using .read_text() without stripping or modifying lines.

Why do image-based code blocks become too small?

Word scales images to fit page width. Increasing the image formatter’s scale or adjusting the wrapping width can improve readability.

Can readers copy code from Word?

Yes — except when code is inserted as an image.

Do I need Markdown for conversion?

No. Markdown is optional but useful when working with documentation pipelines.

Can I export the generated document as a PDF file?

Yes. When saving the document, simply specify PDF as the output format in the Document.SaveToFile() method.

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a 30-day trial license.

Published in Conversion

Tagged under

doc Python Conversion

Create a CSV File in Python: Simple & Advanced Examples

2026-01-23 03:38:16 Written by Jane Zhao

A guide to create a CSV file using Python

CSV (Comma-Separated Values) files are the backbone of data exchange across industries—from data analysis to backend systems. They’re lightweight, human-readable, and compatible with almost every tool (Excel, Google Sheets, databases). If you’re a developer seeking a reliable way to create a CSV file in Python, Spire.XLS for Python is a powerful library that simplifies the process.

In this comprehensive guide, we'll explore how to generate a CSV file in Python with Spire.XLS, covering basic CSV creation and advanced use cases like list to CSV and Excel to CSV conversion.

What You’ll Learn

Installation and Setup
Basic: Create a Simple CSV File in Python
Dynamic Data: Generate CSV from a List of Dictionaries in Python
Excel-to-CSV: Generate CSV From an Excel File in Python
Best Practices for CSV Creation
FAQ: Create CSV in Python

Installation and Setup

Getting started with Spire.XLS for Python is straightforward. Follow these steps to set up your environment:

Step 1: Ensure Python 3.6 or higher is installed.

Step 2: Install the library via pip (the official package manager for Python):

pip install Spire.XLS

Step 3 (Optional): Request a temporary free license to test full features without any limitations.

Basic: Create a Simple CSV File in Python

Let’s start with a simple scenario: creating a CSV file from scratch with static data (e.g., a sales report). The code below creates a new workbook, populates it with data, and saves it as a CSV file.

from spire.xls import *
from spire.xls.common import *

# 1. Create a new workbook
workbook = Workbook()
    
# 2. Get the first worksheet (default sheet)
worksheet = workbook.Worksheets[0]

# 3. Populate data into cells
# Header row
worksheet.Range["A1"].Text = "ProductID"
worksheet.Range["B1"].Text = "ProductName"
worksheet.Range["C1"].Text = "Price"
worksheet.Range["D1"].Text = "QuantitySold"

worksheet.Range["A2"].NumberValue = 101
worksheet.Range["B2"].Text = "Wireless Headphones"
worksheet.Range["C2"].NumberValue = 79.99
worksheet.Range["D2"].NumberValue = 250

worksheet.Range["A3"].NumberValue = 102
worksheet.Range["B3"].Text = "Bluetooth Speaker"
worksheet.Range["C3"].NumberValue = 49.99
worksheet.Range["D3"].NumberValue = 180

# Save the worksheet to CSV
worksheet.SaveToFile("BasicSalesReport.csv", ",", Encoding.get_UTF8())
workbook.Dispose()

Core Workflow

Initialize Core object: Workbook() creates a new Excel workbook, Worksheets[0] accesses the target sheet.
Fill data into cells: Use .Text (for strings) and .NumberValue (for numbers) to ensure correct data types.
Export & cleanup: SaveToFile() exports the worksheet to CSV , and Dispose() prevents memory leaks.

Output:

The resulting BasicSalesReport.csv will look like this:

Create a CSV file from scratch using Python

Dynamic Data: Generate CSV from a List of Dictionaries in Python

In real-world scenarios, data is often stored in dictionaries (e.g., from APIs/databases). The code below converts a list of dictionaries to a CSV:

from spire.xls import *
from spire.xls.common import *

# Sample data (e.g., from a database/API)
customer_data = [
    {"CustomerID": 1, "Name": "John Doe", "Email": "john@example.com", "Country": "USA"},
    {"CustomerID": 2, "Name": "Maria Garcia", "Email": "maria@example.es", "Country": "Spain"},
    {"CustomerID": 3, "Name": "Li Wei", "Email": "wei@example.cn", "Country": "China"}
]

# 1. Create workbook and worksheet
workbook = Workbook()
worksheet = workbook.Worksheets[0]

# 2. Write headers (extract keys from the first dictionary)
headers = list(customer_data[0].keys())
for col_idx, header in enumerate(headers, start=1):
    worksheet.Range[1, col_idx].Text = header  # Row 1 = headers

# 3. Write data rows
for row_idx, customer in enumerate(customer_data, start=2):  # Start at row 2
    for col_idx, key in enumerate(headers, start=1):
        # Handle different data types (text/numbers)
        value = customer[key]
        if isinstance(value, (int, float)):
            worksheet.Range[row_idx, col_idx].NumberValue = value
        else:
            worksheet.Range[row_idx, col_idx].Text = value

# 4. Save as CSV
worksheet.SaveToFile("CustomerData.csv", ",", Encoding.get_UTF8())
workbook.Dispose()

This example is ideal for JSON to CSV conversion, database dumps, and REST API data exports. Key advantages include:

Dynamic Headers: Automatically extracts headers from the keys of the first dictionary in the dataset.
Scalable: Seamlessly adapts to any volume of dictionaries or key-value pairs (perfect for dynamic data).
Clean Output: Preserves the original order of dictionary keys for consistent CSV structure.

The generated CSV file:

Convert a list of dictionaries to CSV file using Python

Excel-to-CSV: Generate CSV From an Excel File in Python

Spire.XLS excels at converting Excel (XLS/XLSX) to CSV in Python. This is useful if you have Excel reports and need to export them to CSV for data pipelines or third-party tools.

from spire.xls import *

# 1. Initialize a workbook instance
workbook = Workbook()

# 2. Load a xlsx file
workbook.LoadFromFile("Expenses.xlsx")

# 3. Save Excel as a CSV file
workbook.SaveToFile("XLSXToCSV.csv", FileFormat.CSV)
workbook.Dispose()

Conversion result:

Convert Excel to CSV using Python

Note: By default, SaveToFile() converts only the first worksheet. For converting multiple sheets to separate CSV files, refer to the comprehensive guide: Convert Excel (XLSX/XLS) to CSV in Python – Batch & Multi-Sheet

Best Practices for CSV Creation

Follow these guidelines to ensure robust and professional CSV output:

Validate Data First: Clean empty rows/columns before exporting to CSV.
Use UTF-8 Encoding: Always specify UTF-8 encoding (Encoding.get_UTF8()) to support international characters seamlessly.
Batch Process Smartly: For 100k+ rows, process data in chunks (avoid loading all data into memory at once).
Choose the Correct Delimiter: Be mindful of regional settings. For European users, use a semicolon (;) as the delimiter to avoid locale issues.
Dispose Objects: Release workbook/worksheet resources with Dispose() to prevent memory leaks.

Conclusion

Spire.XLS simplifies the process of leveraging Python to generate CSV files. Whether you're creating reports from scratch, converting Excel workbooks, or handling dynamic data from APIs and databases, this library delivers a robust and flexible solution.

By following this guide, you can easily customize delimiters, specify encodings such as UTF-8, and manage data types—ensuring your CSV files are accurate, compatible, and ready for any application. For more advanced features, you can explore the Spire.XLS for Python tutorials.

FAQ: Create CSV in Python

Q1: Why choose Spire.XLS over Python’s built-in csv module?

A: While Python's csv module is excellent for basic read/write operations, Spire.XLS offers significant advantages:

Better data type handling: Automatic distinction between text and numeric data.
Excel Compatibility: Seamlessly converts between Excel (XLSX/XLS) and CSV—critical for teams using Excel as a data source.
Advanced Customization: Supports customizing the delimiter and encoding of the generated CSV file.
Batch processing: Efficient handling of large datasets and multiple files.
Cross-Platform Support: Works on Windows, macOS, and Linux (no Excel installation required).

Q2: Can I use Spire.XLS for Python to read CSV files?

A: Yes. Spire.XLS supports parsing CSV files and extracting their data. Details refer to: How to Read CSV Files in Python: A Comprehensive Guide

Q3: Can Spire.XLS convert CSV files back to Excel format?

A: Yes! Spire.XLS supports bidirectional conversion. A quick example:

from spire.xls import *

# Create a workbook
workbook = Workbook()

# Load a CSV file
workbook.LoadFromFile("sample.csv", ",", 1, 1)

# Save CSV as Excel
workbook.SaveToFile("CSVToExcel.xlsx", ExcelVersion.Version2016)

Q4: How do I change the CSV delimiter?

A: The SaveToFile() method’s second parameter controls the delimiter:

# Semicolon (for European locales): 
worksheet.SaveToFile("EU.csv", ";", Encoding.get_UTF8())
# Tab (for tab-separated values/TSV)
worksheet.SaveToFile("TSV_File.csv", "\t", Encoding.get_UTF8())

Published in Document Operation

Tagged under

xls Python Document Operation

News Category

Python (365)

Children categories

1. How Is Word Converted into JSON?

2. Install the Required Library

3. Method 1 – Convert Word Text to JSON

3.1 Read Paragraphs from a Word Document

3.2 Serialize the Extracted Text to JSON

Output Example

Conversion Result

3.3 Explanation

4. Method 2 – Convert Word Tables to JSON

Why Tables Need Special Handling

Extracting Tables from a Word Document

Output Example

Conversion Result

Explanation

5. Method 3 – Preserve Document Structure in JSON

How to Preserve Headings, Paragraphs, and Tables in a Hierarchical JSON Structure

Output Example

Conversion Result

Explanation

6. When to Use Word to JSON Conversion

7. Limitations and Best Practices

Limitations

Best Practices

8. FAQ

Can I convert DOCX to JSON in Python?

What is the best Word to JSON converter for developers?

Can I convert Word tables to JSON?

Does Word have a native JSON export option?

Can I preserve headings and structure when converting Word to JSON?

Can I convert Word to JSON online?

9. Conclusion

1. Understanding JSON-to-Word Conversion

2. Install Spire.Doc for Python

Install via pip (Recommended)

3. Method 1: Convert JSON to Word as Formatted Text

Sample JSON

Python Code

Output

When to Use This Approach

4. Method 2: Convert JSON Arrays to Word Tables

Sample JSON

Python Code

Output

Why Use Tables for JSON Arrays

Enhancing JSON Tables with Formatting

5. Method 3: Generate Structured Word Reports from JSON

Sample JSON

Python Code

Output

Key Techniques

Why Structured Reports Matter

6. Handle Nested JSON Objects

Example JSON

Python Code

Output

How It Works

7. Handle Missing or Optional JSON Fields

Example JSON with Missing Fields

Python Code

Output

Key Techniques

8. Convert JSON Files to Word Documents

Python Code

Key Points

9. Why Use Spire.Doc for JSON-to-Word Conversion

Challenges of JSON-to-Word Conversion

Benefits of Spire.Doc for Python

10. FAQ

How do I convert JSON to Word in Python?

Can JSON arrays be converted into Word tables?

How do I create a DOCX report from API JSON responses?

Can nested JSON objects be exported to Word?

How do I convert a JSON file to a Word document?

What is the best way to generate Word documents from JSON data?

11. Conclusion

In This Article

Why Convert Excel to Markdown?