Python (355)
Convert HTML to Text in Python | Simple Plain Text Output
2025-09-03 01:16:17 Written by Administrator
HTML (HyperText Markup Language) is a markup language used to create web pages, allowing developers to build rich and visually appealing layouts. However, HTML files often contain a large number of tags, which makes them difficult to read if you only need the main content. By using Python to convert HTML to text, this problem can be easily solved. Unlike raw HTML, the converted text file strips away all unnecessary markup, leaving only clean and readable content that is easier to store, analyze, or process further.
- Install HTML to Text Converter in Python
- Python Convert HTML File to Text
- Python Convert HTML String to Text
- The Conclusion
- FAQs
Install HTML to Text Converter in Python
To simplify the task, we recommend using Spire.Doc for Python. This Python Word library allows you to quickly remove HTML markup and extract clean plain text with ease. It not only works as an HTML-to-text converter, but also offers a wide range of features—covering almost everything you can do in Microsoft Word.
To install it, you can run the following pip command:
pip install spire.doc
Alternatively, you can download the Spire.Doc package and install it manually.
Python Convert HTML Files to Text in 3 Steps
After preparing the necessary tools, let's dive into today's main topic: how to convert HTML to plain text using Python. With the help of Spire.Doc, this task can be accomplished in just three simple steps: create a new document object, load the HTML file, and save it as a text file. It’s straightforward and efficient, even for beginners. Let’s take a closer look at how this process can be implemented in code!
Code Example – Converting an HTML File to a Text File:
from spire.doc import *
from spire.doc.common import *
# Open an html file
document = Document()
document.LoadFromFile("/input/htmlsample.html", FileFormat.Html, XHTMLValidationType.none)
# Save it as a Text document.
document.SaveToFile("/output/HtmlFileTotext.txt", FileFormat.Txt)
document.Close()
The following is a preview comparison between the source document (.html) and the output document (.txt):

Note that if the HTML file contains tables, the output text file will only retain the values within the tables and cannot preserve the original table formatting. If you want to keep certain styles while removing markup, it is recommended to convert HTML to a Word document . This way, you can retain headings, tables, and other formatting, making the content easier to edit and use.
How to Convert an HTML String to Text in Python
Sometimes, we don’t need the entire content of a web page and only want to extract specific parts. In such cases, you can convert an HTML string directly to text. This approach allows you to precisely control the information you need without further editing. Using Python to convert an HTML string to a text file is also straightforward. Here’s a detailed step-by-step guide:
Steps to convert an HTML string to a text document using Spire.Doc:
- Input the HTML string directly or read it from a local file.
- Create a Document object and add sections and paragraphs.
- Use Paragraph.AppendHTML() method to insert the HTML string into a paragraph.
- Save the document as a .txt file using Document.SaveToFile() method.
The following code demonstrates how to convert an HTML string to a text file using Python:
from spire.doc import *
from spire.doc.common import *
#Get html string.
#with open(inputFile) as fp:
#HTML = fp.read()
# Load HTML from string
html = """<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>HTML to Text Example</title>
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
header { background: #f4f4f4; padding: 10px; }
nav a { margin: 0 10px; text-decoration: none; color: #333; }
main { margin-top: 20px; }
</style>
</head>
<body>
<header>
<h1>My Demo Page</h1>
<nav>
<a href="#">Home</a>
<a href="#">About</a>
<a href="#">Contact</a>
</nav>
</header>
<main>
<h2>Convert HTML to Text</h2>
<p>This is a simple demo showing how HTML content can be displayed before converting it to plain text.</p>
</main>
</body>
</html>
"""
# Create a new document
document = Document()
section = document.AddSection()
section.AddParagraph().AppendHTML(html)
# Save directly as TXT
document.SaveToFile("/output/HtmlStringTotext.txt", FileFormat.Txt)
document.Close()
Here's the preview of the converted .txt file: 
The Conclusion
In today’s tutorial, we focused on how to use Python to convert HTML to a text file. With the help of Spire.Doc, you can handle both HTML files and HTML strings in just a few lines of code, easily generating clean plain text files. If you’re interested in the other powerful features of the Python Word library, you can request a 30-day free trial license and explore its full capabilities for yourself.
FAQs about Converting HTML to Text in Python
Q1: How can I convert HTML to plain text using Python?
A: Use Spire.Doc to load an HTML file or string, insert it into a Document object with AppendHTML(), and save it as a .txt file.
Q2: Can I keep some formatting when converting HTML to text?
A: To retain styles like headings or tables, convert HTML to a Word document first, then export to text if needed.
Q3: Is it possible to convert only part of an HTML page to text?
A: Yes, extract the specific HTML segment as a string and convert it to text using Python for precise control.

In development, reading CSV files in Python is a common task in data processing, analytics, and backend integration. While Python offers built-in modules like csv and pandas for handling CSV files, Spire.XLS for Python provides a powerful, feature-rich alternative for working with CSV and Excel files programmatically.
In this article, you’ll learn how to use Python to read CSV files, from basic CSV parsing to advanced techniques.
- Getting Started with Spire.XLS for Python
- Basic Example: Read a CSV in Python
- Advanced CSV Reading Techniques
- Conclusion
Getting Started with Spire.XLS for Python
Spire.XLS for Python is a feature-rich library for processing Excel and CSV files. Unlike basic CSV parsers in Python, it offers advanced capabilities such as:
- Simple API to load, read, and manipulate CSV data.
- Reading/writing CSV files with support for custom delimiters.
- Converting CSV files to Excel formats (XLSX, XLS) and vice versa.
These features make Spire.XLS ideal for data analysts, developers, and anyone working with structured data in CSV format.
Install via pip
Before getting started, install the library via pip. It works with Python 3.6+ on Windows, macOS, and Linux:
pip install Spire.XLS
Basic Example: Read a CSV in Python
Let’s start with a simple example: parsing a CSV file and extracting its data. Suppose we have a CSV file named “input.csv” with the following content:
Name,Age,City,Salary
Alice,30,New York,75000
Bob,28,Los Angeles,68000
Charlie,35,San Francisco,90000
Python Code to Read the CSV File
Here’s how to load and get data from the CSV file with Python:
from spire.xls import *
from spire.xls.common import *
# Create a Workbook object
workbook = Workbook()
# Load a CSV file
workbook.LoadFromFile("input.csv", ",", 1, 1)
# Get the first worksheet (CSV files are loaded as a single sheet)
worksheet = workbook.Worksheets[0]
# Get the number of rows and columns with data
row_count = worksheet.LastRow
col_count = worksheet.LastColumn
# Iterate through rows and columns to print data
print("CSV Data:")
for row in range(row_count):
for col in range(col_count):
# Get cell value
cell_value = worksheet.Range[row+1, col+1].Value
print(cell_value, end="\t")
print() # New line after each row
# Close the workbook
workbook.Dispose()
Explanation:
-
Workbook Initialization: The Workbook class is the core object for handling Excel files.
-
Load CSV File: LoadFromFile() imports the CSV data. Its parameters are:
- fileName: The CSV file to read.
- separator: Specified delimiter (e.g., “,”).
- row/column: The starting row/column index.
-
Access Worksheet: The CSV data is loaded into the first worksheet.
-
Read Data: Iterate through rows and columns to extract cell values via worksheet.Range[].Value.
Output: Get data from a CSV file and print in a tabular format.

Advanced CSV Reading Techniques
1. Read CSV with Custom Delimiters
Not all CSVs use commas. If your CSV file uses a different delimiter (e.g., ;), specify it during loading:
# Load a CSV file
workbook.LoadFromFile("input.csv", ";", 1, 1)
2. Skip Header Rows
If your CSV has headers, skip them by adjusting the row iteration to start from the second row instead of the first.
for row in range(1, row_count):
for col in range(col_count):
# Get cell value (row+1 because Spire.XLS uses 1-based indexing)
cell_value = worksheet.Range[row+1, col+1].Value
print(cell_value, end="\t")
3. Convert CSV to Excel in Python
One of the most powerful features of Spire.XLS is the ability to convert a CSV file into a native Excel format effortlessly. For example, you can read a CSV and then:
- Apply Excel formatting (e.g., set cell colors, borders).
- Create charts (e.g., a bar chart for sales by region).
- Save the data as an Excel file (.xlsx) for sharing.
Code Example: Convert CSV to Excel (XLSX) in Python – Single & Batch
Conclusion
Reading CSV files in Python with Spire.XLS simplifies both basic and advanced data processing tasks. Whether you need to extract CSV data, convert it to Excel, or handle advanced scenarios like custom delimiters, the examples outlined in this guide enables you to implement robust CSV reading capabilities in your projects with minimal effort.
Try the examples above, and explore the online documentation for more advanced features!
Reading PowerPoint Files in Python: Extract Text, Images & More
2025-07-24 03:50:38 Written by zaki zou
PowerPoint (PPT & PPTX) files are rich with diverse content, including text, images, tables, charts, shapes, and metadata. Extracting these elements programmatically can unlock a wide range of use cases, from automating repetitive tasks to performing in-depth data analysis or migrating content across platforms.
In this tutorial, we'll explore how to read PowerPoint documents in Python using Spire.Presentation for Python, a powerful library for processing PowerPoint files.
Table of Contents:
- Python Library to Read PowerPoint Files
- Extracting Text from Slides
- Saving Images from Slides
- Accessing Metadata (Document Properties)
- Conclusion
- FAQs
1. Python Library to Read PowerPoint Files
To work with PowerPoint files in Python, we'll use Spire.Presentation for Python. This feature-rich library enables developers to create, edit, and read content from PowerPoint presentations efficiently. It allows for the extraction of text, images, tables, SmartArt, and metadata with minimal coding effort.
Before we begin, install the library using pip:
pip install spire.presentation
Now, let's dive into different ways to extract content from PowerPoint files.
2. Extracting Text from Slides in Python
PowerPoint slides contain text in various forms—shapes, tables, SmartArt, and more. We'll explore how to extract text from each of these elements.
2.1 Extract Text from Shapes
Most text in PowerPoint slides resides within shapes (text boxes, labels, etc.). Here’s how to extract text from shapes:
Steps-by-Step Guide
- Initialize the Presentation object and load your PowerPoint file.
- Iterate through each slide and its shapes.
- Check if a shape is an IAutoShape (a standard text container).
- Extract text from each paragraph in the shape.
Code Example
from spire.presentation import *
from spire.presentation.common import *
# Create an object of Presentation class
presentation = Presentation()
# Load a PowerPoint presentation
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")
# Create a list
text = []
# Loop through the slides in the document
for slide_index, slide in enumerate(presentation.Slides):
# Add slide marker
text.append(f"====slide {slide_index + 1}====")
# Loop through the shapes in the slide
for shape in slide.Shapes:
# Check if the shape is an IAutoShape object
if isinstance(shape, IAutoShape):
# Loop through the paragraphs in the shape
for paragraph in shape.TextFrame.Paragraphs:
# Get the paragraph text and append it to the list
text.append(paragraph.Text)
# Write the text to a txt file
with open("output/ExtractAllText.txt", "w", encoding='utf-8') as f:
for s in text:
f.write(s + "\n")
# Dispose resources
presentation.Dispose()
Output:

2.2 Extract Text from Tables
Tables in PowerPoint store structured data. Extracting this data requires iterating through each cell to maintain the table’s structure.
Step-by-Step Guide
- Initialize the Presentation object and load your PowerPoint file.
- Iterate through each slide to access its shapes.
- Identify table shapes (ITable objects).
- Loop through rows and cells to extract text.
Code Example
from spire.presentation import *
from spire.presentation.common import *
# Create a Presentation object
presentation = Presentation()
# Load a PowerPoint file
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")
# Create a list for tables
tables = []
# Loop through the slides
for slide in presentation.Slides:
# Loop through the shapes in the slide
for shape in slide.Shapes:
# Check whether the shape is a table
if isinstance(shape, ITable):
tableData = "
# Loop through the rows in the table
for row in shape.TableRows:
rowData = "
# Loop through the cells in the row
for i in range(row.Count):
# Get the cell value
cellValue = row[i].TextFrame.Text
# Add cell value with spaces for better readability
rowData += (cellValue + " | " if i < row.Count - 1 else cellValue)
tableData += (rowData + "\n")
tables.append(tableData)
# Write the tables to text files
for idx, table in enumerate(tables, start=1):
fileName = f"output/Table-{idx}.txt"
with open(fileName, "w", encoding='utf-8') as f:
f.write(table)
# Dispose resources
presentation.Dispose()
Output:

2.3 Extract Text from SmartArt
SmartArt is a unique feature in PowerPoint used for creating diagrams. Extracting text from SmartArt involves accessing its nodes and retrieving the text from each node.
Step-by-Step Guide
- Load the PowerPoint file into a Presentation object.
- Iterate through each slide and its shapes.
- Identify ISmartArt shapes in slides.
- Loop through each node in the SmartArt.
- Extract and save the text from each node.
Code Example
from spire.presentation.common import *
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load a PowerPoint file
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")
# Iterate through each slide in the presentation
for slide_index, slide in enumerate(presentation.Slides):
# Create a list to store the extracted text for the current slide
extracted_text = []
# Loop through the shapes on the slide and find the SmartArt shapes
for shape in slide.Shapes:
if isinstance(shape, ISmartArt):
smartArt = shape
# Extract text from the SmartArt nodes and append to the list
for node in smartArt.Nodes:
extracted_text.append(node.TextFrame.Text)
# Write the extracted text to a separate text file for each slide
if extracted_text: # Only create a file if there's text extracted
file_name = f"output/SmartArt-from-slide-{slide_index + 1}.txt"
with open(file_name, "w", encoding="utf-8") as text_file:
for text in extracted_text:
text_file.write(text + "\n")
# Dispose resources
presentation.Dispose()
Output:

You might also be interested in: Read Speaker Notes in PowerPoint in Python
3. Saving Images from Slides in Python
In addition to text, slides often contain images that may be important for your analysis. This section will show you how to save images from the slides.
Step-by-Step Guide
- Initialize the Presentation object and load your PowerPoint file.
- Access the Images collection in the presentation.
- Iterate through each image and save it in a desired format (e.g., PNG).
Code Example
from spire.presentation.common import *
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")
# Get the images in the document
images = presentation.Images
# Iterate through the images in the document
for i, image in enumerate(images):
# Save a certain image in the specified path
ImageName = "Output/Images_"+str(i)+".png"
image_data = (IImageData)(image)
image_data.Image.Save(ImageName)
# Dispose resources
presentation.Dispose()
Output:

4. Accessing Metadata (Document Properties) in Python
Extracting metadata provides insights into the presentation, such as its title, author, and keywords. This section will guide you on how to access and save this metadata.
Step-by-Step Guide
- Create and load your PowerPoint file into a Presentation object.
- Access the DocumentProperty object.
- Extract properties like Title , Author , and Keywords .
Code Example
from spire.presentation.common import *
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")
# Get the DocumentProperty object
documentProperty = presentation.DocumentProperty
# Prepare the content for the text file
properties = [
f"Title: {documentProperty.Title}",
f"Subject: {documentProperty.Subject}",
f"Author: {documentProperty.Author}",
f"Manager: {documentProperty.Manager}",
f"Company: {documentProperty.Company}",
f"Category: {documentProperty.Category}",
f"Keywords: {documentProperty.Keywords}",
f"Comments: {documentProperty.Comments}",
]
# Write the properties to a text file
with open("output/DocumentProperties.txt", "w", encoding="utf-8") as text_file:
for line in properties:
text_file.write(line + "\n")
# Dispose resources
presentation.Dispose()
Output:

You might also be interested in: Add Document Properties to a PowerPoint File in Python
5. Conclusion
With Spire.Presentation for Python, you can effortlessly read and extract various elements from PowerPoint files—such as text, images, tables, and metadata. This powerful library streamlines automation tasks, content analysis, and data migration, allowing for efficient management of PowerPoint files. Whether you're developing an analytics tool, automating document processing, or managing presentation content, Spire.Presentation offers a robust and seamless solution for programmatically handling PowerPoint files.
6. FAQs
Q1. Can Spire.Presentation handle password-protected PowerPoint files?
Yes, Spire.Presentation can open and process password-protected PowerPoint files. To access an encrypted file, use the LoadFromFile() method with the password parameter:
presentation.LoadFromFile("encrypted.pptx", "yourpassword")
Q2. How can I read comments from PowerPoint slides?
You can read comments from PowerPoint slides using the Spire.Presentation library. Here’s how:
from spire.presentation import *
presentation = Presentation()
presentation.LoadFromFile("Input.pptx")
with open("PowerPoint_Comments.txt", "w", encoding="utf-8") as file:
for slide_idx, slide in enumerate(presentation.Slides):
slide = (ISlide)(slide)
if len(slide.Comments) > 0:
for comment_idx, comment in enumerate(slide.Comments):
file.write(f"Comment {comment_idx + 1} from Slide {slide_idx + 1}: {comment.Text}\n")
Q3. Does Spire.Presentation preserve formatting when extracting text?
Basic text extraction retrieves raw text content. For formatted text (fonts, colors), you would need to access additional properties like TextRange.LatinFont and TextRange.Fill .
Q4. Are there any limitations on file size when reading PowerPoint files in Python?
While Spire.Presentation can handle most standard presentations, extremely large files (hundreds of MB) may require optimization for better performance.
Q5. Can I create or modify PowerPoint documents using Spire.Presentation?
Yes, you can create PowerPoint documents and modify existing ones using Spire.Presentation. The library provides a range of features that allow you to add new slides, insert text, images, tables, and shapes, as well as edit existing content.
Get a Free License
To fully experience the capabilities of Spire.Presentation for Python without any evaluation limitations, you can request a free 30-day trial license.

Creating PowerPoint presentations programmatically can save time and enhance efficiency in generating reports, slideshows, and other visual presentations. By automating the process, you can focus on content and design rather than manual formatting.
In this tutorial, we will explore how to create PowerPoint documents in Python using Spire.Presentation for Python. This powerful tool allows developers to manipulate and generate PPT and PPTX files seamlessly.
Table of Contents:
- Python Library to Work with PowerPoint Files
- Installing Spire.Presentation for Python
- Creating a PowerPoint Document from Scratch
- Creating PowerPoint Documents Based on a Template
- Best Practices for Python PowerPoint Generation
- Wrap Up
- FAQs
1. Python Library to Work with PowerPoint Files
Spire.Presentation is a robust library for creating, reading, and modifying PowerPoint files in Python , without requiring Microsoft Office. This library supports a wide range of features, including:
- Create PowerPoint documents from scratch or templates.
- Add text, images, lists, tables, charts, and shapes.
- Customize fonts, colors, backgrounds, and layouts.
- Save as or export to PPT, PPTX, PDF, or images.
In the following sections, we will walk through the steps to install the library, create presentations, and add various elements to your slides.
2. Installing Spire.Presentation for Python
To get started, you need to install the Spire.Presentation library. You can install it using pip:
pip install spire.presentation
Once installed, you can begin utilizing its features in your Python scripts to create PowerPoint documents.
3. Creating a PowerPoint Document from Scratch
3.1 Generate and Save a Blank Presentation
Let's start by creating a basic PowerPoint presentation from scratch. The following code snippet demonstrates how to generate and save a blank presentation:
from spire.presentation.common import *
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Set the slide size type
presentation.SlideSize.Type = SlideSizeType.Screen16x9
# Add a slide (there is one slide in the document by default)
presentation.Slides.Append()
# Save the document as a PPT or PPTX file
presentation.SaveToFile("BlankPowerPoint.pptx", FileFormat.Pptx2019)
presentation.Dispose()
In this code:
- Presentation : Root class representing the PowerPoint file.
- SlideSize.Type : Sets the slide dimensions (e.g., SlideSizeType.Screen16x9 for widescreen).
- Slides.Append() : Adds a new slide to the presentation. By default, a presentation starts with one slide.
- SaveToFile() : Saves the presentation to the specified file format (PPTX in this case).
3.2 Add Basic Elements to Your Slides
Now that we have a blank presentation, let's add some basic elements like text, images, lists, and tables.
Add Formatted Text
To add formatted text, we can use the following code:
# Get the first slide
first_slide = presentation.Slides[0]
# Add a shape to the slide
rect = RectangleF.FromLTRB (30, 60, 900, 150)
shape = first_slide.Shapes.AppendShape(ShapeType.Rectangle, rect)
shape.ShapeStyle.LineColor.Color = Color.get_Transparent()
shape.Fill.FillType = FillFormatType.none
# Add text to the shape
shape.AppendTextFrame("This guide demonstrates how to create a PowerPoint document using Python.")
# Get text of the shape as a text range
textRange = shape.TextFrame.TextRange
# Set font name, style (bold & italic), size and color
textRange.LatinFont = TextFont("Times New Roman")
textRange.IsBold = TriState.TTrue
textRange.FontHeight = 32
textRange.Fill.FillType = FillFormatType.Solid
textRange.Fill.SolidColor.Color = Color.get_Black()
# Set alignment
textRange.Paragraph.Alignment = TextAlignmentType.Left
In this code:
- AppendShape() : Adds a shape to the slide. We create a rectangle shape that will house our text.
- AppendTextFrame() : Adds a text frame to the shape, allowing us to insert text into it.
- TextFrame.TextRange : Accesses the text range of the shape, enabling further customization such as font style, size, and color.
- Paragraph.Alignment : Sets the alignment of the text within the shape.
Add an Image
To include an image in your presentation, use the following code snippet:
# Get the first slide
first_slide = presentation.Slides[0]
# Load an image file
imageFile = "C:\\Users\\Administrator\\Desktop\\logo.png"
stream = Stream(imageFile)
imageData = presentation.Images.AppendStream(stream)
# Reset size
width = imageData.Width * 0.6
height = imageData.Height * 0.6
# Append it to the slide
rect = RectangleF.FromLTRB (750, 50, 750 + width, 50 + height)
image = first_slide.Shapes.AppendEmbedImageByImageData(ShapeType.Rectangle, imageData, rect)
image.Line.FillType = FillFormatType.none
In this code:
- Stream() : Creates a stream from the specified image file path.
- AppendStream() : Appends the image data to the presentation's image collection.
- AppendEmbedImageByImageData() : Adds the image to the slide at the specified rectangle coordinates.
You may also like: Insert Shapes in PowerPoint in Python
Add a List
To add a bulleted list to your slide, we can use:
# Get the first slide
first_slide = presentation.Slides[0]
# Specify list bounds and content
listBounds = RectangleF.FromLTRB(30, 150, 500, 350)
listContent = [
" Step 1. Install Spire.Presentation for Python.",
" Step 2. Create a Presentation object.",
" Step 3. Add text, images, etc. to slides.",
" Step 5. Set a background image or color.",
" Step 6. Save the presentation to a PPT(X) file."
]
# Add a shape
autoShape = first_slide.Shapes.AppendShape(ShapeType.Rectangle, listBounds)
autoShape.TextFrame.Paragraphs.Clear()
autoShape.Fill.FillType = FillFormatType.none
autoShape.Line.FillType = FillFormatType.none
for content in listContent:
# Create paragraphs based on the list content and add them to the shape
paragraph = TextParagraph()
autoShape.TextFrame.Paragraphs.Append(paragraph)
paragraph.Text = content
paragraph.TextRanges[0].Fill.FillType = FillFormatType.Solid
paragraph.TextRanges[0].Fill.SolidColor.Color = Color.get_Black()
paragraph.TextRanges[0].FontHeight = 20
paragraph.TextRanges[0].LatinFont = TextFont("Arial")
# Set the bullet type for these paragraphs
paragraph.BulletType = TextBulletType.Symbol
# Set line spacing
paragraph.LineSpacing = 150
In this code:
- AppendShape() : Creates a rectangle shape for the list.
- TextFrame.Paragraphs.Append() : Adds paragraphs for each list item.
- BulletType : Sets the bullet style for the list items.
Add a Table
To include a table, you can use the following:
# Get the first slide
first_slide = presentation.Slides[0]
# Define table dimensions and data
widths = [200, 200, 200]
heights = [18, 18, 18, 18]
dataStr = [
["Slide Number", "Title", "Content Type"],
["1", "Introduction", "Text/Image"],
["2", "Project Overview", "Chart/Graph"],
["3", "Key Findings", "Text/List"]
]
# Add table to the slide
table = first_slide.Shapes.AppendTable(30, 360, widths, heights)
# Fill table with data and apply formatting
for rowNum, rowData in enumerate(dataStr):
for colNum, cellData in enumerate(rowData):
cell = table[colNum, rowNum]
cell.TextFrame.Text = cellData
textRange = cell.TextFrame.Paragraphs[0].TextRanges[0]
textRange.LatinFont = TextFont("Times New Roman")
textRange.FontHeight = 20
cell.TextFrame.Paragraphs[0].Alignment = TextAlignmentType.Center
# Apply a built-in table style
table.StylePreset = TableStylePreset.MediumStyle2Accent1
In this code:
- AppendTable() : Adds a table to the slide at specified coordinates with defined widths and heights for columns and rows.
- Cell.TextFrame.Text : Sets the text for each cell in the table.
- StylePreset : Applies a predefined style to the table for enhanced aesthetics.
3.3 Apply a Background Image or Color
To set a custom background for your slide, use the following code:
# Get the first slide
first_slide = presentation.Slides[0]
# Get the background of the first slide
background = first_slide.SlideBackground
# Create a stream from the specified image file
stream = Stream("C:\\Users\\Administrator\\Desktop\\background.jpg")
imageData = presentation.Images.AppendStream(stream)
# Set the image as the background
background.Type = BackgroundType.Custom
background.Fill.FillType = FillFormatType.Picture
background.Fill.PictureFill.FillType = PictureFillType.Stretch
background.Fill.PictureFill.Picture.EmbedImage = imageData
In this code:
- SlideBackground : Accesses the background properties of the slide.
- Fill.FillType : Specifies the type of fill (in this case, an image).
- PictureFill.FillType : Sets how the background image is displayed (stretched, in this case).
- Picture.EmbedImage : Sets image data for the background.
For additional background options, refer to this tutorial: Set Background Color or Picture for PowerPoint Slides in Python
Output:
Below is a screenshot of the PowerPoint document generated by the code snippets provided above.

4. Creating PowerPoint Documents Based on a Template
Using templates can simplify the process of creating presentations by allowing you to replace placeholders with actual data. Below is an example of how to create a PowerPoint document based on a template:
from spire.presentation.common import *
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load a PowerPoint document from a specified file path
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\template.pptx")
# Get a specific slide from the presentation
slide = presentation.Slides[0]
# Define a list of replacements where each tuple consists of the placeholder and its corresponding replacement text
replacements = [
("{project_name}", "GreenCity Solar Initiative"),
("{budget}", "$1,250,000"),
("{status}", "In Progress (65% Completion)"),
("{start_date}", "March 15, 2023"),
("{end_date}", "November 30, 2024"),
("{manager}", "Emily Carter"),
("{client}", "GreenCity Municipal Government")
]
# Iterate through each replacement pair
for old_string, new_string in replacements:
# Replace the first occurrence of the old string in the slide with the new string
slide.ReplaceFirstText(old_string, new_string, False)
# Save the modified presentation to a new file
presentation.SaveToFile("Template-Based.pptx", FileFormat.Pptx2019)
presentation.Dispose()
In this code:
- LoadFromFile() : Loads an existing PowerPoint file that serves as the template.
- ReplaceFirstText() : Replaces placeholders within the slide with actual values. This is useful for dynamic content generation.
- SaveToFile() : Saves the modified presentation as a new file.
Output:

5. Best Practices for Python PowerPoint Generation
When creating PowerPoint presentations using Python, consider the following best practices:
- Maintain Consistency : Ensure that the formatting (fonts, colors, styles) is consistent across slides for a professional appearance.
- Modular Code: Break document generation into functions (e.g., add_list(), insert_image()) for reusability.
- Optimize Images : Resize and compress images before adding them to presentations to reduce file size and improve loading times.
- Use Templates : Whenever possible, use templates to save time and maintain a cohesive design.
- Test Your Code : Always test your presentation generation code to ensure that all elements are added correctly and appear as expected.
6. Wrap Up
In this tutorial, we explored how to create PowerPoint documents in Python using the Spire.Presentation library. We covered the installation, creation of presentations from scratch, adding various elements, and using templates for dynamic content generation. With these skills, you can automate the creation of presentations, making your workflow more efficient and effective.
7. FAQs
Q1. What is Spire.Presentation?
Spire.Presentation is a powerful library used for creating, reading, and modifying PowerPoint files in various programming languages, including Python.
Q2. Does this library require Microsoft Office to be installed?
No, Spire.Presentation operates independently and does not require Microsoft Office.
Q3. Can I customize the layout of slides in my presentation?
Yes, you can customize the layout of each slide by adjusting properties such as size, background, and the placement of shapes, text, and images.
Q4. Does Spire.Presentation support both PPT and PPTX format?
Yes, Spire.Presentation supports both PPT and PPTX formats, allowing you to create and manipulate presentations in either format.
Q5. Can I add charts to my presentations?
Yes, Spire.Presentation supports the addition of charts to your slides, allowing you to visualize data effectively. For detailed instruction, refer to: How to Create Column Charts in PowerPoint Using Python
Get a Free License
To fully experience the capabilities of Spire.Presentation for Python without any evaluation limitations, you can request a free 30-day trial license.
In daily work, extracting text from PDF files is a common task. For standard digital documents—such as those exported from Word to PDF—this process is usually straightforward. However, things get tricky when dealing with scanned PDFs, which are essentially images of printed documents. In such cases, traditional text extraction methods fail, and OCR (Optical Character Recognition) becomes necessary to recognize and convert the text within images into editable content.
In this article, we’ll walk through how to perform PDF OCR using Python to automate this workflow and significantly reduce manual effort.
- Why OCR is Needed for PDF Text Extraction
- Best Python OCR Libraries for PDF Processing
- Convert PDF Pages to Images Using Python
- Scan and Extract Text from Images Using Spire.OCR
- Conclusion
Why OCR is Needed for PDF Text Extraction
When it comes to extracting text from PDF files, one important factor that determines your approach is the type of PDF. Generally, PDFs fall into two categories: scanned (image-based) PDFs and searchable PDFs. Each requires a different strategy for text extraction.
-
Scanned PDFs are typically created by digitizing physical documents such as books, invoices, contracts, or magazines. While the text appears readable to the human eye, it's actually embedded as an image—making it inaccessible to traditional text extraction tools. Older digital files or password-protected PDFs may also lack an actual text layer.
-
Searchable PDFs, on the other hand, contain a hidden text layer that allows computers to search, copy, or parse the content. These files are usually generated directly from applications like Microsoft Word or PDF editors and are much easier to process programmatically.
This distinction highlights the importance of OCR (Optical Character Recognition) when working with scanned PDFs. With tools like Python PDF OCR, we can convert these image-based PDFs into images, run OCR to recognize the text, and extract it for further use—all in an automated way.
Best Python OCR Libraries for PDF Processing
Before diving into the implementation, let’s take a quick look at the tools we’ll be using in this tutorial. To simplify the process, we’ll use Spire.PDF for Python and Spire.OCR for Python to perform PDF OCR in Python.
- Spire.PDF will handle the conversion from PDF to images.
- Spire.OCR, a powerful OCR tool for PDF files, will recognize the text in those images and extract it as editable content.
You can install Spire.PDF using the following pip command:
pip install spire.pdf
and install Spire.OCR with:
pip install spire.ocr
Alternatively, you can download and install them manually by visiting the official Spire.PDF and Spire.OCR pages.
Convert PDF Pages to Images Using Python
Before we dive into Python PDF OCR, it's crucial to understand a foundational step: OCR technology doesn't directly process PDF files. Especially with image-based PDFs (like those created from scanned documents), we first need to convert them into individual image files.
Converting PDFs to images using the Spire.PDF library is straightforward. You simply load your target PDF document and then iterate through each page. For every page, call the PdfDocument.SaveAsImage() method to save it as a separate image file. Once this step is complete, your images are ready for the subsequent OCR process.
Here's a code example showing how to convert PDF to PNG:
from spire.pdf import *
# Load the PDF file
pdf = PdfDocument()
pdf.LoadFromFile("/AI-Generated Art.pdf")
# Loop through pages and save as images
for i in range(pdf.Pages.Count):
# Convert each page to image
with pdf.SaveAsImage(i) as image:
# Save in different formats as needed
image.Save(f"/output/pdftoimage/ToImage_{i}.png")
# image.Save(f"Output/ToImage_{i}.jpg")
# image.Save(f"Output/ToImage_{i}.bmp")
# Close the PDF document
pdf.Close()
Conversion result preview: 
Scan and Extract Text from Images Using Spire.OCR
After converting the scanned PDF into images, we can now move on to OCR PDF with Python and to extract text from the PDF. With OcrScanner.Scan() from Spire.OCR, recognizing text in images becomes straightforward. It supports multiple languages such as English, Chinese, French, and German. Once the text is extracted, you can easily save it to a .txt file or generate a Word document.
The code example below shows how to OCR the first PDF page and export to text in Python:
from spire.ocr import *
# Create OCR scanner instance
scanner = OcrScanner()
# Configure OCR model path and language
configureOptions = ConfigureOptions()
configureOptions.ModelPath = r'E:/DownloadsNew/win-x64/'
configureOptions.Language = 'English'
scanner.ConfigureDependencies(configureOptions)
# Perform OCR on the image
scanner.Scan(r'/output/pdftoimage/ToImage_0.png')
# Save extracted text to file
text = scanner.Text.ToString()
with open('/output/scannedpdfoutput.txt', 'a', encoding='utf-8') as file:
file.write(text + '\n')
Result preview: 
The Conclusion
In this article, we covered how to perform PDF OCR with Python—from converting PDFs to images, to recognizing text with OCR, and finally saving the extracted content as a plain text file. With this streamlined approach, extracting text from scanned PDFs becomes effortless. If you're looking to automate your PDF processing workflows, feel free to reach out and request a 30-day free trial. It’s time to simplify your document management.
Convert PDF to Markdown in Python – Single & Batch Conversion
2025-07-17 02:36:46 Written by zaki zou
PDFs are ubiquitous in digital document management, but their rigid formatting often makes them less than ideal for content that needs to be easily edited, updated, or integrated into modern workflows. Markdown (.md), on the other hand, offers a lightweight, human-readable syntax perfect for web publishing, documentation, and version control. In this guide, we'll explore how to leverage the Spire.PDF for Python library to perform single or batch conversions from PDF to Markdown in Python efficiently.
- Why Convert PDFs to Markdown?
- Python PDF Converter Library – Installation
- Convert PDF to Markdown in Python
- Batch Convert Multiple PDFs to Markdown in Python
- Frequently Asked Questions
- Conclusion
Why Convert PDFs to Markdown?
Markdown offers several advantages over PDF for content creation and management:
- Version control friendly: Easily track changes in Git
- Lightweight and readable: Plain text format with simple syntax
- Editability: Simple to modify without specialized software
- Web integration: Natively supported by platforms like GitHub, GitLab, and static site generators (e.g., Jekyll, Hugo).
Spire.PDF for Python provides a robust solution for extracting text and structure from PDFs while preserving essential formatting elements like tables, lists, and basic styling.
Python PDF Converter Library - Installation
To use Spire.PDF for Python in your projects, you need to install the library via PyPI (Python Package Index) using pip. Open your terminal/command prompt and run:
pip install Spire.PDF
To upgrade an existing installation to the latest version:
pip install --upgrade spire.pdf
Convert PDF to Markdown in Python
Here’s a basic example demonstrates how to use Python to convert a PDF file to a Markdown (.md) file.
from spire.pdf.common import *
from spire.pdf import *
# Create an instance of PdfDocument class
pdf = PdfDocument()
# Load a PDF document
pdf.LoadFromFile("TestFile.pdf")
# Convert the PDF to a Markdown file
pdf.SaveToFile("PDFToMarkdown.md", FileFormat.Markdown)
pdf.Close()
This Python script loads a PDF file and then uses the SaveToFile() method to convert it to Markdown format. The FileFormat.Markdown parameter specifies the output format.
How Conversion Works
The library extracts text, images, tables, and basic formatting from the PDF and converts them into Markdown syntax.
- Text: Preserved with paragraphs/line breaks.
- Images: Images in the PDF are converted to base64-encoded PNG format and embedded directly in the Markdown.
- Tables: Tabular data is converted to Markdown table syntax (rows/columns with pipes |).
- Styling: Basic formatting (bold, italic) is retained using Markdown syntax.
Output: 
Batch Convert Multiple PDFs to Markdown in Python
This Python script uses a loop to convert all PDF files in a specified directory to Markdown format.
import os
from spire.pdf import *
# Configure paths
input_folder = "pdf_folder/"
output_folder = "markdown_output/"
# Create output directory
os.makedirs(output_folder, exist_ok=True)
# Process all PDFs in folder
for file_name in os.listdir(input_folder):
if file_name.endswith(".pdf"):
# Initialize document
pdf = PdfDocument()
pdf.LoadFromFile(os.path.join(input_folder, file_name))
# Generate output path
md_name = os.path.splitext(file_name)[0] + ".md"
output_path = os.path.join(output_folder, md_name)
# Convert to Markdown
pdf.SaveToFile(output_path, FileFormat.Markdown)
pdf.Close()
Key Characteristics
- Batch Processing: Automatically processes all PDFs in input folder, improving efficiency for bulk operations.
- 1:1 Conversion: Each PDF generates corresponding Markdown file.
- Sequential Execution: Files processed in alphabetical order.
- Resource Management: Each PDF is closed immediately after conversion.
Output:

Need to convert Markdown to PDF? Refer to: Convert Markdown to PDF in Python
Frequently Asked Questions (FAQs)
Q1: Is Spire.PDF for Python free?
A: Spire.PDF offers a free version with limitations (e.g., maximum 3 pages per conversion). For unlimited use, request a 30-day free trial for evaluation.
Q2: Can I convert password-protected PDFs to Markdown?
A: Yes. Use the LoadFromFile method with the password as a second parameter:
pdf.LoadFromFile("ProtectedFile.pdf", "your_password")
Q3: Can Spire.PDF convert scanned/image-based PDFs to Markdown?
A: No. The library extracts text-based content only. For scanned PDFs, use OCR tools (like Spire.OCR for Python) to create searchable PDFs first.
Conclusion
Spire.PDF for Python simplifies PDF to Markdown conversion for both single file and batch processing.
Its advantages include:
- Simple API with minimal code
- Preservation of document structure
- Batch processing capabilities
- Cross-platform compatibility
Whether you're migrating documentation, processing research papers, or building content pipelines, by following the examples in this guide, you can efficiently transform static PDF documents into flexible, editable Markdown content, streamlining workflows and improving collaboration.
Convert JSON to/from Excel in Python – Full Guide with Examples
2025-07-16 05:39:52 Written by zaki zou
In many Python projects, especially those that involve APIs, data analysis, or business reporting, developers often need to convert Excel to JSON or JSON to Excel using Python code. These formats serve different but complementary roles: JSON is ideal for structured data exchange and storage, while Excel is widely used for sharing, editing, and presenting data in business environments.
This tutorial provides a complete, developer-focused guide to converting between JSON and Excel in Python. You'll learn how to handle nested data, apply Excel formatting, and resolve common conversion or encoding issues. We’ll use Python’s built-in json module to handle JSON data, and Spire.XLS for Python to read and write Excel files in .xlsx, .xls, and .csv formats — all without requiring Microsoft Excel or other third-party software.
Topics covered include:
- Install Spire.XLS for Python
- Why Choose Spire.XLS over General-Purpose Libraries?
- Convert JSON to Excel in Python
- Convert Excel to JSON in Python
- Real-World Example: Handling Nested JSON and Complex Excel Formats
- Common Errors and Fixes
- FAQ
- Conclusion
Install Spire.XLS for Python
This library is used in this tutorial to generate and parse Excel files (.xlsx, .xls, .csv) as part of the JSON–Excel conversion workflow.
To get started, install the Spire.XLS for Python package from PyPI:
pip install spire.xls
You can also choose Free Spire.XLS for Python in smaller projects:
pip install spire.xls.free
Spire.XLS for Python runs on Windows, Linux, and macOS. It does not require Microsoft Excel or any COM components to be installed.
Why Choose Spire.XLS over Open-Source Libraries?
Many open-source Python libraries are great for general Excel tasks like simple data export or basic formatting. If your use case only needs straightforward table output, these tools often get the job done quickly.
However, when your project requires rich Excel formatting, multi-sheet reports, or an independent solution without Microsoft Office, Spire.XLS for Python stands out by offering a complete Excel feature set.
| Capability | Open-Source Libraries | Spire.XLS for Python |
|---|---|---|
| Advanced Excel formatting | Basic styling | Full styling API for reports |
| No Office/COM dependency | Fully standalone | Fully standalone |
| Supports .xls, .xlsx, .csv | .xlsx and .csv mostly; .xls may need extra packages | Full support for .xls, .xlsx, .csv |
| Charts, images, shapes | Limited or none | Built-in full support |
For developer teams that need polished Excel files — with complex layouts, visuals, or business-ready styling — Spire.XLS is an efficient, all-in-one alternative.
Convert JSON to Excel in Python
In this section, we’ll walk through how to convert structured JSON data into an Excel file using Python. This is especially useful when exporting API responses or internal data into .xlsx reports for business users or analysts.
Step 1: Prepare JSON Data
We’ll start with a JSON list of employee records:
[
{"employee_id": "E001", "name": "Jane Doe", "department": "HR"},
{"employee_id": "E002", "name": "Michael Smith", "department": "IT"},
{"employee_id": "E003", "name": "Sara Lin", "department": "Finance"}
]
This is a typical structure returned by APIs or stored in log files. For more complex nested structures, see the real-world example section.
Step 2: Convert JSON to Excel in Python with Spire.XLS
from spire.xls import Workbook, FileFormat
import json
# Load JSON data from file
with open("employees.json", "r", encoding="utf-8") as f:
data = json.load(f)
# Create a new Excel workbook and access the first worksheet
workbook = Workbook()
sheet = workbook.Worksheets[0]
# Write headers to the first row
headers = list(data[0].keys())
for col, header in enumerate(headers):
sheet.Range[1, col + 1].Text = header
# Write data rows starting from the second row
for row_idx, row in enumerate(data, start=2):
for col_idx, key in enumerate(headers):
sheet.Range[row_idx, col_idx + 1].Text = str(row.get(key, ""))
# Auto-fit the width of all used columns
for i in range(1, sheet.Range.ColumnCount + 1):
sheet.AutoFitColumn(i)
# Save the Excel file and release resources
workbook.SaveToFile("output/employees.xlsx", FileFormat.Version2016)
workbook.Dispose()
Code Explanation:
- Workbook() initializes the Excel file with three default worksheets.
- workbook.Worksheets[] accesses the specified worksheet.
- sheet.Range(row, col).Text writes string data to a specific cell (1-indexed).
- The first row contains column headers based on JSON keys, and each JSON object is written to a new row beneath it.
- workbook.SaveToFile() saves the Excel workbook to disk. You can specify the format using the FileFormat enum — for example, Version97to2003 saves as .xls, Version2007 and newer save as .xlsx, and CSV saves as .csv.
The generated Excel file (employees.xlsx) with columns employee_id, name, and department.

You can also convert the Excel worksheet to a CSV file using Spire.XLS for Python if you need a plain text output format.
Convert Excel to JSON in Python
This part explains how to convert Excel data back into structured JSON using Python. This is a common need when importing .xlsx files into web apps, APIs, or data pipelines that expect JSON input.
Step 1: Load the Excel File
First, we use Workbook.LoadFromFile() to load the Excel file, and then select the worksheet using workbook.Worksheets[0]. This gives us access to the data we want to convert into JSON format.
from spire.xls import Workbook
# Load the Excel file
workbook = Workbook()
workbook.LoadFromFile("products.xlsx")
sheet = workbook.Worksheets[0]
Step 2: Extract Excel Data and Write to JSON
import json
# Get the index of the last row and column
rows = sheet.LastRow
cols = sheet.LastColumn
# Extract headers from the first row
headers = [sheet.Range[1, i + 1].Text for i in range(cols)]
data = []
# Map each row to a dictionary using headers
for r in range(2, rows + 1):
row_data = {}
for c in range(cols):
value = sheet.Range[r, c + 1].Text
row_data[headers[c]] = value
data.append(row_data)
# Write JSON output
with open("products_out.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
Code Explanation:
- sheet.LastRow and sheet.LastColumn detect actual used cell range.
- The first row is used to extract field names (headers).
- Each row is mapped to a dictionary, forming a list of JSON objects.
- sheet.Range[row, col].Text returns the cell’s displayed text, including any formatting (like date formats or currency symbols). If you need the raw numeric value or a real date object, you can use .Value, .NumberValue, or .DateTimeValue instead.
The JSON file generated from the Excel data using Python:

If you’re not yet familiar with how to read Excel files in Python, see our full guide: How to Read Excel Files in Python Using Spire.XLS.
Real-World Example: Handling Nested JSON and Formatting Excel
In real-world Python applications, JSON data often contains nested dictionaries or lists, such as contact details, configuration groups, or progress logs. At the same time, the Excel output is expected to follow a clean, readable layout suitable for business or reporting purposes.
In this section, we'll demonstrate how to flatten nested JSON data and format the resulting Excel sheet using Python and Spire.XLS. This includes merging cells, applying styles, and auto-fitting columns — all features that help present complex data in a clear tabular form.
Let’s walk through the process using a sample file: projects_nested.json.
Step 1: Flatten Nested JSON
Sample JSON file (projects_nested.json):
[
{
"project_id": "PRJ001",
"title": "AI Research",
"manager": {
"name": "Dr. Emily Wang",
"email": "emily@lab.org"
},
"phases": [
{"phase": "Design", "status": "Completed"},
{"phase": "Development", "status": "In Progress"}
]
},
{
"project_id": "PRJ002",
"title": "Cloud Migration",
"manager": {
"name": "Mr. John Lee",
"email": "john@infra.net"
},
"phases": [
{"phase": "Assessment", "status": "Completed"}
]
}
]
We'll flatten all nested structures, including objects like manager, and summarize lists like phases into string fields. Each JSON record becomes a single flat row, with even complex nested data compactly represented in columns using readable summaries.
import json
# Helper: Flatten nested data and summarize list of dicts into strings
# e.g., [{"a":1},{"a":2}] → "a: 1; a: 2"
def flatten(data, parent_key='', sep='.'):
items = {}
for k, v in data.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.update(flatten(v, new_key, sep=sep))
elif isinstance(v, list):
if all(isinstance(i, dict) for i in v):
items[new_key] = "; ".join(
", ".join(f"{ik}: {iv}" for ik, iv in i.items()) for i in v
)
else:
items[new_key] = ", ".join(map(str, v))
else:
items[new_key] = v
return items
# Load and flatten JSON
with open("projects_nested.json", "r", encoding="utf-8") as f:
raw_data = json.load(f)
flat_data = [flatten(record) for record in raw_data]
# Collect all unique keys from flattened data as headers
all_keys = set()
for item in flat_data:
all_keys.update(item.keys())
headers = list(sorted(all_keys)) # Consistent, sorted column order
This version of flatten() converts lists of dictionaries into concise summary strings (e.g., "phase: Design, status: Completed; phase: Development, status: In Progress"), making complex structures more compact for Excel output.
Step 2: Format and Export Excel with Spire.XLS
Now we’ll export the flattened project data to Excel, and use formatting features in Spire.XLS for Python to improve the layout and readability. This includes setting fonts, colors, merging cells, and automatically adjusting column widths for a professional report look.
from spire.xls import Workbook, Color, FileFormat
# Create workbook and worksheet
workbook = Workbook()
sheet = workbook.Worksheets[0]
sheet.Name = "Projects"
# Title row: merge and style
col_count = len(headers)
sheet.Range[1, 1, 1, col_count].Merge()
sheet.Range[1, 1].Text = "Project Report (Flattened JSON)"
title_style = sheet.Range["A1"].Style
title_style.Font.IsBold = True
title_style.Font.Size = 14
title_style.Font.Color = Color.get_White()
title_style.Color = Color.get_DarkBlue()
# Header row from flattened keys
for col, header in enumerate(headers):
cell = sheet.Range[2, col + 1]
cell.BorderAround() # Add outside borders to a cell or cell range
#cell.BorderInside() # Add inside borders to a cell range
cell.Text = header
style = cell.Style
style.Font.IsBold = True
style.Color = Color.get_LightGray()
# Data rows
for row_idx, row in enumerate(flat_data, start=3):
for col_idx, key in enumerate(headers):
sheet.Range[row_idx, col_idx + 1].Text = str(row.get(key, ""))
# Auto-fit columns and rows
for col in range(len(headers)):
sheet.AutoFitColumn(col + 1)
for row in range(len(flat_data)):
sheet.AutoFitRow(row + 1)
# Save Excel file
workbook.SaveToFile("output/projects_formatted.xlsx", FileFormat.Version2016)
workbook.Dispose()
This produces a clean, styled Excel sheet from a nested JSON file, making your output suitable for reports, stakeholders, or dashboards.
Code Explanation
- sheet.Range[].Merge(): merges a range of cells into one. Here we use it for the report title row (A1:F1).
- .Style.Font / .Style.Color: allow customizing font properties (bold, size, color) and background fill of a cell.
- .BorderAround() / .BorderInside(): add outside/inside borders to a cell range.
- AutoFitColumn(n): automatically adjusts the width of column
nto fit its content.
The Excel file generated after flattening the JSON data using Python:

Common Errors and Fixes in JSON ↔ Excel Conversion
Converting between JSON and Excel may sometimes raise formatting, encoding, or data structure issues. Here are some common problems and how to fix them:
| Error | Fix |
|---|---|
| JSONDecodeError or malformed input | Ensure valid syntax; avoid using eval(); use json.load() and flatten nested objects. |
| TypeError: Object of type ... is not JSON serializable | Use json.dump(data, f, default=str) to convert non-serializable values. |
| Excel file not loading or crashing | Ensure the file is not open in Excel; use the correct extension (.xlsx or .xls). |
| UnicodeEncodeError or corrupted characters | Set encoding="utf-8" and ensure_ascii=False in json.dump(). |
Conclusion
With Spire.XLS for Python, converting between JSON and Excel becomes a streamlined and reliable process. You can easily transform JSON data into well-formatted Excel files, complete with headers and styles, and just as smoothly convert Excel sheets back into structured JSON. The library helps you avoid common issues such as encoding errors, nested data complications, and Excel file format pitfalls.
Whether you're handling data exports, generating reports, or processing API responses, Spire.XLS provides a consistent and efficient way to work with .json and .xlsx formats in both directions.
Want to unlock all features without limitations? Request a free temporary license for full access to Spire.XLS for Python.
FAQ
Q1: How to convert JSON into Excel using Python?
You can use the json module in Python to load structured JSON data, and then use a library like Spire.XLS to export it to .xlsx. Spire.XLS allows writing headers, formatting Excel cells, and handling nested JSON via flattening. See the JSON to Excel section above for step-by-step examples.
Q2: How do I parse JSON data in Python?
Parsing JSON in Python is straightforward with the built-in json module. Use json.load() to parse JSON from a file, or json.loads() to parse a JSON string. After parsing, the result is usually a list of dictionaries, which can be iterated and exported to Excel or other formats.
Q3: Can I export Excel to JSON with Spire.XLS in Python?
Yes. Spire.XLS for Python lets you read Excel files and convert worksheet data into a list of dictionaries, which can be written to JSON using json.dump(). The process includes extracting headers, detecting used rows and columns, and optionally handling formatting. See Excel to JSON for detailed implementation.
How to Generate QR Codes in Python (Full Tutorial with Examples)
2025-07-15 02:49:34 Written by zaki zou
QR codes have transformed how we bridge physical and digital experiences—from marketing campaigns to secure data sharing. For developers looking to generate QR codes in Python , Spire.Barcode for Python provides a complete toolkit for seamless QR code generation, offering both simplicity for basic needs and advanced customization for professional applications.
This step-by-step guide walks you through the entire QR code generation process in Python. You'll learn to programmatically create scannable codes, customize their appearance, embed logos, and optimize error correction - everything needed to implement robust QR code generation solutions for any business or technical requirement.
Table of Contents
- Introduction to Spire.Barcode for Python
- Setting Up the Environment
- Basic Example: Generating QR Codes in Python
- Customizing QR Code Appearance
- Generating QR Code with Logo
- Wrapping up
- FAQs
1. Introduction to Spire.Barcode for Python
Spire.Barcode for Python is a powerful library that enables developers to generate and read various barcode types, including QR codes, in Python applications. This robust solution supports multiple barcode symbologies while offering extensive customization options for appearance and functionality.
Key features of Spire.Barcode include:
- Support for QR Code generation with customizable error correction levels
- Flexible data encoding options (numeric, alphanumber, byte/binary)
- Comprehensive appearance customization (colors, sizes, fonts)
- High-resolution output capabilities
- Logo integration within QR codes
2. Setting Up the Environment
Before we dive into generating QR codes, you need to set up your Python environment. Ensure you have Python installed, and then install the Spire.Barcode library using pip:
pip install spire.barcode
For the best results, obtain a free temporary license from our website. This will allow you to create professional QR code images without evaluation messages, enhancing both user experience and quality of the generated codes.
3. Basic Example: Generating QR Codes in Python
Now that we have everything set up, let's generate our first QR code. Below is the step-by-step process:
-
Initial Setup :
- Import the Spire.Barcode library.
- Activate the library with a valid license key to remove the
-
Configure Barcode Settings :
- Create a BarcodeSettings object to control QR code properties.
- Set barcode type to QR code.
- Configure settings such as data mode and error correction level.
- Define the content to encode.
- Configure visual aspects like module width and text display options.
-
Generate Barcode Image :
- Create a BarCodeGenerator object with the configured settings.
- Convert the configured QR code into an image object in memory.
-
Save Image to File :
- Write the generated QR code image to a specified file path in PNG format.
The following code snippet demonstrates how to generate QR codes in Python:
from spire.barcode import *
# Function to write all bytes to a file
def WriteAllBytes(fname: str, data):
with open(fname, "wb") as fp:
fp.write(data)
fp.close()
# Apply license key for the barcode generation library
License.SetLicenseKey("your license key")
# Create a BarcodeSettings object to configure barcode properties
barcodeSettings = BarcodeSettings()
# Set the type of barcode to QR code
barcodeSettings.Type = BarCodeType.QRCode
# Set the data mode for QR code (automatic detection of data type)
barcodeSettings.QRCodeDataMode = QRCodeDataMode.Auto
# Set the error correction level (M means medium level of error correction)
barcodeSettings.QRCodeECL = QRCodeECL.M
# Set the data that will be encoded in the QR code
barcodeSettings.Data2D = "visit us at www.e-iceblue.com"
# Set the width of each module (the square bars) in the barcode
barcodeSettings.X = 3
# Hide the text that typically accompanies the barcode
barcodeSettings.ShowText = False
# Create a BarCodeGenerator object with the specified settings
barCodeGenerator = BarCodeGenerator(barcodeSettings)
# Generate the image for the barcode based on the settings
image = barCodeGenerator.GenerateImage()
# Write the generated PNG image to disk at the specified path
WriteAllBytes("output/QRCode.png", image)
Key Concepts:
A. QRCodeDataMode (Data Encoding Scheme)
Controls how the input data is encoded in the QR code:
| Mode | Best For | Example Use Cases |
|---|---|---|
| Auto | Let library detect automatically | General purpose (default choice) |
| Numeric | Numbers only (0-9) | Product codes, phone numbers |
| AlphaNumber | A-Z, 0-9, and some symbols | URLs, simple messages |
| Byte | Binary/Unicode data | Complex text, special characters |
Why it matters:
- Different modes have different storage capacities.
- Numeric mode can store more digits than other modes.
- Auto mode is safest for mixed content.
B. QRCodeECL (Error Correction Level)
Determines how much redundancy is built into the QR code:
| Level | Recovery Capacity | Use Case |
|---|---|---|
| L (Low) | 7% damage recovery | Small codes, short URLs |
| M (Medium) | 15% damage recovery | General purpose (recommended) |
| Q (Quartile) | 25% damage recovery | Codes with logos or decorations |
| H (High) | 30% damage recovery | Critical applications |
Visual Impact:
Higher ECLs:
- Increase QR code complexity (more modules/squares).
- Make the code more scannable when damaged.
- Are essential when adding logos (use at least Q or H).
Output:

4. Customizing QR Code Appearance
Once you've generated a basic QR code, you can further customize its appearance to make it more visually appealing or to fit your brand. Here are some customization options:
4.1 Adjusting DPI Settings
The DPI (dots per inch) settings control the image quality of the QR code. Higher DPI values result in sharper images suitable for printing, while lower values (72-150) are typically sufficient for digital use.
barcodeSettings.DpiX = 150
barcodeSettings.DpiY = 150
4.2 Changing Foreground and Background Colors
You can customize your QR code’s color scheme. The ForeColor determines the color of the QR code modules (squares), while BackColor sets the background color. Ensure sufficient contrast for reliable scanning.
barcodeSettings.BackColor = Color.get_GhostWhite()
barcodeSettings.ForeColor = Color.get_Purple()
4.3 Displaying the Encoded Data
If you want users to see the encoded information without scanning, set the following properties:
barcodeSettings.ShowTextOnBottom = True
barcodeSettings.TextColor = Color.get_Purple()
barcodeSettings.SetTextFont("Arial", 13, FontStyle.Bold)
4.4 Adding Text Under QR Code
You can also add custom text under the QR code, which could be a call-to-action or instructions.
barcodeSettings.ShowBottomText = True
barcodeSettings.BottomText = "Scan Me"
barcodeSettings.SetBottomTextFont("Arial", 13, FontStyle.Bold)
barcodeSettings.BottomTextColor = Color.get_Black()
4.5 Setting Margins and Border
Defining margins and border styles enhances the presentation of the QR code. Here’s how to do it:
barcodeSettings.LeftMargin = 2
barcodeSettings.RightMargin = 2
barcodeSettings.TopMargin = 2
barcodeSettings.BottomMargin = 2
barcodeSettings.HasBorder = True
barcodeSettings.BorderWidth = 0.5
barcodeSettings.BorderColor = Color.get_Black()
5. Generating QR Code with Logo
For branding purposes, you might want to add a logo to your QR code. Spire.Barcode makes this process seamless while maintaining scannability. Here’s how:
barcodeSettings.SetQRCodeLogoImage("C:\\Users\\Administrator\\Desktop\\logo.png")
When adding a logo:
- Use a simple, high-contrast logo for best results.
- Test the scannability after adding the logo.
- The QR code's error correction (set earlier) helps compensate for the obscured area.
The logo will be centered within the QR code, and Spire.Barcode will automatically resize it to ensure the QR code remains scannable.
Output:

6. Wrapping up
Generating QR codes in Python using Spire.Barcode is a straightforward process that offers extensive customization options. From basic QR codes to branded versions with logos and custom styling, the library provides all the tools needed for professional barcode generation.
Key Benefits:
- Spire.Barcode simplifies QR code generation with a clean API.
- Extensive customization options allow for branded, visually appealing QR codes.
- Error correction ensures reliability even with added logos.
- High-resolution output supports both digital and print use cases.
Whether you're building an inventory system, creating marketing materials, or developing a mobile app integration, Spire.Barcode provides a robust solution for all your QR code generation needs in Python.
7. FAQs
Q1: What is a QR code?
A QR code (Quick Response code) is a type of matrix barcode that can store URLs and other information. It is widely used for quickly linking users to websites, apps, and digital content through mobile devices.
Q2: Can I generate QR codes without a license key?
Yes, you can generate QR codes without a license key; however, the generated barcode will display an evaluation message along with our company logo, which may detract from the user experience.
Q3: Can I generate different types of barcodes with Spire.Barcode?
Yes, Spire.Barcode supports various barcode types, not just QR codes. Detailed documentation can be found here: How to Generate Barcode in Python
Q4: How can I implement a QR code generator in Python using Spire.Barcode?
To implement a QR code generator in Python with Spire.Barcode, create a BarcodeSettings object to configure the QR code properties. Then, use the BarCodeGenerator class to generate the QR code image by calling the GenerateImage() method.
Q5: Can I scan or read QR code using Spire.Barcode?
Yes, you can scan and read QR codes using Spire.Barcode for Python. The library provides functionality for both generating and decoding QR codes. Follow this guide: How to Read Barcode Using Python
Get a Free License
To fully experience the capabilities of Spire.Barcode for Python without any evaluation limitations, you can request a free 30-day trial license.
How to Add Text to PDF in Python (Create & Edit with Examples)
2025-07-10 08:51:40 Written by zaki zou
Adding text to a PDF is a common task in Python — whether you're generating reports, adding annotations, filling templates, or labeling documents. This guide will walk you through how to write text in a PDF file using Python, including both creating new PDF documents and updating existing ones.
We’ll be using a dedicated Python PDF library - Spire.PDF for Python, which allows precise control over text placement, font styling, and batch processing. The examples are concise, beginner-friendly, and ready for real-world projects.
Sections Covered
- Setup: Install the PDF Library in Python
- Add Text to a New PDF
- Add Text to an Existing PDF
- Control Text Style, Position, Transparency, and Rotation
- Common Pitfalls and Cross-Platform Tips
- Conclusion
- FAQ
Setup: Install the PDF Library in Python
To get started, install Spire.PDF for Python, a flexible and cross-platform PDF library.
pip install Spire.PDF
Or use Free Spire.PDF for Python:
pip install spire.pdf.free
Why use this library?
- Works without Adobe Acrobat or Microsoft Office
- Add and format text at exact positions
- Supports both new and existing PDF editing
- Runs on Windows, macOS, and Linux
Add Text to a New PDF Using Python
If you want to create a PDF from text using Python, the example below shows how to insert a line of text into a blank PDF page using custom font and position settings.
Example: Create and write text to a blank PDF
from spire.pdf import PdfDocument, PdfTrueTypeFont, PdfFontStyle, PdfSolidBrush, PdfRGBColor, PointF, RectangleF, \
PdfStringFormat, PdfTextAlignment, PdfVerticalAlignment
# Create a new PDF document and add a new page
pdf = PdfDocument()
page = pdf.Pages.Add()
text = ("This report summarizes the sales performance of various products in the first quarter of 2025. " +
"Below is a breakdown of the total sales by product category, " +
"followed by a comparison of sales in different regions.")
# Set the font, brush, and point
font = PdfTrueTypeFont("Arial", 14.0, PdfFontStyle.Regular, True)
brush = PdfSolidBrush(PdfRGBColor(0, 0, 0)) # black
point = PointF(50.0, 100.0)
# Set the layout area and string format
layoutArea = RectangleF(50.0, 50.0, page.GetClientSize().Width - 100.0, page.GetClientSize().Height)
stringFormat = PdfStringFormat(PdfTextAlignment.Left, PdfVerticalAlignment.Top)
page.Canvas.DrawString(text, font, brush, layoutArea, stringFormat, False)
pdf.SaveToFile("output/new.pdf")
pdf.Close()
Technical Notes
- PdfTrueTypeFont() loads a TrueType font from the system with customizable size and style (e.g., regular, bold). It ensures consistent text rendering in the PDF.
- PdfSolidBrush() defines the fill color for text or shapes using RGB values. In this example, it's set to black ((0, 0, 0)).
- RectangleF(x, y, width, height) specifies a rectangular layout area for drawing text. It enables automatic line wrapping and precise control of text boundaries.
- PdfStringFormat() controls the alignment of the text inside the rectangle. Here, text is aligned to the top-left (Left and Top).
- DrawString() draws the specified text within the defined layout area without affecting existing content on the page.
Example output PDF showing wrapped black text starting at coordinates (50, 50).

Tip: To display multiple paragraphs or line breaks, consider adjusting the Y-coordinate dynamically or using multiple DrawString() calls with updated positions.
If you want to learn how to convert TXT files to PDF directly using Python, please check: How to Convert Text Files to PDF Using Python.
Add Text to an Existing PDF in Python
Need to add text to an existing PDF using Python? This method lets you load a PDF, access a page, and write new text anywhere on the canvas.
This is helpful for:
- Adding comments or annotations
- Labeling document versions
- Filling pre-designed templates
Example: Open an existing PDF and insert text
from spire.pdf import PdfDocument, PdfFontStyle, PdfSolidBrush, PdfRGBColor, PointF, PdfFont, PdfFontFamily
pdf = PdfDocument()
pdf.LoadFromFile("input.pdf")
page = pdf.Pages.get_Item(0)
font = PdfFont(PdfFontFamily.TimesRoman, 12.0, PdfFontStyle.Bold)
brush = PdfSolidBrush(PdfRGBColor(255, 0, 0)) # red
location = PointF(150.0, 110.0)
page.Canvas.DrawString("This document is approved.", font, brush, location)
pdf.SaveToFile("output/modified.pdf")
pdf.Close()
Technical Notes
- LoadFromFile() loads an existing PDF into memory.
- You can access specific pages via pdf.Pages[index].
- New content is drawn on top of the existing layout, non-destructively.
- The text position is again controlled via PointF(x, y).
Modified PDF page with newly added red text annotation on the first page.

Use different x, y coordinates to insert content at custom positions.
Related article: Replace Text in PDF with Python
Control Text Style, Positioning, Transparency, and Rotation
When adding text to a PDF, you often need more than just plain content—you may want to customize the font, color, placement, rotation, and transparency, especially for annotations or watermarks.
Spire.PDF for Python offers fine-grained control for these visual elements, whether you’re building structured reports or stamping dynamic text overlays.
Set Font Style and Color
# Create PdfTrueTypeFont
font = PdfTrueTypeFont("Calibri", 16.0, PdfFontStyle.Italic, True)
# Create PdfFont
font = PdfFont(PdfFontFamily.TimesRoman, 16.0, PdfFontStyle.Italic)
# Create PdfBrush to specify text drawing color
brush = PdfSolidBrush(PdfRGBColor(34, 139, 34)) # forest green
PdfTrueTypeFont will embed the font into the PDF file. To reduce file size, you may use PdfFont, which uses system fonts without embedding them.
Apply Transparency and Rotation
You can adjust transparency and rotation when drawing text to achieve effects like watermarks or angled labels.
# Save the current canvas state
state = page.Canvas.Save()
# Set semi-transparency (0.0 = fully transparent, 1.0 = fully opaque)
page.Canvas.SetTransparency(0.4)
# Move the origin to the center of the page
page.Canvas.TranslateTransform(page.Size.Width / 2, page.Size.Height / 2)
# Rotate the canvas -45 degrees (counterclockwise)
page.Canvas.RotateTransform(-45)
# Draw text at new origin
page.Canvas.DrawString("DRAFT", font, brush, PointF(-50, -20))
Example: Add a Diagonal Watermark to the Center of the Page
The following example demonstrates how to draw a centered, rotated, semi-transparent watermark using all the style controls above:
from spire.pdf import PdfDocument, PdfTrueTypeFont, PdfFontStyle, PdfSolidBrush, PdfRGBColor, PointF
from spire.pdf.common import Color
pdf = PdfDocument()
pdf.LoadFromFile("input1.pdf")
page = pdf.Pages.get_Item(0)
text = "Confidential"
font = PdfTrueTypeFont("Arial", 40.0, PdfFontStyle.Bold, True)
brush = PdfSolidBrush(PdfRGBColor(Color.get_DarkBlue())) # gray
# Measure text size to calculate center
size = font.MeasureString(text)
x = (page.Canvas.ClientSize.Width - size.Width) / 2
y = (page.Canvas.ClientSize.Height - size.Height) / 2
state = page.Canvas.Save()
page.Canvas.SetTransparency(0.3)
page.Canvas.TranslateTransform(x + size.Width / 2, y + size.Height / 2)
page.Canvas.RotateTransform(-45.0)
page.Canvas.DrawString(text, font, brush, PointF(-size.Width / 2, -size.Height / 2))
page.Canvas.Restore(state)
pdf.SaveToFile("output/with_watermark.pdf")
pdf.Close()
PDF page displaying a centered, rotated, semi-transparent watermark text.

This approach works well for dynamic watermarking, diagonal stamps like "VOID", "COPY", or "ARCHIVED", and supports full automation.
Make sure all files are closed and not in use to avoid PermissionError.
For more details on inserting watermarks into PDF with Python, please refer to: How to Insert Text Watermarks into PDFs Using Python.
Common Pitfalls and Cross-Platform Considerations
Even with the right API, issues can arise when deploying PDF text operations across different environments or font configurations. Here are some common problems and how to resolve them:
| Issue | Cause | Recommended Fix |
|---|---|---|
| Text appears in wrong position | Hardcoded coordinates not accounting for page size | Use ClientSize and MeasureString() for dynamic layout |
| Font not rendered | Font lacks glyphs or isn't supported | Use PdfTrueTypeFont to embed supported fonts like Arial Unicode |
| Unicode text not displayed | Font does not support full Unicode range | Use universal fonts (e.g., Arial Unicode, Noto Sans) |
| Text overlaps existing content | Positioning too close to body text | Adjust Y-offsets or add padding with MeasureString() |
| Watermark text appears on output | You are using the paid version without a license | Use the free version or apply for a temporary license |
| Font file too large | Embedded font increases PDF size | Use PdfFont for system fonts (non-embedded), if portability is not a concern |
| Inconsistent results on macOS/Linux | Fonts not available or different metrics | Ship fonts with your application, or use built-in cross-platform fonts |
Conclusion
With Spire.PDF for Python, adding text to PDFs—whether creating new files, updating existing ones, or automating batch edits—can be done easily and precisely. From annotations to watermarks, the library gives you full control over layout and styling.
You can start with the free version right away, or apply for a temporary license to unlock full features.
FAQ
How to add text to a PDF using Python?
Use a PDF library such as Spire.PDF to insert text via the DrawString() method. You can define font, position, and styling.
Can I write text into an existing PDF file with Python?
Yes. Load the file with LoadFromFile(), then use DrawString() to add text at a specific location.
How do I generate a PDF from text using Python?
Create a new document and use drawing methods to write content line by line with precise positioning.
Can I add the same text to many PDFs automatically?
Yes. Use a loop to process multiple files and insert text programmatically using a template script.

Automating the creation of Word documents is a powerful way to generate reports, and produce professional-looking files. With Python, you can utilize various libraries for this purpose, and one excellent option is Spire.Doc for Python, specifically designed for handling Word documents.
This guide will provide a clear, step-by-step process for creating Word documents in Python using Spire.Doc. We’ll cover everything from setting up the library to adding formatted text, images, tables, and more. Whether you're generating reports, invoices, or any other type of document, thes techniques will equip you with the essential tools to enhance your workflow effectively.
Table of Contents:
- What's Sprie.Doc for Python?
- Set Up Spire.Doc in Your Python Project
- Step 1: Create a Blank Word Document
- Step 2: Add Formatted Text (Headings, Paragraphs)
- Step 3: Insert Images to a Word Document
- Step 4: Create and Format Tables
- Step 5: Add Numbered or Bulleted Lists
- Best Practices for Word Document Creation in Python
- FAQs
- Conclusion
What's Spire.Doc for Python?
Spire.Doc is a powerful library for creating, manipulating, and converting Word documents in Python. It enables developers to generate professional-quality documents programmatically without needing Microsoft Word. Here are some key features:
- Supports Multiple Formats : Works with DOCX, DOC, RTF, and HTML.
- Extensive Functionalities : Add text, images, tables, and charts.
- Styling and Formatting : Apply various styles for consistent document appearance.
- User-Friendly API: Simplifies automation of document generation processes.
- Versatile Applications : Ideal for generating reports, invoices, and other documents.
With Spire.Doc, you have the flexibility and tools to streamline your Word document creation tasks effectively.
Set Up Spire.Doc in Your Python Project
To get started with Spire.Doc in your Python project, follow these simple steps:
- Install Spire.Doc : First, you need to install the Spire.Doc library. You can do this using pip. Open your terminal or command prompt and run the following command:
pip install spire.doc
- Import the Library : Once installed, import the Spire.Doc module in your Python script to access its functionalities. You can do this with the following import statement:
from spire.doc import *
from spire.doc.common import *
With the setup complete, you can begin writing your Python code to create Word documents according to your needs.
Step 1: Create a Blank Word Document in Python
The first step in automating Word document creation is to create a blank document. To begin with, we create a Document object, which serves as the foundation of our Word document. We then add a section to organize content, and set the page size to A4 with 60-unit margins . These configurations are crucial for ensuring proper document layout and readability.
Below is the code to initialize a document and set up the page configuration:
# Create a Document object
doc = Document()
# Add a section
section = doc.AddSection()
# Set page size and page margins
section.PageSetup.PageSize = PageSize.A4()
section.PageSetup.Margins.All = 60
# Save the document
doc.SaveToFile("BlankDocument.docx")
doc.Dispose
Step 2: Add Formatted Text (Headings, Paragraphs)
1. Add Title, Headings, Paragraphs
In this step, we add text content by first creating paragraphs using the AddParagraph method, followed by inserting text with the AppendText method.
Different paragraphs can be styled using various BuiltInStyle options, such as Title , Heading1 , and Normal , allowing for quick generation of document elements. Additionally, the TextRange.CharacterFormat property can be used to adjust the font, size, and other styles of the text, ensuring a polished and organized presentation.
Below is the code to insert and format these elements:
# Add a title
title_paragraph = section.AddParagraph()
textRange = title_paragraph.AppendText("My First Document")
title_paragraph.ApplyStyle(BuiltinStyle.Title)
textRange.CharacterFormat.FontName = "Times New Properties"
textRange.CharacterFormat.FontSize = 24
# Add a heading
heading_paragraph = section.AddParagraph()
textRange = heading_paragraph.AppendText("This Is Heading1")
heading_paragraph.ApplyStyle(BuiltinStyle.Heading1)
textRange.CharacterFormat.FontName = "Times New Properties"
textRange.CharacterFormat.FontSize = 16
# Add a paragraph
normal_paragraph = section.AddParagraph()
textRange = normal_paragraph .AppendText("This is a sample paragraph.")
normal_paragraph .ApplyStyle(BuiltinStyle.Normal)
textRange.CharacterFormat.FontName = "Times New Properties"
textRange.CharacterFormat.FontSize = 12
2. Apply Formatting to Paragraph
To ensure consistent formatting across multiple paragraphs, we can create a ParagraphStyle that defines key properties such as font attributes (name, size, color, boldness) and paragraph settings (spacing, indentation, alignment) within a single object. This style can then be easily applied to the selected paragraphs for uniformity.
Below is the code to define and apply the paragraph style:
# Defined paragraph style
style = ParagraphStyle(doc)
style.Name = "paraStyle"
style.CharacterFormat.FontName = "Arial"
style.CharacterFormat.FontSize = 13
style.CharacterFormat.TextColor = Color.get_Red()
style.CharacterFormat.Bold = True
style.ParagraphFormat.AfterSpacing = 12
style.ParagraphFormat.BeforeSpacing = 12
style.ParagraphFormat.FirstLineIndent = 4
style.ParagraphFormat.LineSpacing = 10
style.ParagraphFormat.HorizontalAlignment = HorizontalAlignment.Left
doc.Styles.Add(style)
# Apply the style to the specific paragraph
normal_paragraph.ApplyStyle("paraStyle")
You may also like: How to Convert Text to Word and Word to Text in Python
Step 3: Insert Images to a Word Document
1. Insert an Image
In this step, we add an image to our document, allowing for visual enhancements that complement the text. We begin by creating a paragraph to host the image and then proceed to insert the desired image file usingthe Paragraph.AppendPicture method. After the image is inserted, we can adjust its dimensions and alignment to ensure it fits well within the document layout.
Below is the code to insert and format the image:
# Add a paragraph
paragraph = section.AddParagraph()
# Insert an image
picture = paragraph.AppendPicture("C:\\Users\\Administrator\\Desktop\\logo.png")
# Scale the image dimensions
picture.Width = picture.Width * 0.9
picture.Height = picture.Height * 0.9
# Set text wrapping style
picture.TextWrappingStyle = TextWrappingStyle.TopAndBottom
# Center-align the image horizontally
picture.HorizontalAlignment = HorizontalAlignment.Center
2. Position Image at Precise Location
To gain precise control over the positioning of images within your Word document, you can adjust both the horizontal and vertical origins and specify the image's coordinates in relation to these margins. This allows for accurate placement of the image, ensuring it aligns perfectly with the overall layout of your document.
Below is the code to set the image's position.
picture.HorizontalOrigin = HorizontalOrigin.LeftMarginArea
picture.VerticalOrigin = VerticalOrigin.TopMarginArea
picture.HorizontalPosition = 180.0
picture.VerticalPosition = 165.0
Note : Absolute positioning does not apply when using the Inline text wrapping style.
Step 4: Create and Format Tables
In this step, we will create a table within the document and customize its appearance and functionality. This includes defining the table's structure, adding header and data rows, and setting formatting options to enhance readability.
Steps for creating and customizing a table in Word:
- Add a Table : Use the Section.AddTablemethod to create a new table.
- Specify Table Data : Define the data that will populate the table.
- Set Rows and Columns : Specify the number of rows and columns with the Table.ResetCells method.
- Access Rows and Cells : Retrieve a specific row using Table.Rows[rowIndex] and a specific cell using TableRow.Cells[cellIndex] .
- Populate the Table : Add paragraphs with text to the designated cells.
- Customize Appearance : Modify the table and cell styles through the Table.TableFormat and TableCell.CellFormat properties.
The following code demonstrates how to add a teble when creating Word documents in Python:
# Add a table
table = section.AddTable(True)
# Specify table data
header_data = ["Header 1", "Header 2", "Header 3"]
row_data = [["Row 1, Col 1", "Row 1, Col 2", "Row 1, Col 3"],
["Row 2, Col 1", "Row 2, Col 2", "Row 2, Col 3"]]
# Set the row number and column number of table
table.ResetCells(len(row_data) + 1, len(header_data))
# Set the width of table
table.PreferredWidth = PreferredWidth(WidthType.Percentage, int(100))
# Get header row
headerRow = table.get_Item(0)
headerRow.IsHeader = True
headerRow.Height = 23
headerRow.RowFormat.BackColor = Color.get_DarkBlue() # Header color
# Fill the header row with data and set the text formatting
for i in range(len(header_data)):
headerRow.get_Item(i).CellFormat.VerticalAlignment = VerticalAlignment.Middle
paragraph = headerRow.get_Item(i).AddParagraph()
paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
txtRange = paragraph.AppendText(header_data[i])
txtRange.CharacterFormat.Bold = True
txtRange.CharacterFormat.FontSize = 15
txtRange.CharacterFormat.TextColor = Color.get_White() # White text color
# Fill the rest rows with data and set the text formatting
for r in range(len(row_data)):
dataRow = table.Rows.get_Item(r + 1)
dataRow.Height = 20
dataRow.HeightType = TableRowHeightType.Exactly
for c in range(len(row_data[r])):
dataRow.Cells[c].CellFormat.VerticalAlignment = VerticalAlignment.Middle
paragraph = dataRow.Cells[c].AddParagraph()
paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
txtRange = paragraph.AppendText(row_data[r][c])
txtRange.CharacterFormat.FontSize = 13
# Alternate row color
for j in range(1, table.Rows.Count):
if j % 2 == 0:
row2 = table.Rows[j]
for f in range(row2.Cells.Count):
row2.Cells[f].CellFormat.BackColor = Color.get_LightGray() # Alternate row color
# Set the border of table
table.TableFormat.Borders.BorderType = BorderStyle.Single
table.TableFormat.Borders.LineWidth = 1.0
table.TableFormat.Borders.Color = Color.get_Black()
You may also like: How to Create Tables in Word Documents in Python
Step 5: Add Numbered or Bulleted Lists
In this step, we create and apply both numbered and bulleted lists to enhance the document's organization. Spire.Doc offers the ListStyle class to define and manage different types of lists with customizable formatting options. Once created, these styles can be applied to any paragraph in the document, ensuring a consistent look across all list items.
Steps for generating numbered/bulleted lists in Word:
- Define the List Style : Initialize a ListStyle for the numbered or bulleted list, specifying properties such as name, pattern type, and text position.
- Add the List Style to Document : Use the Document.ListStyles.Add() method to incorporate the new list style into the document's styles collection.
- Create List Items : For each item, create a paragraph and apply the corresponding list style using the Paragraph.ListFormat.ApplyStyle() method.
- Format Text Properties : Adjust font size and type for each item to ensure consistency and readability.
Below is the code to generate numbered and bulleted lists:
# Create a numbered list style
listStyle = ListStyle(doc, ListType.Numbered)
listStyle.Name = "numberedList"
listStyle.Levels[0].PatternType = ListPatternType.Arabic
listStyle.Levels[0].TextPosition = 60;
doc.ListStyles.Add(listStyle)
# Create a numbered list
for item in ["First item", "Second item", "Third item"]:
paragraph = section.AddParagraph()
textRange = paragraph.AppendText(item)
textRange.CharacterFormat.FontSize = 13
textRange.CharacterFormat.FontName = "Times New Roman"
paragraph.ListFormat.ApplyStyle("numberedList")
# Create a bulleted list style
listStyle = ListStyle(doc, ListType.Bulleted)
listStyle.Name = "bulletedList"
listStyle.Levels[0].BulletCharacter = "\u00B7"
listStyle.Levels[0].CharacterFormat.FontName = "Symbol"
listStyle.Levels[0].TextPosition = 20
doc.ListStyles.Add(listStyle)
# Create a bulleted list
for item in ["Bullet item one", "Bullet item two", "Bullet item three"]:
paragraph = section.AddParagraph()
textRange = paragraph.AppendText(item)
textRange.CharacterFormat.FontSize = 13
textRange.CharacterFormat.FontName = "Times New Roman"
paragraph.ListFormat.ApplyStyle("bulletedList")
Here’s a screenshot of the Word document created using the code snippets provided above:

Best Practices for Word Document Creation in Python
- Reuse Styles : Define paragraph and list styles upfront to maintain consistency.
- Modular Code : Break document generation into functions (e.g., add_heading(), insert_table()) for reusability.
- Error Handling : Validate file paths and inputs to avoid runtime errors.
- Performance Optimization: Dispose of document objects (doc.Dispose()) to free resources.
- Use Templates : For complex documents, create MS Word templates with placeholders and replace them programmatically to save development time.
By implementing these practices, you can streamline document automation, reduce manual effort, and ensure professional-quality outputs.
FAQs
Q1: Does Spire.Doc support adding headers and footers to a Word document?
Yes, you can add and customize headers and footers, including page numbers, images, and custom text.
Q2. Can I generate Word documents on a server without Microsoft Office installed?
Yes, Spire.Doc works without Office dependencies, making it ideal for server-side automation.
Q3: Can I create Word documents from a template using Spire.Doc?
Of course, you can. Refer to the tutorial: Create Word Documents from Templates with Python
Q4: Can I convert Word documents to other formats using Spire.Doc?
Yes, Spire.Doc supports converting Word documents to various formats, including PDF, HTML, and plain text.
Q5. Can Spire.Doc edit existing Word documents?
Yes, Spire.Doc supports reading, editing, and saving DOCX/DOC files programmatically. Check out this documentation: How to Edit or Modify Word Documents in Pyhton
Conclusion
In this article, we've explored how to create Word documents in Python using the Spire.Doc library, highlighting its potential to enhance productivity while enabling the generation of highly customized and professional documents. By following the steps outlined in this guide, you can fully leverage Spire.Doc, making your document creation process both efficient and straightforward.
As you implement best practices and delve into the library's extensive functionalities, you'll discover that automating document generation significantly reduces manual effort, allowing you to concentrate on more critical tasks. Embrace the power of Python and elevate your document creation capabilities today!