page 2

Subscribe to this RSS feed

Python (355)

Children categories

Spire.Presentation for Python (53)

View items...

Spire.OCR for Python (3)

View items...

Convert HTML to Text in Python | Simple Plain Text Output

2025-09-03 01:16:17 Written by Administrator

Python Convert HTML Text Quickly and Easily

HTML (HyperText Markup Language) is a markup language used to create web pages, allowing developers to build rich and visually appealing layouts. However, HTML files often contain a large number of tags, which makes them difficult to read if you only need the main content. By using Python to convert HTML to text, this problem can be easily solved. Unlike raw HTML, the converted text file strips away all unnecessary markup, leaving only clean and readable content that is easier to store, analyze, or process further.

Install HTML to Text Converter in Python
Python Convert HTML File to Text
Python Convert HTML String to Text
The Conclusion
FAQs

Install HTML to Text Converter in Python

To simplify the task, we recommend using Spire.Doc for Python. This Python Word library allows you to quickly remove HTML markup and extract clean plain text with ease. It not only works as an HTML-to-text converter, but also offers a wide range of features—covering almost everything you can do in Microsoft Word.

To install it, you can run the following pip command:

pip install spire.doc

Alternatively, you can download the Spire.Doc package and install it manually.

Python Convert HTML Files to Text in 3 Steps

After preparing the necessary tools, let's dive into today's main topic: how to convert HTML to plain text using Python. With the help of Spire.Doc, this task can be accomplished in just three simple steps: create a new document object, load the HTML file, and save it as a text file. It’s straightforward and efficient, even for beginners. Let’s take a closer look at how this process can be implemented in code!

Code Example – Converting an HTML File to a Text File:

from spire.doc import *
from spire.doc.common import *

# Open an html file
document = Document()
document.LoadFromFile("/input/htmlsample.html", FileFormat.Html, XHTMLValidationType.none)
# Save it as a Text document.
document.SaveToFile("/output/HtmlFileTotext.txt", FileFormat.Txt)

document.Close()

The following is a preview comparison between the source document (.html) and the output document (.txt):

Python Convert an HTML File to a Text Document

Note that if the HTML file contains tables, the output text file will only retain the values within the tables and cannot preserve the original table formatting. If you want to keep certain styles while removing markup, it is recommended to convert HTML to a Word document . This way, you can retain headings, tables, and other formatting, making the content easier to edit and use.

How to Convert an HTML String to Text in Python

Sometimes, we don’t need the entire content of a web page and only want to extract specific parts. In such cases, you can convert an HTML string directly to text. This approach allows you to precisely control the information you need without further editing. Using Python to convert an HTML string to a text file is also straightforward. Here’s a detailed step-by-step guide:

Steps to convert an HTML string to a text document using Spire.Doc:

Input the HTML string directly or read it from a local file.
Create a Document object and add sections and paragraphs.
Use Paragraph.AppendHTML() method to insert the HTML string into a paragraph.
Save the document as a .txt file using Document.SaveToFile() method.

The following code demonstrates how to convert an HTML string to a text file using Python:

from spire.doc import *
from spire.doc.common import *

#Get html string.
#with open(inputFile) as fp:
    #HTML = fp.read()

# Load HTML from string
html = """<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>HTML to Text Example</title>
  <style>
    body { font-family: Arial, sans-serif; margin: 20px; }
    header { background: #f4f4f4; padding: 10px; }
    nav a { margin: 0 10px; text-decoration: none; color: #333; }
    main { margin-top: 20px; }
  </style>
</head>
<body>
  <header>
    <h1>My Demo Page</h1>
    <nav>
      <a href="#">Home</a>
      <a href="#">About</a>
      <a href="#">Contact</a>
    </nav>
  </header>
  
  <main>
    <h2>Convert HTML to Text</h2>
    <p>This is a simple demo showing how HTML content can be displayed before converting it to plain text.</p>
  </main>
</body>
</html>
"""

# Create a new document
document = Document()
section = document.AddSection()
section.AddParagraph().AppendHTML(html)

# Save directly as TXT
document.SaveToFile("/output/HtmlStringTotext.txt", FileFormat.Txt)
document.Close()

Here's the preview of the converted .txt file: Python Convert an HTML String to a Text Document

The Conclusion

In today’s tutorial, we focused on how to use Python to convert HTML to a text file. With the help of Spire.Doc, you can handle both HTML files and HTML strings in just a few lines of code, easily generating clean plain text files. If you’re interested in the other powerful features of the Python Word library, you can request a 30-day free trial license and explore its full capabilities for yourself.

FAQs about Converting HTML to Text in Python

Q1: How can I convert HTML to plain text using Python?

A: Use Spire.Doc to load an HTML file or string, insert it into a Document object with AppendHTML(), and save it as a .txt file.

Q2: Can I keep some formatting when converting HTML to text?

A: To retain styles like headings or tables, convert HTML to a Word document first, then export to text if needed.

Q3: Is it possible to convert only part of an HTML page to text?

A: Yes, extract the specific HTML segment as a string and convert it to text using Python for precise control.

Published in Conversion

Tagged under

doc Python Conversion

How to Read CSV Files in Python: A Comprehensive Guide

2025-08-22 08:47:42 Written by zaki zou

Read CSV file and convert to Excel in Python

In development, reading CSV files in Python is a common task in data processing, analytics, and backend integration. While Python offers built-in modules like csv and pandas for handling CSV files, Spire.XLS for Python provides a powerful, feature-rich alternative for working with CSV and Excel files programmatically.

In this article, you’ll learn how to use Python to read CSV files, from basic CSV parsing to advanced techniques.

Getting Started with Spire.XLS for Python
Basic Example: Read a CSV in Python
Advanced CSV Reading Techniques
Conclusion

Getting Started with Spire.XLS for Python

Spire.XLS for Python is a feature-rich library for processing Excel and CSV files. Unlike basic CSV parsers in Python, it offers advanced capabilities such as:

Simple API to load, read, and manipulate CSV data.
Reading/writing CSV files with support for custom delimiters.
Converting CSV files to Excel formats (XLSX, XLS) and vice versa.

These features make Spire.XLS ideal for data analysts, developers, and anyone working with structured data in CSV format.

Install via pip

Before getting started, install the library via pip. It works with Python 3.6+ on Windows, macOS, and Linux:

pip install Spire.XLS

Basic Example: Read a CSV in Python

Let’s start with a simple example: parsing a CSV file and extracting its data. Suppose we have a CSV file named “input.csv” with the following content:

Name,Age,City,Salary
Alice,30,New York,75000
Bob,28,Los Angeles,68000
Charlie,35,San Francisco,90000

Python Code to Read the CSV File

Here’s how to load and get data from the CSV file with Python:

from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load a CSV file
workbook.LoadFromFile("input.csv", ",", 1, 1)

# Get the first worksheet (CSV files are loaded as a single sheet)
worksheet = workbook.Worksheets[0]

# Get the number of rows and columns with data
row_count = worksheet.LastRow 
col_count = worksheet.LastColumn 

# Iterate through rows and columns to print data
print("CSV Data:")
for row in range(row_count):
    for col in range(col_count):
        # Get cell value
        cell_value = worksheet.Range[row+1, col+1].Value
        print(cell_value, end="\t")
    print()  # New line after each row

# Close the workbook
workbook.Dispose()

Explanation:

Workbook Initialization: The Workbook class is the core object for handling Excel files.
Load CSV File: LoadFromFile() imports the CSV data. Its parameters are:
- fileName: The CSV file to read.
- separator: Specified delimiter (e.g., “,”).
- row/column: The starting row/column index.
Access Worksheet: The CSV data is loaded into the first worksheet.
Read Data: Iterate through rows and columns to extract cell values via worksheet.Range[].Value.

Output: Get data from a CSV file and print in a tabular format.

Read a CSV file using Python

Advanced CSV Reading Techniques

1. Read CSV with Custom Delimiters

Not all CSVs use commas. If your CSV file uses a different delimiter (e.g., ;), specify it during loading:

# Load a CSV file
workbook.LoadFromFile("input.csv", ";", 1, 1)

2. Skip Header Rows

If your CSV has headers, skip them by adjusting the row iteration to start from the second row instead of the first.

for row in range(1, row_count):
    for col in range(col_count):
        # Get cell value (row+1 because Spire.XLS uses 1-based indexing)
        cell_value = worksheet.Range[row+1, col+1].Value
        print(cell_value, end="\t")

3. Convert CSV to Excel in Python

One of the most powerful features of Spire.XLS is the ability to convert a CSV file into a native Excel format effortlessly. For example, you can read a CSV and then:

Apply Excel formatting (e.g., set cell colors, borders).
Create charts (e.g., a bar chart for sales by region).
Save the data as an Excel file (.xlsx) for sharing.

Code Example: Convert CSV to Excel (XLSX) in Python – Single & Batch

Conclusion

Reading CSV files in Python with Spire.XLS simplifies both basic and advanced data processing tasks. Whether you need to extract CSV data, convert it to Excel, or handle advanced scenarios like custom delimiters, the examples outlined in this guide enables you to implement robust CSV reading capabilities in your projects with minimal effort.

Try the examples above, and explore the online documentation for more advanced features!

Published in Document Operation

Tagged under

xls Python Document Operation

Reading PowerPoint Files in Python: Extract Text, Images & More

2025-07-24 03:50:38 Written by zaki zou

Python Read PowerPoint Documents

PowerPoint (PPT & PPTX) files are rich with diverse content, including text, images, tables, charts, shapes, and metadata. Extracting these elements programmatically can unlock a wide range of use cases, from automating repetitive tasks to performing in-depth data analysis or migrating content across platforms.

In this tutorial, we'll explore how to read PowerPoint documents in Python using Spire.Presentation for Python, a powerful library for processing PowerPoint files.

Table of Contents:

Python Library to Read PowerPoint Files
Extracting Text from Slides
Saving Images from Slides
Accessing Metadata (Document Properties)
Conclusion
FAQs

1. Python Library to Read PowerPoint Files

To work with PowerPoint files in Python, we'll use Spire.Presentation for Python. This feature-rich library enables developers to create, edit, and read content from PowerPoint presentations efficiently. It allows for the extraction of text, images, tables, SmartArt, and metadata with minimal coding effort.

Before we begin, install the library using pip:

pip install spire.presentation

Now, let's dive into different ways to extract content from PowerPoint files.

2. Extracting Text from Slides in Python

PowerPoint slides contain text in various forms—shapes, tables, SmartArt, and more. We'll explore how to extract text from each of these elements.

2.1 Extract Text from Shapes

Most text in PowerPoint slides resides within shapes (text boxes, labels, etc.). Here’s how to extract text from shapes:

Steps-by-Step Guide

Initialize the Presentation object and load your PowerPoint file.
Iterate through each slide and its shapes.
Check if a shape is an IAutoShape (a standard text container).
Extract text from each paragraph in the shape.

Code Example

from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")

# Create a list
text = []

# Loop through the slides in the document
for slide_index, slide in enumerate(presentation.Slides):
    # Add slide marker
    text.append(f"====slide {slide_index + 1}====")

    # Loop through the shapes in the slide
    for shape in slide.Shapes:
        # Check if the shape is an IAutoShape object
        if isinstance(shape, IAutoShape):
            # Loop through the paragraphs in the shape
            for paragraph in shape.TextFrame.Paragraphs:
                # Get the paragraph text and append it to the list
                text.append(paragraph.Text)

# Write the text to a txt file
with open("output/ExtractAllText.txt", "w", encoding='utf-8') as f:
    for s in text:
        f.write(s + "\n")

# Dispose resources
presentation.Dispose()

Output:

A text file containing all extracted text from shapes in the presentation.

2.2 Extract Text from Tables

Tables in PowerPoint store structured data. Extracting this data requires iterating through each cell to maintain the table’s structure.

Step-by-Step Guide

Initialize the Presentation object and load your PowerPoint file.
Iterate through each slide to access its shapes.
Identify table shapes (ITable objects).
Loop through rows and cells to extract text.

Code Example

from spire.presentation import *
from spire.presentation.common import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint file
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")

# Create a list for tables
tables = []

# Loop through the slides
for slide in presentation.Slides:
    # Loop through the shapes in the slide
    for shape in slide.Shapes:
        # Check whether the shape is a table
        if isinstance(shape, ITable):
            tableData = "

            # Loop through the rows in the table
            for row in shape.TableRows:
                rowData = "

                # Loop through the cells in the row
                for i in range(row.Count):
                    # Get the cell value
                    cellValue = row[i].TextFrame.Text
                    # Add cell value with spaces for better readability
                    rowData += (cellValue + "  |  " if i < row.Count - 1 else cellValue)
                tableData += (rowData + "\n")
            tables.append(tableData)

# Write the tables to text files
for idx, table in enumerate(tables, start=1):
    fileName = f"output/Table-{idx}.txt"
    with open(fileName, "w", encoding='utf-8') as f:
        f.write(table)

# Dispose resources
presentation.Dispose()

Output:

A text file containing structured table data from the presentation.

2.3 Extract Text from SmartArt

SmartArt is a unique feature in PowerPoint used for creating diagrams. Extracting text from SmartArt involves accessing its nodes and retrieving the text from each node.

Step-by-Step Guide

Load the PowerPoint file into a Presentation object.
Iterate through each slide and its shapes.
Identify ISmartArt shapes in slides.
Loop through each node in the SmartArt.
Extract and save the text from each node.

Code Example

from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint file
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")

# Iterate through each slide in the presentation
for slide_index, slide in enumerate(presentation.Slides):

    # Create a list to store the extracted text for the current slide
    extracted_text = []

    # Loop through the shapes on the slide and find the SmartArt shapes
    for shape in slide.Shapes:
        if isinstance(shape, ISmartArt):
            smartArt = shape

            # Extract text from the SmartArt nodes and append to the list
            for node in smartArt.Nodes:
                extracted_text.append(node.TextFrame.Text)

    # Write the extracted text to a separate text file for each slide
    if extracted_text:  # Only create a file if there's text extracted
        file_name = f"output/SmartArt-from-slide-{slide_index + 1}.txt"
        with open(file_name, "w", encoding="utf-8") as text_file:
            for text in extracted_text:
                text_file.write(text + "\n")

# Dispose resources
presentation.Dispose()

Output:

A text file containing node text of a SmartArt.

You might also be interested in: Read Speaker Notes in PowerPoint in Python

3. Saving Images from Slides in Python

In addition to text, slides often contain images that may be important for your analysis. This section will show you how to save images from the slides.

Step-by-Step Guide

Initialize the Presentation object and load your PowerPoint file.
Access the Images collection in the presentation.
Iterate through each image and save it in a desired format (e.g., PNG).

Code Example

from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")

# Get the images in the document
images = presentation.Images

# Iterate through the images in the document
for i, image in enumerate(images):

    # Save a certain image in the specified path
    ImageName = "Output/Images_"+str(i)+".png"
    image_data = (IImageData)(image)
    image_data.Image.Save(ImageName)

# Dispose resources
presentation.Dispose()

Output:

PNG files of all images extracted from the PowerPoint.

4. Accessing Metadata (Document Properties) in Python

Extracting metadata provides insights into the presentation, such as its title, author, and keywords. This section will guide you on how to access and save this metadata.

Step-by-Step Guide

Create and load your PowerPoint file into a Presentation object.
Access the DocumentProperty object.
Extract properties like Title , Author , and Keywords .

Code Example

from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pptx")

# Get the DocumentProperty object
documentProperty = presentation.DocumentProperty

# Prepare the content for the text file
properties = [
    f"Title: {documentProperty.Title}",
    f"Subject: {documentProperty.Subject}",
    f"Author: {documentProperty.Author}",
    f"Manager: {documentProperty.Manager}",
    f"Company: {documentProperty.Company}",
    f"Category: {documentProperty.Category}",
    f"Keywords: {documentProperty.Keywords}",
    f"Comments: {documentProperty.Comments}",
]

# Write the properties to a text file
with open("output/DocumentProperties.txt", "w", encoding="utf-8") as text_file:
    for line in properties:
        text_file.write(line + "\n")

# Dispose resources
presentation.Dispose()

Output:

A text file containing PowerPoint document properties.

You might also be interested in: Add Document Properties to a PowerPoint File in Python

5. Conclusion

With Spire.Presentation for Python, you can effortlessly read and extract various elements from PowerPoint files—such as text, images, tables, and metadata. This powerful library streamlines automation tasks, content analysis, and data migration, allowing for efficient management of PowerPoint files. Whether you're developing an analytics tool, automating document processing, or managing presentation content, Spire.Presentation offers a robust and seamless solution for programmatically handling PowerPoint files.

6. FAQs

Q1. Can Spire.Presentation handle password-protected PowerPoint files?

Yes, Spire.Presentation can open and process password-protected PowerPoint files. To access an encrypted file, use the LoadFromFile() method with the password parameter:

presentation.LoadFromFile("encrypted.pptx", "yourpassword")

Q2. How can I read comments from PowerPoint slides?

You can read comments from PowerPoint slides using the Spire.Presentation library. Here’s how:

from spire.presentation import *

presentation = Presentation()
presentation.LoadFromFile("Input.pptx")

with open("PowerPoint_Comments.txt", "w", encoding="utf-8") as file:
    for slide_idx, slide in enumerate(presentation.Slides):
        slide = (ISlide)(slide)
        if len(slide.Comments) > 0:
            for comment_idx, comment in enumerate(slide.Comments):
                file.write(f"Comment {comment_idx + 1} from Slide {slide_idx + 1}: {comment.Text}\n")

Q3. Does Spire.Presentation preserve formatting when extracting text?

Basic text extraction retrieves raw text content. For formatted text (fonts, colors), you would need to access additional properties like TextRange.LatinFont and TextRange.Fill .

Q4. Are there any limitations on file size when reading PowerPoint files in Python?

While Spire.Presentation can handle most standard presentations, extremely large files (hundreds of MB) may require optimization for better performance.

Q5. Can I create or modify PowerPoint documents using Spire.Presentation?

Yes, you can create PowerPoint documents and modify existing ones using Spire.Presentation. The library provides a range of features that allow you to add new slides, insert text, images, tables, and shapes, as well as edit existing content.

Get a Free License

To fully experience the capabilities of Spire.Presentation for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Document Operation

Tagged under

ppt Python Document Operation

How to Create PowerPoint Documents in Python

2025-07-23 01:32:28 Written by zaki zou

Python code examples for creating PowerPoint documents

Creating PowerPoint presentations programmatically can save time and enhance efficiency in generating reports, slideshows, and other visual presentations. By automating the process, you can focus on content and design rather than manual formatting.

In this tutorial, we will explore how to create PowerPoint documents in Python using Spire.Presentation for Python. This powerful tool allows developers to manipulate and generate PPT and PPTX files seamlessly.

Table of Contents:

Python Library to Work with PowerPoint Files
Installing Spire.Presentation for Python
Creating a PowerPoint Document from Scratch
Creating PowerPoint Documents Based on a Template
Best Practices for Python PowerPoint Generation
Wrap Up
FAQs

1. Python Library to Work with PowerPoint Files

Spire.Presentation is a robust library for creating, reading, and modifying PowerPoint files in Python , without requiring Microsoft Office. This library supports a wide range of features, including:

Create PowerPoint documents from scratch or templates.
Add text, images, lists, tables, charts, and shapes.
Customize fonts, colors, backgrounds, and layouts.
Save as or export to PPT, PPTX, PDF, or images.

In the following sections, we will walk through the steps to install the library, create presentations, and add various elements to your slides.

2. Installing Spire.Presentation for Python

To get started, you need to install the Spire.Presentation library. You can install it using pip:

pip install spire.presentation

Once installed, you can begin utilizing its features in your Python scripts to create PowerPoint documents.

3. Creating a PowerPoint Document from Scratch

3.1 Generate and Save a Blank Presentation

Let's start by creating a basic PowerPoint presentation from scratch. The following code snippet demonstrates how to generate and save a blank presentation:

from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Set the slide size type
presentation.SlideSize.Type = SlideSizeType.Screen16x9

# Add a slide (there is one slide in the document by default)
presentation.Slides.Append()

# Save the document as a PPT or PPTX file
presentation.SaveToFile("BlankPowerPoint.pptx", FileFormat.Pptx2019)
presentation.Dispose()

In this code:

Presentation : Root class representing the PowerPoint file.
SlideSize.Type : Sets the slide dimensions (e.g., SlideSizeType.Screen16x9 for widescreen).
Slides.Append() : Adds a new slide to the presentation. By default, a presentation starts with one slide.
SaveToFile() : Saves the presentation to the specified file format (PPTX in this case).

3.2 Add Basic Elements to Your Slides

Now that we have a blank presentation, let's add some basic elements like text, images, lists, and tables.

Add Formatted Text

To add formatted text, we can use the following code:

# Get the first slide
first_slide = presentation.Slides[0]

# Add a shape to the slide
rect = RectangleF.FromLTRB (30, 60, 900, 150)
shape = first_slide.Shapes.AppendShape(ShapeType.Rectangle, rect)
shape.ShapeStyle.LineColor.Color = Color.get_Transparent()
shape.Fill.FillType = FillFormatType.none

# Add text to the shape
shape.AppendTextFrame("This guide demonstrates how to create a PowerPoint document using Python.")

# Get text of the shape as a text range
textRange = shape.TextFrame.TextRange

# Set font name, style (bold & italic), size and color
textRange.LatinFont = TextFont("Times New Roman")
textRange.IsBold = TriState.TTrue
textRange.FontHeight = 32
textRange.Fill.FillType = FillFormatType.Solid
textRange.Fill.SolidColor.Color = Color.get_Black()

# Set alignment
textRange.Paragraph.Alignment = TextAlignmentType.Left

In this code:

AppendShape() : Adds a shape to the slide. We create a rectangle shape that will house our text.
AppendTextFrame() : Adds a text frame to the shape, allowing us to insert text into it.
TextFrame.TextRange : Accesses the text range of the shape, enabling further customization such as font style, size, and color.
Paragraph.Alignment : Sets the alignment of the text within the shape.

Add an Image

To include an image in your presentation, use the following code snippet:

# Get the first slide
first_slide = presentation.Slides[0]

# Load an image file
imageFile = "C:\\Users\\Administrator\\Desktop\\logo.png"
stream = Stream(imageFile)
imageData = presentation.Images.AppendStream(stream)

# Reset size
width = imageData.Width * 0.6
height = imageData.Height * 0.6

# Append it to the slide
rect = RectangleF.FromLTRB (750, 50, 750 + width, 50 + height)
image = first_slide.Shapes.AppendEmbedImageByImageData(ShapeType.Rectangle, imageData, rect)
image.Line.FillType = FillFormatType.none

In this code:

Stream() : Creates a stream from the specified image file path.
AppendStream() : Appends the image data to the presentation's image collection.
AppendEmbedImageByImageData() : Adds the image to the slide at the specified rectangle coordinates.

You may also like: Insert Shapes in PowerPoint in Python

Add a List

To add a bulleted list to your slide, we can use:

# Get the first slide
first_slide = presentation.Slides[0]

# Specify list bounds and content
listBounds = RectangleF.FromLTRB(30, 150, 500, 350)
listContent = [
    " Step 1. Install Spire.Presentation for Python.",
    " Step 2. Create a Presentation object.",
    " Step 3. Add text, images, etc. to slides.",
    " Step 5. Set a background image or color.",
    " Step 6. Save the presentation to a PPT(X) file."
]

# Add a shape
autoShape = first_slide.Shapes.AppendShape(ShapeType.Rectangle, listBounds)
autoShape.TextFrame.Paragraphs.Clear()
autoShape.Fill.FillType = FillFormatType.none
autoShape.Line.FillType = FillFormatType.none

for content in listContent:
    # Create paragraphs based on the list content and add them to the shape
    paragraph = TextParagraph()
    autoShape.TextFrame.Paragraphs.Append(paragraph)
    paragraph.Text = content
    paragraph.TextRanges[0].Fill.FillType = FillFormatType.Solid
    paragraph.TextRanges[0].Fill.SolidColor.Color = Color.get_Black()
    paragraph.TextRanges[0].FontHeight = 20
    paragraph.TextRanges[0].LatinFont = TextFont("Arial")

    # Set the bullet type for these paragraphs
    paragraph.BulletType = TextBulletType.Symbol

    # Set line spacing
    paragraph.LineSpacing = 150

In this code:

AppendShape() : Creates a rectangle shape for the list.
TextFrame.Paragraphs.Append() : Adds paragraphs for each list item.
BulletType : Sets the bullet style for the list items.

Add a Table

To include a table, you can use the following:

# Get the first slide
first_slide = presentation.Slides[0]

# Define table dimensions and data
widths = [200, 200, 200]
heights = [18, 18, 18, 18]
dataStr = [
    ["Slide Number", "Title", "Content Type"], 
    ["1", "Introduction", "Text/Image"], 
    ["2", "Project Overview", "Chart/Graph"],  
    ["3", "Key Findings", "Text/List"]
]

# Add table to the slide
table = first_slide.Shapes.AppendTable(30, 360, widths, heights)

# Fill table with data and apply formatting
for rowNum, rowData in enumerate(dataStr):
    for colNum, cellData in enumerate(rowData):  
        cell = table[colNum, rowNum]
        cell.TextFrame.Text = cellData
        
        textRange = cell.TextFrame.Paragraphs[0].TextRanges[0]
        textRange.LatinFont = TextFont("Times New Roman")
        textRange.FontHeight = 20
        
        cell.TextFrame.Paragraphs[0].Alignment = TextAlignmentType.Center

# Apply a built-in table style
table.StylePreset = TableStylePreset.MediumStyle2Accent1

In this code:

AppendTable() : Adds a table to the slide at specified coordinates with defined widths and heights for columns and rows.
Cell.TextFrame.Text : Sets the text for each cell in the table.
StylePreset : Applies a predefined style to the table for enhanced aesthetics.

3.3 Apply a Background Image or Color

To set a custom background for your slide, use the following code:

# Get the first slide
first_slide = presentation.Slides[0]

# Get the background of the first slide
background = first_slide.SlideBackground

# Create a stream from the specified image file
stream = Stream("C:\\Users\\Administrator\\Desktop\\background.jpg")
imageData = presentation.Images.AppendStream(stream)

# Set the image as the background 
background.Type = BackgroundType.Custom
background.Fill.FillType = FillFormatType.Picture
background.Fill.PictureFill.FillType = PictureFillType.Stretch
background.Fill.PictureFill.Picture.EmbedImage = imageData

In this code:

SlideBackground : Accesses the background properties of the slide.
Fill.FillType : Specifies the type of fill (in this case, an image).
PictureFill.FillType : Sets how the background image is displayed (stretched, in this case).
Picture.EmbedImage : Sets image data for the background.

For additional background options, refer to this tutorial: Set Background Color or Picture for PowerPoint Slides in Python

Output:

Below is a screenshot of the PowerPoint document generated by the code snippets provided above.

Create a PowerPoint file from scratch in Python

4. Creating PowerPoint Documents Based on a Template

Using templates can simplify the process of creating presentations by allowing you to replace placeholders with actual data. Below is an example of how to create a PowerPoint document based on a template:

from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint document from a specified file path
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\template.pptx")

# Get a specific slide from the presentation
slide = presentation.Slides[0]

# Define a list of replacements where each tuple consists of the placeholder and its corresponding replacement text
replacements = [
    ("{project_name}", "GreenCity Solar Initiative"),
    ("{budget}", "$1,250,000"),
    ("{status}", "In Progress (65% Completion)"),
    ("{start_date}", "March 15, 2023"),
    ("{end_date}", "November 30, 2024"),
    ("{manager}", "Emily Carter"),
    ("{client}", "GreenCity Municipal Government")
]

# Iterate through each replacement pair
for old_string, new_string in replacements:

    # Replace the first occurrence of the old string in the slide with the new string
    slide.ReplaceFirstText(old_string, new_string, False)

# Save the modified presentation to a new file
presentation.SaveToFile("Template-Based.pptx", FileFormat.Pptx2019)
presentation.Dispose()

In this code:

LoadFromFile() : Loads an existing PowerPoint file that serves as the template.
ReplaceFirstText() : Replaces placeholders within the slide with actual values. This is useful for dynamic content generation.
SaveToFile() : Saves the modified presentation as a new file.

Output:

Create PowerPoint documents based on a template in Python

5. Best Practices for Python PowerPoint Generation

When creating PowerPoint presentations using Python, consider the following best practices:

Maintain Consistency : Ensure that the formatting (fonts, colors, styles) is consistent across slides for a professional appearance.
Modular Code: Break document generation into functions (e.g., add_list(), insert_image()) for reusability.
Optimize Images : Resize and compress images before adding them to presentations to reduce file size and improve loading times.
Use Templates : Whenever possible, use templates to save time and maintain a cohesive design.
Test Your Code : Always test your presentation generation code to ensure that all elements are added correctly and appear as expected.

6. Wrap Up

In this tutorial, we explored how to create PowerPoint documents in Python using the Spire.Presentation library. We covered the installation, creation of presentations from scratch, adding various elements, and using templates for dynamic content generation. With these skills, you can automate the creation of presentations, making your workflow more efficient and effective.

7. FAQs

Q1. What is Spire.Presentation?

Spire.Presentation is a powerful library used for creating, reading, and modifying PowerPoint files in various programming languages, including Python.

Q2. Does this library require Microsoft Office to be installed?

No, Spire.Presentation operates independently and does not require Microsoft Office.

Q3. Can I customize the layout of slides in my presentation?

Yes, you can customize the layout of each slide by adjusting properties such as size, background, and the placement of shapes, text, and images.

Q4. Does Spire.Presentation support both PPT and PPTX format?

Yes, Spire.Presentation supports both PPT and PPTX formats, allowing you to create and manipulate presentations in either format.

Q5. Can I add charts to my presentations?

Yes, Spire.Presentation supports the addition of charts to your slides, allowing you to visualize data effectively. For detailed instruction, refer to: How to Create Column Charts in PowerPoint Using Python

Get a Free License

To fully experience the capabilities of Spire.Presentation for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Document Operation

Tagged under

ppt Python Document Operation

Perform PDF OCR with Python (Extract Text from Scanned PDF)

2025-07-18 06:44:09 Written by zaki zou

OCR PDF and Extract Text Using Python In daily work, extracting text from PDF files is a common task. For standard digital documents—such as those exported from Word to PDF—this process is usually straightforward. However, things get tricky when dealing with scanned PDFs, which are essentially images of printed documents. In such cases, traditional text extraction methods fail, and OCR (Optical Character Recognition) becomes necessary to recognize and convert the text within images into editable content.

In this article, we’ll walk through how to perform PDF OCR using Python to automate this workflow and significantly reduce manual effort.

Why OCR is Needed for PDF Text Extraction
Best Python OCR Libraries for PDF Processing
Convert PDF Pages to Images Using Python
Scan and Extract Text from Images Using Spire.OCR
Conclusion

Why OCR is Needed for PDF Text Extraction

When it comes to extracting text from PDF files, one important factor that determines your approach is the type of PDF. Generally, PDFs fall into two categories: scanned (image-based) PDFs and searchable PDFs. Each requires a different strategy for text extraction.

Scanned PDFs are typically created by digitizing physical documents such as books, invoices, contracts, or magazines. While the text appears readable to the human eye, it's actually embedded as an image—making it inaccessible to traditional text extraction tools. Older digital files or password-protected PDFs may also lack an actual text layer.
Searchable PDFs, on the other hand, contain a hidden text layer that allows computers to search, copy, or parse the content. These files are usually generated directly from applications like Microsoft Word or PDF editors and are much easier to process programmatically.

This distinction highlights the importance of OCR (Optical Character Recognition) when working with scanned PDFs. With tools like Python PDF OCR, we can convert these image-based PDFs into images, run OCR to recognize the text, and extract it for further use—all in an automated way.

Best Python OCR Libraries for PDF Processing

Before diving into the implementation, let’s take a quick look at the tools we’ll be using in this tutorial. To simplify the process, we’ll use Spire.PDF for Python and Spire.OCR for Python to perform PDF OCR in Python.

Spire.PDF will handle the conversion from PDF to images.
Spire.OCR, a powerful OCR tool for PDF files, will recognize the text in those images and extract it as editable content.

You can install Spire.PDF using the following pip command:

pip install spire.pdf

and install Spire.OCR with:

pip install spire.ocr

Alternatively, you can download and install them manually by visiting the official Spire.PDF and Spire.OCR pages.

Convert PDF Pages to Images Using Python

Before we dive into Python PDF OCR, it's crucial to understand a foundational step: OCR technology doesn't directly process PDF files. Especially with image-based PDFs (like those created from scanned documents), we first need to convert them into individual image files.

Converting PDFs to images using the Spire.PDF library is straightforward. You simply load your target PDF document and then iterate through each page. For every page, call the PdfDocument.SaveAsImage() method to save it as a separate image file. Once this step is complete, your images are ready for the subsequent OCR process.

Here's a code example showing how to convert PDF to PNG:

from spire.pdf import *

# Load the PDF file
pdf = PdfDocument()
pdf.LoadFromFile("/AI-Generated Art.pdf")

# Loop through pages and save as images
for i in range(pdf.Pages.Count):
    # Convert each page to image
    with pdf.SaveAsImage(i) as image:
        
        # Save in different formats as needed
        image.Save(f"/output/pdftoimage/ToImage_{i}.png")
        # image.Save(f"Output/ToImage_{i}.jpg")
        # image.Save(f"Output/ToImage_{i}.bmp")

# Close the PDF document
pdf.Close()

Conversion result preview: Convert PDF to PNG in Python

Scan and Extract Text from Images Using Spire.OCR

After converting the scanned PDF into images, we can now move on to OCR PDF with Python and to extract text from the PDF. With OcrScanner.Scan() from Spire.OCR, recognizing text in images becomes straightforward. It supports multiple languages such as English, Chinese, French, and German. Once the text is extracted, you can easily save it to a .txt file or generate a Word document.

The code example below shows how to OCR the first PDF page and export to text in Python:

from spire.ocr import *

# Create OCR scanner instance
scanner = OcrScanner()

# Configure OCR model path and language
configureOptions = ConfigureOptions()
configureOptions.ModelPath = r'E:/DownloadsNew/win-x64/'
configureOptions.Language = 'English'
scanner.ConfigureDependencies(configureOptions)

# Perform OCR on the image
scanner.Scan(r'/output/pdftoimage/ToImage_0.png')

# Save extracted text to file
text = scanner.Text.ToString()
with open('/output/scannedpdfoutput.txt', 'a', encoding='utf-8') as file:
    file.write(text + '\n')

Result preview: OCR the First PDF Image and Extract Text Using Python

The Conclusion

In this article, we covered how to perform PDF OCR with Python—from converting PDFs to images, to recognizing text with OCR, and finally saving the extracted content as a plain text file. With this streamlined approach, extracting text from scanned PDFs becomes effortless. If you're looking to automate your PDF processing workflows, feel free to reach out and request a 30-day free trial. It’s time to simplify your document management.

Published in Extract/Read

Tagged under

pdf Python Extract Read

Convert PDF to Markdown in Python – Single & Batch Conversion

2025-07-17 02:36:46 Written by zaki zou

Visual guide of PDF to Markdown in Python

PDFs are ubiquitous in digital document management, but their rigid formatting often makes them less than ideal for content that needs to be easily edited, updated, or integrated into modern workflows. Markdown (.md), on the other hand, offers a lightweight, human-readable syntax perfect for web publishing, documentation, and version control. In this guide, we'll explore how to leverage the Spire.PDF for Python library to perform single or batch conversions from PDF to Markdown in Python efficiently.

Why Convert PDFs to Markdown?
Python PDF Converter Library – Installation
Convert PDF to Markdown in Python
Batch Convert Multiple PDFs to Markdown in Python
Frequently Asked Questions
Conclusion

Why Convert PDFs to Markdown?

Markdown offers several advantages over PDF for content creation and management:

Version control friendly: Easily track changes in Git
Lightweight and readable: Plain text format with simple syntax
Editability: Simple to modify without specialized software
Web integration: Natively supported by platforms like GitHub, GitLab, and static site generators (e.g., Jekyll, Hugo).

Spire.PDF for Python provides a robust solution for extracting text and structure from PDFs while preserving essential formatting elements like tables, lists, and basic styling.

Python PDF Converter Library - Installation

To use Spire.PDF for Python in your projects, you need to install the library via PyPI (Python Package Index) using pip. Open your terminal/command prompt and run:

pip install Spire.PDF

To upgrade an existing installation to the latest version:

pip install --upgrade spire.pdf

Convert PDF to Markdown in Python

Here’s a basic example demonstrates how to use Python to convert a PDF file to a Markdown (.md) file.

from spire.pdf.common import *
from spire.pdf import *

# Create an instance of PdfDocument class
pdf = PdfDocument()

# Load a PDF document
pdf.LoadFromFile("TestFile.pdf")

# Convert the PDF to a Markdown file
pdf.SaveToFile("PDFToMarkdown.md", FileFormat.Markdown) 
pdf.Close()

This Python script loads a PDF file and then uses the SaveToFile() method to convert it to Markdown format. The FileFormat.Markdown parameter specifies the output format.

How Conversion Works

The library extracts text, images, tables, and basic formatting from the PDF and converts them into Markdown syntax.

Text: Preserved with paragraphs/line breaks.
Images: Images in the PDF are converted to base64-encoded PNG format and embedded directly in the Markdown.
Tables: Tabular data is converted to Markdown table syntax (rows/columns with pipes |).
Styling: Basic formatting (bold, italic) is retained using Markdown syntax.

Output: Convert a PDF file to a Markdown file.

Batch Convert Multiple PDFs to Markdown in Python

This Python script uses a loop to convert all PDF files in a specified directory to Markdown format.

import os
from spire.pdf import *

# Configure paths
input_folder = "pdf_folder/"
output_folder = "markdown_output/"

# Create output directory
os.makedirs(output_folder, exist_ok=True)

# Process all PDFs in folder
for file_name in os.listdir(input_folder):
    if file_name.endswith(".pdf"):
        # Initialize document
        pdf = PdfDocument()
        pdf.LoadFromFile(os.path.join(input_folder, file_name))
        
        # Generate output path
        md_name = os.path.splitext(file_name)[0] + ".md"
        output_path = os.path.join(output_folder, md_name)
        
        # Convert to Markdown
        pdf.SaveToFile(output_path, FileFormat.Markdown)
        pdf.Close()

Key Characteristics

Batch Processing: Automatically processes all PDFs in input folder, improving efficiency for bulk operations.
1:1 Conversion: Each PDF generates corresponding Markdown file.
Sequential Execution: Files processed in alphabetical order.
Resource Management: Each PDF is closed immediately after conversion.

Output:

Batch convert multiple PDF files to Markdown files.

Need to convert Markdown to PDF? Refer to: Convert Markdown to PDF in Python

Frequently Asked Questions (FAQs)

Q1: Is Spire.PDF for Python free?

A: Spire.PDF offers a free version with limitations (e.g., maximum 3 pages per conversion). For unlimited use, request a 30-day free trial for evaluation.

Q2: Can I convert password-protected PDFs to Markdown?

A: Yes. Use the LoadFromFile method with the password as a second parameter:

pdf.LoadFromFile("ProtectedFile.pdf", "your_password")

Q3: Can Spire.PDF convert scanned/image-based PDFs to Markdown?

A: No. The library extracts text-based content only. For scanned PDFs, use OCR tools (like Spire.OCR for Python) to create searchable PDFs first.

Conclusion

Spire.PDF for Python simplifies PDF to Markdown conversion for both single file and batch processing.

Its advantages include:

Simple API with minimal code
Preservation of document structure
Batch processing capabilities
Cross-platform compatibility

Whether you're migrating documentation, processing research papers, or building content pipelines, by following the examples in this guide, you can efficiently transform static PDF documents into flexible, editable Markdown content, streamlining workflows and improving collaboration.

Published in Conversion

Tagged under

pdf Python Conversion

Convert JSON to/from Excel in Python – Full Guide with Examples

2025-07-16 05:39:52 Written by zaki zou

Convert JSON and Excel in Python using Spire.XLS – tutorial cover image

In many Python projects, especially those that involve APIs, data analysis, or business reporting, developers often need to convert Excel to JSON or JSON to Excel using Python code. These formats serve different but complementary roles: JSON is ideal for structured data exchange and storage, while Excel is widely used for sharing, editing, and presenting data in business environments.

This tutorial provides a complete, developer-focused guide to converting between JSON and Excel in Python. You'll learn how to handle nested data, apply Excel formatting, and resolve common conversion or encoding issues. We’ll use Python’s built-in json module to handle JSON data, and Spire.XLS for Python to read and write Excel files in .xlsx, .xls, and .csv formats — all without requiring Microsoft Excel or other third-party software.

Topics covered include:

Install Spire.XLS for Python
Why Choose Spire.XLS over General-Purpose Libraries?
Convert JSON to Excel in Python
Convert Excel to JSON in Python
Real-World Example: Handling Nested JSON and Complex Excel Formats
Common Errors and Fixes
FAQ
Conclusion

Install Spire.XLS for Python

This library is used in this tutorial to generate and parse Excel files (.xlsx, .xls, .csv) as part of the JSON–Excel conversion workflow.

To get started, install the Spire.XLS for Python package from PyPI:

pip install spire.xls

You can also choose Free Spire.XLS for Python in smaller projects:

pip install spire.xls.free

Spire.XLS for Python runs on Windows, Linux, and macOS. It does not require Microsoft Excel or any COM components to be installed.

Why Choose Spire.XLS over Open-Source Libraries?

Many open-source Python libraries are great for general Excel tasks like simple data export or basic formatting. If your use case only needs straightforward table output, these tools often get the job done quickly.

However, when your project requires rich Excel formatting, multi-sheet reports, or an independent solution without Microsoft Office, Spire.XLS for Python stands out by offering a complete Excel feature set.

Capability	Open-Source Libraries	Spire.XLS for Python
Advanced Excel formatting	Basic styling	Full styling API for reports
No Office/COM dependency	Fully standalone	Fully standalone
Supports .xls, .xlsx, .csv	.xlsx and .csv mostly; .xls may need extra packages	Full support for .xls, .xlsx, .csv
Charts, images, shapes	Limited or none	Built-in full support

For developer teams that need polished Excel files — with complex layouts, visuals, or business-ready styling — Spire.XLS is an efficient, all-in-one alternative.

Convert JSON to Excel in Python

In this section, we’ll walk through how to convert structured JSON data into an Excel file using Python. This is especially useful when exporting API responses or internal data into .xlsx reports for business users or analysts.

Step 1: Prepare JSON Data

We’ll start with a JSON list of employee records:

[
  {"employee_id": "E001", "name": "Jane Doe", "department": "HR"},
  {"employee_id": "E002", "name": "Michael Smith", "department": "IT"},
  {"employee_id": "E003", "name": "Sara Lin", "department": "Finance"}
]

This is a typical structure returned by APIs or stored in log files. For more complex nested structures, see the real-world example section.

Step 2: Convert JSON to Excel in Python with Spire.XLS

from spire.xls import Workbook, FileFormat
import json

# Load JSON data from file
with open("employees.json", "r", encoding="utf-8") as f:
    data = json.load(f)

# Create a new Excel workbook and access the first worksheet
workbook = Workbook()
sheet = workbook.Worksheets[0]

# Write headers to the first row
headers = list(data[0].keys())
for col, header in enumerate(headers):
    sheet.Range[1, col + 1].Text = header

# Write data rows starting from the second row
for row_idx, row in enumerate(data, start=2):
    for col_idx, key in enumerate(headers):
        sheet.Range[row_idx, col_idx + 1].Text = str(row.get(key, ""))

# Auto-fit the width of all used columns
for i in range(1, sheet.Range.ColumnCount + 1):
    sheet.AutoFitColumn(i)

# Save the Excel file and release resources
workbook.SaveToFile("output/employees.xlsx", FileFormat.Version2016)
workbook.Dispose()

Code Explanation:

Workbook() initializes the Excel file with three default worksheets.
workbook.Worksheets[] accesses the specified worksheet.
sheet.Range(row, col).Text writes string data to a specific cell (1-indexed).
The first row contains column headers based on JSON keys, and each JSON object is written to a new row beneath it.
workbook.SaveToFile() saves the Excel workbook to disk. You can specify the format using the FileFormat enum — for example, Version97to2003 saves as .xls, Version2007 and newer save as .xlsx, and CSV saves as .csv.

The generated Excel file (employees.xlsx) with columns employee_id, name, and department.

Export JSON to Excel in Python

You can also convert the Excel worksheet to a CSV file using Spire.XLS for Python if you need a plain text output format.

Convert Excel to JSON in Python

This part explains how to convert Excel data back into structured JSON using Python. This is a common need when importing .xlsx files into web apps, APIs, or data pipelines that expect JSON input.

Step 1: Load the Excel File

First, we use Workbook.LoadFromFile() to load the Excel file, and then select the worksheet using workbook.Worksheets[0]. This gives us access to the data we want to convert into JSON format.

from spire.xls import Workbook

# Load the Excel file
workbook = Workbook()
workbook.LoadFromFile("products.xlsx")
sheet = workbook.Worksheets[0]

Step 2: Extract Excel Data and Write to JSON

import json

# Get the index of the last row and column
rows = sheet.LastRow
cols = sheet.LastColumn

# Extract headers from the first row
headers = [sheet.Range[1, i + 1].Text for i in range(cols)]
data = []

# Map each row to a dictionary using headers
for r in range(2, rows + 1):
    row_data = {}
    for c in range(cols):
        value = sheet.Range[r, c + 1].Text
        row_data[headers[c]] = value
    data.append(row_data)

# Write JSON output
with open("products_out.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

Code Explanation:

sheet.LastRow and sheet.LastColumn detect actual used cell range.
The first row is used to extract field names (headers).
Each row is mapped to a dictionary, forming a list of JSON objects.
sheet.Range[row, col].Text returns the cell’s displayed text, including any formatting (like date formats or currency symbols). If you need the raw numeric value or a real date object, you can use .Value, .NumberValue, or .DateTimeValue instead.

The JSON file generated from the Excel data using Python:

Excel to JSON using Python

If you’re not yet familiar with how to read Excel files in Python, see our full guide: How to Read Excel Files in Python Using Spire.XLS.

Real-World Example: Handling Nested JSON and Formatting Excel

In real-world Python applications, JSON data often contains nested dictionaries or lists, such as contact details, configuration groups, or progress logs. At the same time, the Excel output is expected to follow a clean, readable layout suitable for business or reporting purposes.

In this section, we'll demonstrate how to flatten nested JSON data and format the resulting Excel sheet using Python and Spire.XLS. This includes merging cells, applying styles, and auto-fitting columns — all features that help present complex data in a clear tabular form.

Let’s walk through the process using a sample file: projects_nested.json.

Step 1: Flatten Nested JSON

Sample JSON file (projects_nested.json):

[
  {
    "project_id": "PRJ001",
    "title": "AI Research",
    "manager": {
      "name": "Dr. Emily Wang",
      "email": "emily@lab.org"
    },
    "phases": [
      {"phase": "Design", "status": "Completed"},
      {"phase": "Development", "status": "In Progress"}
    ]
  },
  {
    "project_id": "PRJ002",
    "title": "Cloud Migration",
    "manager": {
      "name": "Mr. John Lee",
      "email": "john@infra.net"
    },
    "phases": [
      {"phase": "Assessment", "status": "Completed"}
    ]
  }
]

We'll flatten all nested structures, including objects like manager, and summarize lists like phases into string fields. Each JSON record becomes a single flat row, with even complex nested data compactly represented in columns using readable summaries.

import json

# Helper: Flatten nested data and summarize list of dicts into strings
# e.g., [{"a":1},{"a":2}] → "a: 1; a: 2"
def flatten(data, parent_key='', sep='.'):
    items = {}
    for k, v in data.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.update(flatten(v, new_key, sep=sep))
        elif isinstance(v, list):
            if all(isinstance(i, dict) for i in v):
                items[new_key] = "; ".join(
                    ", ".join(f"{ik}: {iv}" for ik, iv in i.items()) for i in v
                )
            else:
                items[new_key] = ", ".join(map(str, v))
        else:
            items[new_key] = v
    return items

# Load and flatten JSON
with open("projects_nested.json", "r", encoding="utf-8") as f:
    raw_data = json.load(f)

flat_data = [flatten(record) for record in raw_data]

# Collect all unique keys from flattened data as headers
all_keys = set()
for item in flat_data:
    all_keys.update(item.keys())
headers = list(sorted(all_keys))  # Consistent, sorted column order

This version of flatten() converts lists of dictionaries into concise summary strings (e.g., "phase: Design, status: Completed; phase: Development, status: In Progress"), making complex structures more compact for Excel output.

Step 2: Format and Export Excel with Spire.XLS

Now we’ll export the flattened project data to Excel, and use formatting features in Spire.XLS for Python to improve the layout and readability. This includes setting fonts, colors, merging cells, and automatically adjusting column widths for a professional report look.

from spire.xls import Workbook, Color, FileFormat

# Create workbook and worksheet
workbook = Workbook()
sheet = workbook.Worksheets[0]
sheet.Name = "Projects"

# Title row: merge and style
col_count = len(headers)
sheet.Range[1, 1, 1, col_count].Merge()
sheet.Range[1, 1].Text = "Project Report (Flattened JSON)"
title_style = sheet.Range["A1"].Style
title_style.Font.IsBold = True
title_style.Font.Size = 14
title_style.Font.Color = Color.get_White()
title_style.Color = Color.get_DarkBlue()

# Header row from flattened keys
for col, header in enumerate(headers):
    cell = sheet.Range[2, col + 1]
    cell.BorderAround() # Add outside borders to a cell or cell range
    #cell.BorderInside() # Add inside borders to a cell range
    cell.Text = header
    style = cell.Style
    style.Font.IsBold = True
    style.Color = Color.get_LightGray()

# Data rows
for row_idx, row in enumerate(flat_data, start=3):
    for col_idx, key in enumerate(headers):
        sheet.Range[row_idx, col_idx + 1].Text = str(row.get(key, ""))

# Auto-fit columns and rows
for col in range(len(headers)):
    sheet.AutoFitColumn(col + 1)
for row in range(len(flat_data)):
    sheet.AutoFitRow(row + 1)

# Save Excel file
workbook.SaveToFile("output/projects_formatted.xlsx", FileFormat.Version2016)
workbook.Dispose()

This produces a clean, styled Excel sheet from a nested JSON file, making your output suitable for reports, stakeholders, or dashboards.

Code Explanation

sheet.Range[].Merge(): merges a range of cells into one. Here we use it for the report title row (A1:F1).
.Style.Font / .Style.Color: allow customizing font properties (bold, size, color) and background fill of a cell.
.BorderAround() / .BorderInside(): add outside/inside borders to a cell range.
AutoFitColumn(n): automatically adjusts the width of column n to fit its content.

The Excel file generated after flattening the JSON data using Python:

Nested JSON converted to formatted Excel in Python

Common Errors and Fixes in JSON ↔ Excel Conversion

Converting between JSON and Excel may sometimes raise formatting, encoding, or data structure issues. Here are some common problems and how to fix them:

Error	Fix
JSONDecodeError or malformed input	Ensure valid syntax; avoid using eval(); use json.load() and flatten nested objects.
TypeError: Object of type ... is not JSON serializable	Use json.dump(data, f, default=str) to convert non-serializable values.
Excel file not loading or crashing	Ensure the file is not open in Excel; use the correct extension (.xlsx or .xls).
UnicodeEncodeError or corrupted characters	Set encoding="utf-8" and ensure_ascii=False in json.dump().

Conclusion

With Spire.XLS for Python, converting between JSON and Excel becomes a streamlined and reliable process. You can easily transform JSON data into well-formatted Excel files, complete with headers and styles, and just as smoothly convert Excel sheets back into structured JSON. The library helps you avoid common issues such as encoding errors, nested data complications, and Excel file format pitfalls.

Whether you're handling data exports, generating reports, or processing API responses, Spire.XLS provides a consistent and efficient way to work with .json and .xlsx formats in both directions.

Want to unlock all features without limitations? Request a free temporary license for full access to Spire.XLS for Python.

FAQ

Q1: How to convert JSON into Excel using Python?

You can use the json module in Python to load structured JSON data, and then use a library like Spire.XLS to export it to .xlsx. Spire.XLS allows writing headers, formatting Excel cells, and handling nested JSON via flattening. See the JSON to Excel section above for step-by-step examples.

Q2: How do I parse JSON data in Python?

Parsing JSON in Python is straightforward with the built-in json module. Use json.load() to parse JSON from a file, or json.loads() to parse a JSON string. After parsing, the result is usually a list of dictionaries, which can be iterated and exported to Excel or other formats.

Q3: Can I export Excel to JSON with Spire.XLS in Python?

Yes. Spire.XLS for Python lets you read Excel files and convert worksheet data into a list of dictionaries, which can be written to JSON using json.dump(). The process includes extracting headers, detecting used rows and columns, and optionally handling formatting. See Excel to JSON for detailed implementation.

Published in Conversion

Tagged under

xls Python Conversion

How to Generate QR Codes in Python (Full Tutorial with Examples)

2025-07-15 02:49:34 Written by zaki zou

Generate QR Codes Using Python Library

QR codes have transformed how we bridge physical and digital experiences—from marketing campaigns to secure data sharing. For developers looking to generate QR codes in Python , Spire.Barcode for Python provides a complete toolkit for seamless QR code generation, offering both simplicity for basic needs and advanced customization for professional applications.

This step-by-step guide walks you through the entire QR code generation process in Python. You'll learn to programmatically create scannable codes, customize their appearance, embed logos, and optimize error correction - everything needed to implement robust QR code generation solutions for any business or technical requirement.

Table of Contents

Introduction to Spire.Barcode for Python
Setting Up the Environment
Basic Example: Generating QR Codes in Python
Customizing QR Code Appearance
Generating QR Code with Logo
Wrapping up
FAQs

1. Introduction to Spire.Barcode for Python

Spire.Barcode for Python is a powerful library that enables developers to generate and read various barcode types, including QR codes, in Python applications. This robust solution supports multiple barcode symbologies while offering extensive customization options for appearance and functionality.

Key features of Spire.Barcode include:

Support for QR Code generation with customizable error correction levels
Flexible data encoding options (numeric, alphanumber, byte/binary)
Comprehensive appearance customization (colors, sizes, fonts)
High-resolution output capabilities
Logo integration within QR codes

2. Setting Up the Environment

Before we dive into generating QR codes, you need to set up your Python environment. Ensure you have Python installed, and then install the Spire.Barcode library using pip:

pip install spire.barcode

For the best results, obtain a free temporary license from our website. This will allow you to create professional QR code images without evaluation messages, enhancing both user experience and quality of the generated codes.

3. Basic Example: Generating QR Codes in Python

Now that we have everything set up, let's generate our first QR code. Below is the step-by-step process:

Initial Setup :
- Import the Spire.Barcode library.
- Activate the library with a valid license key to remove the
Configure Barcode Settings :
- Create a BarcodeSettings object to control QR code properties.
- Set barcode type to QR code.
- Configure settings such as data mode and error correction level.
- Define the content to encode.
- Configure visual aspects like module width and text display options.
Generate Barcode Image :
- Create a BarCodeGenerator object with the configured settings.
- Convert the configured QR code into an image object in memory.
Save Image to File :
- Write the generated QR code image to a specified file path in PNG format.

The following code snippet demonstrates how to generate QR codes in Python:

from spire.barcode import *

# Function to write all bytes to a file
def WriteAllBytes(fname: str, data):
    with open(fname, "wb") as fp:
        fp.write(data)
    fp.close()

# Apply license key for the barcode generation library
License.SetLicenseKey("your license key")

# Create a BarcodeSettings object to configure barcode properties
barcodeSettings = BarcodeSettings()

# Set the type of barcode to QR code
barcodeSettings.Type = BarCodeType.QRCode

# Set the data mode for QR code (automatic detection of data type)
barcodeSettings.QRCodeDataMode = QRCodeDataMode.Auto

# Set the error correction level (M means medium level of error correction)
barcodeSettings.QRCodeECL = QRCodeECL.M

# Set the data that will be encoded in the QR code
barcodeSettings.Data2D = "visit us at www.e-iceblue.com"

# Set the width of each module (the square bars) in the barcode
barcodeSettings.X = 3

# Hide the text that typically accompanies the barcode
barcodeSettings.ShowText = False

# Create a BarCodeGenerator object with the specified settings
barCodeGenerator = BarCodeGenerator(barcodeSettings)

# Generate the image for the barcode based on the settings
image = barCodeGenerator.GenerateImage()

# Write the generated PNG image to disk at the specified path
WriteAllBytes("output/QRCode.png", image)

Key Concepts:

A. QRCodeDataMode (Data Encoding Scheme)

Controls how the input data is encoded in the QR code:

Mode	Best For	Example Use Cases
Auto	Let library detect automatically	General purpose (default choice)
Numeric	Numbers only (0-9)	Product codes, phone numbers
AlphaNumber	A-Z, 0-9, and some symbols	URLs, simple messages
Byte	Binary/Unicode data	Complex text, special characters

Why it matters:

Different modes have different storage capacities.
Numeric mode can store more digits than other modes.
Auto mode is safest for mixed content.

B. QRCodeECL (Error Correction Level)

Determines how much redundancy is built into the QR code:

Level	Recovery Capacity	Use Case
L (Low)	7% damage recovery	Small codes, short URLs
M (Medium)	15% damage recovery	General purpose (recommended)
Q (Quartile)	25% damage recovery	Codes with logos or decorations
H (High)	30% damage recovery	Critical applications

Visual Impact:

Higher ECLs:

Increase QR code complexity (more modules/squares).
Make the code more scannable when damaged.
Are essential when adding logos (use at least Q or H).

Output:

A QR code generated by Spire.Barcode for Python

4. Customizing QR Code Appearance

Once you've generated a basic QR code, you can further customize its appearance to make it more visually appealing or to fit your brand. Here are some customization options:

4.1 Adjusting DPI Settings

The DPI (dots per inch) settings control the image quality of the QR code. Higher DPI values result in sharper images suitable for printing, while lower values (72-150) are typically sufficient for digital use.

barcodeSettings.DpiX = 150
barcodeSettings.DpiY = 150

4.2 Changing Foreground and Background Colors

You can customize your QR code’s color scheme. The ForeColor determines the color of the QR code modules (squares), while BackColor sets the background color. Ensure sufficient contrast for reliable scanning.

barcodeSettings.BackColor = Color.get_GhostWhite()
barcodeSettings.ForeColor = Color.get_Purple()

4.3 Displaying the Encoded Data

If you want users to see the encoded information without scanning, set the following properties:

barcodeSettings.ShowTextOnBottom = True
barcodeSettings.TextColor = Color.get_Purple()
barcodeSettings.SetTextFont("Arial", 13, FontStyle.Bold)

4.4 Adding Text Under QR Code

You can also add custom text under the QR code, which could be a call-to-action or instructions.

barcodeSettings.ShowBottomText = True
barcodeSettings.BottomText = "Scan Me"
barcodeSettings.SetBottomTextFont("Arial", 13, FontStyle.Bold)
barcodeSettings.BottomTextColor = Color.get_Black()

4.5 Setting Margins and Border

Defining margins and border styles enhances the presentation of the QR code. Here’s how to do it:

barcodeSettings.LeftMargin = 2
barcodeSettings.RightMargin = 2
barcodeSettings.TopMargin = 2
barcodeSettings.BottomMargin = 2

barcodeSettings.HasBorder = True
barcodeSettings.BorderWidth = 0.5
barcodeSettings.BorderColor = Color.get_Black()

5. Generating QR Code with Logo

For branding purposes, you might want to add a logo to your QR code. Spire.Barcode makes this process seamless while maintaining scannability. Here’s how:

barcodeSettings.SetQRCodeLogoImage("C:\\Users\\Administrator\\Desktop\\logo.png")

When adding a logo:

Use a simple, high-contrast logo for best results.
Test the scannability after adding the logo.
The QR code's error correction (set earlier) helps compensate for the obscured area.

The logo will be centered within the QR code, and Spire.Barcode will automatically resize it to ensure the QR code remains scannable.

Output:

QR code with a logo at the center

6. Wrapping up

Generating QR codes in Python using Spire.Barcode is a straightforward process that offers extensive customization options. From basic QR codes to branded versions with logos and custom styling, the library provides all the tools needed for professional barcode generation.

Key Benefits:

Spire.Barcode simplifies QR code generation with a clean API.
Extensive customization options allow for branded, visually appealing QR codes.
Error correction ensures reliability even with added logos.
High-resolution output supports both digital and print use cases.

Whether you're building an inventory system, creating marketing materials, or developing a mobile app integration, Spire.Barcode provides a robust solution for all your QR code generation needs in Python.

7. FAQs

Q1: What is a QR code?

A QR code (Quick Response code) is a type of matrix barcode that can store URLs and other information. It is widely used for quickly linking users to websites, apps, and digital content through mobile devices.

Q2: Can I generate QR codes without a license key?

Yes, you can generate QR codes without a license key; however, the generated barcode will display an evaluation message along with our company logo, which may detract from the user experience.

Q3: Can I generate different types of barcodes with Spire.Barcode?

Yes, Spire.Barcode supports various barcode types, not just QR codes. Detailed documentation can be found here: How to Generate Barcode in Python

Q4: How can I implement a QR code generator in Python using Spire.Barcode?

To implement a QR code generator in Python with Spire.Barcode, create a BarcodeSettings object to configure the QR code properties. Then, use the BarCodeGenerator class to generate the QR code image by calling the GenerateImage() method.

Q5: Can I scan or read QR code using Spire.Barcode?

Yes, you can scan and read QR codes using Spire.Barcode for Python. The library provides functionality for both generating and decoding QR codes. Follow this guide: How to Read Barcode Using Python

Get a Free License

To fully experience the capabilities of Spire.Barcode for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Create and Scan Barcode

Tagged under

Barcode Python Create and Scan Barcode

How to Add Text to PDF in Python (Create & Edit with Examples)

2025-07-10 08:51:40 Written by zaki zou

Illustration showing Python code adding text to a PDF file

Adding text to a PDF is a common task in Python — whether you're generating reports, adding annotations, filling templates, or labeling documents. This guide will walk you through how to write text in a PDF file using Python, including both creating new PDF documents and updating existing ones.

We’ll be using a dedicated Python PDF library - Spire.PDF for Python, which allows precise control over text placement, font styling, and batch processing. The examples are concise, beginner-friendly, and ready for real-world projects.

Sections Covered

Setup: Install the PDF Library in Python
Add Text to a New PDF
Add Text to an Existing PDF
Control Text Style, Position, Transparency, and Rotation
Common Pitfalls and Cross-Platform Tips
Conclusion
FAQ

Setup: Install the PDF Library in Python

To get started, install Spire.PDF for Python, a flexible and cross-platform PDF library.

pip install Spire.PDF

Or use Free Spire.PDF for Python:

pip install spire.pdf.free

Why use this library?

Works without Adobe Acrobat or Microsoft Office
Add and format text at exact positions
Supports both new and existing PDF editing
Runs on Windows, macOS, and Linux

Add Text to a New PDF Using Python

If you want to create a PDF from text using Python, the example below shows how to insert a line of text into a blank PDF page using custom font and position settings.

Example: Create and write text to a blank PDF

from spire.pdf import PdfDocument, PdfTrueTypeFont, PdfFontStyle, PdfSolidBrush, PdfRGBColor, PointF, RectangleF, \
    PdfStringFormat, PdfTextAlignment, PdfVerticalAlignment

# Create a new PDF document and add a new page
pdf = PdfDocument()
page = pdf.Pages.Add()

text = ("This report summarizes the sales performance of various products in the first quarter of 2025. " +
        "Below is a breakdown of the total sales by product category, " +
        "followed by a comparison of sales in different regions.")

# Set the font, brush, and point
font = PdfTrueTypeFont("Arial", 14.0, PdfFontStyle.Regular, True)
brush = PdfSolidBrush(PdfRGBColor(0, 0, 0))  # black
point = PointF(50.0, 100.0)

# Set the layout area and string format
layoutArea = RectangleF(50.0, 50.0, page.GetClientSize().Width - 100.0, page.GetClientSize().Height)
stringFormat = PdfStringFormat(PdfTextAlignment.Left, PdfVerticalAlignment.Top)

page.Canvas.DrawString(text, font, brush, layoutArea, stringFormat, False)

pdf.SaveToFile("output/new.pdf")
pdf.Close()

Technical Notes

PdfTrueTypeFont() loads a TrueType font from the system with customizable size and style (e.g., regular, bold). It ensures consistent text rendering in the PDF.
PdfSolidBrush() defines the fill color for text or shapes using RGB values. In this example, it's set to black ((0, 0, 0)).
RectangleF(x, y, width, height) specifies a rectangular layout area for drawing text. It enables automatic line wrapping and precise control of text boundaries.
PdfStringFormat() controls the alignment of the text inside the rectangle. Here, text is aligned to the top-left (Left and Top).
DrawString() draws the specified text within the defined layout area without affecting existing content on the page.

Example output PDF showing wrapped black text starting at coordinates (50, 50).

Generated PDF showing wrapped black text block starting at position (50, 50)

Tip: To display multiple paragraphs or line breaks, consider adjusting the Y-coordinate dynamically or using multiple DrawString() calls with updated positions.

If you want to learn how to convert TXT files to PDF directly using Python, please check: How to Convert Text Files to PDF Using Python.

Add Text to an Existing PDF in Python

Need to add text to an existing PDF using Python? This method lets you load a PDF, access a page, and write new text anywhere on the canvas.

This is helpful for:

Adding comments or annotations
Labeling document versions
Filling pre-designed templates

Example: Open an existing PDF and insert text

from spire.pdf import PdfDocument, PdfFontStyle, PdfSolidBrush, PdfRGBColor, PointF, PdfFont, PdfFontFamily

pdf = PdfDocument()
pdf.LoadFromFile("input.pdf")
page = pdf.Pages.get_Item(0)

font = PdfFont(PdfFontFamily.TimesRoman, 12.0, PdfFontStyle.Bold)
brush = PdfSolidBrush(PdfRGBColor(255, 0, 0))  # red
location = PointF(150.0, 110.0)

page.Canvas.DrawString("This document is approved.", font, brush, location)

pdf.SaveToFile("output/modified.pdf")
pdf.Close()

Technical Notes

LoadFromFile() loads an existing PDF into memory.
You can access specific pages via pdf.Pages[index].
New content is drawn on top of the existing layout, non-destructively.
The text position is again controlled via PointF(x, y).

Modified PDF page with newly added red text annotation on the first page.

Modified PDF with added red text label on the first page

Use different x, y coordinates to insert content at custom positions.

Related article: Replace Text in PDF with Python

Control Text Style, Positioning, Transparency, and Rotation

When adding text to a PDF, you often need more than just plain content—you may want to customize the font, color, placement, rotation, and transparency, especially for annotations or watermarks.

Spire.PDF for Python offers fine-grained control for these visual elements, whether you’re building structured reports or stamping dynamic text overlays.

Set Font Style and Color

# Create PdfTrueTypeFont
font = PdfTrueTypeFont("Calibri", 16.0, PdfFontStyle.Italic, True)

# Create PdfFont
font = PdfFont(PdfFontFamily.TimesRoman, 16.0, PdfFontStyle.Italic)

# Create PdfBrush to specify text drawing color
brush = PdfSolidBrush(PdfRGBColor(34, 139, 34))  # forest green

PdfTrueTypeFont will embed the font into the PDF file. To reduce file size, you may use PdfFont, which uses system fonts without embedding them.

Apply Transparency and Rotation

You can adjust transparency and rotation when drawing text to achieve effects like watermarks or angled labels.

# Save the current canvas state
state = page.Canvas.Save()

# Set semi-transparency (0.0 = fully transparent, 1.0 = fully opaque)
page.Canvas.SetTransparency(0.4)

# Move the origin to the center of the page
page.Canvas.TranslateTransform(page.Size.Width / 2, page.Size.Height / 2)

# Rotate the canvas -45 degrees (counterclockwise)
page.Canvas.RotateTransform(-45)

# Draw text at new origin
page.Canvas.DrawString("DRAFT", font, brush, PointF(-50, -20))

Example: Add a Diagonal Watermark to the Center of the Page

The following example demonstrates how to draw a centered, rotated, semi-transparent watermark using all the style controls above:

from spire.pdf import PdfDocument, PdfTrueTypeFont, PdfFontStyle, PdfSolidBrush, PdfRGBColor, PointF
from spire.pdf.common import Color

pdf = PdfDocument()
pdf.LoadFromFile("input1.pdf")
page = pdf.Pages.get_Item(0)

text = "Confidential"
font = PdfTrueTypeFont("Arial", 40.0, PdfFontStyle.Bold, True)
brush = PdfSolidBrush(PdfRGBColor(Color.get_DarkBlue()))  # gray

# Measure text size to calculate center
size = font.MeasureString(text)
x = (page.Canvas.ClientSize.Width - size.Width) / 2
y = (page.Canvas.ClientSize.Height - size.Height) / 2

state = page.Canvas.Save()
page.Canvas.SetTransparency(0.3)
page.Canvas.TranslateTransform(x + size.Width / 2, y + size.Height / 2)
page.Canvas.RotateTransform(-45.0)
page.Canvas.DrawString(text, font, brush, PointF(-size.Width / 2, -size.Height / 2))
page.Canvas.Restore(state)

pdf.SaveToFile("output/with_watermark.pdf")
pdf.Close()

PDF page displaying a centered, rotated, semi-transparent watermark text.

Screenshot showing several PDF files with uniform footer text added programmatically

This approach works well for dynamic watermarking, diagonal stamps like "VOID", "COPY", or "ARCHIVED", and supports full automation.

Make sure all files are closed and not in use to avoid PermissionError.

For more details on inserting watermarks into PDF with Python, please refer to: How to Insert Text Watermarks into PDFs Using Python.

Common Pitfalls and Cross-Platform Considerations

Even with the right API, issues can arise when deploying PDF text operations across different environments or font configurations. Here are some common problems and how to resolve them:

Issue	Cause	Recommended Fix
Text appears in wrong position	Hardcoded coordinates not accounting for page size	Use ClientSize and MeasureString() for dynamic layout
Font not rendered	Font lacks glyphs or isn't supported	Use PdfTrueTypeFont to embed supported fonts like Arial Unicode
Unicode text not displayed	Font does not support full Unicode range	Use universal fonts (e.g., Arial Unicode, Noto Sans)
Text overlaps existing content	Positioning too close to body text	Adjust Y-offsets or add padding with MeasureString()
Watermark text appears on output	You are using the paid version without a license	Use the free version or apply for a temporary license
Font file too large	Embedded font increases PDF size	Use PdfFont for system fonts (non-embedded), if portability is not a concern
Inconsistent results on macOS/Linux	Fonts not available or different metrics	Ship fonts with your application, or use built-in cross-platform fonts

Conclusion

With Spire.PDF for Python, adding text to PDFs—whether creating new files, updating existing ones, or automating batch edits—can be done easily and precisely. From annotations to watermarks, the library gives you full control over layout and styling.

You can start with the free version right away, or apply for a temporary license to unlock full features.

FAQ

How to add text to a PDF using Python?

Use a PDF library such as Spire.PDF to insert text via the DrawString() method. You can define font, position, and styling.

Can I write text into an existing PDF file with Python?

Yes. Load the file with LoadFromFile(), then use DrawString() to add text at a specific location.

How do I generate a PDF from text using Python?

Create a new document and use drawing methods to write content line by line with precise positioning.

Can I add the same text to many PDFs automatically?

Yes. Use a loop to process multiple files and insert text programmatically using a template script.

Published in Text

Tagged under

pdf Python Text

Creating Word Documents with Python: A Step-By-Step Guide

2025-07-08 01:35:11 Written by zaki zou

Python Examples for Creating Word Documents

Automating the creation of Word documents is a powerful way to generate reports, and produce professional-looking files. With Python, you can utilize various libraries for this purpose, and one excellent option is Spire.Doc for Python, specifically designed for handling Word documents.

This guide will provide a clear, step-by-step process for creating Word documents in Python using Spire.Doc. We’ll cover everything from setting up the library to adding formatted text, images, tables, and more. Whether you're generating reports, invoices, or any other type of document, thes techniques will equip you with the essential tools to enhance your workflow effectively.

Table of Contents:

What's Sprie.Doc for Python?
Set Up Spire.Doc in Your Python Project
Step 1: Create a Blank Word Document
Step 2: Add Formatted Text (Headings, Paragraphs)
Step 3: Insert Images to a Word Document
Step 4: Create and Format Tables
Step 5: Add Numbered or Bulleted Lists
Best Practices for Word Document Creation in Python
FAQs
Conclusion

What's Spire.Doc for Python?

Spire.Doc is a powerful library for creating, manipulating, and converting Word documents in Python. It enables developers to generate professional-quality documents programmatically without needing Microsoft Word. Here are some key features:

Supports Multiple Formats : Works with DOCX, DOC, RTF, and HTML.
Extensive Functionalities : Add text, images, tables, and charts.
Styling and Formatting : Apply various styles for consistent document appearance.
User-Friendly API: Simplifies automation of document generation processes.
Versatile Applications : Ideal for generating reports, invoices, and other documents.

With Spire.Doc, you have the flexibility and tools to streamline your Word document creation tasks effectively.

Set Up Spire.Doc in Your Python Project

To get started with Spire.Doc in your Python project, follow these simple steps:

Install Spire.Doc : First, you need to install the Spire.Doc library. You can do this using pip. Open your terminal or command prompt and run the following command:

pip install spire.doc

Import the Library : Once installed, import the Spire.Doc module in your Python script to access its functionalities. You can do this with the following import statement:

from spire.doc import *
from spire.doc.common import *

With the setup complete, you can begin writing your Python code to create Word documents according to your needs.

Step 1: Create a Blank Word Document in Python

The first step in automating Word document creation is to create a blank document. To begin with, we create a Document object, which serves as the foundation of our Word document. We then add a section to organize content, and set the page size to A4 with 60-unit margins . These configurations are crucial for ensuring proper document layout and readability.

Below is the code to initialize a document and set up the page configuration:

# Create a Document object
doc = Document()

# Add a section
section = doc.AddSection()

# Set page size and page margins
section.PageSetup.PageSize = PageSize.A4()
section.PageSetup.Margins.All = 60

# Save the document
doc.SaveToFile("BlankDocument.docx")
doc.Dispose

Step 2: Add Formatted Text (Headings, Paragraphs)

1. Add Title, Headings, Paragraphs

In this step, we add text content by first creating paragraphs using the AddParagraph method, followed by inserting text with the AppendText method.

Different paragraphs can be styled using various BuiltInStyle options, such as Title , Heading1 , and Normal , allowing for quick generation of document elements. Additionally, the TextRange.CharacterFormat property can be used to adjust the font, size, and other styles of the text, ensuring a polished and organized presentation.

Below is the code to insert and format these elements:

# Add a title
title_paragraph = section.AddParagraph()
textRange = title_paragraph.AppendText("My First Document")
title_paragraph.ApplyStyle(BuiltinStyle.Title)
textRange.CharacterFormat.FontName = "Times New Properties"
textRange.CharacterFormat.FontSize = 24

# Add a heading
heading_paragraph = section.AddParagraph()
textRange = heading_paragraph.AppendText("This Is Heading1")
heading_paragraph.ApplyStyle(BuiltinStyle.Heading1)
textRange.CharacterFormat.FontName = "Times New Properties"
textRange.CharacterFormat.FontSize = 16

# Add a paragraph
normal_paragraph = section.AddParagraph()
textRange = normal_paragraph .AppendText("This is a sample paragraph.")
normal_paragraph .ApplyStyle(BuiltinStyle.Normal)
textRange.CharacterFormat.FontName = "Times New Properties"
textRange.CharacterFormat.FontSize = 12

2. Apply Formatting to Paragraph

To ensure consistent formatting across multiple paragraphs, we can create a ParagraphStyle that defines key properties such as font attributes (name, size, color, boldness) and paragraph settings (spacing, indentation, alignment) within a single object. This style can then be easily applied to the selected paragraphs for uniformity.

Below is the code to define and apply the paragraph style:

# Defined paragraph style
style = ParagraphStyle(doc)
style.Name = "paraStyle"
style.CharacterFormat.FontName = "Arial"
style.CharacterFormat.FontSize = 13
style.CharacterFormat.TextColor = Color.get_Red()
style.CharacterFormat.Bold = True
style.ParagraphFormat.AfterSpacing = 12
style.ParagraphFormat.BeforeSpacing = 12
style.ParagraphFormat.FirstLineIndent = 4
style.ParagraphFormat.LineSpacing = 10
style.ParagraphFormat.HorizontalAlignment = HorizontalAlignment.Left
doc.Styles.Add(style)

# Apply the style to the specific paragraph
normal_paragraph.ApplyStyle("paraStyle")

Step 3: Insert Images to a Word Document

1. Insert an Image

In this step, we add an image to our document, allowing for visual enhancements that complement the text. We begin by creating a paragraph to host the image and then proceed to insert the desired image file usingthe Paragraph.AppendPicture method. After the image is inserted, we can adjust its dimensions and alignment to ensure it fits well within the document layout.

Below is the code to insert and format the image:

# Add a paragraph
paragraph = section.AddParagraph()

# Insert an image
picture = paragraph.AppendPicture("C:\\Users\\Administrator\\Desktop\\logo.png")

# Scale the image dimensions
picture.Width = picture.Width * 0.9
picture.Height = picture.Height * 0.9

# Set text wrapping style
picture.TextWrappingStyle = TextWrappingStyle.TopAndBottom

# Center-align the image horizontally
picture.HorizontalAlignment = HorizontalAlignment.Center

2. Position Image at Precise Location

To gain precise control over the positioning of images within your Word document, you can adjust both the horizontal and vertical origins and specify the image's coordinates in relation to these margins. This allows for accurate placement of the image, ensuring it aligns perfectly with the overall layout of your document.

Below is the code to set the image's position.

picture.HorizontalOrigin = HorizontalOrigin.LeftMarginArea
picture.VerticalOrigin = VerticalOrigin.TopMarginArea
picture.HorizontalPosition = 180.0
picture.VerticalPosition = 165.0

Note : Absolute positioning does not apply when using the Inline text wrapping style.

Step 4: Create and Format Tables

In this step, we will create a table within the document and customize its appearance and functionality. This includes defining the table's structure, adding header and data rows, and setting formatting options to enhance readability.

Steps for creating and customizing a table in Word:

Add a Table : Use the Section.AddTablemethod to create a new table.
Specify Table Data : Define the data that will populate the table.
Set Rows and Columns : Specify the number of rows and columns with the Table.ResetCells method.
Access Rows and Cells : Retrieve a specific row using Table.Rows[rowIndex] and a specific cell using TableRow.Cells[cellIndex] .
Populate the Table : Add paragraphs with text to the designated cells.
Customize Appearance : Modify the table and cell styles through the Table.TableFormat and TableCell.CellFormat properties.

The following code demonstrates how to add a teble when creating Word documents in Python:

# Add a table
table = section.AddTable(True)

# Specify table data
header_data = ["Header 1", "Header 2", "Header 3"]
row_data = [["Row 1, Col 1", "Row 1, Col 2", "Row 1, Col 3"],
            ["Row 2, Col 1", "Row 2, Col 2", "Row 2, Col 3"]]

# Set the row number and column number of table
table.ResetCells(len(row_data) + 1, len(header_data))

# Set the width of table
table.PreferredWidth = PreferredWidth(WidthType.Percentage, int(100))

# Get header row
headerRow = table.get_Item(0)
headerRow.IsHeader = True
headerRow.Height = 23
headerRow.RowFormat.BackColor = Color.get_DarkBlue()  # Header color

# Fill the header row with data and set the text formatting
for i in range(len(header_data)):
    headerRow.get_Item(i).CellFormat.VerticalAlignment = VerticalAlignment.Middle
    paragraph = headerRow.get_Item(i).AddParagraph()
    paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
    txtRange = paragraph.AppendText(header_data[i])
    txtRange.CharacterFormat.Bold = True
    txtRange.CharacterFormat.FontSize = 15
    txtRange.CharacterFormat.TextColor = Color.get_White()  # White text color

# Fill the rest rows with data and set the text formatting
for r in range(len(row_data)):
    dataRow = table.Rows.get_Item(r + 1)
    dataRow.Height = 20
    dataRow.HeightType = TableRowHeightType.Exactly

    for c in range(len(row_data[r])):
        dataRow.Cells[c].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = dataRow.Cells[c].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        txtRange = paragraph.AppendText(row_data[r][c])
        txtRange.CharacterFormat.FontSize = 13

# Alternate row color
for j in range(1, table.Rows.Count):
    if j % 2 == 0:
        row2 = table.Rows[j]
        for f in range(row2.Cells.Count):
            row2.Cells[f].CellFormat.BackColor = Color.get_LightGray()  # Alternate row color

# Set the border of table
table.TableFormat.Borders.BorderType = BorderStyle.Single
table.TableFormat.Borders.LineWidth = 1.0
table.TableFormat.Borders.Color = Color.get_Black()

Step 5: Add Numbered or Bulleted Lists

In this step, we create and apply both numbered and bulleted lists to enhance the document's organization. Spire.Doc offers the ListStyle class to define and manage different types of lists with customizable formatting options. Once created, these styles can be applied to any paragraph in the document, ensuring a consistent look across all list items.

Steps for generating numbered/bulleted lists in Word:

Define the List Style : Initialize a ListStyle for the numbered or bulleted list, specifying properties such as name, pattern type, and text position.
Add the List Style to Document : Use the Document.ListStyles.Add() method to incorporate the new list style into the document's styles collection.
Create List Items : For each item, create a paragraph and apply the corresponding list style using the Paragraph.ListFormat.ApplyStyle() method.
Format Text Properties : Adjust font size and type for each item to ensure consistency and readability.

Below is the code to generate numbered and bulleted lists:

# Create a numbered list style
listStyle = ListStyle(doc, ListType.Numbered)
listStyle.Name = "numberedList"
listStyle.Levels[0].PatternType = ListPatternType.Arabic
listStyle.Levels[0].TextPosition = 60;
doc.ListStyles.Add(listStyle)

# Create a numbered list
for item in ["First item", "Second item", "Third item"]:
    paragraph = section.AddParagraph()
    textRange = paragraph.AppendText(item)
    textRange.CharacterFormat.FontSize = 13
    textRange.CharacterFormat.FontName = "Times New Roman"
    paragraph.ListFormat.ApplyStyle("numberedList")

# Create a bulleted list style
listStyle = ListStyle(doc, ListType.Bulleted)
listStyle.Name = "bulletedList"
listStyle.Levels[0].BulletCharacter = "\u00B7"
listStyle.Levels[0].CharacterFormat.FontName = "Symbol"
listStyle.Levels[0].TextPosition = 20
doc.ListStyles.Add(listStyle)

# Create a bulleted list
for item in ["Bullet item one", "Bullet item two", "Bullet item three"]:
    paragraph = section.AddParagraph()
    textRange = paragraph.AppendText(item)
    textRange.CharacterFormat.FontSize = 13
    textRange.CharacterFormat.FontName = "Times New Roman"
paragraph.ListFormat.ApplyStyle("bulletedList")

Here’s a screenshot of the Word document created using the code snippets provided above:

Word document generated with Python code.

Best Practices for Word Document Creation in Python

Reuse Styles : Define paragraph and list styles upfront to maintain consistency.
Modular Code : Break document generation into functions (e.g., add_heading(), insert_table()) for reusability.
Error Handling : Validate file paths and inputs to avoid runtime errors.
Performance Optimization: Dispose of document objects (doc.Dispose()) to free resources.
Use Templates : For complex documents, create MS Word templates with placeholders and replace them programmatically to save development time.

By implementing these practices, you can streamline document automation, reduce manual effort, and ensure professional-quality outputs.

FAQs

Q1: Does Spire.Doc support adding headers and footers to a Word document?

Yes, you can add and customize headers and footers, including page numbers, images, and custom text.

Q2. Can I generate Word documents on a server without Microsoft Office installed?

Yes, Spire.Doc works without Office dependencies, making it ideal for server-side automation.

Q3: Can I create Word documents from a template using Spire.Doc?

Of course, you can. Refer to the tutorial: Create Word Documents from Templates with Python

Q4: Can I convert Word documents to other formats using Spire.Doc?

Yes, Spire.Doc supports converting Word documents to various formats, including PDF, HTML, and plain text.

Q5. Can Spire.Doc edit existing Word documents?

Yes, Spire.Doc supports reading, editing, and saving DOCX/DOC files programmatically. Check out this documentation: How to Edit or Modify Word Documents in Pyhton

Conclusion

In this article, we've explored how to create Word documents in Python using the Spire.Doc library, highlighting its potential to enhance productivity while enabling the generation of highly customized and professional documents. By following the steps outlined in this guide, you can fully leverage Spire.Doc, making your document creation process both efficient and straightforward.

As you implement best practices and delve into the library's extensive functionalities, you'll discover that automating document generation significantly reduces manual effort, allowing you to concentrate on more critical tasks. Embrace the power of Python and elevate your document creation capabilities today!

Published in Document Operation

Tagged under

doc Python Document Operation

News Category

Python (355)

Children categories

Install HTML to Text Converter in Python

Python Convert HTML Files to Text in 3 Steps

How to Convert an HTML String to Text in Python

The Conclusion

FAQs about Converting HTML to Text in Python

Getting Started with Spire.XLS for Python

Install via pip

Basic Example: Read a CSV in Python

Advanced CSV Reading Techniques

1. Read CSV with Custom Delimiters

2. Skip Header Rows

3. Convert CSV to Excel in Python

Conclusion

1. Python Library to Read PowerPoint Files

2. Extracting Text from Slides in Python

2.1 Extract Text from Shapes

2.2 Extract Text from Tables

2.3 Extract Text from SmartArt

3. Saving Images from Slides in Python

4. Accessing Metadata (Document Properties) in Python

5. Conclusion

6. FAQs

Q1. Can Spire.Presentation handle password-protected PowerPoint files?

Q2. How can I read comments from PowerPoint slides?

Q3. Does Spire.Presentation preserve formatting when extracting text?

Q4. Are there any limitations on file size when reading PowerPoint files in Python?

Q5. Can I create or modify PowerPoint documents using Spire.Presentation?

Get a Free License

1. Python Library to Work with PowerPoint Files

2. Installing Spire.Presentation for Python

3. Creating a PowerPoint Document from Scratch

3.1 Generate and Save a Blank Presentation

3.2 Add Basic Elements to Your Slides

3.3 Apply a Background Image or Color

4. Creating PowerPoint Documents Based on a Template

5. Best Practices for Python PowerPoint Generation

6. Wrap Up

7. FAQs

Q1. What is Spire.Presentation?

Q2. Does this library require Microsoft Office to be installed?

Q3. Can I customize the layout of slides in my presentation?

Q4. Does Spire.Presentation support both PPT and PPTX format?

Q5. Can I add charts to my presentations?

Get a Free License

Why OCR is Needed for PDF Text Extraction

Best Python OCR Libraries for PDF Processing

Convert PDF Pages to Images Using Python

Scan and Extract Text from Images Using Spire.OCR

The Conclusion

Why Convert PDFs to Markdown?

Python PDF Converter Library - Installation

Convert PDF to Markdown in Python

Batch Convert Multiple PDFs to Markdown in Python

Frequently Asked Questions (FAQs)

Q1: Is Spire.PDF for Python free?

Q2: Can I convert password-protected PDFs to Markdown?

Q3: Can Spire.PDF convert scanned/image-based PDFs to Markdown?

Conclusion

Install Spire.XLS for Python

Why Choose Spire.XLS over Open-Source Libraries?

Convert JSON to Excel in Python

Step 1: Prepare JSON Data

Step 2: Convert JSON to Excel in Python with Spire.XLS

Code Explanation:

Convert Excel to JSON in Python

Step 1: Load the Excel File

Step 2: Extract Excel Data and Write to JSON

Code Explanation:

Real-World Example: Handling Nested JSON and Formatting Excel

Step 1: Flatten Nested JSON

Step 2: Format and Export Excel with Spire.XLS

Code Explanation

Common Errors and Fixes in JSON ↔ Excel Conversion

Conclusion

FAQ

Q1: How to convert JSON into Excel using Python?

Q2: How do I parse JSON data in Python?