Python (355)
Converting a Word document into images can be a useful and convenient option when you want to share or present the content without worrying about formatting issues or compatibility across devices. By converting a Word document into images, you can ensure that the text, images, and formatting remain intact, making it an ideal solution for sharing documents on social media, websites, or through email. In this article, you will learn how to convert Word to PNG, JPEG or SVG in Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Word to PNG or JPEG in Python
Spire.Doc for Python offers the Document.SaveImageToStream() method to convert a certain page into a bitmap image. Afterwards, you can save the bitmap image to a popular image format such as PNG, JPEG, or BMP. The detailed steps are as follows.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Retrieve each page in the document, and convert a specific page into a bitmap image using Document.SaveImageToStreams() method.
- Save the bitmap image into a PNG or JPEG file.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")
# Loop through the pages in the document
for i in range(document.GetPageCount()):
# Convert a specific page to bitmap image
imageStream = document.SaveImageToStreams(i, ImageType.Bitmap)
# Save the bitmap to a PNG file
with open('Output/ToImage-{0}.png'.format(i),'wb') as imageFile:
imageFile.write(imageStream.ToArray())
document.Close()

Convert Word to SVG in Python
To convert a Word document into multiple SVG files, you can simply use the Document.SaveToFile() method. Here are the steps.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Convert it to individual SVG files using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")
# Convert it to SVG files
document.SaveToFile("output/ToSVG.svg", FileFormat.SVG)
document.Close()

Get a Free License
To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.
Watermarks in Word documents serve as overlayed text or pictures that are typically used to indicate documents’ status, confidentiality, draft nature, etc. While they are useful in certain contexts, watermarks often become a hindrance when it comes to presenting documents. They can be distracting, obscuring the readability, and reduce the overall quality of the document. This article will show how to remove watermarks from Word documents in Python programs using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Remove the Watermark from a Word Document
Spire.Doc for Python provides the Document.Watermark property which allows users to deal with the watermark of a Word document. Users can assign a null value to this property to remove the watermark of Word document. The detailed steps are as follows:
- Create an object of Document class.
- Load a Word document using Document.LoadFromFile() method.
- Remove the watermark by assigning a null value to Document.Watermark property.
- Save the document using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create an object of Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("Sample.docx")
# Remove the watermark
doc.Watermark = None
# Save the document
doc.SaveToFile("output/RemoveWatermark.docx", FileFormat.Auto)

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Convert PowerPoint PPT or PPTX to PDF in Python: A Comprehensive Guide
2023-09-12 01:08:24 Written by Koohji
Looking to convert PowerPoint PPT or PPTX files to PDF using Python? This comprehensive guide walks you through the process of converting PowerPoint to PDF with ease. You'll learn how to perform quick conversions using default settings, as well as explore advanced features such as exporting specific slides, adjusting slide size for optimal output, including hidden slides, and generating PDF/A-compliant files for archival use.
- Why Convert PowerPoint to PDF
- Python PowerPoint to PDF Converter Library Installation
- Convert PowerPoint to PDF with Default Settings
- Export PowerPoint to PDF with Advanced Settings
- Conclusion
- FAQs
Why Convert PowerPoint to PDF?
Converting PowerPoint files to PDF offers several advantages:
- Universal Compatibility: PDF files can be opened on virtually any device or operating system without needing PowerPoint installed.
- Preserved Formatting: Unlike PowerPoint files, PDFs lock in layout, fonts, and images to avoid rendering inconsistencies.
- Enhanced Security: PDF files can be encrypted, making them ideal for sharing confidential information.
- Reduced File Size: PDFs often have a smaller file size than PowerPoint presentations, making them easier to share via email or upload online.
Python PowerPoint to PDF Converter Library Installation
To convert PowerPoint presentations to PDF in Python, you can use Spire.Presentation for Python. This powerful library allows you to create, read, modify, and convert PowerPoint PPTX and PPT files without needing Microsoft PowerPoint installed.
Why Choose Spire.Presentation for Python?
- High Fidelity: Ensures accurate conversion while preserving formatting and layout.
- User-Friendly: Simple API makes it easy to implement in your projects.
- Versatile: Supports a wide range of PowerPoint features and formats.
Install Spire.Presentation for Python
Before starting with the conversion process, install Spire.Presentation via pip using the following command:
pip install Spire.Presentation
Need help with installation? Refer to this detailed documentation: How to Install Spire.Presentation for Python on Windows
Convert PowerPoint to PDF with Default Settings
Spire.Presentation makes it easy to convert PowerPoint files to PDF with just a few lines of code.
The example below shows how to load a .pptx or .ppt file and export it to PDF using default settings - ideal for quick conversions where no customization is needed.
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load a PPTX file
presentation.LoadFromFile("Input.pptx")
# Or load a PPT file
# presentation.LoadFromFile("Input.ppt")
# Save the file as a PDF
presentation.SaveToFile("Basic_conversion.pdf", FileFormat.PDF)
presentation.Dispose()

Export PowerPoint to PDF with Advanced Settings
Spire.Presentation provides a range of advanced settings that give you control over how the PDF output is generated, making it ideal for both professional use and archival purposes. For example, you can:
- Export a Particular Slide to PDF
- Adjust Slide Size for Optimal PDF Output
- Include Hidden Slides in the Converted PDF
- Generate PDF/A-compliant Files from PowerPoint
Export a Particular Slide to PDF
If you only need to share a specific part of your presentation, Spire.Presentation allows you to extract and convert individual slides to a PDF. This is especially useful for generating slide-specific reports or handouts.
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")
# Get the desired slide (e.g., the second slide)
slide = presentation.Slides.get_Item(1)
# Save the slide as a PDF
slide.SaveToFile("Single_slide.pdf", FileFormat.PDF)
presentation.Dispose()

Adjust Slide Size for Optimal PDF Output
To ensure that your PDF meets printing or layout requirements, you can adjust the slide dimensions before conversion. Spire.Presentation lets you set standard slide sizes as well as custom slide dimensions so the output aligns with your document formatting needs.
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")
# Set the slide size to a standard slide size like A4
presentation.SlideSize.Type = SlideSizeType.A4
# # Or you can set custom slide size (e.g., 720x540 points)
# presentation.SlideSize.Size = SizeF(720.0, 540.0)
# Fit content to the new slide size
presentation.SlideSizeAutoFit = True
# Save the presentation as a PDF
presentation.SaveToFile("Resized_output.pdf", FileFormat.PDF)
presentation.Dispose()

Include Hidden Slides in the Converted PDF
By default, hidden slides are excluded from conversion. However, if your workflow requires complete documentation, Spire.Presentation enables you to include hidden slides in the output PDF.
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")
# Get the SaveToPdfOption object
option = presentation.SaveToPdfOption
# Enable ContainHiddenSlides option
option.ContainHiddenSlides = True
# Save the presentation as a PDF
presentation.SaveToFile("Include_hidden_slides.pdf", FileFormat.PDF)
presentation.Dispose()
Generate PDF/A-compliant Files from PowerPoint
PDF/A is a specialized format intended for long-term digital preservation. If your organization needs to archive presentations in a standards-compliant format, Spire.Presentation allows you to export PDF/A files that conform to archival best practices.
from spire.presentation import *
# Create a Presentation object
presentation = Presentation()
# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")
# Get the SaveToPdfOption object
option = presentation.SaveToPdfOption
# Set PDF compliance to PDF/A-1a
option.PdfConformanceLevel = PdfConformanceLevel.Pdf_A1A
# Save the presentation as a PDF
presentation.SaveToFile("Pdf_a_output.pdf", FileFormat.PDF)
presentation.Dispose()
Conclusion
Spire.Presentation for Python offers a robust set of features for converting PowerPoint files to PDF with minimal effort and maximum flexibility. Whether you require simple conversions or advanced customization options, this library gives developers full control over the process. From exporting individual slides to generating archival-quality outputs, it’s a comprehensive tool for PowerPoint-to-PDF conversion workflows in Python.
FAQs
Q1: Can I convert PPTX and PPT files without installing Microsoft PowerPoint?
A1: Yes, Spire.Presentation is a standalone library and does not require Microsoft Office or PowerPoint to be installed.
Q2: Does the library support batch conversion of multiple PowerPoint files?
A2: Yes, you can write scripts to loop through multiple files and convert each to PDF programmatically.
Q3: Is PDF/A-1a the only compliance level supported for PPT to PDF/A conversion?
A3: No, Spire.Presentation supports multiple compliance levels for PPT to PDF/A conversion, including PDF/A-1a, PDF/A-2a, PDF/A-3a, PDF/A-1b, PDF/A-2b, and PDF/A-3b.
Get a Free License
To fully experience the capabilities of Spire.Presentation for Python without any evaluation limitations, you can request a free 30-day trial license.
Images in Word documents can break up large blocks of text and make the content more visually interesting. In addition, they can also effectively illustrate complex ideas or concepts that are difficult to explain solely through text. In this article, you will learn how to programmatically add images to a Word document using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Insert an Image in a Word Document in Python
Spire.Doc for Python offers the Paragraph.AppendPicture() method to insert an image into a Word document. The following are the detailed steps.
- Create a Document object.
- Add a section using Document.AddSection() method.
- Add two paragraphs to the section using Section.AddParagraph() method.
- Add text to the paragraphs and set formatting.
- Load an image and add it to a specified paragraph using Paragraph.AppendPicture() method.
- Set width and height for the image using DocPicture.Width and DocPicture.Height properties.
- Set a text wrapping style for the image using DocPicture.TextWrappingStyle property.
- Save the result document using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Add a seciton
section = document.AddSection()
# Add a paragraph
paragraph1 = section.AddParagraph()
# Add text to the paragraph and set formatting
tr = paragraph1.AppendText("Spire.Doc for Python is a professional Word Python API specifically designed for developers to create, read, write, convert, and compare Word documents with fast and high-quality performance.")
tr.CharacterFormat.FontName = "Calibri"
tr.CharacterFormat.FontSize = 11
paragraph1.Format.LineSpacing = 18
paragraph1.Format.BeforeSpacing = 10
paragraph1.Format.AfterSpacing = 10
# Add another paragraph
paragraph2 = section.AddParagraph()
tr = paragraph2.AppendText("Spire.Doc for Python enables to perform many Word document processing tasks. It supports Word 97-2003 /2007/2010/2013/2016/2019 and it has the ability to convert them to commonly used file formats like XML, RTF, TXT, XPS, EPUB, EMF, HTML and vice versa. Furthermore, it supports to convert Word Doc/Docx to PDF using Python, Word to SVG, and Word to PostScript in high quality.")
# Add text to the paragraph and set formatting
tr.CharacterFormat.FontName = "Calibri"
tr.CharacterFormat.FontSize = 11
paragraph2.Format.LineSpacing = 18
# Add an image to the specified paragraph
picture = paragraph1.AppendPicture("Spire.Doc.jpg")
# Set image width and height
picture.Width = 100
picture.Height = 100
# Set text wrapping style for the image
picture.TextWrappingStyle = TextWrappingStyle.Square
#Save the result document
document.SaveToFile("InsertImage.docx", FileFormat.Docx)
document.Close()

Insert an Image at a Specified Location in a Word document in Python
If you wish to place the image at a specified location in the Word document, you can set its position through the DocPicture.HorizontalPosition and DocPicture.VerticalPosition properties. The following are the detailed steps.
- Create a Document object.
- Add a section using Document.AddSection() method.
- Add a paragraph to the section using Section.AddParagraph() method.
- Add text to the paragraph and set formatting.
- Add an image to the paragraph using Paragraph.AppendPicture() method.
- Set width and height for the image using DocPicture.Width and DocPicture.Height properties.
- Set the horizontal position and vertical position for the image using DocPicture.HorizontalPosition and DocPicture.VerticalPosition properties.
- Set a text wrapping style for the image using DocPicture.TextWrappingStyle property.
- Save the result document using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
doc = Document()
# Add a section
section = doc.AddSection()
# Add a paragraph to the section
paragraph = section.AddParagraph()
# Add text to the paragraph and set formatting
paragraph.AppendText("The sample demonstrates how to insert an image at a specified location in a Word document.")
paragraph.ApplyStyle(BuiltinStyle.Heading2)
# Add an image to the paragraph
picture = paragraph.AppendPicture("pic.jpg")
# Set image position
picture.HorizontalPosition = 150.0
picture.VerticalPosition = 60.0
# Set image size
picture.Width = 120.0
picture.Height = 180.0
# Set a text wrapping style for the image (note that the position settings are not applicable when the text wrapping style is Inline)
picture.TextWrappingStyle = TextWrappingStyle.Through
# Save the result document
doc.SaveToFile("WordImage.docx", FileFormat.Docx)
doc.Close()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Text files are a common file type that contain only plain text without any formatting or styles. If you want to apply formatting or add images, charts, tables, and other media elements to text files, one of the recommended solutions is to convert them to Word files.
Conversely, if you want to efficiently extract content or reduce the file size of Word documents, you can convert them to text format. This article will demonstrate how to programmatically convert text files to Word format and convert Word files to text format using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Text (TXT) to Word in Python
Conversion from TXT to Word is quite simple that requires only a few lines of code. The following are the detailed steps.
- Create a Document object.
- Load a text file using Document.LoadFromFile(string fileName) method.
- Save the text file as a Word file using Document.SaveToFile(string fileName, FileFormat fileFormat) method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Load a TXT file
document.LoadFromFile("input.txt")
# Save the TXT file as Word
document.SaveToFile("TxtToWord.docx", FileFormat.Docx2016)
document.Close()

Convert Word to Text (TXT) in Python
The Document.SaveToFile(string fileName, FileFormat.Txt) method provided by Spire.Doc for Python allows you to export a Word file to text format. The following are the detailed steps.
- Create a Document object.
- Load a Word file using Document.LoadFromFile(string fileName) method.
- Save the Word file in txt format using Document.SaveToFile(string fileName, FileFormat.Txt) method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
document = Document()
# Load a Word file from disk
document.LoadFromFile("Input.docx")
# Save the Word file in txt format
document.SaveToFile("WordToTxt.txt", FileFormat.Txt)
document.Close()

Get a Free License
To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.
Comment in Excel is a function that allows users to add extra details or remarks as explanatory notes. Comments can be in the form of text or images. It enables users to provide additional information to explain or supplement the data in specified cells. After adding a comment, users can view the content of the comment by hovering the mouse over the cell with the comment. This feature enhances the readability and comprehensibility of the document, helping readers better understand and handle the data in Excel. In this article, we will show you how to add comments in Excel by using Spire.XLS for Python.
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Add Comment with Text in Excel
Spire.XLS for Python allows users to add comment with text in Excel by calling CellRange.AddComment() method. The following are detailed steps.
- Create an object of Workbook class.
- Load an Excel file using Workbook.LoadFromFile() method.
- Get the first worksheet of this file using Workbook.Worksheets[] property.
- Get the specified cell by using Worksheet.Range[] property.
- Set the author and content of the comment and add them to the obtained cell using CellRange.AddComment() method.
- Set the font of the comment.
- Save the result file using Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.xls.common import *
inputFile = "sample.xlsx"
outputFile = "CommentWithAuthor.xlsx"
#Create an object of Workbook class
workbook = Workbook()
#Load the sample file from disk
workbook.LoadFromFile(inputFile)
#Get the first worksheet
sheet = workbook.Worksheets[0]
#Get the specified cell
range = sheet.Range["B4"]
#Set the author and content of the comment
author = "Jhon"
text = "Emergency task."
#Add comment to the obtained cell
comment = range.AddComment()
comment.Width = 200
comment.Visible = True
comment.Text = author + ":\n" + text
#Set the font of the comment
font = workbook.CreateFont()
font.FontName = "Tahoma"
font.KnownColor = ExcelColors.Black
font.IsBold = True
comment.RichText.SetFont(0, len(author), font)
#Save the result file
workbook.SaveToFile(outputFile, ExcelVersion.Version2013)
workbook.Dispose()

Add Comment with Picture in Excel
Additionally, Spire.XLS for Python also enable users to add comment with picture to the specified cell in Excel by using CellRange.AddComment() and ExcelCommentObject.Fill.CustomPicture() methods. The following are detailed steps.
- Create an object of Workbook class.
- Get the first worksheet using Workbook.Worksheets[] property.
- Get the specified cell by using Worksheet.Range[] property and set text for it.
- Add comment to the obtained cell by using CellRange.AddComment() method.
- Load an image and fill the comment with it by calling ExcelCommentObject.Fill.CustomPicture() method.
- Set the height and width of the comment.
- Save the result file using Workbook.SaveToFile() method.
- Python
from spire.xls import * from spire.common import * inputFile = "logo.png" outputFile = "CommentWithPicture.xlsx" #Create an object of Workbook class workbook = Workbook() #Get the first worksheet sheet = workbook.Worksheets[0] #Get the specified cell and set text for it range = sheet.Range["C6"] range.Text = "E-iceblue" #Add comment to the obtained cell comment = range["C6"].AddComment() #Load an image file and fill the comment with it image = Image.FromFile(inputFile) comment.Fill.CustomPicture(image, "logo.png") #Set the height and width of the comment comment.Height = image.Height comment.Width = image.Width comment.Visible = True #Save the result file workbook.SaveToFile(outputFile, ExcelVersion.Version2010) workbook.Dispose()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Hyperlinks are a commonly used tool in Excel that facilitates navigation between different sheets, workbooks, websites, or even specific cells within a worksheet. There are instances where you may need to manage hyperlinks in Excel files, such as extracting hyperlinks for further analysis, modifying existing links, or removing them entirely. In this article, we will introduce how to extract, modify, and remove hyperlinks in Excel in Python using Spire.XLS for Python.
- Extract Hyperlinks from Excel in Python
- Modify Hyperlinks in Excel in Python
- Remove Hyperlinks from Excel in Python
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Extract Hyperlinks from Excel in Python
Extracting hyperlinks from an Excel worksheet can be beneficial when you need to analyze or export the link data for further processing.
The following steps demonstrate how to extract hyperlinks from an Excel worksheet in Python using Spire.XLS for Python:
- Create a Workbook object.
- Load an Excel file using Workbook.LoadFromFile() method.
- Get a specific worksheet using Workbook.Worksheets[] property.
- Get the collection of all hyperlinks in the worksheet using Worksheet.HyperLinks property.
- Create an empty list to store the extracted hyperlink information.
- Loop through the hyperlinks in the hyperlink collection.
- Get the address of each hyperlink using XlsHyperlink.Address property and append the address to the list.
- Write the addresses in the list into a text file.
- Python
from spire.xls import *
from spire.xls.common import *
# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Hyperlinks.xlsx")
# Get the first worksheet of the file
sheet = workbook.Worksheets[0]
# Get the hyperlink collection of the worksheet
links = sheet.HyperLinks
# Create an empty list to store the extracted hyperlinks
list = []
# Loop through the hyperlinks in the hyperlink collection
for link in links:
# Get the address of each hyperlink
address = link.Address
# Append the address to the list
list.append(address)
# Write the extracted hyperlink addresses to a text file
with open("ExtractHyperlinks.txt", "w", encoding = "utf-8") as file:
for item in list:
file.write(item + "\n")
workbook.Dispose()

Modify Hyperlinks in Excel in Python
Modifying hyperlinks allows you to update URLs or alter the display text to suit your needs.
The following steps demonstrate how to modify an existing hyperlink in an Excel worksheet in Python using Spire.XLS for Python:
- Create a Workbook object.
- Load an Excel file using Workbook.LoadFromFile() method.
- Get a specific worksheet using Workbook.Worksheets[] property.
- Get a specific hyperlink in the worksheet using Worksheet.HyperLinks[] property.
- Modify the display text and address of the hyperlink using XlsHyperlink.TextToDisplay and XlsHyperlink.Address properties.
- Save the resulting file using Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.xls.common import *
# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Hyperlinks.xlsx")
# Get the first worksheet of the file
sheet = workbook.Worksheets[0]
# Get the first hyperlink in the worksheet
link = sheet.HyperLinks[0]
# Change the display text of the hyperlink
link.TextToDisplay = "Spire.XLS for .NET"
# Change the address of the hyperlink
link.Address = "http://www.e-iceblue.com"
# Save the resulting file
workbook.SaveToFile("ModifyHyperlink.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Remove Hyperlinks from Excel in Python
Removing hyperlinks can help eliminate unnecessary links and clean up your spreadsheet.
The following steps demonstrate how to remove a specific hyperlink from an Excel worksheet in Python using Spire.XLS for Python:
- Create a Workbook object.
- Load an Excel file using Workbook.LoadFromFile() method.
- Get a specific worksheet using Workbook.Worksheets[] property.
- Remove a specific hyperlink from the worksheet using Worksheet.Hyperlinks.RemoveAt() method.
- Save the resulting file using Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.xls.common import *
# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Hyperlinks.xlsx")
# Get the first worksheet of the file
sheet = workbook.Worksheets[0]
# Remove the first hyperlink and keep its display text
sheet.HyperLinks.RemoveAt(0)
# Save the resulting file
workbook.SaveToFile("RemoveHyperlink.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
By extracting text from Word documents, you can effortlessly obtain the written information contained within them. This allows for easier manipulation, analysis, and organization of textual content, enabling tasks such as text mining, sentiment analysis, and natural language processing. Extracting images, on the other hand, provides access to visual elements embedded within Word documents, which can be crucial for tasks like image recognition, content extraction, or creating image databases. In this article, you will learn how to extract text and images from a Word document in Python using Spire.Doc for Python.
- Extract Text from a Specific Paragraph in Python
- Extract Text from an Entire Word Document in Python
- Extract Images from an Entire Word Document in Python
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Extract Text from a Specific Paragraph in Python
To get a certain paragraph from a section, use Section.Paragraphs[index] property. Then, you can get the text of the paragraph through Paragraph.Text property. The detailed steps are as follows.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Get a specific section through Document.Sections[index] property.
- Get a specific paragraph through Section.Paragraphs[index] property.
- Get text from the paragraph through Paragraph.Text property.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
doc = Document()
# Load a Word document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")
# Get a specific section
section = doc.Sections.get_Item(0)
# Get a specific paragraph
paragraph = section.Paragraphs.get_Item(2)
# Get text from the paragraph
str = paragraph.Text
# Print result
print(str)

Extract Text from an Entire Word Document in Python
If you want to get text from a whole document, you can simply use Document.GetText() method. Below are the steps.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Get text from the document using Document.GetText() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
doc = Document()
# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")
# Get text from the entire document
str = doc.GetText()
# Print result
print(str)

Extract Images from an Entire Word Document in Python
Spire.Doc for Python does not provide a straightforward method to get images from a Word document. You need to iterate through the child objects in the document, and determine if a certain a child object is a DocPicture. If yes, you get the image data using DocPicture.ImageBytes property and then save it as a popular image format file. The main steps are as follows.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Loop through the child objects in the document.
- Determine if a specific child object is a DocPicture. If yes, get the image data through DocPicture.ImageBytes property.
- Write the image data as a PNG file.
- Python
import queue
from spire.doc import *
from spire.doc.common import *
# Create a Document object
doc = Document()
# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")
# Create a Queue object
nodes = queue.Queue()
nodes.put(doc)
# Create a list
images = []
while nodes.qsize() > 0:
node = nodes.get()
# Loop through the child objects in the document
for i in range(node.ChildObjects.Count):
child = node.ChildObjects.get_Item(i)
# Determine if a child object is a picture
if child.DocumentObjectType == DocumentObjectType.Picture:
picture = child if isinstance(child, DocPicture) else None
dataBytes = picture.ImageBytes
# Add the image data to the list
images.append(dataBytes)
elif isinstance(child, ICompositeObject):
nodes.put(child if isinstance(child, ICompositeObject) else None)
# Loop through the images in the list
for i, item in enumerate(images):
fileName = "Image-{}.png".format(i)
with open("ExtractedImages/"+fileName,'wb') as imageFile:
# Write the image to a specified path
imageFile.write(item)
doc.Close()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
How to Convert PDF to Excel with Formatting in Python (Step-by-Step Guide)
2023-09-07 01:11:29 Written by Koohji
Converting PDF files to Excel spreadsheets in Python is an effective way to extract structured data for analysis, reporting, and automation. While PDFs are excellent for preserving layout across platforms, their static format often makes data extraction challenging.
Excel, on the other hand, provides robust features for sorting, filtering, calculating, and visualizing data. By using Python along with the Spire.PDF for Python library, you can automate the entire PDF to Excel conversion process — from basic one-page documents to complex, multi-page PDFs.
Whether you're automating data extraction from PDFs or integrating PDF content into Excel workflows, this tutorial will walk you through both quick-start and advanced methods for reliable Python PDF to Excel conversion.
Table of Contents
- Why Convert PDF to Excel Programmatically in Python
- Setting Up Your Development Environment
- Quick Start: Convert PDF to Excel in Python
- Advanced PDF to Excel Conversion with Layout Control and Formatting Options
- Conclusion
Why Convert PDF to Excel Programmatically in Python
PDFs are ideal for sharing documents with consistent formatting, but their fixed structure makes them difficult to analyze or reuse, especially if they contain tables.
Converting PDF to Excel allows you to:
- Extract tabular data for analysis or visualization
- Automate monthly or recurring report extraction
- Enable downstream processing in Excel
- Save hours of manual copy-pasting
Using Python for this task adds automation, flexibility, and scalability — ideal for integration into data pipelines or backend services.
Setting Up Your Development Environment
Before you start converting PDF files to Excel using Python, it’s essential to set up your development environment properly. This ensures you have all the necessary tools and libraries installed to follow the tutorial smoothly.
Install Python
If you haven’t already installed Python on your system, download and install the latest version from the official website.
Make sure to add Python to your system PATH during installation to run Python commands from the terminal or command prompt easily.
Install Spire.PDF for Python
Spire.PDF for Python is the core library used in this tutorial to load, manipulate, and convert PDF documents.
To install it, open your terminal and run:
pip install Spire.PDF
This command downloads and installs Spire.PDF along with any required dependencies.
If you encounter any issues or need detailed installation help, please refer to our step-by-step guide: How to Install Spire.PDF for Python on Windows
Quick Start: Convert PDF to Excel in Python
If your PDF has a clean and simple layout without complex formatting or multiple page structures, you can convert it directly to Excel with just 3 lines of code using Spire.PDF for Python.
Steps to Quickly Export PDF to Excel
Follow these straightforward steps to export your PDF file to an Excel spreadsheet in Python:
- Import the required classes.
- Create a PdfDocument object.
- Load your PDF file with the LoadFromFile method.
- Export the PDF to Excel (.xlsx) format using the SaveToFile method and specify FileFormat.XLSX as the output format.
Code Example
from spire.pdf import *
# Create a PdfDocument object
pdf = PdfDocument()
# Load your PDF file
pdf.LoadFromFile("Sample.pdf")
# Convert and save the PDF to Excel
pdf.SaveToFile("output.xlsx", FileFormat.XLSX)
# Close the document
pdf.Close()
Advanced PDF to Excel Conversion with Layout Control and Formatting Options
For more complex PDF documents—such as those containing multiple pages, rotated text, table cells with multiple lines of text, or overlapping content - you can use the XlsxLineLayoutOptions class to gain precise control over the conversion process.
This allows you to preserve the original structure and formatting of your PDF more accurately when exporting to Excel.
Layout Options You Can Configure
The XlsxLineLayoutOptions class in Spire.PDF provides several properties that give you granular control over how PDF content is exported to Excel. Below is a breakdown of each option and its behavior:
| Option | Description |
|---|---|
| convertToMultipleSheet | Determines whether to convert each PDF page into a separate worksheet. The default value is true. |
| rotatedText | Specifies whether to preserve the original rotation of angled text. The default value is true. |
| splitCell | Determines whether to split a PDF table cell with multiple lines of text into separate rows in the Excel output. The default value is true. |
| wrapText | Determines whether to enable word wrap inside Excel cells. The default value is true. |
| overlapText | Specifies whether text overlapping in the original PDF should be preserved in the Excel output. The default value is false. |
Code Example
from spire.pdf import *
# Create a PdfDocument object
pdf = PdfDocument()
# Load your PDF file
pdf.LoadFromFile("Sample.pdf")
# Define layout options
# Parameters: convertToMultipleSheet, rotatedText, splitCell, wrapText, overlapText
layout_options = XlsxLineLayoutOptions(True, True, False, True, False)
# Apply layout options
pdf.ConvertOptions.SetPdfToXlsxOptions(layout_options)
# Convert and save the PDF to Excel
pdf.SaveToFile("advanced_output.xlsx", FileFormat.XLSX)
# Close the document
pdf.Close()

Conclusion
Converting PDF files to Excel in Python is an efficient way to automate data extraction and processing tasks. Whether you need a quick conversion or fine-grained layout control, Spire.PDF for Python offers flexible options that scale from simple to complex scenarios.
Ready to automate your PDF to Excel conversions?
Get a free trial license for Spire.PDF for Python and explore the full Spire.PDF Documentation to get started today!
FAQs
Q1: Can I convert each PDF page into a separate Excel worksheet?
A1: Yes. Use the convertToMultipleSheet=True option in the XlsxLineLayoutOptions class to export each page to its own sheet.
Q2. What Excel format does Spire.PDF export to?
A2: Spire.PDF converts PDFs to .xlsx, the modern Excel format supported by Excel 2007 and later.
Q3: Can I convert a PDF to Excel in Python without losing formatting?
A3: Yes. Using Spire.PDF for Python, you can retain the original formatting, including merged cells, cell background colors, and other format settings when saving PDFs to Excel.
Q4: Can I extract only a specific table from a PDF to Excel instead of converting the whole document?
A4: Yes, Spire.PDF for Python supports extracting specific tables from PDF files. You can then write the extracted table data to Excel using our Excel processing library - Spire.XLS for Python.
Merging PDF files is a common task in many applications, from combining report sections to creating comprehensive document collections. For developers, using Python to merge PDF files programmatically can significantly streamline the process and help build automated workflows.
This article explores how to merge PDFs in Python using Spire.PDF for Python - a robust library designed for efficient PDF manipulation.

Table of Contents:
- 5 Reasons Why You Should Use Python to Combine PDFs
- Step-by-Step: Merge PDF Files in Python
- Advanced: Merge Selected Pages from PDFs in Python
- Batch Processing: Merge Multiple PDF Files in a Folder
- Frequently Asked Questions
- Conclusion
5 Reasons Why You Should Use Python to Combine PDFs
While GUI tools like Adobe Acrobat offer PDF merging capabilities, Python provides distinct advantages for developers and enterprises. Python’s PDF merging feature shines when you need to:
- Process documents in bulk
- Schedule scripts to run automatically (e.g., daily report merging).
- Integrate with data workflows
- Implement business-specific logic
- Deploy in server/cloud environments
Step-by-Step: Merge PDF Files in Python
Step 1: Install Spire.PDF for Python
Before you can start combining PDFs with Spire.PDF for Python, you need to install the library. You can do this using pip, the Python package manager. Open your terminal and run the following command:
pip install Spire.PDF
Step 2: Merge Multiple PDF Files into One
Now, let's dive into the Python code for merging multiple PDF files into a single PDF.
1. Import the Required Classes
First, import the necessary classes from the Spire.PDF library:
from spire.pdf.common import *
from spire.pdf import *
2. Define Paths of PDFs to Merge
Define three PDF file paths and stored them in a list. You can modify these paths or adjust the number of files according to your needs.
inputFile1 = "Sample1.pdf"
inputFile2 = "Sample2.pdf"
inputFile3 = "Sample3.pdf"
files = [inputFile1, inputFile2, inputFile3]
3. Merge PDF Files
The MergeFiles() method combines all PDFs in the list into a new PDF document object.
pdf = PdfDocument.MergeFiles(files)
4. Save the Merged PDF Finally, save the combined PDF to a specified output path.
pdf.Save("output/MergePDF.pdf", FileFormat.PDF)
Full Python Code to Combine PDFs:
from spire.pdf.common import *
from spire.pdf import *
# Create a list of the PDF file paths
inputFile1 = "Sample1.pdf"
inputFile2 = "Sample2.pdf"
inputFile3 = "Sample3.pdf"
files = [inputFile1, inputFile2, inputFile3]
# Merge the PDF documents
pdf = PdfDocument.MergeFiles(files)
# Save the result document
pdf.Save("output/MergePDF.pdf", FileFormat.PDF)
pdf.Close()
Result: Combine three PDF files (total of 6 pages) into one PDF file.

Advanced: Merge Selected Pages from PDFs in Python
In some cases, you may only want to merge specific pages of multiple PDFs. Spire.PDF for Python makes this easy by allowing you to select pages from different PDF documents and insert them into a new PDF file.
from spire.pdf import *
from spire.pdf.common import *
# Create a list of the PDF file paths
file1 = "Sample1.pdf"
file2 = "Sample2.pdf"
file3 = "Sample3.pdf"
files = [file1, file2, file3]
# Load each PDF file as an PdfDocument object and add them to a list
pdfs = []
for file in files:
pdfs.append(PdfDocument(file))
# Create an object of PdfDocument class
newPdf = PdfDocument()
# Insert the selected pages from the loaded PDF documents into the new document
newPdf.InsertPage(pdfs[0], 0)
newPdf.InsertPage(pdfs[1], 1)
newPdf.InsertPageRange(pdfs[2], 0, 1)
# Save the new PDF document
newPdf.SaveToFile("output/SelectedPages.pdf")
Explanation:
- PdfDocument(): Initializes a new PDF document object.
- InsertPage(): Insert a specified page to the new PDF (Page index starts at 0).
- InsertPageRange(): Inserts a range of pages to the new PDF.
- SaveToFile(): Save the combined PDF to the specified output path.
Result: Combine selected pages from three separate PDF files into a new PDF.

Batch Processing: Merge Multiple PDF Files in a Folder
The Python script loops through each source PDF in a specified folder, then appends all pages from the source PDFs to a new PDF file.
import os
from spire.pdf.common import *
from spire.pdf import *
# Specify the directory where the source PDFs are stored
folder = "pdf_folder/"
# Create a new PDF to hold the combined content.
merged_pdf = PdfDocument()
# Loop through each source PDF
for file in os.listdir(folder):
if file.endswith(".pdf"):
pdf = PdfDocument(os.path.join(folder, file))
# Appends all pages from each source PDF to the new PDF
merged_pdf.AppendPage(pdf)
pdf.Close() # Close source PDF
# Save the merged PDF after processing all files
merged_pdf.SaveToFile("BatchCombinePDFs.pdf")
merged_pdf.Close() # Release resources
Frequently Asked Questions
Q1: Is Spire.PDF for Python free?
A: Spire.PDF for Python offers a 30-day free trial with full features. There’s also a free version available but with page limits.
Q2: Can I merge scanned/image-based PDFs?
A: Yes, Spire.PDF handles image-only PDFs. However, OCR/text extraction requires the Spire.OCR for Python library.
Q3: How to add page numbers to the merged PDF?
A: Refer to this comprehensive guide: Add Page Numbers to PDF in Python
Q4: How to reduce the size of the merged PDF?
A: You can compress the high-resolution images and fonts contained in the merged PDF file. A related tutorial: Compress PDF Documents in Python.
Conclusion
Merging PDFs with Python doesn't have to be a complex task. With Spire.PDF for Python, you can efficiently combine multiple PDF files into a single document with just a few lines of code. Whether you need to merge entire documents, specific pages, or a batch merge, this guide outlines step-by-step instructions to help you automate the PDF merging process.
Explore Spire.PDF's online documentation for more PDF prcessing features with Python.