Python: Convert Word to Images

2023-09-14 00:55:04 Written by Koohji

Converting a Word document into images can be a useful and convenient option when you want to share or present the content without worrying about formatting issues or compatibility across devices. By converting a Word document into images, you can ensure that the text, images, and formatting remain intact, making it an ideal solution for sharing documents on social media, websites, or through email. In this article, you will learn how to convert Word to PNG, JPEG or SVG in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Word to PNG or JPEG in Python

Spire.Doc for Python offers the Document.SaveImageToStream() method to convert a certain page into a bitmap image. Afterwards, you can save the bitmap image to a popular image format such as PNG, JPEG, or BMP. The detailed steps are as follows.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Retrieve each page in the document, and convert a specific page into a bitmap image using Document.SaveImageToStreams() method.
  • Save the bitmap image into a PNG or JPEG file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Loop through the pages in the document
for i in range(document.GetPageCount()):

    # Convert a specific page to bitmap image
    imageStream = document.SaveImageToStreams(i, ImageType.Bitmap)

    # Save the bitmap to a PNG file
    with open('Output/ToImage-{0}.png'.format(i),'wb') as imageFile:
        imageFile.write(imageStream.ToArray())

document.Close()

Python: Convert Word to Images

Convert Word to SVG in Python

To convert a Word document into multiple SVG files, you can simply use the Document.SaveToFile() method. Here are the steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Convert it to individual SVG files using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Convert it to SVG files
document.SaveToFile("output/ToSVG.svg", FileFormat.SVG)

document.Close()

Python: Convert Word to Images

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Watermarks in Word documents serve as overlayed text or pictures that are typically used to indicate documents’ status, confidentiality, draft nature, etc. While they are useful in certain contexts, watermarks often become a hindrance when it comes to presenting documents. They can be distracting, obscuring the readability, and reduce the overall quality of the document. This article will show how to remove watermarks from Word documents in Python programs using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Remove the Watermark from a Word Document

Spire.Doc for Python provides the Document.Watermark property which allows users to deal with the watermark of a Word document. Users can assign a null value to this property to remove the watermark of Word document. The detailed steps are as follows:

  • Create an object of Document class.
  • Load a Word document using Document.LoadFromFile() method.
  • Remove the watermark by assigning a null value to Document.Watermark property.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Remove the watermark
doc.Watermark = None

# Save the document
doc.SaveToFile("output/RemoveWatermark.docx", FileFormat.Auto)

Python: Remove Watermarks from Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Convert PowerPoint PPT or PPTX to PDF in Python

Looking to convert PowerPoint PPT or PPTX files to PDF using Python? This comprehensive guide walks you through the process of converting PowerPoint to PDF with ease. You'll learn how to perform quick conversions using default settings, as well as explore advanced features such as exporting specific slides, adjusting slide size for optimal output, including hidden slides, and generating PDF/A-compliant files for archival use.

Why Convert PowerPoint to PDF?

Converting PowerPoint files to PDF offers several advantages:

  • Universal Compatibility: PDF files can be opened on virtually any device or operating system without needing PowerPoint installed.
  • Preserved Formatting: Unlike PowerPoint files, PDFs lock in layout, fonts, and images to avoid rendering inconsistencies.
  • Enhanced Security: PDF files can be encrypted, making them ideal for sharing confidential information.
  • Reduced File Size: PDFs often have a smaller file size than PowerPoint presentations, making them easier to share via email or upload online.

Python PowerPoint to PDF Converter Library Installation

To convert PowerPoint presentations to PDF in Python, you can use Spire.Presentation for Python. This powerful library allows you to create, read, modify, and convert PowerPoint PPTX and PPT files without needing Microsoft PowerPoint installed.

Why Choose Spire.Presentation for Python?

  • High Fidelity: Ensures accurate conversion while preserving formatting and layout.
  • User-Friendly: Simple API makes it easy to implement in your projects.
  • Versatile: Supports a wide range of PowerPoint features and formats.

Install Spire.Presentation for Python

Before starting with the conversion process, install Spire.Presentation via pip using the following command:

pip install Spire.Presentation

Need help with installation? Refer to this detailed documentation: How to Install Spire.Presentation for Python on Windows

Convert PowerPoint to PDF with Default Settings

Spire.Presentation makes it easy to convert PowerPoint files to PDF with just a few lines of code.

The example below shows how to load a .pptx or .ppt file and export it to PDF using default settings - ideal for quick conversions where no customization is needed.

from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PPTX file
presentation.LoadFromFile("Input.pptx")
# Or load a PPT file
# presentation.LoadFromFile("Input.ppt")

# Save the file as a PDF
presentation.SaveToFile("Basic_conversion.pdf", FileFormat.PDF)
presentation.Dispose()

Python Convert PowerPoint Presentation to PDF

Export PowerPoint to PDF with Advanced Settings

Spire.Presentation provides a range of advanced settings that give you control over how the PDF output is generated, making it ideal for both professional use and archival purposes. For example, you can:

  • Export a Particular Slide to PDF
  • Adjust Slide Size for Optimal PDF Output
  • Include Hidden Slides in the Converted PDF
  • Generate PDF/A-compliant Files from PowerPoint

Export a Particular Slide to PDF

If you only need to share a specific part of your presentation, Spire.Presentation allows you to extract and convert individual slides to a PDF. This is especially useful for generating slide-specific reports or handouts.

from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")

# Get the desired slide (e.g., the second slide)
slide = presentation.Slides.get_Item(1)

# Save the slide as a PDF
slide.SaveToFile("Single_slide.pdf", FileFormat.PDF)
presentation.Dispose()

Python Convert Individual Slides to PDF

Adjust Slide Size for Optimal PDF Output

To ensure that your PDF meets printing or layout requirements, you can adjust the slide dimensions before conversion. Spire.Presentation lets you set standard slide sizes as well as custom slide dimensions so the output aligns with your document formatting needs.

from spire.presentation import *

# Create a Presentation object
presentation = Presentation()
# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")

# Set the slide size to a standard slide size like A4
presentation.SlideSize.Type = SlideSizeType.A4

# # Or you can set custom slide size (e.g., 720x540 points)
# presentation.SlideSize.Size = SizeF(720.0, 540.0)

# Fit content to the new slide size
presentation.SlideSizeAutoFit = True

# Save the presentation as a PDF
presentation.SaveToFile("Resized_output.pdf", FileFormat.PDF)
presentation.Dispose()

Python Adjust Slide Size for Optimal PDF Output

Include Hidden Slides in the Converted PDF

By default, hidden slides are excluded from conversion. However, if your workflow requires complete documentation, Spire.Presentation enables you to include hidden slides in the output PDF.

from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")

# Get the SaveToPdfOption object
option = presentation.SaveToPdfOption

# Enable ContainHiddenSlides option
option.ContainHiddenSlides = True

# Save the presentation as a PDF
presentation.SaveToFile("Include_hidden_slides.pdf", FileFormat.PDF)
presentation.Dispose()

Generate PDF/A-compliant Files from PowerPoint

PDF/A is a specialized format intended for long-term digital preservation. If your organization needs to archive presentations in a standards-compliant format, Spire.Presentation allows you to export PDF/A files that conform to archival best practices.

from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load the PowerPoint file
presentation.LoadFromFile("Input.pptx")

# Get the SaveToPdfOption object
option = presentation.SaveToPdfOption

# Set PDF compliance to PDF/A-1a
option.PdfConformanceLevel = PdfConformanceLevel.Pdf_A1A

# Save the presentation as a PDF
presentation.SaveToFile("Pdf_a_output.pdf", FileFormat.PDF)
presentation.Dispose()

Conclusion

Spire.Presentation for Python offers a robust set of features for converting PowerPoint files to PDF with minimal effort and maximum flexibility. Whether you require simple conversions or advanced customization options, this library gives developers full control over the process. From exporting individual slides to generating archival-quality outputs, it’s a comprehensive tool for PowerPoint-to-PDF conversion workflows in Python.

FAQs

Q1: Can I convert PPTX and PPT files without installing Microsoft PowerPoint?

A1: Yes, Spire.Presentation is a standalone library and does not require Microsoft Office or PowerPoint to be installed.

Q2: Does the library support batch conversion of multiple PowerPoint files?

A2: Yes, you can write scripts to loop through multiple files and convert each to PDF programmatically.

Q3: Is PDF/A-1a the only compliance level supported for PPT to PDF/A conversion?

A3: No, Spire.Presentation supports multiple compliance levels for PPT to PDF/A conversion, including PDF/A-1a, PDF/A-2a, PDF/A-3a, PDF/A-1b, PDF/A-2b, and PDF/A-3b.

Get a Free License

To fully experience the capabilities of Spire.Presentation for Python without any evaluation limitations, you can request a free 30-day trial license.

Python: Insert Images in Word

2023-09-11 01:47:11 Written by Koohji

Images in Word documents can break up large blocks of text and make the content more visually interesting. In addition, they can also effectively illustrate complex ideas or concepts that are difficult to explain solely through text. In this article, you will learn how to programmatically add images to a Word document using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Insert an Image in a Word Document in Python

Spire.Doc for Python offers the Paragraph.AppendPicture() method to insert an image into a Word document. The following are the detailed steps.

  • Create a Document object.
  • Add a section using Document.AddSection() method.
  • Add two paragraphs to the section using Section.AddParagraph() method.
  • Add text to the paragraphs and set formatting.
  • Load an image and add it to a specified paragraph using Paragraph.AppendPicture() method.
  • Set width and height for the image using DocPicture.Width and DocPicture.Height properties.
  • Set a text wrapping style for the image using DocPicture.TextWrappingStyle property.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Add a seciton
section = document.AddSection()

# Add a paragraph
paragraph1 = section.AddParagraph()
# Add text to the paragraph and set formatting
tr = paragraph1.AppendText("Spire.Doc for Python is a professional Word Python API specifically designed for developers to create, read, write, convert, and compare Word documents with fast and high-quality performance.")
tr.CharacterFormat.FontName = "Calibri"
tr.CharacterFormat.FontSize = 11
paragraph1.Format.LineSpacing = 18
paragraph1.Format.BeforeSpacing = 10
paragraph1.Format.AfterSpacing = 10

# Add another paragraph
paragraph2 = section.AddParagraph()
tr = paragraph2.AppendText("Spire.Doc for Python enables to perform many Word document processing tasks. It supports Word 97-2003 /2007/2010/2013/2016/2019 and it has the ability to convert them to commonly used file formats like XML, RTF, TXT, XPS, EPUB, EMF, HTML and vice versa. Furthermore, it supports to convert Word Doc/Docx to PDF using Python, Word to SVG, and Word to PostScript in high quality.")
# Add text to the paragraph and set formatting
tr.CharacterFormat.FontName = "Calibri"
tr.CharacterFormat.FontSize = 11
paragraph2.Format.LineSpacing = 18

# Add an image to the specified paragraph
picture = paragraph1.AppendPicture("Spire.Doc.jpg")

# Set image width and height
picture.Width = 100
picture.Height = 100

# Set text wrapping style for the image
picture.TextWrappingStyle = TextWrappingStyle.Square

#Save the result document
document.SaveToFile("InsertImage.docx", FileFormat.Docx)
document.Close()

Python: Insert Images in Word

Insert an Image at a Specified Location in a Word document in Python

If you wish to place the image at a specified location in the Word document, you can set its position through the DocPicture.HorizontalPosition and DocPicture.VerticalPosition properties. The following are the detailed steps.

  • Create a Document object.
  • Add a section using Document.AddSection() method.
  • Add a paragraph to the section using Section.AddParagraph() method.
  • Add text to the paragraph and set formatting.
  • Add an image to the paragraph using Paragraph.AppendPicture() method.
  • Set width and height for the image using DocPicture.Width and DocPicture.Height properties.
  • Set the horizontal position and vertical position for the image using DocPicture.HorizontalPosition and DocPicture.VerticalPosition properties.
  • Set a text wrapping style for the image using DocPicture.TextWrappingStyle property.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Add a section
section = doc.AddSection()

# Add a paragraph to the section
paragraph = section.AddParagraph()

# Add text to the paragraph and set formatting
paragraph.AppendText("The sample demonstrates how to insert an image at a specified location in a Word document.")
paragraph.ApplyStyle(BuiltinStyle.Heading2)

# Add an image to the paragraph
picture = paragraph.AppendPicture("pic.jpg")

# Set image position
picture.HorizontalPosition = 150.0
picture.VerticalPosition = 60.0

# Set image size
picture.Width = 120.0
picture.Height = 180.0

# Set a text wrapping style for the image  (note that the position settings are not applicable when the text wrapping style is Inline)
picture.TextWrappingStyle = TextWrappingStyle.Through

# Save the result document
doc.SaveToFile("WordImage.docx", FileFormat.Docx)
doc.Close()

Python: Insert Images in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Text files are a common file type that contain only plain text without any formatting or styles. If you want to apply formatting or add images, charts, tables, and other media elements to text files, one of the recommended solutions is to convert them to Word files.

Conversely, if you want to efficiently extract content or reduce the file size of Word documents, you can convert them to text format. This article will demonstrate how to programmatically convert text files to Word format and convert Word files to text format using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Text (TXT) to Word in Python

Conversion from TXT to Word is quite simple that requires only a few lines of code. The following are the detailed steps.

  • Create a Document object.
  • Load a text file using Document.LoadFromFile(string fileName) method.
  • Save the text file as a Word file using Document.SaveToFile(string fileName, FileFormat fileFormat) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a TXT file
document.LoadFromFile("input.txt")

# Save the TXT file as Word
document.SaveToFile("TxtToWord.docx", FileFormat.Docx2016)
document.Close()

Python: Convert Text to Word or Word to Text

Convert Word to Text (TXT) in Python

The Document.SaveToFile(string fileName, FileFormat.Txt) method provided by Spire.Doc for Python allows you to export a Word file to text format. The following are the detailed steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile(string fileName) method.
  • Save the Word file in txt format using Document.SaveToFile(string fileName, FileFormat.Txt) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file from disk
document.LoadFromFile("Input.docx")

# Save the Word file in txt format
document.SaveToFile("WordToTxt.txt", FileFormat.Txt)
document.Close()

Python: Convert Text to Word or Word to Text

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Python: Add Comments in Excel

2023-09-08 02:37:24 Written by Koohji

Comment in Excel is a function that allows users to add extra details or remarks as explanatory notes. Comments can be in the form of text or images. It enables users to provide additional information to explain or supplement the data in specified cells. After adding a comment, users can view the content of the comment by hovering the mouse over the cell with the comment. This feature enhances the readability and comprehensibility of the document, helping readers better understand and handle the data in Excel. In this article, we will show you how to add comments in Excel by using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Add Comment with Text in Excel

Spire.XLS for Python allows users to add comment with text in Excel by calling CellRange.AddComment() method. The following are detailed steps.

  • Create an object of Workbook class.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get the first worksheet of this file using Workbook.Worksheets[] property.
  • Get the specified cell by using Worksheet.Range[] property.
  • Set the author and content of the comment and add them to the obtained cell using CellRange.AddComment() method.
  • Set the font of the comment.
  • Save the result file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

inputFile = "sample.xlsx"
outputFile = "CommentWithAuthor.xlsx"
     
#Create an object of Workbook class
workbook = Workbook()

#Load the sample file from disk
workbook.LoadFromFile(inputFile)

#Get the first worksheet
sheet = workbook.Worksheets[0]

#Get the specified cell
range = sheet.Range["B4"]

#Set the author and content of the comment
author = "Jhon"
text = "Emergency task."

#Add comment to the obtained cell 
comment = range.AddComment()
comment.Width = 200
comment.Visible = True
comment.Text = author + ":\n" + text

#Set the font of the comment
font = workbook.CreateFont()
font.FontName = "Tahoma"
font.KnownColor = ExcelColors.Black
font.IsBold = True
comment.RichText.SetFont(0, len(author), font)

#Save the result file
workbook.SaveToFile(outputFile, ExcelVersion.Version2013)
workbook.Dispose()

Python: Add Comments in Excel

Add Comment with Picture in Excel

Additionally, Spire.XLS for Python also enable users to add comment with picture to the specified cell in Excel by using CellRange.AddComment() and ExcelCommentObject.Fill.CustomPicture() methods. The following are detailed steps.

  • Create an object of Workbook class.
  • Get the first worksheet using Workbook.Worksheets[] property.
  • Get the specified cell by using Worksheet.Range[] property and set text for it.
  • Add comment to the obtained cell by using CellRange.AddComment() method.
  • Load an image and fill the comment with it by calling ExcelCommentObject.Fill.CustomPicture() method.
  • Set the height and width of the comment.
  • Save the result file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.common import *

inputFile = "logo.png"
outputFile = "CommentWithPicture.xlsx"

#Create an object of Workbook class
workbook = Workbook()

#Get the first worksheet
sheet = workbook.Worksheets[0]

#Get the specified cell and set text for it
range = sheet.Range["C6"]
range.Text = "E-iceblue"

#Add comment to the obtained cell 
comment = range["C6"].AddComment()

#Load an image file and fill the comment with it
image = Image.FromFile(inputFile)
comment.Fill.CustomPicture(image, "logo.png")

#Set the height and width of the comment
comment.Height = image.Height
comment.Width = image.Width
comment.Visible = True

#Save the result file
workbook.SaveToFile(outputFile, ExcelVersion.Version2010)
workbook.Dispose()

Python: Add Comments in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Hyperlinks are a commonly used tool in Excel that facilitates navigation between different sheets, workbooks, websites, or even specific cells within a worksheet. There are instances where you may need to manage hyperlinks in Excel files, such as extracting hyperlinks for further analysis, modifying existing links, or removing them entirely. In this article, we will introduce how to extract, modify, and remove hyperlinks in Excel in Python using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Extract Hyperlinks from Excel in Python

Extracting hyperlinks from an Excel worksheet can be beneficial when you need to analyze or export the link data for further processing.

The following steps demonstrate how to extract hyperlinks from an Excel worksheet in Python using Spire.XLS for Python:

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet using Workbook.Worksheets[] property.
  • Get the collection of all hyperlinks in the worksheet using Worksheet.HyperLinks property.
  • Create an empty list to store the extracted hyperlink information.
  • Loop through the hyperlinks in the hyperlink collection.
  • Get the address of each hyperlink using XlsHyperlink.Address property and append the address to the list.
  • Write the addresses in the list into a text file.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Hyperlinks.xlsx")

# Get the first worksheet of the file
sheet = workbook.Worksheets[0]

# Get the hyperlink collection of the worksheet
links = sheet.HyperLinks

# Create an empty list to store the extracted hyperlinks
list = []

# Loop through the hyperlinks in the hyperlink collection
for link in links:
    # Get the address of each hyperlink
    address = link.Address
    # Append the address to the list
    list.append(address)

# Write the extracted hyperlink addresses to a text file
with open("ExtractHyperlinks.txt", "w", encoding = "utf-8") as file:
    for item in list:
        file.write(item + "\n")

workbook.Dispose()

Python: Extract, Modify or Remove Hyperlinks in Excel

Modify Hyperlinks in Excel in Python

Modifying hyperlinks allows you to update URLs or alter the display text to suit your needs.

The following steps demonstrate how to modify an existing hyperlink in an Excel worksheet in Python using Spire.XLS for Python:

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet using Workbook.Worksheets[] property.
  • Get a specific hyperlink in the worksheet using Worksheet.HyperLinks[] property.
  • Modify the display text and address of the hyperlink using XlsHyperlink.TextToDisplay and XlsHyperlink.Address properties.
  • Save the resulting file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Hyperlinks.xlsx")

# Get the first worksheet of the file
sheet = workbook.Worksheets[0]

# Get the first hyperlink in the worksheet
link = sheet.HyperLinks[0]

# Change the display text of the hyperlink
link.TextToDisplay = "Spire.XLS for .NET"
# Change the address of the hyperlink
link.Address = "http://www.e-iceblue.com"

# Save the resulting file
workbook.SaveToFile("ModifyHyperlink.xlsx", ExcelVersion.Version2016)

workbook.Dispose()

Python: Extract, Modify or Remove Hyperlinks in Excel

Remove Hyperlinks from Excel in Python

Removing hyperlinks can help eliminate unnecessary links and clean up your spreadsheet.

The following steps demonstrate how to remove a specific hyperlink from an Excel worksheet in Python using Spire.XLS for Python:

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet using Workbook.Worksheets[] property.
  • Remove a specific hyperlink from the worksheet using Worksheet.Hyperlinks.RemoveAt() method.
  • Save the resulting file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Hyperlinks.xlsx")

# Get the first worksheet of the file
sheet = workbook.Worksheets[0]

# Remove the first hyperlink and keep its display text
sheet.HyperLinks.RemoveAt(0)

# Save the resulting file
workbook.SaveToFile("RemoveHyperlink.xlsx", ExcelVersion.Version2016)

workbook.Dispose()

Python: Extract, Modify or Remove Hyperlinks in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

By extracting text from Word documents, you can effortlessly obtain the written information contained within them. This allows for easier manipulation, analysis, and organization of textual content, enabling tasks such as text mining, sentiment analysis, and natural language processing. Extracting images, on the other hand, provides access to visual elements embedded within Word documents, which can be crucial for tasks like image recognition, content extraction, or creating image databases. In this article, you will learn how to extract text and images from a Word document in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Extract Text from a Specific Paragraph in Python

To get a certain paragraph from a section, use Section.Paragraphs[index] property. Then, you can get the text of the paragraph through Paragraph.Text property. The detailed steps are as follows.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Get a specific section through Document.Sections[index] property.
  • Get a specific paragraph through Section.Paragraphs[index] property.
  • Get text from the paragraph through Paragraph.Text property.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Get a specific section
section = doc.Sections.get_Item(0)

# Get a specific paragraph
paragraph = section.Paragraphs.get_Item(2)

# Get text from the paragraph
str = paragraph.Text

# Print result
print(str)

Python: Extract Text and Images from Word Documents

Extract Text from an Entire Word Document in Python

If you want to get text from a whole document, you can simply use Document.GetText() method. Below are the steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Get text from the document using Document.GetText() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Get text from the entire document
str = doc.GetText()

# Print result
print(str)

Python: Extract Text and Images from Word Documents

Extract Images from an Entire Word Document in Python

Spire.Doc for Python does not provide a straightforward method to get images from a Word document. You need to iterate through the child objects in the document, and determine if a certain a child object is a DocPicture. If yes, you get the image data using DocPicture.ImageBytes property and then save it as a popular image format file. The main steps are as follows.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Loop through the child objects in the document.
  • Determine if a specific child object is a DocPicture. If yes, get the image data through DocPicture.ImageBytes property.
  • Write the image data as a PNG file.
  • Python
import queue
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a Queue object
nodes = queue.Queue()
nodes.put(doc)

# Create a list
images = []

while nodes.qsize() > 0:
    node = nodes.get()

    # Loop through the child objects in the document
    for i in range(node.ChildObjects.Count):
        child = node.ChildObjects.get_Item(i)

        # Determine if a child object is a picture
        if child.DocumentObjectType == DocumentObjectType.Picture:
            picture = child if isinstance(child, DocPicture) else None
            dataBytes = picture.ImageBytes

            # Add the image data to the list 
            images.append(dataBytes)
         
        elif isinstance(child, ICompositeObject):
            nodes.put(child if isinstance(child, ICompositeObject) else None)

# Loop through the images in the list
for i, item in enumerate(images):
    fileName = "Image-{}.png".format(i)
    with open("ExtractedImages/"+fileName,'wb') as imageFile:

        # Write the image to a specified path
        imageFile.write(item)
doc.Close()

Python: Extract Text and Images from Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Automate PDF to Excel in Python with Spire.PDF

Converting PDF files to Excel spreadsheets in Python is an effective way to extract structured data for analysis, reporting, and automation. While PDFs are excellent for preserving layout across platforms, their static format often makes data extraction challenging.

Excel, on the other hand, provides robust features for sorting, filtering, calculating, and visualizing data. By using Python along with the Spire.PDF for Python library, you can automate the entire PDF to Excel conversion process — from basic one-page documents to complex, multi-page PDFs.

Whether you're automating data extraction from PDFs or integrating PDF content into Excel workflows, this tutorial will walk you through both quick-start and advanced methods for reliable Python PDF to Excel conversion.

Table of Contents

Why Convert PDF to Excel Programmatically in Python

PDFs are ideal for sharing documents with consistent formatting, but their fixed structure makes them difficult to analyze or reuse, especially if they contain tables.

Converting PDF to Excel allows you to:

  • Extract tabular data for analysis or visualization
  • Automate monthly or recurring report extraction
  • Enable downstream processing in Excel
  • Save hours of manual copy-pasting

Using Python for this task adds automation, flexibility, and scalability — ideal for integration into data pipelines or backend services.

Setting Up Your Development Environment

Before you start converting PDF files to Excel using Python, it’s essential to set up your development environment properly. This ensures you have all the necessary tools and libraries installed to follow the tutorial smoothly.

Install Python

If you haven’t already installed Python on your system, download and install the latest version from the official website.

Make sure to add Python to your system PATH during installation to run Python commands from the terminal or command prompt easily.

Install Spire.PDF for Python

Spire.PDF for Python is the core library used in this tutorial to load, manipulate, and convert PDF documents.

To install it, open your terminal and run:

pip install Spire.PDF

This command downloads and installs Spire.PDF along with any required dependencies.

If you encounter any issues or need detailed installation help, please refer to our step-by-step guide: How to Install Spire.PDF for Python on Windows

Quick Start: Convert PDF to Excel in Python

If your PDF has a clean and simple layout without complex formatting or multiple page structures, you can convert it directly to Excel with just 3 lines of code using Spire.PDF for Python.

Steps to Quickly Export PDF to Excel

Follow these straightforward steps to export your PDF file to an Excel spreadsheet in Python:

  • Import the required classes.
  • Create a PdfDocument object.
  • Load your PDF file with the LoadFromFile method.
  • Export the PDF to Excel (.xlsx) format using the SaveToFile method and specify FileFormat.XLSX as the output format.

Code Example

from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load your PDF file
pdf.LoadFromFile("Sample.pdf")

# Convert and save the PDF to Excel
pdf.SaveToFile("output.xlsx", FileFormat.XLSX)

# Close the document
pdf.Close()

Advanced PDF to Excel Conversion with Layout Control and Formatting Options

For more complex PDF documents—such as those containing multiple pages, rotated text, table cells with multiple lines of text, or overlapping content - you can use the XlsxLineLayoutOptions class to gain precise control over the conversion process.

This allows you to preserve the original structure and formatting of your PDF more accurately when exporting to Excel.

Layout Options You Can Configure

The XlsxLineLayoutOptions class in Spire.PDF provides several properties that give you granular control over how PDF content is exported to Excel. Below is a breakdown of each option and its behavior:

Option Description
convertToMultipleSheet Determines whether to convert each PDF page into a separate worksheet. The default value is true.
rotatedText Specifies whether to preserve the original rotation of angled text. The default value is true.
splitCell Determines whether to split a PDF table cell with multiple lines of text into separate rows in the Excel output. The default value is true.
wrapText Determines whether to enable word wrap inside Excel cells. The default value is true.
overlapText Specifies whether text overlapping in the original PDF should be preserved in the Excel output. The default value is false.

Code Example

from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load your PDF file
pdf.LoadFromFile("Sample.pdf")

# Define layout options
# Parameters: convertToMultipleSheet, rotatedText, splitCell, wrapText, overlapText
layout_options = XlsxLineLayoutOptions(True, True, False, True, False)

# Apply layout options
pdf.ConvertOptions.SetPdfToXlsxOptions(layout_options)

# Convert and save the PDF to Excel
pdf.SaveToFile("advanced_output.xlsx", FileFormat.XLSX)

# Close the document
pdf.Close()

Advanced PDF to Excel conversion example in Python with Spire.PDF

Conclusion

Converting PDF files to Excel in Python is an efficient way to automate data extraction and processing tasks. Whether you need a quick conversion or fine-grained layout control, Spire.PDF for Python offers flexible options that scale from simple to complex scenarios.

Ready to automate your PDF to Excel conversions?
Get a free trial license for Spire.PDF for Python and explore the full Spire.PDF Documentation to get started today!

FAQs

Q1: Can I convert each PDF page into a separate Excel worksheet?

A1: Yes. Use the convertToMultipleSheet=True option in the XlsxLineLayoutOptions class to export each page to its own sheet.

Q2. What Excel format does Spire.PDF export to?

A2: Spire.PDF converts PDFs to .xlsx, the modern Excel format supported by Excel 2007 and later.

Q3: Can I convert a PDF to Excel in Python without losing formatting?

A3: Yes. Using Spire.PDF for Python, you can retain the original formatting, including merged cells, cell background colors, and other format settings when saving PDFs to Excel.

Q4: Can I extract only a specific table from a PDF to Excel instead of converting the whole document?

A4: Yes, Spire.PDF for Python supports extracting specific tables from PDF files. You can then write the extracted table data to Excel using our Excel processing library - Spire.XLS for Python.

Merging PDF files is a common task in many applications, from combining report sections to creating comprehensive document collections. For developers, using Python to merge PDF files programmatically can significantly streamline the process and help build automated workflows.

This article explores how to merge PDFs in Python using Spire.PDF for Python - a robust library designed for efficient PDF manipulation.

Visual guide of Python Merge PDF

Table of Contents:

5 Reasons Why You Should Use Python to Combine PDFs

While GUI tools like Adobe Acrobat offer PDF merging capabilities, Python provides distinct advantages for developers and enterprises. Python’s PDF merging feature shines when you need to:

  • Process documents in bulk
  • Schedule scripts to run automatically (e.g., daily report merging).
  • Integrate with data workflows
  • Implement business-specific logic
  • Deploy in server/cloud environments

Step-by-Step: Merge PDF Files in Python

Step 1: Install Spire.PDF for Python

Before you can start combining PDFs with Spire.PDF for Python, you need to install the library. You can do this using pip, the Python package manager. Open your terminal and run the following command:

pip install Spire.PDF

Step 2: Merge Multiple PDF Files into One

Now, let's dive into the Python code for merging multiple PDF files into a single PDF.

1. Import the Required Classes

First, import the necessary classes from the Spire.PDF library:

from spire.pdf.common import *
from spire.pdf import *

2. Define Paths of PDFs to Merge

Define three PDF file paths and stored them in a list. You can modify these paths or adjust the number of files according to your needs.

inputFile1 = "Sample1.pdf"
inputFile2 = "Sample2.pdf"
inputFile3 = "Sample3.pdf"
files = [inputFile1, inputFile2, inputFile3]

3. Merge PDF Files

The MergeFiles() method combines all PDFs in the list into a new PDF document object.

pdf = PdfDocument.MergeFiles(files)

4. Save the Merged PDF Finally, save the combined PDF to a specified output path.

pdf.Save("output/MergePDF.pdf", FileFormat.PDF)

Full Python Code to Combine PDFs:

from spire.pdf.common import *
from spire.pdf import *

# Create a list of the PDF file paths
inputFile1 = "Sample1.pdf"
inputFile2 = "Sample2.pdf"
inputFile3 = "Sample3.pdf"
files = [inputFile1, inputFile2, inputFile3]

# Merge the PDF documents
pdf = PdfDocument.MergeFiles(files)

# Save the result document
pdf.Save("output/MergePDF.pdf", FileFormat.PDF)
pdf.Close()

Result: Combine three PDF files (total of 6 pages) into one PDF file.

Merge multiple PDF files into a single PDF

Advanced: Merge Selected Pages from PDFs in Python

In some cases, you may only want to merge specific pages of multiple PDFs. Spire.PDF for Python makes this easy by allowing you to select pages from different PDF documents and insert them into a new PDF file.

from spire.pdf import *
from spire.pdf.common import *

# Create a list of the PDF file paths
file1 = "Sample1.pdf"
file2 = "Sample2.pdf"
file3 = "Sample3.pdf"
files = [file1, file2, file3]

# Load each PDF file as an PdfDocument object and add them to a list
pdfs = []
for file in files:
    pdfs.append(PdfDocument(file))

# Create an object of PdfDocument class
newPdf = PdfDocument()

# Insert the selected pages from the loaded PDF documents into the new document
newPdf.InsertPage(pdfs[0], 0)
newPdf.InsertPage(pdfs[1], 1)
newPdf.InsertPageRange(pdfs[2], 0, 1)

# Save the new PDF document
newPdf.SaveToFile("output/SelectedPages.pdf")

Explanation:

  • PdfDocument(): Initializes a new PDF document object.
  • InsertPage(): Insert a specified page to the new PDF (Page index starts at 0).
  • InsertPageRange(): Inserts a range of pages to the new PDF.
  • SaveToFile(): Save the combined PDF to the specified output path.

Result: Combine selected pages from three separate PDF files into a new PDF.

Combine pages from different PDFs to a new PDF

Batch Processing: Merge Multiple PDF Files in a Folder

The Python script loops through each source PDF in a specified folder, then appends all pages from the source PDFs to a new PDF file.

import os 
from spire.pdf.common import *
from spire.pdf import *

# Specify the directory where the source PDFs are stored
folder = "pdf_folder/"  

# Create a new PDF to hold the combined content.
merged_pdf = PdfDocument()  

# Loop through each source PDF
for file in os.listdir(folder):  
    if file.endswith(".pdf"):  
        pdf = PdfDocument(os.path.join(folder, file))  
        # Appends all pages from each source PDF to the new PDF
        merged_pdf.AppendPage(pdf)  
        pdf.Close()  # Close source PDF

# Save the merged PDF after processing all files
merged_pdf.SaveToFile("BatchCombinePDFs.pdf")
merged_pdf.Close()  # Release resources 

Frequently Asked Questions

Q1: Is Spire.PDF for Python free?

A: Spire.PDF for Python offers a 30-day free trial with full features. There’s also a free version available but with page limits.

Q2: Can I merge scanned/image-based PDFs?

A: Yes, Spire.PDF handles image-only PDFs. However, OCR/text extraction requires the Spire.OCR for Python library.

Q3: How to add page numbers to the merged PDF?

A: Refer to this comprehensive guide: Add Page Numbers to PDF in Python

Q4: How to reduce the size of the merged PDF?

A: You can compress the high-resolution images and fonts contained in the merged PDF file. A related tutorial: Compress PDF Documents in Python.


Conclusion

Merging PDFs with Python doesn't have to be a complex task. With Spire.PDF for Python, you can efficiently combine multiple PDF files into a single document with just a few lines of code. Whether you need to merge entire documents, specific pages, or a batch merge, this guide outlines step-by-step instructions to help you automate the PDF merging process.

Explore Spire.PDF's online documentation for more PDF prcessing features with Python.

Page 24 of 26
page 24