In the realm of document management, the ability to add headers and footers to PDFs has become an essential feature. This functionality allows individuals and businesses to enhance the visual appeal, branding, and organization of their PDF documents.

By incorporating headers and footers, users can customize their PDFs with important contextual information, such as document titles, page numbers, logos, dates, copyright notices, or confidentiality disclaimers. This not only helps establish a professional look but also improves document navigation and ensures compliance with legal requirements.

In this article, we will delve into the process of seamlessly integrating headers and footers into existing PDF files by using the Spire.PDF for Python library.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Coordinate System in an Existing PDF

When using Spire.PDF for Python to manipulate an existing PDF document, the coordinate system's origin is positioned at the top left corner of the page. The x-axis extends to the right, while the y-axis extends downward.

Python: Add Header and Footer to an Existing PDF Document

Understanding coordinate system is crucial for us, as nearly all newly added elements on a PDF page need to be positioned using specified coordinates. The process of creating headers and footers on PDF pages involves adding text, images, shapes, automatic fields, or other elements to the upper or lower margins of the page at designated coordinates.

Classes and Methods for Creating Header and Footer

In Spire.PDF for Python, there are several methods available for drawing elements on a PDF page. The PdfCanvas class provides the methods DrawString(), DrawImage(), and DrawLine(), which allow users to draw strings, images, and lines respectively, at specific coordinates on the page.

Additionally, Spire.PDF for Python offers specialized classes such as PdfPageNumberField, PdfPageCountField, and PdfSectionNumberField. These classes enable automatic access to the current page number, page count, and section number. Moreover, these classes include the Draw() method, which facilitates the easy addition of dynamic information to the header or footer section of the PDF document.

Add Header to an Existing PDF Document in Python

A header refers to a section that appears at the top of each page. The header typically contains information such as a logo, document title, date, or any other relevant details that provide context or branding to the document.

To add a header consisting of text, an image, a line and a section number to a PDF document, you can follow these steps:

  • Create a PdfDocuemnt object.
  • Load an existing PDF document from the specified path.
  • Define the header content:
    • Specify the text to be added to the header.
    • Load an image for the header.
    • Create a PdfSectionNumberField object to get the current section number, and create a PdfCompositeField object to combine text and the section number in a single field.
  • Add the header to each page: Iterate through each page of the PDF document and add the header content at the designated position by using the Canvas.DrawString(), Canvas.DrawImage(), Canvas.DrawLine(), and PdfCompositeField.Draw() methods. When calling these methods, you need to consider the page size and margins when determining the position.
  • Save the modified PDF to a new file or overwrite the existing file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Terms of service.pdf")

# Load an image
headerImage = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\Logo-Small.png")

# Get the image width in pixel
width = headerImage.Width

# Get the image width in point
unitCvtr = PdfUnitConvertor()
pointWidth = unitCvtr.ConvertUnits(width, PdfGraphicsUnit.Pixel, PdfGraphicsUnit.Point)

# Create font, brush and pen
firstFont = PdfTrueTypeFont("Times New Roman", 18.0, PdfFontStyle.Bold, True)
secondFont = PdfTrueTypeFont("Times New Roman", 12.0, PdfFontStyle.Regular, True)
brush = PdfBrushes.get_DarkBlue()
pen = PdfPen(PdfBrushes.get_Black(), 1.5)

# Specify text to add to header
headerText = "TERMS OF SERVICE"

# Create a PdfSectionNumberField object
sectionField = PdfSectionNumberField(firstFont, brush)

# Create a PdfCompositeField object 
compositeField = PdfCompositeField(secondFont, brush, "Section: [{0}]", [sectionField])

# Set the location of the composite field
compositeField.Location = PointF(72.0, 45.0)

# Iterate throuh the pages in the document
for i in range(doc.Pages.Count):

    # Get a specific page
    page = doc.Pages[i]  

    # Draw an image at the specified position
    page.Canvas.DrawImage(headerImage, page.ActualSize.Width - pointWidth - 72.0, 20.0)

    # Draw a string at the specified position 
    page.Canvas.DrawString(headerText, firstFont, brush, 72.0, 25.0)

    # Draw a line at the specified position
    page.Canvas.DrawLine(pen, 72.0, 65.0, page.ActualSize.Width - 72.0, 65.0)

    # Draw composite on the page
    compositeField.Draw(page.Canvas, 0.0, 0.0)

# Save the changes to a different PDF file
doc.SaveToFile("Output/AddHeader.pdf")

# Dispose resources
doc.Dispose()

Python: Add Header and Footer to an Existing PDF Document

Add Footer to an Existing PDF Document in Python

A footer refers to a section that appears at the bottom of each page. The footer may contain information such as page numbers, copyright information, author name, date, or any other relevant details that provide additional context or navigation aids to the reader.

To add a footer which includes a line and "Page X of Y" to a PDF document, follow the steps below.

  • Create a PdfDocuemnt object.
  • Load an existing PDF document from the specified path.
  • Define the footer content: Create a PdfPageNumberField object to get the current page number, and a PdfPageCountField object to get the total page count. In order to create a "Page X of Y" format, you can utilize a PdfCompositeField object to combine text and these two automatic fields in a single field.
  • Add the footer to each page: Iterate through each page of the PDF document and add a line using the Canvas.DrawLine() method. Add the page number and page count to the footer space using the PdfCompositeField.Draw() method. When calling these methods, you need to consider the page size and margins when determining the position.
  • Save the modified PDF to a new file or overwrite the existing file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Terms of service.pdf")

# Create font, brush and pen
font = PdfTrueTypeFont("Times New Roman", 12.0, PdfFontStyle.Bold, True)
brush = PdfBrushes.get_Black()
pen = PdfPen(brush, 1.5)

# Create a PdfPageNumberField object and a PdfPageCountField object
pageNumberField = PdfPageNumberField()
pageCountField = PdfPageCountField()

# Create a PdfCompositeField object to combine page count field and page number field in a single string
compositeField = PdfCompositeField(font, brush, "Page {0} of {1}", [pageNumberField, pageCountField])

# Get the page size
pageSize = doc.Pages[0].Size

# Set the location of the composite field
compositeField.Location = PointF(72.0, pageSize.Height - 45.0)

# Iterate through the pages in the document
for i in range(doc.Pages.Count):

    # Get a specific page
    page = doc.Pages[i]

    # Draw a line at the specified position
    page.Canvas.DrawLine(pen, 72.0, pageSize.Height - 50.0, pageSize.Width - 72.0, pageSize.Height - 50.0)

    # Draw the composite field on the page
    compositeField.Draw(page.Canvas, 0.0, 0.0)

# Save to a different PDF file
doc.SaveToFile("Output/AddFooter.pdf")

# Dispose resources
doc.Dispose()

Python: Add Header and Footer to an Existing PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Efficient document organization and navigability are crucial for lengthy Word documents. One powerful way to streamline document readability and accessibility is by incorporating a table of contents (TOC) into a Word document, which allows readers to quickly locate specific sections and jump to relevant content. By harnessing the capabilities of Python, users can effortlessly generate a table of contents that dynamically updates as the document evolves. This article provides a step-by-step guide and code examples for inserting a table of contents into a Word document in Python programs using Spire.Doc for Python, empowering users to create professional-looking documents with ease.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows

Insert the Default Table of Contents into a Word Document

Spire.Doc for Python supports inserting a table of contents in a Word document based on the headings of different levels. If the document does not have heading levels set, developers can set the heading levels using the Paragraph.ApplyStyle(BuiltinStyle) method before inserting a table of contents.

By using the Paragraph.AppendTOC(lowerLevel: int, upperLevel: int) method, developers can insert a table of contents at any paragraph and specify the titles to be displayed. It is important to note that after inserting the table of contents, developers need to use the Document.UpdateTableOfContents() method to update the table of contents so that its contents are displayed correctly.

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Add a section using Document.AddSection() method, add a paragraph to the section using Section.AddParagraph() method, and insert the new section after the cover section using Document.Sections.Insert(index: int, entity: Section) method.
  • Update the table of contents using Document.UpdateTableOfContents() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Create a section for the table of contents
section = doc.AddSection()

# Add a paragraph in the section
paragraph = section.AddParagraph()

# Append a table of contents in the paragraph
paragraph.AppendTOC(1, 2)

# Insert the section after the cover section
doc.Sections.Insert(1, section)

# Update the table of contents
doc.UpdateTableOfContents()

# Save the document
doc.SaveToFile("output/DefaultTOC.docx")
doc.Close()

Python: Insert a Table of Contents into a Word Document

Insert a Custom Table of Contents into a Word Document

Developers can also create a table of contents by initializing a TableOfContent object, and customize it through switches. For example, the switch "{\\o \"1-2\" \\n 1-1}" indicates showing headings from level one to level three in the table of contents and omitting the page numbers of level one headings. The detailed steps for inserting a customized table of contents into a Word document are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Add a section to the document using Document.AddSecction() method, add a paragraph to the section using Section.AddParagraph() method, and insert the section after the cover section using Document.Sections.Insert() method.
  • Create an object of TableOfContents class and insert it into the added paragraph using Paragraph.Items.Add() method.
  • Append field separator and field end mark to end the TOC filed using Paragraph.AppendFieldMark() method.
  • Set the created table of contents as the table of contents of the document through Document.TOC property.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class and load a Word document
doc = Document()
doc.LoadFromFile("Sample.docx")

# Add a section and a paragraph and insert the section after the cover section
section = doc.AddSection()
paragraph = section.AddParagraph()
doc.Sections.Insert(1, section)

# Customize a table of contents with switches
toc = TableOfContent(doc, "{\\o \"1-2\" \\n 1-1}")

# Insert the TOC to the paragraph
paragraph.Items.Add(toc)

# Insert field separator and filed end mark to end the TOC field
paragraph.AppendFieldMark(FieldMarkType.FieldSeparator)
paragraph.AppendFieldMark(FieldMarkType.FieldEnd)

# Set the TOC field as the table of contents of the document
doc.TOC = toc

# Update the TOC
doc.UpdateTableOfContents()

# Save the document
doc.SaveToFile("output/CustomizedTOC.docx")
doc.Close()

Python: Insert a Table of Contents into a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Converting Word documents to XPS, PostScript, and OFD documents is of significant importance. Firstly, this conversion makes it easier to share and display documents across different platforms and applications, as these formats typically have broader compatibility.

Secondly, converting to these formats can preserve the document's formatting, layout, and content, ensuring consistent display across different systems.

Additionally, XPS and OFD formats support high-quality printing, helping to maintain the visual appearance and print quality of the document. The PostScript format is commonly used for printing and graphic processing, converting to PostScript can ensure that the document maintains high quality when printed.

In this article, you will learn how to convert Word to XPS, PostScript, or OFD with Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Word to XPS in Python

The Document.SaveToFile(filename:str, FileFormat.XPS) method provided by Spire.Doc for Python can convert a Word document to XPS format. The detailed steps are as follows:

  • Create an object of the Document class.
  • Use the Document.LoadFromFile() method to load the Word document.
  • Use the Document.SaveToFile(filename:str, FileFormat.XPS) method to convert the Word document to an XPS document.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Save the loaded document as an XPS document
doc.SaveToFile("Result.xps", FileFormat.XPS)

# Close the document object and release the resources occupied by the document object
doc.Close()
doc.Dispose()

Python: Convert Word to XPS, PostScript, or OFD

Convert Word to PostScript in Python

With Document.SaveToFile(filename:str, FileFormat.PostScript) method in Spire.Doc for Python, you can convert a Word document to PostScript format. The detailed steps are as follows:

  • Create an object of the Document class.
  • Use the Document.LoadFromFile() method to load the Word document.
  • Use the Document.SaveToFile(filename:str, FileFormat.PostScript) method to convert the Word document to a PostScript document.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# # Save the loaded document as a PostScript document
doc.SaveToFile("Result.ps", FileFormat.PostScript)

# Close the document object and release the resources occupied by the document object
doc.Close()
doc.Dispose()

Python: Convert Word to XPS, PostScript, or OFD

Convert Word to OFD in Python

By utilizing the Document.SaveToFile() method in the Spire.Doc for Python library and specifying the file format as FileFormat.OFD, you can save a Word document as an OFD file format. The detailed steps are as follows:

  • Create an object of the Document class.
  • Use the Document.LoadFromFile() method to load the Word document.
  • Use the Document.SaveToFile(filename:str, FileFormat.OFD) method to convert the Word document to an OFD document.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Save the loaded document as an OFD document
doc.SaveToFile("Result.ofd", FileFormat.OFD)

# Close the document object and release the resources occupied by the document object
doc.Close()
doc.Dispose()

Python: Convert Word to XPS, PostScript, or OFD

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Python: Convert PDF to XPS

2024-02-28 06:49:31 Written by Koohji

XPS, or XML Paper Specification, is a file format developed by Microsoft as an alternative to PDF (Portable Document Format). Similar to PDF, XPS is specifically designed to preserve the visual appearance and layout of documents across different platforms and devices, ensuring consistent viewing regardless of the software or hardware being used.

Converting PDF files to XPS format offers several notable benefits. Firstly, XPS files are fully supported within the Windows ecosystem. If you work in a Microsoft-centric environment that heavily relies on Windows operating systems and Microsoft applications, converting PDF files to XPS guarantees smooth compatibility and an optimized viewing experience tailored to the Windows platform.

Secondly, XPS files are optimized for printing, ensuring precise reproduction of the document on paper. This makes XPS the preferred format when high-quality printed copies of the document are required.

Lastly, XPS files are based on XML, a widely adopted standard for structured data representation. This XML foundation enables easy extraction and manipulation of content within the files, as well as seamless integration of file content with other XML-based workflows or systems.

In this article, we will demonstrate how to convert PDF files to XPS format in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert PDF to XPS in Python

Converting a PDF file to the XPS file format is very easy with Spire.PDF for Python. Simply load the PDF file using the PdfDocument.LoadFromFile() method, and then save the PDF file to the XPS file format using the PdfDocument.SaveToFile(filename:str, fileFormat:FileFormat) method. The detailed steps are as follows:

  • Create an object of the PdfDocument class.
  • Load the sample PDF file using the PdfDocument.LoadFromFile() method.
  • Save the PDF file to the XPS file format using the PdfDocument.SaveToFile (filename:str, fileFormat:FileFormat) method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
inputFile = "sample.pdf"
outputFile = "ToXPS.xps"

# Create an object of the PdfDocument class
pdf = PdfDocument()
# Load the sample PDF file
pdf.LoadFromFile(inputFile)

# Save the PDF file to the XPS file format
pdf.SaveToFile(outputFile, FileFormat.XPS)
# Close the PdfDocument object
pdf.Close()

Python: Convert PDF to XPS

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Python: Split or Merge PDF Pages

2024-02-26 01:36:28 Written by Koohji

Modifying PDF documents to suit various usage scenarios is a common task for PDF document creators and managers. Among these operations, splitting and merging PDF pages can assist in reorganizing PDF content for printing, typesetting, etc. By using Python programs, developers can easily split one page from a PDF document to several pages or merge multiple PDF pages into a single page. This article will demonstrate how to use Spire.PDF for Python for splitting and merging PDF pages in Python programs.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows

Split One PDF Page into Several PDF Pages with Python

With Spire.PDF for Python, developers draw a PDF page on a new PDF page using the PdfPageBase.CreateTemplate().Draw(newPage PdfPageBase, PointF) method. When drawing, if the current new page cannot fully accommodate the content of the original page, a new page is automatically created, and the remaining content is drawn on it. Therefore, we can create a new PDF document and control the drawing result by specifying the page size to achieve specified division of PDF pages horizontally or vertically.

Here are the steps to vertically split a PDF page into two separate PDF pages:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get the first page of the document using PdfDocument.Pages.get_Item() method.
  • Create a new PDF document by creating an object of PdfDocument class.
  • Set the margins of the new document to 0 through PdfDocument.PageSettings.Margins.All property.
  • Get the width and height of the retrieved page through PdfPageBase.Size.Width property and PdfPageBase.Size.Height property.
  • Set the width of the new PDF document to the same as the retrieved page through PdfDocument.PageSettings.Width property and its height to half of the retrieved page's height through PdfDocument.PageSettings.Height property.
  • Add a new page in the new document using PdfDocument.Pages.Add() method.
  • Draw the content of the retrieved page onto the new page using PdfPageBase.CreateTemplate().Draw() method.
  • Save the new document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Get the first page of the document
page = pdf.Pages.get_Item(0)

# Create a new PDF document
newPdf = PdfDocument()

# Set the margins of the new PDF document to 0
newPdf.PageSettings.Margins.All = 0.0

# Get the width and height of the retrieved page
width = page.Size.Width
height = page.Size.Height

# Set the width of the new PDF document to the same as the retrieved page and its height to half of the retrieved page's height
newPdf.PageSettings.Width = width
newPdf.PageSettings.Height = height / 2

# Add a new page to the new PDF document
newPage = newPdf.Pages.Add()

# Draw the content of the retrieved page onto the new page
page.CreateTemplate().Draw(newPage, PointF(0.0, 0.0))

# Save the new PDF document
newPdf.SaveToFile("output/SplitPDFPage.pdf")
pdf.Close()
newPdf.Close()

Python: Split or Merge PDF Pages

Merge Multiple PDF Pages into a Single Page with Python

Similarly, developers can merge PDF pages by drawing different pages on the same PDF page. It should be noted that the pages to be merged are preferably in the same width or height, otherwise it is necessary to take the maximum value to ensure correct drawing.

The detailed steps for merging two PDF pages into a single PDF page are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get the first and second pages of the document using PdfDocument.Pages.get_Item() method.
  • Create a new PDF document by creating an object of PdfDocument class.
  • Set the margins of the new document to 0 through PdfDocument.PageSettings.Margins.All property.
  • Get the width and height of the two retrieved pages through PdfPageBase.Size.Width property and PdfPageBase.Size.Height property.
  • Set the width of the new PDF document to the same as the retrieved pages through PdfDocument.PageSettings.Width property and its height to the sum of the two retrieved pages' heights through PdfDocument.PageSettings.Height property.
  • Draw the content of the two retrieved pages onto the new page using PdfPageBase.CreateTemplate().Draw() method.
  • Save the new document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample1.pdf")

# Get the first page and the second page of the document
page = pdf.Pages.get_Item(0)
page1 = pdf.Pages.get_Item(0)

# Create a new PDF document
newPdf = PdfDocument()

# Set the margins of the new PDF document to 0
newPdf.PageSettings.Margins.All = 0.0

# Set the page width of the new document to the same as the retrieved page
newPdf.PageSettings.Width = page.Size.Width

# Set the page height of the new document to the sum of the heights of the two retrieved pages
newPdf.PageSettings.Height = page.Size.Height + page1.Size.Height

# Add a new page to the new PDF document
newPage = newPdf.Pages.Add()

# Draw the content of the retrieved pages onto the new page
page.CreateTemplate().Draw(newPage, PointF(0.0, 0.0))
page1.CreateTemplate().Draw(newPage, PointF(0.0, page.Size.Height))

# Save the new document
newPdf.SaveToFile("output/MergePDFPages.pdf")
pdf.Close()
newPdf.Close()

Python: Split or Merge PDF Pages

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Managing document properties in PowerPoint is an essential aspect of presentation creation. These properties serve as metadata that provides important information about the file, such as the author, subject, and keywords. By being able to add, retrieve, or remove document properties, users gain control over the organization and customization of their presentations. Whether it's adding relevant tags for easy categorization, accessing authorship details, or removing sensitive data, effectively managing document properties in PowerPoint ensures seamless collaboration and professionalism in your slide decks.

In this article, you will learn how to add, read, and remove document properties in a PowerPoint file in Python by using the Spire.Presentation for Python library.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows

Prerequisite Knowledge

Document properties can be divided into two types: standard document properties and custom document properties.

  • Standard document properties are pre-defined properties that are commonly used across various PowerPoint presentations. Some examples of standard document properties include title, author, subject, keywords and company. Standard document properties are useful for providing general information and metadata about the presentation.
  • Custom document properties are user-defined properties that allow you to add specific information to a PowerPoint presentation. Unlike standard document properties, custom properties are not predefined and can be tailored to suit your specific needs. Custom properties usually provide information relevant to your presentation that may not be covered by the default properties.

Spire.Presentation for Python offers the DocumentProperty class to work with both standard document properties and custom document properties. The standard document properties can be accessed using the properties like Title, Subject, Author, Manager, Company, etc. of the DocumentProperty class. To add or retrieve custom properties, you can use the set_Item() method and the GetPropertyName() method of the DocumentProperty class.

Add Document Properties to a PowerPoint File in Python

To add or change the standard document properties, you can assign values to the DocumentProperty.Title proerpty, DocumentProperty.Subject property and other similar properties. To add custom properties to a presentation, use the DocumentProperty.set_Item(name: str, value: SpireObject) method. The detailed steps are as follows.

  • Create a Presentation object.
  • Load a PowerPoint document using Presentation.LoadFromFile() method.
  • Get the DocumentProperty object.
  • Add standard document properties to the presentation by assigning values to the Title, Subject, Author, Manager, Company and Keywords properties of the object.
  • Add custom properties to the presentation using set_Item() of the object.
  • Save the presentation to a PPTX file using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pptx")

# Get the DocumentProperty object
documentProperty = presentation.DocumentProperty

# Set built-in document properties
documentProperty.Title = "Annual Sales Presentation"
documentProperty.Subject = "Company performance and sales strategy"
documentProperty.Author = "John Smith"
documentProperty.Manager = "Sarah Johnson"
documentProperty.Company = "E-iceblue Corporation"
documentProperty.Category = "Business"
documentProperty.Keywords = "sales, strategy, performance"
documentProperty.Comments = "Please review and provide feedback by Friday"

# Add custom document properties
documentProperty.set_Item("Document ID", Int32(12))
documentProperty.set_Item("Authorized by", String("Product Manager"))
documentProperty.set_Item("Authorized Date", DateTime(2024, 1, 10, 0, 0, 0, 0))

# Save to file
presentation.SaveToFile("output/Properties.pptx", FileFormat.Pptx2019)
presentation.Dispose()

Python: Add, Read, or Remove Document Properties in PowerPoint

Read Document Properties of a PowerPoint File in Python

The DocumentProperty.Title and the similar properties are not only used to set standard properties but can return the values of standard properties as well. Since the name of a custom property is not constant, we need to get the name using the DocumentProperty.GetPropertyName(index: int) method. And then, we're able to get the property's value using the DocumentProperty.get_Item(name: str) method.

The steps to read document properties of a PowerPoint file are as follows.

  • Create a Presentation object.
  • Load a PowerPoint document using Presentation.LoadFromFile() method.
  • Get the DocumentProperty object.
  • Get the standard document properties by using the Title, Subject, Author, Manager, Company, and Keywords properties of the object.
  • Get the count of the custom properties, and iterate through the custom properties.
  • Get the name of a specific custom property by its index using DocumentProperty.GetPropertyName() method.
  • Get the value of the property using DocumentProperty.get_Item() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Properties.pptx")

# Get the DocumentProperty object
documentProperty = presentation.DocumentProperty

# Get the built-in document properties
print("Title: " + documentProperty.Title)
print("Subject: " + documentProperty.Subject)
print("Author: " + documentProperty.Author)
print("Manager : " + documentProperty.Manager)
print("Company: " + documentProperty.Company)
print("Category: " + documentProperty.Category)
print("Keywords: " + documentProperty.Keywords)
print("Comments: " + documentProperty.Comments)

# Get the count of the custom document properties
count = documentProperty.Count

# Iterate through the custom properties
for i in range(count):

    # Get the name of a specific custom property
    customPropertyName = documentProperty.GetPropertyName(i)

    # Get the value of the custom property
    customPropertyValue = documentProperty.get_Item(customPropertyName)

    # Print the result
    print(customPropertyName + ": " + str(customPropertyValue))

Python: Add, Read, or Remove Document Properties in PowerPoint

Remove Document Properties from a PowerPoint File in Python

Removing a standard property means assigning an empty string to a property like DocumentProperty.Title. To remove the custom properties, Spire.Presentation provides the DocumentProperty.Remove(name: str) method. The following are the steps to remove document properties from a PowerPoint file in Python.

  • Create a Presentation object.
  • Load a PowerPoint document using Presentation.LoadFromFile() method.
  • Get the DocumentProperty object.
  • Set the Title, Subject, Author, Manager, Company, and Keywords properties of the object to empty strings.
  • Get the count of the custom properties.
  • Get the name of a specific custom property by its index using DocumentProperty.GetPropertyName() method.
  • Remove the custom property using DocumentProperty.Remove() method.
  • Save the presentation to a PPTX file using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint document
presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Properties.pptx")

# Get the DocumentProperty object
documentProperty = presentation.DocumentProperty

# Set built-in document properties to empty strings
documentProperty.Title = ""
documentProperty.Subject = ""
documentProperty.Author = ""
documentProperty.Manager = ""
documentProperty.Company = ""
documentProperty.Category = ""
documentProperty.Keywords = ""
documentProperty.Comments = ""

# Get the count of the custom document properties
i = documentProperty.Count
while i > 0:

    # Get the name of a specific custom property
    customPropertyName = documentProperty.GetPropertyName(i - 1)

    # Remove the custom property
    documentProperty.Remove(customPropertyName)
    i = i - 1

# Save the presentation to a different pptx file
presentation.SaveToFile("Output/RemoveProperties.pptx",FileFormat.Pptx2019)

Python: Add, Read, or Remove Document Properties in PowerPoint

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Images are an effective tool for conveying complex information. By inserting images into tables, you can enhance data presentation with charts, graphs, diagrams, illustrations, and more. This not only enables readers to easily comprehend the information being presented but also adds visual appeal to your document. In certain cases, you may also come across situations where you need to extract images from tables for various purposes. For example, you might want to reuse an image in a presentation, website, or another document. Extracting images allows you to repurpose them, streamlining your content creation process and increasing efficiency. In this article, we will explore how to insert and extract images in Word tables in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Insert Images into a Word Table in Python

Spire.Doc for Python provides the TableCell.Paragraphs[index].AppendPicture() method to add an image to a specific table cell. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Get a specific section in the document using the Document.Sections[index] property.
  • Get a specific table in the section using the Section.Tables[index] property.
  • Access a specific cell in the table using the Table.Row[index].Cells[index] property.
  • Add an image to the cell using the TableCell.Paragraphs[index].AppendPicture() method and set the image width and height.
  • Save the result document using the Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("Table2.docx")

# Get the first section
section = doc.Sections.get_Item(0)

# Get the first table in the section
table = section.Tables.get_Item(0)

# Add an image to the 3rd cell of the second row in the table
cell = table.Rows[1].Cells[2]
picture = cell.Paragraphs[0].AppendPicture("doc.png")
# Set image width and height
picture.Width = 100
picture.Height = 100

# Add an image to the 3rd cell of the 3rd row in the table
cell = table.Rows[2].Cells[2]
picture = cell.Paragraphs[0].AppendPicture("xls.png")
# Set image width and height
picture.Width = 100
picture.Height = 100

# Save the result document
doc.SaveToFile("AddImagesToTable.docx", FileFormat.Docx2013)
doc.Close()

Python: Insert or Extract Images in Word Tables

Extract Images from a Word Table in Python

To extract images from a Word table, you need to iterate through all objects in the table and identify the ones of the DocPicture type. Once the DocPicture objects are found, you can access their image bytes using the DocPicture.ImageBytes property, and then save the image bytes to image files. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Get a specific section in the document using the Document.Sections[index] property.
  • Get a specific table in the section using the Section.Tables[index] property.
  • Create a list to store the extracted image data.
  • Iterate through all rows in the table.
  • Iterate through all cells in each row.
  • Iterate through all paragraphs in each cell.
  • Iterate through all child objects in each paragraph.
  • Check if the current child object is of DocPicture type.
  • Get the image bytes of the DocPicture object using the DocPicture.ImageBytes property and append them to the list.
  • Save the image bytes in the list to image files.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("AddImagesToTable.docx")

# Get the first section
section = doc.Sections.get_Item(0)

# Get the first table in the section
table = section.Tables.get_Item(0)

# Create a list to store image bytes
image_data = []

# Iterate through all rows in the table
for i in range(table.Rows.Count):
    row = table.Rows.get_Item(i)
    # Iterate through all cells in each row
    for j in range(row.Cells.Count):
        cell = row.Cells[j]
        # Iterate through all paragraphs in each cell
        for k in range(cell.Paragraphs.Count):
            paragraph = cell.Paragraphs[k]
            # Iterate through all child objects in each paragraph
            for o in range(paragraph.ChildObjects.Count):
                child_object = paragraph.ChildObjects[o]
                # Check if the current child object is of DocPicture type
                if isinstance(child_object, DocPicture):
                    picture = child_object
                    # Get the image bytes
                    bytes = picture.ImageBytes
                    # Append the image bytes to the list
                    image_data.append(bytes)

# Save the image bytes in the list to image files
for index, item in enumerate(image_data):
    image_Name = f"Images/Image-{index}.png"
    with open(image_Name, 'wb') as imageFile:
        imageFile.write(item)

doc.Close()

Python: Insert or Extract Images in Word Tables

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Adding or extracting audio and video in a PowerPoint document can greatly enrich the presentation content, enhance audience engagement, and improve comprehension. By adding audio, you can include background music, narration, or sound effects to make the content more lively and emotionally engaging. Inserting videos allows you to showcase dynamic visuals, demonstrate processes, or explain complex concepts, helping the audience to understand the content more intuitively. Extracting audio and video can help preserve important information or resources for reuse when needed. This article will introduce how to use Python and Spire.Presentation for Python to add or extract audio and video in PowerPoint.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows

Add Audio in PowerPoint Documents in Python

Spire.Presentation for Python provides the Slide.Shapes.AppendAudioMedia() method, which can be used to add audio files to slides. The specific steps are as follows:

  • Create an object of the Presentation class.
  • Use the RectangleF.FromLTRB() method to create a rectangle.
  • In the shapes collection of the first slide, use the Slide.Shapes.AppendAudioMedia() method to add the audio file to the previously created rectangle.
  • Use the Presentation.SaveToFile() method to save the document as a PowerPoint file.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a presentation object
presentation = Presentation()

# Create an audio rectangle
audioRect = RectangleF.FromLTRB(200, 150, 310, 260)

# Add audio
presentation.Slides[0].Shapes.AppendAudioMedia("data/Music.wav", audioRect)

# Save the presentation to a file
presentation.SaveToFile("AddAudio.pptx", FileFormat.Pptx2016)

# Release resources
presentation.Dispose()

Python: Add or Extract Audio and Video from PowerPoint Documents

Extract Audio from PowerPoint Documents in Python

To determine if a shape is of audio type, you can check if its type is IAudio. If the shape is of audio type, you can use the IAudio.Data property to retrieve audio data. The specific steps are as follows:

  • Create an object of the Presentation class.
  • Use the Presentation.LoadFromFile() method to load the PowerPoint document.
  • Iterate through the shapes collection on the first slide, checking if each shape is of type IAudio.
  • If the shape is of type IAudio, use IAudio.Data property to retrieve the audio data from the audio object.
  • Use the AudioData.SaveToFile() method to save the audio data to a file.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a presentation object
presentation = Presentation()

# Load a presentation from a file
presentation.LoadFromFile("Audio.pptx")

# Initialize a counter
i = 1

# Iterate through shapes in the first slide
for shape in presentation.Slides[0].Shapes:

    # Check if the shape is of audio type
    if isinstance(shape, IAudio):

        # Get the audio data and save it to a file
        AudioData = shape.Data
        AudioData.SaveToFile("ExtractAudio_"+str(i)+".wav")
        i = i + 1

# Release resources
presentation.Dispose()

Python: Add or Extract Audio and Video from PowerPoint Documents

Add Video in PowerPoint Documents in Python

Using the Slide.Shapes.AppendVideoMedia() method, you can add video files to slides. The specific steps are as follows:

  • Create an object of the Presentation class.
  • Use the RectangleF.FromLTRB() method to create a rectangle.
  • In the shapes collection of the first slide, use the Slide.Shapes.AppendVideoMedia() method to add the video file to the previously created rectangle.
  • Use the video.PictureFill.Picture.Url property to set the cover image of the video.
  • Use the Presentation.SaveToFile() method to save the document as a PowerPoint file.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a presentation object
presentation = Presentation()

# Create a video rectangle
videoRect = RectangleF.FromLTRB(200, 150, 450, 350)

# Add video
video = presentation.Slides[0].Shapes.AppendVideoMedia("data/Video.mp4", videoRect)
video.PictureFill.Picture.Url = "data/Video.png"

# Save the presentation to a file
presentation.SaveToFile("AddVideo.pptx", FileFormat.Pptx2016)

# Release resources
presentation.Dispose()

Python: Add or Extract Audio and Video from PowerPoint Documents

Extract Video from PowerPoint Documents in Python

The video type is IVideo. If the shape is of type IVideo, you can use the IVideo.EmbeddedVideoData property to retrieve video data. The specific steps are as follows:

  • Create an object of the Presentation class.
  • Use the Presentation.LoadFromFile() method to load the PowerPoint presentation.
  • Iterate through the shapes collection on the first slide, checking if each shape is of type IVideo.
  • If the shape is of type IVideo, use the IVideo.EmbeddedVideoData property to retrieve the video data from the video object.
  • Use the VideoData.SaveToFile() method to save the video data to a file.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a presentation object
presentation = Presentation()

# Load a presentation from a file
presentation.LoadFromFile("Video.pptx")

# Initialize a counter
i = 1

# Iterate through each slide in the presentation
for slide in presentation.Slides:

    # Iterate through shapes in each slide
    for shape in slide.Shapes:

        # Check if the shape is of video type
        if isinstance(shape, IVideo):

            # Get the video data and save it to a file
            VideoData = shape.EmbeddedVideoData
            VideoData.SaveToFile("ExtractVideo_"+str(i)+".avi")
            i = i + 1

# Release resources
presentation.Dispose()

Python: Add or Extract Audio and Video from PowerPoint Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

OLE (Object Linking and Embedding) objects in Word are files or data from other applications that can be inserted into a document. These objects can be edited and updated within Word, allowing you to seamlessly integrate content from various programs, such as Excel spreadsheets, PowerPoint presentations, or even multimedia files like images, audio, or video. In this article, we will introduce how to insert and extract OLE objects in a Word document in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Insert OLE Objects in Word in Python

Spire.Doc for Python provides the Paragraph.AppendOleObject(pathToFile:str, olePicture:DocPicture, type:OleObjectType) method to embed OLE objects in a Word document. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Get a specific section using the Document.Sections.get_Item(index) method.
  • Add a paragraph to the section using the Section.AddParagraph() method.
  • Create an object of the DocPicture class.
  • Load an image that will be used as the icon of the OLE object using the DocPicture.LoadImage() method and then set image width and height.
  • Append an OLE object to the paragraph using the Paragraph.AppendOleObject(pathToFile:str, olePicture:DocPicture, type:OleObjectType) method.
  • Save the result file using the Document.SaveToFile() method.

The following code example shows how to embed an Excel spreadsheet, a PDF file, and a PowerPoint presentation in a Word document using Spire.Doc for Python:

  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("Example.docx")

# Get the first section
section = doc.Sections.get_Item(0)

# Add a paragraph to the section
para1 = section.AddParagraph()
para1.AppendText("Excel File: ")
# Load an image which will be used as the icon of the OLE object
picture1 = DocPicture(doc)
picture1.LoadImage("Excel-Icon.png")
picture1.Width = 50
picture1.Height = 50
# Append an OLE object (an Excel spreadsheet) to the paragraph 
para1.AppendOleObject("Budget.xlsx", picture1, OleObjectType.ExcelWorksheet)

# Add a paragraph to the section
para2 = section.AddParagraph()
para2.AppendText("PDF File: ")
# Load an image which will be used as the icon of the OLE object
picture2 = DocPicture(doc)
picture2.LoadImage("PDF-Icon.png")
picture2.Width = 50
picture2.Height = 50
# Append an OLE object (a PDF file) to the paragraph 
para2.AppendOleObject("Report.pdf", picture2, OleObjectType.AdobeAcrobatDocument)

# Add a paragraph to the section
para3 = section.AddParagraph()
para3.AppendText("PPT File: ")
# Load an image which will be used as the icon of the OLE object
picture3 = DocPicture(doc)
picture3.LoadImage("PPT-Icon.png")
picture3.Width = 50
picture3.Height = 50
# Append an OLE object (a PowerPoint presentation) to the paragraph 
para3.AppendOleObject("Plan.pptx", picture3, OleObjectType.PowerPointPresentation)

doc.SaveToFile("InsertOLE.docx", FileFormat.Docx2013)
doc.Close()

Python: Insert or Extract OLE Objects in Word

Extract OLE Objects from Word in Python

To extract OLE objects from a Word document, you first need to locate the OLE objects within the document. Once located, you can determine the file format of each OLE object. Finally, you can save the data of each OLE object to a file in its native file format. The detailed steps are as follows.

  • Create an instance of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Iterate through all sections of the document.
  • Iterate through all child objects in the body of each section.
  • Identify the paragraphs within each section.
  • Iterate through the child objects in each paragraph.
  • Locate the OLE object within the paragraph.
  • Determine the file format of the OLE object.
  • Save the data of the OLE object to a file in its native file format.

The following code example shows how to extract the embedded Excel spreadsheet, PDF file, and PowerPoint presentation from a Word document using Spire.Doc for Python:

  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("InsertOLE.docx")

i = 1 
# Iterate through all sections of the Word document
for k in range(doc.Sections.Count):
    sec = doc.Sections.get_Item(k)
    # Iterate through all child objects in the body of each section
    for j in range(sec.Body.ChildObjects.Count):
        obj = sec.Body.ChildObjects.get_Item(j)
        # Check if the child object is a paragraph
        if isinstance(obj, Paragraph):
            par = obj if isinstance(obj, Paragraph) else None
            # Iterate through the child objects in the paragraph
            for m in range(par.ChildObjects.Count):
                o = par.ChildObjects.get_Item(m)
                # Check if the child object is an OLE object
                if o.DocumentObjectType == DocumentObjectType.OleObject:
                    ole = o if isinstance(o, DocOleObject) else None
                    s = ole.ObjectType
                    # Check if the OLE object is a PDF file
                    if s.startswith("AcroExch.Document"):
                        ext = ".pdf"
                    # Check if the OLE object is an Excel spreadsheet
                    elif s.startswith("Excel.Sheet"):
                        ext = ".xlsx"
                    # Check if the OLE object is a PowerPoint presentation
                    elif s.startswith("PowerPoint.Show"):
                        ext = ".pptx"
                    else:
                        continue
                    # Write the data of OLE into a file in its native format
                    with open(f"Output/OLE{i}{ext}", "wb") as file:
                            file.write(ole.NativeData)                        
                    i += 1

doc.Close()

Python: Insert or Extract OLE Objects in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

In addition to text and images, PDF files can also contain various types of attachments, such as documents, images, audio files, or other multimedia elements. Extracting attachments from PDF files allows users to retrieve and save the embedded content, enabling easy access and manipulation outside of the PDF environment. This process proves especially useful when dealing with PDFs that contain important supplementary materials, such as reports, spreadsheets, or legal documents.

In this article, you will learn how to extract attachments from a PDF document in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Prerequisite Knowledge

There are generally two categories of attachments in PDF files: document-level attachments and annotation attachments. Below, you can find a table outlining the disparities between these two types of attachments and how they are represented in Spire.PDF.

Attachment type Represented by Definition
Document level attachment PdfAttachment class A file attached to a PDF at the document level won't appear on a page, but can be viewed in the "Attachments" panel of a PDF reader.
Annotation attachment PdfAnnotationAttachment class A file attached as an annotation can be found on a page or in the "Attachments" panel. An annotation attachment is shown as a paper clip icon on the page; reviewers can double-click the icon to open the file.

Extract Document-Level Attachments from PDF in Python

To retrieve document-level attachments in a PDF document, you can use the PdfDocument.Attachments property. Each attachment has a PdfAttachment.FileName property, which provides the name of the specific attachment, including the file extension. Additionally, the PdfAttachment.Data property allows you to access the attachment's data. To save the attachment to a specific folder, you can utilize the PdfAttachment.Data.Save() method.

The steps to extract document-level attachments from a PDF using Python are as follows.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a collection of attachments using PdfDocument.Attachments property.
  • Iterate through the attachments in the collection.
  • Get a specific attachment from the collection, and get the file name and data of the attachment using PdfAttachment.FileName property and PdfAttachment.Data property.
  • Save the attachment to a specified folder using PdfAttachment.Data.Save() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Attachments.pdf")

# Get the attachment collection from the document
collection = doc.Attachments

# Loop through the collection
if collection.Count > 0:
    for i in range(collection.Count):

        # Get a specific attachment
        attactment = collection.get_Item(i)

        # Get the file name and data of the attachment
        fileName= attactment.FileName
        data = attactment.Data

        # Save it to a specified folder
        data.Save("Output\\ExtractedFiles\\" + fileName)

doc.Close()

Python: Extract Attachments from a PDF Document

Extract Annotation Attachments from PDF in Python

The Annotations attachment is a page-based element. To retrieve annotations from a specific page, use the PdfPageBase.AnnotationsWidget property. You then need to determine if a particular annotation is an attachment. If it is, save it to the specified folder while retaining its original filename.

The following are the steps to extract annotation attachments from a PDF using Python.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Iterate though the pages in the document.
  • Get the annotations from a particular page using PdfPageBase.AnnotationsWidget property.
  • Iterate though the annotations, and determine if a specific annotation is an attachment annotation.
  • If it is, get the file name and data of the annotation using PdfAttachmentAnnotation.FileName property and PdfAttachmentAnnotation.Data property.
  • Save the annotated attachment to a specified folder.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\AnnotationAttachment.pdf")

# Iterate through the pages in the document
for i in range(doc.Pages.Count):

    # Get a specific page
    page = doc.Pages.get_Item(i)

    # Get the annotation collection of the page
    annotationCollection = page.AnnotationsWidget

    # If the page has annotations
    if annotationCollection.Count > 0:

        # Iterate through the annotations
        for j in range(annotationCollection.Count):

            # Get a specific annotation
            annotation = annotationCollection.get_Item(j)

            # Determine if the annotation is an attachment annotation
            if isinstance(annotation, PdfAttachmentAnnotationWidget):
              
                # Get the file name and data of the attachment
                fileName = annotation.FileName
                byteData = annotation.Data
                streamMs = Stream(byteData)

                # Save the attachment into a specified folder 
                streamMs.Save("Output\\ExtractedFiles\\" + fileName)

Python: Extract Attachments from a PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 15