Knowledgebase (2311)
Children categories
PDF files often contain critical embedded images (e.g., charts, diagrams, scanned documents). For developers, knowing how to extract images from PDF in Python allows them to repurpose graphical content for automated report generation or feed these visuals into machine learning models for analysis and OCR tasks.

This article explores how to leverage the Spire.PDF for Python library to extract images from PDF files via Python, covering the following aspects:
- Installation & Environment Setup
- How to Extract Images from PDFs using Python
- Handle Different Image Formats While Extraction
- Frequently Asked Questions
- Conclusion (Extract Text and More)
Installation & Environment Setup
Before you start using Spire.PDF for Python to extract images from PDF, make sure you have the following in place:
-
Python Environment: Ensure that you have Python installed on your system. It is recommended to use the latest stable version for the best compatibility and performance.
-
Spire.PDF for Python Library: You need to install the Python PDF SDK, and the easiest way is using pip, the Python package installer.
Open your command prompt or terminal and run the following command:
pip install Spire.PDF
How to Extract Images from PDFs using Python
Example 1: Extract Images from a PDF Page
Here’s a complete Python script to extract and save images from a specified page in PDF:
from spire.pdf.common import *
from spire.pdf import *
# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("template1.pdf")
# Get the first page
page = pdf.Pages.get_Item(0)
# Create a PdfImageHelper instance
imageHelper = PdfImageHelper()
# Get the image information on the page
imageInfo = imageHelper.GetImagesInfo(page)
# Iterate through the image information
for i in range(0, len(imageInfo)):
# Save images to file
imageInfo[i].Image.Save("PageImage\\Image" + str(i) + ".png")
# Release resources
pdf.Dispose()
Key Steps Explained:
- Load the PDF: Use the LoadFromFile() method to load a PDF file.
- Access a Page: Access a specified PDF page by index.
- Extract Image information:
- Create a PdfImageHelper instance to facilitate image extraction.
- Use the GetImagesInfo() method to retrieve image information from the specified page, and return a list of PdfImageInfo objects.
- Save Images to Files:
- Loops through all detected images on the page
- Use the PdfImageInfo[].Image.Save() method to save the image to disk.
Output:

Example 2: Extract All Images from a PDF File
Building on the single-page extraction method, you can iterate through all pages of the PDF document to extract every embedded image.
Python code example:
from spire.pdf.common import *
from spire.pdf import *
# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("template1.pdf")
# Create a PdfImageHelper instance
imageHelper = PdfImageHelper()
# Iterate through the pages in the document
for i in range(0, pdf.Pages.Count):
# Get the current page
page = pdf.Pages.get_Item(i)
# Get the image information on the page
imageInfo = imageHelper.GetImagesInfo(page)
# Iterate through the image information items
for j in range(0, len(imageInfo)):
# Save the current image to file
imageInfo[j].Image.Save(f"Images\\Image{i}_{j}.png")
# Release resources
pdf.Close()
Output:

Handle Different Image Formats While Extraction
Spire.PDF for Python supports extracting images in various formats such as PNG, JPG/JPEG, BMP, etc. When saving the extracted images, you can choose the appropriate format based on your needs.
Common Image Formats:
| Format | Best Use Cases | PDF Extraction Notes |
|---|---|---|
| JPG/JPEG | Photos, scanned documents | Common in PDFs; quality loss on re-compress |
| PNG | Web graphics, diagrams, screenshots | Preserves transparency; larger file sizes |
| BMP | Windows applications, temp storage | Rare in modern PDFs; avoid for web use |
| TIFF | Archiving, print, OCR input | Ideal for document preservation; multi-page |
| EMF | Windows vector editing | Editable in Illustrator/Inkscape |
Frequently Asked Questions
Q1: Is Spire.PDF for Python a free library?
Spire.PDF for Python offers both free and commercial versions. The free version has limitations, such as a maximum of 10 pages per PDF. For commercial use or to remove these restrictions, you can request a trial license here.
Q2: Can I extract images from a specified page range only?
Yes. Instead of iterating through all pages, specify the page indices you want. For example, to extract images from the pages 2 to 5:
# Extract images from pages 2 to 5
for i in range(1, 4): # Pages are zero-indexed
page = pdf.Pages.get_Item(i)
# Process images as before
Q3: Is it possible to extract text from images?
Yes. For scanned PDF files, after extracting the images, you can extract the text in the images in conjunction with the Spire.OCR for Python library.
A step-by-step guide: How to Extract Text from Image Using Python (OCR Code Examples)
Conclusion (Extract Text and More)
Spire.PDF simplifies image extraction from PDF in Python with minimal code. By following this guide, you can:
- Extract images from single pages or entire PDF documents.
- Save images from PDF in various formats (PNG, JPG, BMP or TIFF).
As a PDF document can contain different elements, the Python PDF library is also capable of:
Adding bookmarks to Word documents is a useful feature that allows users to mark specific locations within their documents for quick reference or navigation. Bookmarks serve as virtual placeholders, making it easier to find and revisit important sections of a document without scrolling through lengthy pages. In this article, you will learn how to add bookmarks to a Word document in Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Add Bookmarks to a Paragraph in Python
Spire.Doc for Python offers the BookmarkStart to represent the start of a bookmark and the BookmarkEnd to represent the end of a bookmark. To bookmark a paragraph, a BookmarkStart object is placed at the beginning of the paragraph and a BookmarkEnd object is appended at the end of the paragraph. The following are the detailed steps.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Get a specific paragraph through Document.Sections[index].Paragraphs[index] property.
- Create a BookmarkStart using Paragraph.AppendBookmarkStart() method and insert it at the beginning of the paragraph using Paragraph.Items.Insert() method.
- Append a BookmarkEnd at the end of the paragraph using Paragraph.AppendBookmarkEnd() method.
- Save the document to a different Word file using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
doc = Document()
# Load a sample Word file
doc.LoadFromFile('C:/Users/Administrator/Desktop/input.docx')
# Get the second paragraph
paragraph = doc.Sections[0].Paragraphs[2]
# Create a bookmark start
start = paragraph.AppendBookmarkStart('myBookmark')
# Insert it at the beginning of the paragraph
paragraph.Items.Insert(0, start)
# Append a bookmark end at the end of the paragraph
paragraph.AppendBookmarkEnd('myBookmark')
# Save the file
doc.SaveToFile('output/AddBookmarkToParagraph.docx', FileFormat.Docx2019)

Add Bookmarks to Selected Text in Python
To bookmark a piece of text, you need first to get the text from the document and get its position inside its owner paragraph. And then place a BookmarkStart before it and a BookmarEnd after it. The detailed steps are as follows.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Find the string to be marked from the document.
- Get its owner paragraph and its position inside the paragraph.
- Insert a BookmarkStart before the text and a BookmarkEnd after the text.
- Save the document to a different Word file using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create a Document object
doc = Document()
# Load a sample Word file
doc.LoadFromFile('C:/Users/Administrator/Desktop/input.docx')
# Specify the string to find
stringToFind = 'programming paradigms'
# Find the selected text from the document
finds = doc.FindAllString(stringToFind, False, True)
specificText = finds[0]
# Find the paragraph where the text is located
paragraph = specificText.GetAsOneRange().OwnerParagraph
# Get the index of the text in the paragraph
index = paragraph.ChildObjects.IndexOf(specificText.GetAsOneRange())
# Create a bookmark start
start = paragraph.AppendBookmarkStart("myBookmark")
# Insert the bookmark start at the index position
paragraph.ChildObjects.Insert(index, start)
# Create a bookmark end
end = paragraph.AppendBookmarkEnd("myBookmark")
# Insert the bookmark end at the end of the selected text
paragraph.ChildObjects.Insert(index + 2, end)
# Save the document to a different file
doc.SaveToFile("output/AddBookmarkToSelectedText.docx", FileFormat.Docx2019)

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Large PDF files can sometimes be cumbersome to handle, especially when sharing or uploading them. Splitting a large PDF file into multiple smaller PDFs reduces the file size, making it more manageable and quicker to open and process. In this article, we will demonstrate how to split PDF documents in Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Split a PDF File into Multiple Single-Page PDFs in Python
Spire.PDF for Python offers the PdfDocument.Split() method to divide a multi-page PDF document into multiple single-page PDF files. The following are the detailed steps.
- Create a PdfDocument object.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Split the document into multiple single-page PDFs using PdfDocument.Split() method.
- Python
from spire.pdf.common import *
from spire.pdf import *
# Create a PdfDocument object
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Sample.pdf")
# Split the PDF file into multiple single-page PDFs
doc.Split("Output/SplitDocument-{0}.pdf", 1)
# Close the PdfDocument object
doc.Close()

Split a PDF File by Page Ranges in Python
To split a PDF file into two or more PDF files by page ranges, you need to create two or more new PDF files, and then import the specific page or range of pages from the source PDF into the newly created PDF files. The following are the detailed steps.
- Create a PdfDocument object.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Create three PdfDocument objects.
- Import the first page from the source file into the first document using PdfDocument.InsertPage() method.
- Import pages 2-4 from the source file into the second document using PdfDocument.InsertPageRange() method.
- Import the remaining pages from the source file into the third document using PdfDocument.InsertPageRange() method.
- Save the three documents using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import *
from spire.pdf import *
# Create a PdfDocument object
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Sample.pdf")
# Create three PdfDocument objects
newDoc_1 = PdfDocument()
newDoc_2 = PdfDocument()
newDoc_3 = PdfDocument()
# Insert the first page of the source file into the first document
newDoc_1.InsertPage(doc, 0)
# Insert pages 2-4 of the source file into the second document
newDoc_2.InsertPageRange(doc, 1, 3)
# Insert the rest pages of the source file into the third document
newDoc_3.InsertPageRange(doc, 4, doc.Pages.Count - 1)
# Save the three documents
newDoc_1.SaveToFile("Output1/Split-1.pdf")
newDoc_2.SaveToFile("Output1/Split-2.pdf")
newDoc_3.SaveToFile("Output1/Split-3.pdf")
# Close the PdfDocument objects
doc.Close()
newDoc_1.Close()
newDoc_2.Close()
newDoc_3.Close()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.