Displaying items by tag: pdf Python Conversion

Thursday, 28 December 2023 00:59

Python: Convert SVG to PDF

SVG files are commonly used for web graphics and vector-based illustrations because they can be scaled and adjusted easily. PDF, on the other hand, is a versatile format widely supported across different devices and operating systems. Converting SVG to PDF allows for easy sharing of graphics and illustrations, ensuring that recipients can open and view the files without requiring specialized software or worrying about browser compatibility issues. In this article, we will demonstrate how to convert SVG files to PDF format in Python using Spire.PDF for Python.

Convert SVG to PDF in Python
Add SVG to PDF in Python

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert SVG to PDF in Python

Spire.PDF for Python provides the PdfDocument.LoadFromSvg() method, which allows users to load an SVG file. Once loaded, users can use the PdfDocument.SaveToFile() method to save the SVG file as a PDF file. The detailed steps are as follows.

Create an object of the PdfDocument class.
Load an SVG file using PdfDocument.LoadFromSvg() method.
Save the SVG file to PDF format using PdfDocument.SaveToFile() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()
# Load an SVG file
doc.LoadFromSvg("Sample.svg")

# Save the SVG file to PDF format
doc.SaveToFile("ConvertSvgToPdf.pdf", FileFormat.PDF)
# Close the PdfDocument object
doc.Close()

Python: Convert SVG to PDF

Add SVG to PDF in Python

In addition to converting SVG to PDF directly, Spire.PDF for Python also supports adding SVG files to specific locations in PDF. The detailed steps are as follows.

Create an object of the PdfDocument class.
Load an SVG file using PdfDocument.LoadFromSvg() method.
Create a template based on the content of the SVG file using PdfDocument. Pages[].CreateTemplate() method.
Get the width and height of the template.
Create another object of the PdfDocument class and load a PDF file using PdfDocument.LoadFromFile() method.
Draw the template with a custom size at a specific location in the PDF file using PdfDocument.Pages[].Canvas.DrawTemplate() method.
Save the result file using PdfDocument.SaveToFile() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc1 = PdfDocument()
# Load an SVG file
doc1.LoadFromSvg("Sample.svg")

# Create a template based on the content of the SVG
template = doc1.Pages[0].CreateTemplate()

# Get the width and height of the template 
width = template.Width
height = template.Height

# Create another PdfDocument object
doc2 = PdfDocument()
# Load a PDF file
doc2.LoadFromFile("Sample.pdf")

# Draw the template with a custom size at a specific location on the first page of the loaded PDF file
doc2.Pages[0].Canvas.DrawTemplate(template, PointF(10.0, 100.0), SizeF(width*0.8, height*0.8))

# Save the result file
doc2.SaveToFile("AddSvgToPdf.pdf", FileFormat.PDF)
# Close the PdfDocument objects
doc2.Close()
doc1.Close()

Python: Convert SVG to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Friday, 22 December 2023 01:04

Python: Convert PDF to PDF/A and Vice Versa

PDF/A is a specialized format designed specifically for long-term archiving and preservation of electronic documents. It guarantees that the content, structure, and visual appearance of the documents remain unchanged over time. By converting PDF files to PDF/A format, you ensure the long-term accessibility of the documents, regardless of software, operating systems, or future technological advancements. Conversely, converting PDF/A files to standard PDF format makes it easier to edit, share, and collaborate on the documents, ensuring better compatibility across different applications, devices, and platforms. In this article, we will explain how to convert PDF to PDF/A and vice versa in Python using Spire.PDF for Python.

Convert PDF to PDF/A in Python
Convert PDF/A to PDF in Python

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert PDF to PDF/A in Python

The PdfStandardsConverter class provided by Spire.PDF for Python supports converting PDF to various PDF/A formats, including PDF/A-1a, 2a, 3a, 1b, 2b and 3b. Moreover, it also supports converting PDF to PDF/X-1a:2001. The detailed steps are as follows.

Specify the input file path and output folder.
Create a PdfStandardsConverter object and pass the input file path to the constructor of the class as a parameter.
Convert the input file to a Pdf/A-1a conformance file using PdfStandardsConverter.ToPdfA1A() method.
Convert the input file to a Pdf/A-1b file using PdfStandardsConverter.ToPdfA1B() method.
Convert the input file to a Pdf/A-2a file using PdfStandardsConverter.ToPdfA2A() method.
Convert the input file to a Pdf/A-2b file using PdfStandardsConverter.ToPdfA2B() method.
Convert the input file to a Pdf/A-3a file using PdfStandardsConverter.ToPdfA3A() method.
Convert the input file to a Pdf/A-3b file using PdfStandardsConverter.ToPdfA3B() method.
Convert the input file to a PDF/X-1a:2001 file using PdfStandardsConverter.ToPdfX1A2001() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Specify the input file path and output folder
inputFile = "Sample.pdf"
outputFolder = "Output/"

# Create an object of the PdfStandardsConverter class
converter = PdfStandardsConverter(inputFile)

# Convert the input file to PdfA1A
converter.ToPdfA1A(outputFolder + "ToPdfA1A.pdf")

# Convert the input file to PdfA1B
converter.ToPdfA1B(outputFolder + "ToPdfA1B.pdf")

# Convert the input file to PdfA2A
converter.ToPdfA2A(outputFolder + "ToPdfA2A.pdf")

# Convert the input file to PdfA2B
converter.ToPdfA2B(outputFolder + "ToPdfA2B.pdf")

# Convert the input file to PdfA3A
converter.ToPdfA3A(outputFolder + "ToPdfA3A.pdf")

# Convert the input file to PdfA3B
converter.ToPdfA3B(outputFolder + "ToPdfA3B.pdf")

# Convert the input file to PDF/X-1a:2001
converter.ToPdfX1A2001(outputFolder + "ToPdfX1a.pdf")

Python: Convert PDF to PDF/A and Vice Versa

Convert PDF/A to PDF in Python

To convert a PDF/A file back to a standard PDF format, you need to create a new standard PDF file, and then draw the page content of the PDF/A file to the newly created PDF file. The detailed steps are as follows.

Create a PdfDocument object.
Load a PDF/A file using PdfDocument.LoadFromFile() method.
Create a PdfNewDocument object and set its compression level as none.
Loop through the pages in the original PDF/A file.
Add pages to the newly created PDF using PdfDocumentBase.Pages.Add() method.
Draw the page content of the original PDF/A file to the corresponding pages of the newly created PDF using PdfPageBase.CreateTemplate.Draw() method.
Create a Stream object and then save the new PDF to the stream using PdfNewDocument.Save() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
inputFile = "Output/ToPdfA1A.pdf"
outputFile = "PdfAToPdf.pdf"

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile(inputFile)

# Create a new standard PDF file
newDoc = PdfNewDocument()
newDoc.CompressionLevel = PdfCompressionLevel.none

# Add pages to the newly created PDF and draw the page content of the loaded PDF onto the corresponding pages of the newly created PDF
for i in range(doc.Pages.Count):
    page = doc.Pages.get_Item(i)
    size = page.Size
    p = newDoc.Pages.Add(size, PdfMargins(0.0))
    page.CreateTemplate().Draw(p, 0.0, 0.0)   

# Save the new PDF to a PDF file   
fileStream = Stream(outputFile)
newDoc.Save(fileStream)
fileStream.Close()
newDoc.Close(True)

Python: Convert PDF to PDF/A and Vice Versa

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Thursday, 21 December 2023 01:35

Convert PDF to HTML Using Python: Complete Developer’s Guide

Precisely Convert PDF to HTML Using Python

Making PDF content accessible on the web enhances usability, searchability, and compatibility across devices. Whether you're developing a PDF viewer, automating document workflows, or republishing content online, converting PDF to HTML using Python can significantly improve the user experience.

This comprehensive guide demonstrates how to convert PDF to HTML using Python. It covers everything from basic conversions and advanced customization to stream-based output—each section includes practical, easy-to-follow code snippets to help you get started quickly.

Table of Contents

Why Export PDF as HTML
Install Python PDF to HTML Converter Library
Basic PDF to HTML Conversion in Python
Customize the HTML Output
Save PDF to HTML Stream
Conclusion
FAQs

Why Export PDF as HTML?

HTML (HyperText Markup Language) is the foundation of web content. By exporting PDFs into HTML, you enable seamless viewing, editing, and indexing of document content online. Key advantages include:

Improved Web Accessibility: HTML renders natively in all browsers.
Search Engine Optimization (SEO): Search engines can index content better than in PDFs.
Responsive Layouts: HTML adjusts to different screen sizes.
Interactive Enhancements: HTML allows for styling, scripts, and better user interaction.
Plugin-Free Viewing: No need for third-party PDF viewers.

Install Python PDF to HTML Converter Library

To start exporting PDFs to HTML using Python, you’ll need a reliable library that supports PDF processing and HTML export. For this tutorial, we’re using Spire.PDF for Python, a high-performance PDF library that supports reading, editing, and converting PDF files in various formats, including HTML, with minimal effort.

Installation

The library can be installed easily via pip. Open your terminal and run the following command:

pip install Spire.PDF

This will download and install the latest version of the package along with its dependencies.

Need help with the installation? Follow this step-by-step guide: How to Install Spire.PDF for Python on Windows

Basic PDF to HTML Conversion in Python

Spire.PDF makes it easy to export an entire PDF document to HTML using the SaveToFile() method.

from spire.pdf.common import *
from spire.pdf import *

# Initialize a PdfDocument object
doc = PdfDocument()

# Load your PDF file
doc.LoadFromFile("Sample.pdf")

# Convert and save it as HTML
doc.SaveToFile("PdfToHtml.html", FileFormat.HTML)

# Close the document
doc.Close()

This approach generates a single HTML file that preserves the layout and structure of the original PDF.

The screenshot below showcases the input PDF and the output HTML file:

Basic PDF to HTML Conversion Output

Customize the HTML Output

If you need more control over the conversion process, the SetPdfToHtmlOptions() method lets you fine-tune the HTML output.

You can customize various aspects of the conversion—such as image embedding, page splitting, and SVG quality—using the following parameters:

Parameter	Type	Description
useEmbeddedSvg	bool	If True, embeds SVG for vector content.
useEmbeddedImg	bool	If True, embeds images. Effective only if useEmbeddedSvg is False.
maxPageOneFile	bool	Limits HTML output to one page per file (if not using SVG).
useHighQualityEmbeddedSvg	bool	Enables high-resolution SVG (only when useEmbeddedSvg is True).

Example Code

from spire.pdf.common import *
from spire.pdf import *

# Initialize a PdfDocument object
doc = PdfDocument()
# Load your PDF file
doc.LoadFromFile("Sample.pdf")

# Access conversion settings
options = doc.ConvertOptions

# Customize conversion: use image embedding, one page per file
options.SetPdfToHtmlOptions(False, True, 1, False)

# Save the PDF to HTML with the custom options
doc.SaveToFile("PdfToHtmlWithOptions.html", FileFormat.HTML)
# Close the document
doc.Close()

This configuration disables SVG and instead embeds images, outputting each page as a separate HTML file.

Save PDF to HTML Stream

In web or cloud-based applications, you might prefer to write the HTML output to a stream (e.g., for serving over HTTP) instead of saving directly to the file system. This can be achieved with the SaveToStream() method.

Example Code

from spire.pdf.common import *
from spire.pdf import *

# Initialize a PdfDocument object
doc = PdfDocument()
# Load your PDF file
doc.LoadFromFile("Sample.pdf")

# Create a stream to save the HTML output
fileStream = Stream("PdfToHtmlStream.html")

# Save the PDF to HTML stream
doc.SaveToStream(fileStream, FileFormat.HTML)

# Close the stream and the document
fileStream.Close()
doc.Close()

This approach is ideal for web servers, APIs, or any application that handles files dynamically in memory or over the network.

Conclusion

Converting PDF to HTML using Python is an effective way to make your documents web-compatible and more interactive. With Spire.PDF for Python, you get full control over the conversion process, from simple exports to advanced configurations like embedded images or SVGs and stream output.

Ready to transform your PDFs into interactive web content? Give Spire.PDF for Python a try and streamline your document-to-HTML workflow today.

FAQs

Q1: Can I convert password-protected PDFs to HTML?

A1: Yes, Spire.PDF allows you to open encrypted PDFs using doc.LoadFromFile("file.pdf", "password").

Q2: Does this method support multi-page PDFs?

A2: Yes. By default, it converts all pages. You can control how many pages appear per HTML file using the maxPageOneFile parameter.

Q3: Are images and fonts preserved in HTML output?

A3: Yes, depending on the conversion settings (e.g., embedding images or SVGs), visual fidelity is preserved as closely as possible.

Get a Free License

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion

Monday, 18 December 2023 01:36

Python: Convert PDF to Word DOC or DOCX

PDF files are designed to preserve the formatting and layout of the original document, making them ideal for sharing and printing. However, they are typically not editable without specialized software. Converting a PDF to a Word document allows you to make changes, add or delete text, modify formatting, and customize content as needed. This is particularly useful when you want to update or revise existing PDF files. In this article, we will explain how to convert PDF to Word DOC or DOCX formats in Python using Spire.PDF for Python.

Convert PDF to Word DOC or DOCX in Python
Setting Document Properties While Converting PDF to Word in Python

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert PDF to Word DOC or DOCX in Python

Spire.PDF for Python provides the PdfDocument.SaveToFile(filename:str, fileFormat:FileFormat) method to convert PDF documents to a wide range of file formats, including Word DOC, DOCX, and more. The detailed steps are as follows.

Create an object of the PdfDocument class.
Load a PDF document using PdfDocument.LoadFromFile() method.
Convert the PDF document to a Word DOCX or DOC file using PdfDocument.SaveToFile(filename:str, fileFormat:FileFormat) method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF document
doc.LoadFromFile("Sample.pdf")

# Convert the PDF document to a Word DOCX file
doc.SaveToFile("ToDocx.docx", FileFormat.DOCX)
# Or convert the PDF document to a Word DOC file
doc.SaveToFile("ToDoc.doc", FileFormat.DOC)
# Close the PdfDocument object
doc.Close()

Python: Convert PDF to Word DOC or DOCX

Setting Document Properties While Converting PDF to Word in Python

Document properties are attributes or information associated with a document that provide additional details about the file. These properties offer insights into various aspects of the document, such as its author, title, subject, version, keywords, category, and more.

Spire.PDF for Python provides the PdfToDocConverter class which allows developers to convert a PDF document to a Word DOCX file and set document properties for the file. The detailed steps are as follows.

Create an object of the PdfToDocConverter class.
Set document properties, such as title, subject, comment and author, for the converted Word DOCX file using the properties of the PdfToDocConverter class.
Convert the PDF document to a Word DOCX file using PdfToDocConverter.SaveToDocx() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfToDocConverter class
converter = PdfToDocConverter("Sample.pdf")

# Set document properties such as title, subject, author and keywords for the converted .DOCX file
converter.DocxOptions.Title = "Spire.PDF for Python"
converter.DocxOptions.Subject = "This document provides an overview of the Spire.PDF for Python product."
converter.DocxOptions.Tags = "PDF, Python"
converter.DocxOptions.Categories = "PDF processing library"
converter.DocxOptions.Commments = "Spire.PDF is a versatile library that caters to multiple platforms, including .NET, Java, Python, and C++."
converter.DocxOptions.Authors = "John Smith"
converter.DocxOptions.LastSavedBy = "Alexander Johnson"
converter.DocxOptions.Revision = 8
converter.DocxOptions.Version = "V4.0"
converter.DocxOptions.ProgramName = "Spire.PDF for Python"
converter.DocxOptions.Company = "E-iceblue"
converter.DocxOptions.Manager = "E-iceblue"

# Convert the PDF document to a Word DOCX file
converter.SaveToDocx("ToWordWithDocumentProperties.docx")

Python: Convert PDF to Word DOC or DOCX

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Wednesday, 18 October 2023 00:57

Python: Convert PDF to SVG

SVG (Scalable Vector Graphics) is an XML-based vector image format that describes two-dimensional graphics using geometric shapes, text, and other graphical elements. SVG files can be easily scaled without losing image quality, which makes them ideal for various purposes such as web design, illustrations, and animations. In certain situations, you may encounter the need to convert PDF files to SVG format. In this article, we will explain how to convert PDF to SVG in Python using Spire.PDF for Python.

Convert a PDF File to SVG in Python
Convert a PDF File to SVG with Custom Width and Height in Python
Convert Specific Pages of a PDF File to SVG in Python

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert a PDF File to SVG in Python

Spire.PDF for Python provides the PdfDocument.SaveToFile(filename:str, fileFormat:FileFormat) method to convert each page of a PDF file to a separate SVG file. The detailed steps are as follows.

Create an object of the PdfDocument class.
Load a sample PDF file using PdfDocument.LoadFromFile() method.
Convert each page of the PDF file to SVG using PdfDocument.SaveToFile(filename:str, fileFormat:FileFormat) method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Sample.pdf")

# Save each page of the file to a separate SVG file
doc.SaveToFile("PdfToSVG/ToSVG.svg", FileFormat.SVG)

# Close the PdfDocument object
doc.Close()

Python: Convert PDF to SVG

Convert a PDF File to SVG with Custom Width and Height in Python

The PdfDocument.PdfConvertOptions.SetPdfToSvgOptions(wPixel:float, hPixel:float) method provided by Spire.PDF for Python allows you to specify the width and height of the SVG files converted from PDF. The detailed steps are as follows.

Create an object of the PdfDocument class.
Load a sample PDF file using PdfDocument.LoadFromFile() method.
Specify the width and height of the output SVG files using PdfDocument.PdfConvertOptions.SetPdfToSvgOptions(wPixel:float, hPixel:float) method.
Convert each page of the PDF file to SVG using PdfDocument.SaveToFile(filename:str, fileFormat:FileFormat) method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Sample.pdf")

# Specify the width and height of output SVG files
doc.ConvertOptions.SetPdfToSvgOptions(800.0, 1200.0)

# Save each page of the file to a separate SVG file
doc.SaveToFile("PdfToSVGWithCustomWidthAndHeight/ToSVG.svg", FileFormat.SVG)

# Close the PdfDocument object
doc.Close()

Python: Convert PDF to SVG

Convert Specific Pages of a PDF File to SVG in Python

The PdfDocument.SaveToFile(filename:str, startIndex:int, endIndex:int, fileFormat:FileFormat) method provided by Spire.PDF for Python allows you to convert specific pages of a PDF file to SVG files. The detailed steps are as follows.

Create an object of the PdfDocument class.
Load a sample PDF file using PdfDocument.LoadFromFile() method.
Convert specific pages of the PDF file to SVG using PdfDocument.SaveToFile(filename:str, startIndex:int, endIndex:int, fileFormat:FileFormat) method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Sample.pdf")

# Save specific pages of the file to SVG files
doc.SaveToFile("PdfPagesToSVG/ToSVG.svg", 1, 2, FileFormat.SVG)

# Close the PdfDocument object
doc.Close()

Python: Convert PDF to SVG

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Wednesday, 27 September 2023 01:15

Python Image to PDF Conversion: Best Practices and Code Examples

Python Examples for Converting Images to PDF

Converting images to PDF programmatically is a common task in document management, as it enhances organization, facilitates sharing, and ensures efficient archiving. By consolidating various image formats into a single PDF document, users can easily manage and distribute their visual content.

In this article, we will explore how to convert a variety of image formats —including PNG , JPEG , TIFF , and SVG —into PDF files using Spire.PDF for Python. We’ll provide detailed instructions and code examples to guide you through the conversion process, highlighting the flexibility and power of this library for handling different image types.

Table of Contents:

Why Convert Image to PDF?
Introducing Spire.PDF: Python Image-to-PDF Library
Convert PNG or JPEG to PDF
- Generate PDF Matching Image Dimensions
- Custom PDF Layout and Image Position
Convert Multi-Page TIFF to PDF
Convert Scalable SVG to PDF
Merge Multiple Images into One PDF
Conclusion
FAQs

1. Why Convert Image to PDF?

PDFs are preferred for their portability, security, and consistent formatting across devices. Converting images to PDF offers several benefits:

Preservation of Quality: PDFs retain image resolution, ensuring no loss in clarity.
Easier Sharing: A single PDF can combine multiple images, simplifying distribution.
Document Standardization: Converting images to PDF ensures compatibility with most document management systems.

Whether you're archiving scanned documents or preparing a portfolio, converting images to PDF enhances usability.

2. Introducing Spire.PDF: Python Image-to-PDF Library

Spire.PDF for Python is a robust library that enables PDF creation, manipulation, and conversion. Key features include:

Support for multiple image formats (PNG, JPEG, BMP, SVG, and more).
Flexible page customization (size, margins, orientation).
Batch processing and multi-image merging.
Advanced options like watermarking and encryption (beyond basic conversion).

To install the library, use:

pip install Spire.PDF

3. Convert PNG or JPEG to PDF in Python

3.1 Generate PDF Matching Image Dimensions

To convert a PNG or JPEG image to PDF while preserving its original size, we start by creating a PdfDocument object, which serves as the container for our PDF. We set the page margins to zero, ensuring that the image will fill the entire page.

After loading the image, we obtain its dimensions to create a new page that matches these dimensions. Finally, we draw the image on the page and save the document to a PDF file. This approach guarantees pixel-perfect conversion without resizing or distortion.

The following code demonstrates how to generate a PDF that perfectly matches your image’s size:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
document = PdfDocument()

# Set the page margins to 0
document.PageSettings.SetMargins(0.0)

# Load an image file
image = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\robot.jpg")
    
# Get the image width and height
imageWidth = image.PhysicalDimension.Width
imageHeight = image.PhysicalDimension.Height

# Add a page that has the same size as the image
page = document.Pages.Add(SizeF(imageWidth, imageHeight))

# Draw image at (0, 0) of the page
page.Canvas.DrawImage(image, 0.0, 0.0)
      
# Save to file
document.SaveToFile("output/ImageToPdf.pdf")

# Dispose resources
document.Dispose()

Output:

Generated PDF file that has the same size as the image file.

3.2 Custom PDF Layouts and Image Position

To customize the PDF page size and margins, a few modifications to the code are needed. In this example, we set the page size to A4, adjusting the margins accordingly. The image is centered on the page by calculating its position based on the page dimensions. This method creates a more polished layout for the PDF.

The code below shows how to customize PDF settings and position images during conversion:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
document = PdfDocument()

# Set the page margins to 5
document.PageSettings.SetMargins(5.0)

# Define page size (A4 or custom)
document.PageSettings.Size = PdfPageSize.A4()

# Add a new page to the document
page = document.Pages.Add()

# Load the image from file
image = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\robot.jpg")

# Get the image dimensions
imageWidth = image.PhysicalDimension.Width
imageHeight = image.PhysicalDimension.Height

# Calculate centered position for the image
x = (page.GetClientSize().Width - imageWidth) / 2
y = (page.GetClientSize().Height - imageHeight) / 2 

# Draw the image at the calculated position
page.Canvas.DrawImage(image, x, y, imageWidth, imageHeight)

# Save to a PDF file
document.SaveToFile("output/ImageToPdf.pdf")

# Release resources
document.Dispose()

Output:

An image placed at the center of an A4 page in a PDF.

4. Convert Multi-Page TIFF to PDF in Python

TIFF files are widely used for high-resolution images, making them suitable for applications such as document scanning and medical imaging. However, Spire.PDF does not support TIFF images natively. To handle TIFF files, we can use the Python Imaging Library (PIL) , which can be installed with the following command:

pip install Pillow

Using PIL, we can access each frame of a TIFF file, temporarily save it as a PNG, and then draw each PNG onto a PDF. This method ensures that each frame is added as a separate page in the PDF, preserving the original quality and layout.

Here is the code snippet for converting a multi-page TIFF to PDF in Python:

from spire.pdf.common import *
from spire.pdf import *
from PIL import Image
import io

# Create a PdfDocument object
document = PdfDocument()

# Set the page margins to 0
document.PageSettings.SetMargins(0.0)

# Load a TIFF image
tiff_image = Image.open("C:\\Users\\Administrator\\Desktop\\TIFF.tiff")

# Iterate through the frames in it
for i in range(tiff_image.n_frames):

    # Go to the current frame
    tiff_image.seek(i)
    
    # Extract the image of the current frame
    frame_image = tiff_image.copy()

    # Save the image to a PNG file
    frame_image.save(f"temp/output_frame_{i}.png")
    
    # Load the image file to PdfImage
    image = PdfImage.FromFile(f"temp/output_frame_{i}.png")

    # Get image width and height
    imageWidth = image.PhysicalDimension.Width
    imageHeight = image.PhysicalDimension.Height

    # Add a page to the document
    page = document.Pages.Add(SizeF(imageWidth, imageHeight))

    # Draw image at (0, 0) of the page
    page.Canvas.DrawImage(image, 0.0, 0.0)

# Save the document to a PDF file
document.SaveToFile("Output/TiffToPdf.pdf",FileFormat.PDF)

# Dispose resources
document.Dispose()

Output:

Screenshot of the input TIFF file and output PDF document.

5. Convert Scalable SVG to PDF in Python

SVG files are vector graphics that provide scalability without loss of quality, making them ideal for web graphics and print media. In this example, we create a PdfDocument object and load an SVG file directly into it. Spire.PDF library efficiently handles the conversion, allowing for quick and straightforward saving of the SVG as a PDF with minimal code.

Below is the code snippet for converting an SVG file to a PDF:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
document = PdfDocument()

# Load an SVG file
document.LoadFromSvg("C:\\Users\\Administrator\\Desktop\\SVG.svg")

# Save the SVG file to PDF
document.SaveToFile("output/SvgToPdf.pdf", FileFormat.PDF)

# Dispose resources
document.Dispose()

Tip : To combine multiple SVG files into a single PDF, convert them separately and then merge the resulting PDFs. For guidance, check out this article: How to Merge PDF Documents in Python.

Output:

Screenshot of the input SVG file and output PDF document.

6. Merge Multiple Images into One PDF

This process involves iterating through images in a specified directory, loading each one, and creating corresponding pages in the PDF document. Each page is formatted to match the image dimensions, preventing any loss or distortion. Finally, each image is drawn onto its respective page.

Code example for combining a folder of images into a single PDF:

from spire.pdf.common import *
from spire.pdf import *
import os

# Create a PdfDocument object
doc = PdfDocument()

# Set the page margins to 0
doc.PageSettings.SetMargins(0.0)

# Get the folder where the images are stored
path = "C:\\Users\\Administrator\\Desktop\\Images\\"
files = os.listdir(path)

# Iterate through the files in the folder
for root, dirs, files in os.walk(path):
    for file in files:

        # Load a particular image 
        image = PdfImage.FromFile(os.path.join(root, file))
    
        # Get the image width and height
        width = image.PhysicalDimension.Width
        height = image.PhysicalDimension.Height

        # Add a page that has the same size as the image
        page = doc.Pages.Add(SizeF(width, height))

        # Draw image at (0, 0) of the page
        page.Canvas.DrawImage(image, 0.0, 0.0, width, height)
      
# Save to file
doc.SaveToFile("output/CombineImages.pdf")
doc.Dispose()

Output:

Screenshot of the input image files in a folder and the output PDF document.

7. Conclusion

Converting images to PDF in Python using the Spire.PDF library is a straightforward task that can be accomplished through various methods for different image formats. Whether you need to convert single images, customize layouts, or merge multiple images, Spire.PDF provides the necessary tools to achieve your goals efficiently. With just a few lines of code, you can create high-quality PDFs from images, enhancing your document management capabilities.

8. FAQs

Q1: Can I convert images in bulk using Spire.PDF?

Yes, Spire.PDF allows you to iterate through directories and convert multiple images to a single PDF or individual PDFs, making bulk conversions easy.

Q2: What image formats does Spire.PDF support?

Spire.PDF supports various image formats, including PNG, JPEG, BMP, and SVG, providing versatility for different use cases.

Q3: Can I customize the PDF layout?

Absolutely. You can set margins, page sizes, and positions of images within the PDF for a tailored layout that meets your requirements.

Q4: Does converting an image to PDF reduce its quality?

No - when using Spire.PDF with default settings, the original image data is embedded without compression.

Get a Free License

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion

Monday, 18 September 2023 01:16

Convert PDF to Images in Python (PNG, JPG, BMP, SVG, TIFF)

Python examples to render PDF to PNG, JPG, BMP, SVG, and TIFF images

Converting PDF files to images in Python is a common need for developers and professionals working with digital documents. Whether you want to generate thumbnails, create previews, extract specific content areas, or prepare files for printing, transforming a PDF into image formats gives you flexibility and compatibility across platforms.

This comprehensive guide demonstrates how to convert PDF files into popular image formats—such as PNG, JPG, BMP, SVG, and TIFF—in Python, using practical, easy-to-follow code examples.

Why Convert PDF to Image
Python PDF-to-Image Converter Library
Simple PDF to PNG, JPG, and BMP Conversion
Advanced Conversion Options
- Enable Transparent Image Background
- Crop Specific PDF Areas to Image
Generate Multi-Page TIFF from PDF
Export PDF as SVG
Conclusion
FAQs

Why Convert PDF to Image?

Converting PDF to image formats offers several benefits:

Cross-platform compatibility: Images are easier to embed in web pages, mobile apps, or presentations.
Preview and thumbnail generation: Quickly create page snapshots without rendering the full PDF.
Selective content extraction: Save specific areas of a PDF as images for focused analysis or reuse.
Simplified sharing: Images can be easily emailed, uploaded, or displayed without special PDF readers.

Python PDF-to-Image Converter Library

Spire.PDF for Python is a powerful and easy-to-use library designed for handling PDF files. It enables developers to convert PDF pages into multiple image formats like PNG, JPG, BMP, SVG, and TIFF with excellent quality and performance.

PDF to Image Library for Python

Installation

You can easily install the library using pip. Simply open your terminal and run the following command:

pip install Spire.PDF

Simple PDF to PNG, JPG, and BMP Conversion

The SaveAsImage method of the PdfDocument class allows you to render each page of a PDF into an image format of your choice.

The code example below demonstrates how to load a PDF file, iterate through its pages, and save each one as a PNG image. You can easily adjust the file format to JPG or BMP by changing the file extension.

from spire.pdf import *

# Load the PDF file
pdf = PdfDocument()
pdf.LoadFromFile("template.pdf")

# Loop through pages and save as images
for i in range(pdf.Pages.Count):
    # Convert each page to image
    with pdf.SaveAsImage(i) as image:
        
        # Save in different formats as needed
        image.Save(f"Output/ToImage_{i}.png")
        # image.Save(f"Output/ToImage_{i}.jpg")
        # image.Save(f"Output/ToImage_{i}.bmp")

# Close the PDF document
pdf.Close()

Python: Convert PDF to Images (JPG, PNG, BMP)

Advanced Conversion Options

Enable Transparent Image Background

Transparent backgrounds help integrate images seamlessly into designs, avoiding unwanted borders or background colors.

To enable a transparent background during PDF-to-image conversion in Python, use the SetPdfToImageOptions() method with an alpha value of 0. This setting ensures that the background of the output image is fully transparent.

The following example demonstrates how to export each PDF page as a transparent PNG image.

from spire.pdf import *

# Load PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("template.pdf")

# Set the transparent value of the image's background to 0
pdf.ConvertOptions.SetPdfToImageOptions(0)

# Loop through all pages and save each as an image
for i in range(pdf.Pages.Count):
    # Convert each page to an image
    with pdf.SaveAsImage(i) as image:
        # Save the image to the output directory
        image.Save(f"Output/ToImage_{i}_transparent.png")

# Close the PDF document
pdf.Close()

Note: Transparency is supported in PNG but not in JPG or BMP formats.

Crop Specific PDF Areas to Image

In some cases, you may only need to export a specific area of a PDF page—such as a chart, table, or block of text. This can be done by adjusting the page’s CropBox before rendering.

The CropBox property defines the visible region of the page used for display and printing. By setting it to a specific RectangleF(x, y, width, height) value, you can isolate and export only the desired portion of the content.

The example below demonstrates how to crop a rectangular area on the first page of a PDF and save that section as a PNG image.

from spire.pdf import *

# Load the PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Access the first page of the PDF
page = pdf.Pages[0]

# Define the crop area of the page using a rectangle (x, y, width, height)
page.CropBox = RectangleF(0.0, 300.0, 600.0, 260.0)

# Convert the cropped page to an image
with pdf.SaveAsImage(0) as image:
    # Save the image to a PNG file
    image.Save("Output/CropPDFSaveAsImage.png")
    
# Close the PDF document
pdf.Close()

Note: You need to adjust the coordinates based on the location of your target content. Coordinates start from the top-left corner of the page.

Python example to crop PDF page area to image

Generate Multi-Page TIFF from PDF

The TIFF format supports multi-page documents, making it a popular choice for archival and printing purposes. Although Spire.PDF for Python doesn't natively create multi-page TIFFs, you can render individual pages as images and then use the Pillow library to merge them into one .tiff file.

Before proceeding, ensure Pillow is installed by running:

pip install Pillow

The following example illustrates how to:

Load a PDF
Convert each page to an image
Combine all images into a single multi-page TIFF

from spire.pdf import *

from PIL import Image
from io import BytesIO

# Load the PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("Input.pdf")

# Create an empty list to store PIL Images
images = []

# Iterate through all pages in the document
for i in range(pdf.Pages.Count):

    # Convert a specific page to an image stream
    with pdf.SaveAsImage(i) as imageData:

        # Open the image stream as a PIL image
        img = Image.open(BytesIO(imageData.ToArray())) 

        # Append the PIL image to list
        images.append(img)

# Save the PIL Images as a multi-page TIFF file
images[0].save("Output/ToTIFF.tiff", save_all=True, append_images=images[1:])

# Dispose resources
pdf.Dispose()

Python example to generate multi-page TIFF from PDF

It’s also possible to convert TIFF files back to PDF. For detailed instructions on it, please refer to the tutorial: Python: Convert PDF to TIFF and TIFF to PDF.

Export PDF as SVG

SVG (Scalable Vector Graphics) is an ideal format for content that requires scaling without quality loss, such as charts, vector illustrations, and technical diagrams.

By using the SaveToFile() method with the FileFormat.SVG option, you can export PDF pages as SVG files. This conversion preserves the vector characteristics of the content, making it well-suited for web embedding, responsive design, and further editing in vector graphic tools.

The following example demonstrates how to export an entire PDF document to SVG format.

from spire.pdf import *

# Load the PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("Example.pdf")

# Save each page of the file to a separate SVG file
pdf.SaveToFile("PdfToSVG/ToSVG.svg", FileFormat.SVG)

# Close the PdfDocument object
pdf.Close()

Note: Each page in the PDF will be saved as a separate SVG file named ToSVG_i.svg, where i is the page number (1-based).

To export specific pages or customize the SVG output size, please refer to our detailed guide: Python: Convert PDF to SVG.

Conclusion

Converting PDF to images in formats like PNG, JPG, BMP, SVG, and TIFF provides flexibility for sharing, displaying, and processing digital documents. With Spire.PDF for Python, you can:

Export high-quality images from PDFs in various formats
Crop specific regions for focused content extraction
Generate multi-page TIFF files for archival purposes
Create scalable SVG vector graphics for diagrams and charts

By automating PDF to image conversion in Python, you can seamlessly integrate image export into your applications and workflows.

FAQs

Q1: Can I convert a range of pages from a PDF to images?

A1: Yes. You can convert specific pages by specifying their indices in a loop. For example, to export pages 1 to 3:

# Convert only pages 1-3
for i in range(0, 3):  # 0-based index
    with pdf.SaveAsImage(i) as img:
        img.Save(f"page_{i}.png")

Q2: Can I batch convert multiple PDF files to images?

A2: Yes, batch conversion is supported. You can iterate through a list of PDF file paths and convert each one within a loop.

pdf_files = ["a.pdf", "b.pdf", "c.pdf"]
for file in pdf_files:
    pdf = PdfDocument()
    pdf.LoadFromFile(file)
    for i in range(pdf.Pages.Count):
        with pdf.SaveAsImage(i) as img:
            img.Save(f"{file}_page_{i}.png")

Q3: Is it possible to convert password-protected PDFs to images?

A3: Yes, you can convert secured PDFs to images as long as you provide the correct password when loading the PDF document.

pdf = PdfDocument()
pdf.LoadFromFile("protected.pdf", "password")

Q4: Is it possible to extract embedded images from a PDF instead of rendering pages?

A4: Yes. Aside from rendering entire pages, the library also supports extracting images directly from the PDF.

Get a Free License

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion

Thursday, 07 September 2023 01:11

How to Convert PDF to Excel with Formatting in Python (Step-by-Step Guide)

Automate PDF to Excel in Python with Spire.PDF

Converting PDF files to Excel spreadsheets in Python is an effective way to extract structured data for analysis, reporting, and automation. While PDFs are excellent for preserving layout across platforms, their static format often makes data extraction challenging.

Excel, on the other hand, provides robust features for sorting, filtering, calculating, and visualizing data. By using Python along with the Spire.PDF for Python library, you can automate the entire PDF to Excel conversion process — from basic one-page documents to complex, multi-page PDFs.

Whether you're automating data extraction from PDFs or integrating PDF content into Excel workflows, this tutorial will walk you through both quick-start and advanced methods for reliable Python PDF to Excel conversion.

Why Convert PDF to Excel Programmatically in Python
Setting Up Your Development Environment
Quick Start: Convert PDF to Excel in Python
Advanced PDF to Excel Conversion with Layout Control and Formatting Options
Conclusion

Why Convert PDF to Excel Programmatically in Python

PDFs are ideal for sharing documents with consistent formatting, but their fixed structure makes them difficult to analyze or reuse, especially if they contain tables.

Converting PDF to Excel allows you to:

Extract tabular data for analysis or visualization
Automate monthly or recurring report extraction
Enable downstream processing in Excel
Save hours of manual copy-pasting

Using Python for this task adds automation, flexibility, and scalability — ideal for integration into data pipelines or backend services.

Setting Up Your Development Environment

Before you start converting PDF files to Excel using Python, it’s essential to set up your development environment properly. This ensures you have all the necessary tools and libraries installed to follow the tutorial smoothly.

Install Python

If you haven’t already installed Python on your system, download and install the latest version from the official website.

Make sure to add Python to your system PATH during installation to run Python commands from the terminal or command prompt easily.

Install Spire.PDF for Python

Spire.PDF for Python is the core library used in this tutorial to load, manipulate, and convert PDF documents.

To install it, open your terminal and run:

pip install Spire.PDF

This command downloads and installs Spire.PDF along with any required dependencies.

If you encounter any issues or need detailed installation help, please refer to our step-by-step guide: How to Install Spire.PDF for Python on Windows

Quick Start: Convert PDF to Excel in Python

If your PDF has a clean and simple layout without complex formatting or multiple page structures, you can convert it directly to Excel with just 3 lines of code using Spire.PDF for Python.

Steps to Quickly Export PDF to Excel

Follow these straightforward steps to export your PDF file to an Excel spreadsheet in Python:

Import the required classes.
Create a PdfDocument object.
Load your PDF file with the LoadFromFile method.
Export the PDF to Excel (.xlsx) format using the SaveToFile method and specify FileFormat.XLSX as the output format.

Code Example

from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load your PDF file
pdf.LoadFromFile("Sample.pdf")

# Convert and save the PDF to Excel
pdf.SaveToFile("output.xlsx", FileFormat.XLSX)

# Close the document
pdf.Close()

Advanced PDF to Excel Conversion with Layout Control and Formatting Options

For more complex PDF documents—such as those containing multiple pages, rotated text, table cells with multiple lines of text, or overlapping content - you can use the XlsxLineLayoutOptions class to gain precise control over the conversion process.

This allows you to preserve the original structure and formatting of your PDF more accurately when exporting to Excel.

Layout Options You Can Configure

The XlsxLineLayoutOptions class in Spire.PDF provides several properties that give you granular control over how PDF content is exported to Excel. Below is a breakdown of each option and its behavior:

Option	Description
convertToMultipleSheet	Determines whether to convert each PDF page into a separate worksheet. The default value is true.
rotatedText	Specifies whether to preserve the original rotation of angled text. The default value is true.
splitCell	Determines whether to split a PDF table cell with multiple lines of text into separate rows in the Excel output. The default value is true.
wrapText	Determines whether to enable word wrap inside Excel cells. The default value is true.
overlapText	Specifies whether text overlapping in the original PDF should be preserved in the Excel output. The default value is false.

Code Example

from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load your PDF file
pdf.LoadFromFile("Sample.pdf")

# Define layout options
# Parameters: convertToMultipleSheet, rotatedText, splitCell, wrapText, overlapText
layout_options = XlsxLineLayoutOptions(True, True, False, True, False)

# Apply layout options
pdf.ConvertOptions.SetPdfToXlsxOptions(layout_options)

# Convert and save the PDF to Excel
pdf.SaveToFile("advanced_output.xlsx", FileFormat.XLSX)

# Close the document
pdf.Close()

Advanced PDF to Excel conversion example in Python with Spire.PDF

Conclusion

Converting PDF files to Excel in Python is an efficient way to automate data extraction and processing tasks. Whether you need a quick conversion or fine-grained layout control, Spire.PDF for Python offers flexible options that scale from simple to complex scenarios.

Ready to automate your PDF to Excel conversions?
Get a free trial license for Spire.PDF for Python and explore the full Spire.PDF Documentation to get started today!

FAQs

Q1: Can I convert each PDF page into a separate Excel worksheet?

A1: Yes. Use the convertToMultipleSheet=True option in the XlsxLineLayoutOptions class to export each page to its own sheet.

Q2. What Excel format does Spire.PDF export to?

A2: Spire.PDF converts PDFs to .xlsx, the modern Excel format supported by Excel 2007 and later.

Q3: Can I convert a PDF to Excel in Python without losing formatting?

A3: Yes. Using Spire.PDF for Python, you can retain the original formatting, including merged cells, cell background colors, and other format settings when saving PDFs to Excel.

Q4: Can I extract only a specific table from a PDF to Excel instead of converting the whole document?

A4: Yes, Spire.PDF for Python supports extracting specific tables from PDF files. You can then write the extracted table data to Excel using our Excel processing library - Spire.XLS for Python.

Published in Conversion

Start «12

Page 2 of 2