Spire.Office Knowledgebase Page 12 | E-iceblue

PDF documents may occasionally include blank pages. These pages can affect the reading experience, increase the file size and lead to paper waste during printing. To improve the professionalism and usability of a PDF document, detecting and removing blank pages is an essential step.

This article shows how to accurately detect and remove blank pages—including those that appear empty but actually contain invisible elements—using Python, Spire.PDF for Python, and Pillow.

Install Required Libraries

This tutorial requires two Python libraries:

  • Spire.PDF for Python: Used for loading PDFs and detecting/removing blank pages.
  • Pillow: A library for image processing that helps detect visually blank pages, which may contain invisible content.

You can easily install both libraries using pip:

pip install Spire.PDF Pillow

Need help installing Spire.PDF? Refer to this guide:

How to Install Spire.PDF for Python on Windows

How to Effectively Detect and Remove Blank Pages from PDF Files in Python

Spire.PDF provides a method called PdfPageBase.IsBlank() to check if a page is completely empty. However, some pages may appear blank but actually contain hidden content like white text, watermarks, or background images. These cannot be reliably detected using the PdfPageBase.IsBlank() method alone.

To ensure accuracy, this tutorial adopts a two-step detection strategy:

  • Use the PdfPageBase.IsBlank() method to identify and remove fully blank pages.
  • Convert non-blank pages to images and analyze them using Pillow to determine if they are visually blank.

⚠️ Important:

If you don’t use a valid license during the PDF-to-image conversion, an evaluation watermark will appear on the image, potentially affecting the blank page detection.

Contact the E-iceblue sales team to request a temporary license for proper functionality.

Steps to Detect and Remove Blank Pages from PDF in Python

Follow these steps to implement blank page detection and removal in Python:

1. Define a custom is_blank_image() Method

This custom function uses Pillow to check whether the converted image of a PDF page is blank (i.e., if all pixels are white).

2. Load the PDF Document

Load the PDF using the PdfDocument.LoadFromFile() method.

3. Iterate Through Pages

Loop through each page to check if it’s blank using two methods:

  • If the PdfPageBase.IsBlank() method returns True, remove the page directly.
  • If not, convert the page to an image using the PdfDocument.SaveAsImage() method and analyze it with the custom is_blank_image() method.

4. Save the Result PDF

Finally, save the PDF with blank pages removed using the PdfDocument.SaveToFile() method.

Code Example

  • Python
import io
from spire.pdf import PdfDocument
from PIL import Image

# Apply the License Key
License.SetLicenseKey("License-Key")

# Custom function: Check if the image is blank (whether all pixels are white)
def is_blank_image(image):
        # Convert to RGB mode and then get the pixels
        img = image.convert("RGB")
        # Get all pixel points and check if they are all white
        white_pixel = (255, 255, 255)
        return all(pixel == white_pixel for pixel in img.getdata())

# Load the PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample1111.pdf")

# Iterate through each page in reverse order to avoid index issues when deleting
for i in range(pdf.Pages.Count - 1, -1, -1):
    page = pdf.Pages[i]
    # Check if the current page is completely blank
    if page.IsBlank():
        # If it's completely blank, remove it directly from the document
        pdf.Pages.RemoveAt(i)
    else:
        # Convert the current page to an image
        with pdf.SaveAsImage(i) as image_data:
            image_bytes = image_data.ToArray()
            pil_image = Image.open(io.BytesIO(image_bytes))
            # Check if the image is blank
            if is_blank_image(pil_image):
                # If it's a blank image, remove the corresponding page from the document
                pdf.Pages.RemoveAt(i)

# Save the resulting PDF
pdf.SaveToFile("RemoveBlankPages.pdf")
pdf.Close()

Python Find and Remove Blank Pages from PDF

Frequently Asked Questions (FAQs)

Q1: What is considered a blank page in a PDF file?

A: A blank page may be truly empty or contain hidden elements such as white text, watermarks, or transparent objects. This solution detects both types using a dual-check strategy.

Q2: Can I use this method without a Spire.PDF license?

A: Yes, you can run it without a license. However, during PDF-to-image conversion, an evaluation watermark will be added to the output images, which may affect the accuracy of blank page detection. It's best to request a free temporary license for testing.

Q3: What versions of Python are compatible with Spire.PDF?

A: Spire.PDF for Python supports Python 3.7 and above. Ensure that Pillow is also installed to perform image-based blank page detection.

Q4: Can I modify the script to only detect blank pages without deleting them?

A: Absolutely. Just remove or comment out the pdf.Pages.RemoveAt(i) line and use print() or logging to list detected blank pages for further review.

Conclusion

Removing unnecessary blank pages from PDF files is an important step in optimizing documents for readability, file size, and professional presentation. With the combined power of Spire.PDF for Python and Pillow, developers can precisely identify both completely blank pages and pages that appear empty but contain invisible content. Whether you're generating reports, cleaning scanned files, or preparing documents for print, this Python-based solution ensures clean and efficient PDFs.

Get a Free License

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

Generating Word documents programmatically is a common requirement in business applications, whether for creating reports, contracts, or personalized letters. Spire.Doc for .NET provides a comprehensive solution for working with Word documents in C# without Microsoft Office dependencies.

This article explores two effective approaches for document generation from templates: replacing text placeholders and modifying bookmark content.

.NET Library for Creating Word Documents

Spire.Doc for .NET is a professional Word API that enables developers to perform a wide range of document processing tasks. Key features include:

  • Creating, reading, editing, and converting Word documents
  • Support for DOC, DOCX, RTF, and other formats
  • Template-based document generation
  • Mail merge functionality
  • Preservation of original formatting

The library is particularly useful for automating document generation in enterprise applications, where consistency and efficiency are crucial.

To begin generating Word documents from a template, donwload Spire.Doc for .NET from our official website or install it using the NuGet Package Manager with the following command:

PM> Install-Package Spire.Doc

Create a Word Document By Replacing Text Placeholders

The Document.Replace method in the Spire.Doc library is used to find and replace specific text within a Word document. This method allows for the efficient modification of text placeholders, enabling the dynamic generation of documents based on templates.

Here are the steps for Word generation through text pattern replacement:

  • Document Initialization: Create a new Document object and load the template file.
  • Placeholder Definition: Use a dictionary to map placeholders (like #name#) to their replacement values.
  • Text Replacement: The Document.Replace method performs case-sensitive and whole-word replacement of all placeholders.
  • Image Handling: The custom ReplaceTextWithImage method locates a text placeholder and substitutes it with an image.
  • Document Saving: The modified document is saved with a new filename.
  • C#
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System.Drawing;

namespace CreateWordByReplacingTextPlaceholders
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize a new Document object
            Document document = new Document();

            // Load the template Word file
            document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Template.docx");

            // Dictionary to hold text placeholders and their replacements
            Dictionary<string, string> replaceDict = new Dictionary<string, string>
            {
                { "#name#", "John Doe" },
                { "#gender#", "Male" },
                { "#birthdate#", "January 15, 1990" },
                { "#address#", "123 Main Street" },
                { "#city#", "Springfield" },
                { "#state#", "Illinois" },
                { "#postal#", "62701" },
                { "#country#", "United States" }
            };

            // Replace placeholders in the document with corresponding values
            foreach (KeyValuePair<string, string> kvp in replaceDict)
            {
                document.Replace(kvp.Key, kvp.Value, true, true);
            }

            // Path to the image file
            String imagePath = "C:\\Users\\Administrator\\Desktop\\portrait.png";

            // Replace the placeholder “#photo#” with an image
            ReplaceTextWithImage(document, "#photo#", imagePath);

            // Save the modified document
            document.SaveToFile("ReplacePlaceholders.docx", FileFormat.Docx);

            // Release resources
            document.Dispose();
        }

        // Method to replace a placeholder in the document with an image
        static void ReplaceTextWithImage(Document document, String stringToReplace, String imagePath)
        {
            // Load the image from the specified path
            Image image = Image.FromFile(imagePath);
            DocPicture pic = new DocPicture(document);
            pic.LoadImage(image);

            // Find the placeholder in the document
            TextSelection selection = document.FindString(stringToReplace, false, true);

            // Get the range of the found text
            TextRange range = selection.GetAsOneRange();
            int index = range.OwnerParagraph.ChildObjects.IndexOf(range);

            // Insert the image and remove the placeholder text
            range.OwnerParagraph.ChildObjects.Insert(index, pic);
            range.OwnerParagraph.ChildObjects.Remove(range);
        }
    }
}

Create Word from Template by Replacing Text Placeholders

Create a Word Document By Replacing Bookmark Content

The BookmarksNavigator class in Spire.Doc is specifically designed to manage and navigate through bookmarks in a Word document. This class simplifies the process of finding and replacing content in bookmarks, making it easy to update sections of a document without manually searching for each bookmark.

The following are the steps for Word generation using bookmark content replacement:

  • Document Initialization: Create a new Document object and load the template file.
  • Bookmark Content Definitions: Create a dictionary mapping bookmark names to their replacement content.
  • Bookmark Navigation: The BookmarksNavigator class provides precise control over bookmark locations.
  • Content Replacement: The ReplaceBookmarkContent method preserves the bookmark while updating its content.
  • Document Saving: The modified document is saved with a new filename.
  • C#
using Spire.Doc;
using Spire.Doc.Documents;

namespace CreateWordByReplacingBookmarkContent
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize a new Document object
            Document document = new Document();

            // Load the template Word file
            document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Template.docx");

            // Define bookmark names and their replacement values
            Dictionary<string, string> replaceDict = new Dictionary<string, string>
            {
                { "name", "Tech Innovations Inc." },
                { "year", "2015" },
                { "headquarter", "San Francisco, California, USA" },
                { "history", "Tech Innovations Inc. was founded by a group of engineers and " +
                              "entrepreneurs with a vision to revolutionize the technology sector. Starting " +
                              "with a focus on software development, the company expanded its portfolio to " +
                              "include artificial intelligence and cloud computing solutions." }
            };

            // Create a BookmarksNavigator to manage bookmarks in the document
            BookmarksNavigator bookmarkNavigator = new BookmarksNavigator(document);

            // Replace each bookmark's content with the corresponding value
            foreach (KeyValuePair<string, string> kvp in replaceDict)
            {
                bookmarkNavigator.MoveToBookmark(kvp.Key); // Navigate to bookmark
                bookmarkNavigator.ReplaceBookmarkContent(kvp.Value, true); // Replace content
            }

            // Save the modified document
            document.SaveToFile("ReplaceBookmarkContent.docx", FileFormat.Docx2013);

            // Release resources
            document.Dispose();
        }
    }
}

Create Word from Template by Replacing Bookmark Content

Conclusion

Both approaches provide effective ways to generate documents from templates, but with important differences:

  • Text Replacement Method:
    • Permanently removes placeholders during replacement
    • Best for one-time document generation
    • Better suited for replacing text with images
  • Bookmark Replacement Method:
    • Preserves bookmarks for future updates
    • Ideal for templates requiring periodic updates

Additionally, Spire.Doc for .NET supports Mail Merge functionality, which provides another powerful way to dynamically generate documents from templates. This feature is particularly useful for creating personalized documents in bulk, such as form letters or reports, where data comes from a database or other structured source.

Get a Free License

To fully experience the capabilities of Spire.Doc for .NET without any evaluation limitations, you can request a free 30-day trial license.

Introduction

Digital signatures help verify the authenticity and integrity of PDF documents. However, if a signing certificate expires or is revoked, the signature alone may no longer be considered valid. To solve this, a timestamp can be added to the digital signature, proving that the document was signed at a specific point in time-validated by a trusted Time Stamp Authority (TSA).

In this tutorial, we will introduce how to use the Spire.PDF for Python library to digitally sign a PDF document with a timestamp in Python.

Prerequisites

To follow this tutorial, ensure you have the following:

  • Spire.PDF for Python library
  • You can install Spire.PDF for Python via pip:
    pip install Spire.PDF
    If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
  • A valid digital certificate (.pfx file)
  • This certificate is used to create the digital signature.
  • A sample PDF file
  • This is the document you want to sign.
  • An image to display as the signature appearance (optional)
  • For visual representation of the signer.
  • A reliable Time Stamp Authority (TSA) URL
  • This provides the timestamp token during signing.

How to Digitally Sign a PDF with a Timestamp in Python

In Spire.PDF for Python, the Security_PdfSignature class is used to create a digital signature, and the ConfigureTimestamp(tsaUrl) method in this class is used to embed a timestamp into the signature. The tsaUrl parameter specifies the address of the TSA server.

Steps to Add a Timestamped Digital Signature

Follow these steps to add a timestamped digital signature to a PDF in Python using Spire.PDF for Python:

  • Create a PdfDocument instance and use the LoadFromFile() method to load the PDF you want to sign.
  • Create a Security_PdfSignature object, specifying the target page, certificate file path, certificate password, and signature name.
  • Configure the signature's appearance, including its position, size, display labels, and signature image.
  • Embed a timestamp by calling the ConfigureTimestamp(tsaUrl) method with a valid Time Stamp Authority (TSA) URL.
  • Save the signed PDF using the SaveToFile() method.

Code Example

  • Python
from spire.pdf import *

inputFile = "Sample.pdf"
inputFile_pfx = "gary.pfx"
inputImage = "E-iceblueLogo.png"
outputFile = "SignWithTimestamp.pdf"

# Create a PdfDocument instance and load the PDF file to be signed
doc = PdfDocument()
doc.LoadFromFile(inputFile)

# Create a digital signature object by specifying the document, target page, certificate file path, certificate password, and signature name
signature = Security_PdfSignature(doc, doc.Pages.get_Item(0), inputFile_pfx, "e-iceblue", "signature")

# Define the position and size of the signature on the page (unit: point)
signature.Bounds = RectangleF(PointF(90.0, 600.0), SizeF(180.0, 90.0))

# Set the labels and content for the signature details
signature.NameLabel = "Digitally signed by: "
signature.Name = "Gary"
signature.LocationInfoLabel = "Location: "
signature.LocationInfo = "CN"
signature.ReasonLabel = "Reason: "
signature.Reason = "Ensure authenticity"
signature.ContactInfoLabel = "Contact Number: "
signature.ContactInfo = "028-81705109"

# Set document permissions: allow form filling, forbid further changes
signature.DocumentPermissions = PdfCertificationFlags.AllowFormFill.value | PdfCertificationFlags.ForbidChanges.value

# Set the graphic mode to include both image and signature details,
# and set the signature image
signature.GraphicsMode = Security_GraphicMode.SignImageAndSignDetail
signature.SignImageSource = PdfImage.FromFile(inputImage)

# Embed a timestamp into the signature using a Time Stamp Authority (TSA) server
url = "http://tsa.cesnet.cz:3161/tsa"
signature.ConfigureTimestamp(url)

# Save the signed PDF and close the document
doc.SaveToFile(outputFile)
doc.Close()

View the Timestamp in PDF

When you open the signed PDF in a viewer like Adobe Acrobat, you can click the Signature Panel to view both the digital signature and the timestamp, which confirm the document’s validity and the signing time:

Add a Timestamped Digital Signature to PDF in Python

Get a Free License

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

Conclusion

Timestamping enhances the reliability of digital signatures by proving when a PDF was signed-even after the certificate has expired. With Spire.PDF for Python, implementing a timestamped digital signature is a straightforward process. Whether you're handling contracts, invoices, or confidential records, this approach ensures long-term document validity and compliance.

page 12