page 79

Subscribe to this RSS feed

Java (480)

Children categories

Spire.Presentation for Java (83)

View items...

Spire.OCR for Java (4)

View items...

Extract Images from PDF in Java – Preserve Quality & Filter Noise

2024-11-21 07:09:00 Written by zaki zou

Extract images from PDF in Java using Spire.PDF with high-quality output

When dealing with PDF documents that contain images—such as scanned reports, digital brochures, or design portfolios—you may need to extract these images for reuse or analysis. In this article, we'll show you how to extract images from PDF in Java, covering both basic usage and advanced image extracting techniques using the Spire.PDF for Java library.

Whether you're creating a PDF image extractor in Java or simply looking to extract images from a PDF file using Java code, this guide will walk you through the process step by step.

Guide Outline

Getting Started – Tools and Setup
Extract All Images from a PDF in Java
Advanced Tips for More Precise Image Extraction
Frequently Asked Questions
Conclusion

Getting Started – Tools and Setup

Extracting images from PDF files in Java can be challenging without third-party libraries. While PDFs may contain valuable image assets—such as scanned pages, charts, or embedded graphics—these elements are often encoded or compressed in ways that native Java APIs can’t handle directly.

Spire.PDF for Java provides a high-level, reliable way to locate and extract embedded or inline images from PDF files. Whether you’re building an automation tool or a document parser, this library helps you extract image content efficiently and with full quality.

Before getting started, make sure you have the following development tools ready:

Java Development Kit (JDK) 1.6 or above
Spire.PDF for Java (Free or commercial version)
An IDE (e.g., IntelliJ IDEA, Eclipse)

Maven Dependency:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.11.11</version>
    </dependency>
</dependencies>

You can use Free Spire.PDF for Java for smaller tasks.

Extract All Images from a PDF in Java

The most straightforward way to extract images from a PDF is by using the PdfImageHelper class in Spire.PDF for Java. This utility scans each page, locates embedded or inline images, and returns both the image data and metadata such as size and position.

Code Example: Basic Image Extraction

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.utilities.PdfImageHelper;
import com.spire.pdf.utilities.PdfImageInfo;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class ExtractAllImagePDF {
    public static void main(String[] args) throws IOException {
        // Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("input.pdf");

        // Create an image helper instance
        PdfImageHelper imageHelper = new PdfImageHelper();

        // Loop through each page to extract images
        for (int i = 0; i < pdf.getPages().getCount(); i++) {
            PdfPageBase page = pdf.getPages().get(i);
            PdfImageInfo[] imagesInfo = imageHelper.getImagesInfo(page);

            for (int j = 0; j < imagesInfo.length; j++) {
                BufferedImage image = imagesInfo[j].getImage();
                File file = new File("output/Page" + i + "_Image" + j + ".png");
                ImageIO.write(image, "png", file);
            }
        }

        pdf.close();
    }
}

Make sure the output folder exists before running the code to avoid IOException.

How It Works

PdfDocument loads and holds the structure of the input PDF.
PdfPageBase represents a single page inside the PDF.
PdfImageHelper.getImagesInfo(PdfPageBase) scans a specific page and returns an array of PdfImageInfo, each containing a detected image.
Each PdfImageInfo includes:
- The image itself as a BufferedImage
- Metadata like size, DPI, and page index
ImageIO.write() supports common formats like "png", "jpg", and "bmp" — you can change the format string as needed.

After running the extraction code, you’ll get a folder containing the exported images from the PDF, each saved in a separate file.

Extracted PDF images saved as PNG files from each page

These high-level abstractions save you from manually decoding image XObjects or parsing raw streams—making PDF image extraction in Java easier and cleaner.

To save full PDF pages as images instead of just extracting embedded images, follow our guide on saving PDF pages as images in Java.

Advanced Tips for More Precise Image Extraction

Extracting images from a PDF is not always a one-size-fits-all operation. Some files contain layout elements like background layers, small decorative icons, or embedded metadata images. The following advanced tips help you refine your extraction logic for better results:

Skip Background Images (Optional)

Some PDF files include background images, such as watermarks or decorative layers. When these are defined using standard PDF background settings, they are typically extracted as the first image on the page. To focus on meaningful content, simply skip the first extracted image per page.

for (int i = 1; i < imagesInfo.length; i++) {  // Skip background image
    BufferedImage image = imagesInfo[i].getImage();
    ImageIO.write(image, "PNG", new File("output/image_" + (i - 1) + ".png"));
}

You can also check the getBounds() property to assess image dimensions and placement before deciding to skip.

Filter by Image Size (Ignore Small Icons)

To exclude small elements like buttons or logos, add a size threshold before saving:

BufferedImage image = imagesInfo[i].getImage();
if (image.getWidth() > 200 && image.getHeight() > 200) {
    ImageIO.write(image, "PNG", new File("output/image_" + i + ".png"));
}

This helps keep the output folder clean and focused on relevant image content.

Export Images in Various Formats or Streams

You can output images in various formats or streams depending on your use case:

ImageIO.write(image, "JPEG", new File("output/image_" + i + ".jpg"));  // compressed
ImageIO.write(image, "BMP", new File("output/image_" + i + ".bmp"));  // high-quality

Use PNG or BMP for lossless quality (e.g., archival or OCR).
Use JPEG for web or lower storage usage.

You can also write images to a ByteArrayOutputStream or other output streams for further processing:

ByteArrayOutputStream stream = new ByteArrayOutputStream();
ImageIO.write(image, "PNG", stream);

Also Want to Extract Images from PDF Attachments?

If your PDF contains embedded file attachments like .jpg or .png images, you'll need a different approach. See our guide here:
How to Extract Attachments from PDF in Java

FAQ for Image Extraction from PDF in Java

Can I extract images from a PDF file using Java?

Yes. Using Spire.PDF for Java, you can easily extract embedded or inline images from any PDF page with a few lines of code.

Will extracted images retain their original quality?

Absolutely. Images are extracted in their original resolution and encoding. You can save them in PNG or BMP format to preserve full quality.

What’s the difference between image extraction and rendering PDF as an image?

Rendering a PDF page creates a bitmap version of the entire page (including text and layout), while image extraction pulls out only the embedded image objects that were originally inserted in the file.

Does this work for scanned PDFs?

Yes. Many scanned PDFs contain full-page raster images (e.g., JPGs or TIFFs). These are extracted just like any other embedded image.

Conclusion

Extracting images from PDF files using Java is fast and efficient with Spire.PDF. Whether you're analyzing marketing materials, scanned reports, or design portfolios, this Java PDF image extractor solution helps you programmatically access and save high-quality images embedded in your documents.

For more advanced cases—such as excluding layout images or processing attachments—the API offers enough flexibility to customize your approach.

To fully unlock the capabilities of Spire.PDF for Java without any evaluation limitations, you can apply for a free temporary license.

Published in Extract/Read

Tagged under

pdf java Extract Read

Java: Encrypt or Decrypt PDF Files

2022-06-29 07:04:00 Written by Koohji

For PDF documents that contain confidential or sensitive information, you may want to password protect these documents to ensure that only the designated person can access the information. This article will demonstrate how to programmatically encrypt a PDF document and decrypt a password-protected document using Spire.PDF for Java.

Encrypt a PDF File with Password
Remove Password to Decrypt a PDF File

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.11.11</version>
    </dependency>
</dependencies>

Encrypt a PDF File with Password

There are two kinds of passwords for encrypting a PDF file - open password and permission password. The former is set to open the PDF file, while the latter is set to restrict printing, contents copying, commenting, etc. If a PDF file is secured with both types of passwords, it can be opened with either password.

The PdfDocument.getSecurity().encrypt(java.lang.String openPassword, java.lang.String permissionPassword, java.util.EnumSet<PdfPermissionsFlags> permissions, PdfEncryptionKeySize keySize) method offered by Spire.PDF for Java allows you to set both open password and permission password to encrypt PDF files. The detailed steps are as follows.

Create a PdfDocument instance.
Load a sample PDF file using PdfDocument.loadFromFile() method.
Set open password, permission password, encryption key size and permissions.
Encrypt the PDF file using PdfDocument.getSecurity().encrypt(java.lang.String openPassword, java.lang.String permissionPassword, java.util.EnumSet<PdfPermissionsFlags> permissions, PdfEncryptionKeySize keySize) method.
Save the result file using PdfDocument.saveToFile () method.

Java

import java.util.EnumSet;

import com.spire.pdf.PdfDocument;
import com.spire.pdf.security.PdfEncryptionKeySize;
import com.spire.pdf.security.PdfPermissionsFlags;

public class EncryptPDF {

    public static void main(String[] args) {

        //Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();

        //Load a sample PDF file
        pdf.loadFromFile("E:\\Files\\sample.pdf");

        //Encrypt the file
        PdfEncryptionKeySize keySize = PdfEncryptionKeySize.Key_128_Bit;
        String openPassword = "e-iceblue";
        String permissionPassword = "test";
        EnumSet flags = EnumSet.of(PdfPermissionsFlags.Print, PdfPermissionsFlags.Fill_Fields);
        pdf.getSecurity().encrypt(openPassword, permissionPassword, flags, keySize);

        //Save and close
        pdf.saveToFile("Encrypt.pdf");
        pdf.close();

    }

}

Java: Encrypt or Decrypt PDF Files

Remove Password to Decrypt a PDF File

When you need to remove the password from a PDF file, you can set the open password and permission password to empty while calling the PdfDocument.getSecurity().encrypt(java.lang.String openPassword, java.lang.String permissionPassword, java.util.EnumSet<PdfPermissionsFlags> permissions, PdfEncryptionKeySize keySize, java.lang.String originalPermissionPassword) method. The detailed steps are as follows.

Create a PdfDocument object.
Load the encrypted PDF file with password using PdfDocument.loadFromFile(java.lang.String filename, java.lang.String password) method.
Decrypt the PDF file by setting the open password and permission password to empty using PdfDocument.getSecurity().encrypt(java.lang.String openPassword, java.lang.String permissionPassword, java.util.EnumSet<PdfPermissionsFlags> permissions, PdfEncryptionKeySize keySize, java.lang.String originalPermissionPassword) method.
Save the result file using PdfDocument.saveToFile() method.

Java

import com.spire.pdf.PdfDocument;
import com.spire.pdf.security.PdfEncryptionKeySize;
import com.spire.pdf.security.PdfPermissionsFlags;

public class DecryptPDF {

    public static void main(String[] args) throws Exception {

        //Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();
        
        //Load the encrypted PDF file with password
        pdf.loadFromFile("Encrypt.pdf", "e-iceblue");

        //Decrypt the file
        pdf.getSecurity().encrypt("", "", PdfPermissionsFlags.getDefaultPermissions(), PdfEncryptionKeySize.Key_256_Bit, "test");

        //Save and close
        pdf.saveToFile("Decrypt.pdf");
        pdf.close();
    }
}

Java: Encrypt or Decrypt PDF Files

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Security

Tagged under

pdf java Security

Adding Watermarks to PDF in Java: Text, Images & Custom Styling

2023-02-06 07:02:00 Written by Koohji

Java Add Watermarks to PDF

Watermarking PDF documents serves as an essential measure for intellectual property protection, document status identification, and brand reinforcement. Java developers can efficiently automate this process using specialized libraries like Spire.PDF for Java, which offers comprehensive solutions for applying both text-based watermarks (such as "CONFIDENTIAL" labels) and graphical watermarks (including corporate logos).

This practical guide provides step-by-step instructions for implementing PDF watermarking in Java using Spire.PDF. You'll learn proven techniques to enhance document security and professional presentation through effective watermark implementation.

Java Library for Watermarking PDF
Steps to Add a Watermark to PDF in Java
Add a Text Watermark to PDF
Add an Image Watermark to PDF
Conclusion
FAQs

Java Library for Watermarking PDF

Spire.PDF for Java is a versatile library that simplifies PDF manipulation, including watermarking. Its intuitive API allows developers to add watermarks with minimal code while offering fine-grained control over appearance and placement.

To get started, download Spire.PDF for Java and reference it in your project. For Maven users, include the following in your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.11.11</version>
    </dependency>
</dependencies>

Steps to Add a Watermark to PDF in Java

Load the PDF using PdfDocument.
Define the watermark (text with font/style or an image file).
Set transparency (e.g., 0.3 for faint, 0.7 for stronger visibility).
Calculate position (e.g., centered, custom location).
Apply the watermark to all pages or specific ones.
Save the modified document to a new file.

Add a Text Watermark to PDF

Text watermarks are ideal for adding labels like "DRAFT", "CONFIDENTIAL", or copyright notices. The implementation involves loading the PDF using PdfDocument , defining the font and brush for styling, and iterating through each page to apply the watermark text using a dedicated method that manages transparency, positioning, and drawing.

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;
import java.awt.*;
import java.awt.geom.Dimension2D;

public class AddTextWatermark {
    public static void main(String[] args) {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\AI.pdf");

        // Create a font and a brush
        PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("Arial Black", Font.PLAIN, 50), true);
        PdfBrush brush = PdfBrushes.getBlue();

        // Specify the watermark text
        String watermarkText = "DO NOT COPY";

        // Specify the opacity level
        float opacity = 0.6f;

        // Iterate through the pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {
            PdfPageBase page = doc.getPages().get(i);
            // Draw watermark text on the page
            addTextWatermark(page, watermarkText, font, brush, opacity);
        }

        // Save the changes to another file
        doc.saveToFile("output/Watermark.pdf");

        // Dispose resources
        doc.dispose();
    }

    // Method to add a text watermark to a given page
    private static void addTextWatermark(PdfPageBase page, String watermarkText, PdfTrueTypeFont font, PdfBrush brush, float opacity) {

        // Set the transparency level for the watermark
        page.getCanvas().setTransparency(opacity);

        // Measure the size of the watermark text
        Dimension2D textSize = font.measureString(watermarkText);

        // Get the width and height of the page
        double pageWidth = page.getActualSize().getWidth();
        double pageHeight = page.getActualSize().getHeight();

        // Calculate the position to center the watermark on the page
        double x = (pageWidth - textSize.getWidth()) / 2;
        double y = (pageHeight - textSize.getHeight()) / 2;

        // Draw the watermark text on the page at the calculated position
        page.getCanvas().drawString(watermarkText, font, brush, x, y);
    }
}

Output:

Output PDF file with a centered text watermark.

Add an Image Watermark to PDF

Image watermarks, such as logos, can elevate the professionalism of a document. This process begins by loading the PDF and specifying the image path and opacity. We then iterate through each page, calling a method that loads the image, calculates its position, and draws it on the page with the specified transparency.

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;

public class AddImageWatermark {
    public static void main(String[] args) {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\AI.pdf");

        // Specify the image path
        String imagePath = "C:\\Users\\Administrator\\Desktop\\logo2.png";

        // Specify the opacity level
        float opacity = 0.3f;

        // Iterate through the pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {

            // Draw watermark text on the current page
            addImageWatermark(doc.getPages().get(i), imagePath, opacity);
        }

        // Save the changes to another file
        doc.saveToFile("output/Watermark.pdf");

        // Dispose resources
        doc.dispose();
    }

    // Method to add an Image watermark to a given page
    private static void addImageWatermark(PdfPageBase page, String imagePath, float opacity) {

        // Load the image
        PdfImage image = PdfImage.fromFile(imagePath);

        // Get the width and height of the image
        double imageWidth = (double)image.getWidth();
        double imageHeight = (double)image.getHeight();

        // Get the width and height of the page
        double pageWidth = page.getActualSize().getWidth();
        double pageHeight = page.getActualSize().getHeight();

        // Calculate the position to center the watermark on the page
        double x = (pageWidth - imageWidth) / 2;
        double y = (pageHeight - imageHeight) / 2;

        // Set the transparency level for the watermark
        page.getCanvas().setTransparency(opacity);

        // Draw the image on the page at the calculated position
        page.getCanvas().drawImage(image, x, y);
    }
}

Output:

Output PDF file with a centered image watermark

Conclusion

In conclusion, adding watermarks to PDF documents in Java is a straightforward task with the right tools and techniques. By leveraging the Spire.PDF for Java library, developers can seamlessly integrate dynamic text watermarks (like copyright notices) or high-resolution image logos while maintaining optimal file performance.

This guide provided a step-by-step approach, from initial setup to final implementation, ensuring that you can protect your documents effectively. Whether for personal use or professional needs, watermarking is an essential skill that adds a layer of professionalism and integrity to your work.

FAQs

Q1. Can I rotate the watermark text?

Yes, use page.getCanvas().rotateTransform(angle) before drawing the text.

Q2. How do I adjust the position of the watermark?

You can modify the x and y coordinates in the addTextWatermark and addImageWatermark methods to change the watermark position.

Q3. Is it possible to add multiple watermarks to the same PDF?

Yes, by calling drawString() or drawImage() multiple times with different parameters.

Q4. Can I use a transparent PNG as a watermark?

Yes, Spire.PDF preserves the transparency of PNG images.

Q5. How do I apply watermarks to specific pages only?

Modify the loop to target specific pages, e.g., if (i == 0) applies the watermark only to the first page.

Get a Free License

To fully experience the capabilities of Spire.PDF for Java without any evaluation limitations, you can request a free 30-day trial license.

Published in Watermark

Tagged under

pdf java Watermark

News Category

Java (480)

Children categories

Getting Started – Tools and Setup

Extract All Images from a PDF in Java

Code Example: Basic Image Extraction

How It Works

Advanced Tips for More Precise Image Extraction

Skip Background Images (Optional)

Filter by Image Size (Ignore Small Icons)

Export Images in Various Formats or Streams

Also Want to Extract Images from PDF Attachments?

FAQ for Image Extraction from PDF in Java

Can I extract images from a PDF file using Java?

Will extracted images retain their original quality?

What’s the difference between image extraction and rendering PDF as an image?

Does this work for scanned PDFs?

Conclusion

Install Spire.PDF for Java

Encrypt a PDF File with Password

Remove Password to Decrypt a PDF File

Apply for a Temporary License

Java Library for Watermarking PDF

Steps to Add a Watermark to PDF in Java

Add a Text Watermark to PDF

Add an Image Watermark to PDF

Conclusion

FAQs

Q1. Can I rotate the watermark text?

Q2. How do I adjust the position of the watermark?

Q3. Is it possible to add multiple watermarks to the same PDF?

Q4. Can I use a transparent PNG as a watermark?

Q5. How do I apply watermarks to specific pages only?

Get a Free License

More...