Extract Images from PDF in Java – Preserve Quality & Filter Noise

Extract images from PDF in Java using Spire.PDF with high-quality output

When dealing with PDF documents that contain images—such as scanned reports, digital brochures, or design portfolios—you may need to extract these images for reuse or analysis. In this article, we'll show you how to extract images from PDF in Java, covering both basic usage and advanced image extracting techniques using the Spire.PDF for Java library.

Whether you're creating a PDF image extractor in Java or simply looking to extract images from a PDF file using Java code, this guide will walk you through the process step by step.

Guide Outline

Getting Started – Tools and Setup
Extract All Images from a PDF in Java
Advanced Tips for More Precise Image Extraction
Frequently Asked Questions
Conclusion

Getting Started – Tools and Setup

Extracting images from PDF files in Java can be challenging without third-party libraries. While PDFs may contain valuable image assets—such as scanned pages, charts, or embedded graphics—these elements are often encoded or compressed in ways that native Java APIs can’t handle directly.

Spire.PDF for Java provides a high-level, reliable way to locate and extract embedded or inline images from PDF files. Whether you’re building an automation tool or a document parser, this library helps you extract image content efficiently and with full quality.

Before getting started, make sure you have the following development tools ready:

Java Development Kit (JDK) 1.6 or above
Spire.PDF for Java (Free or commercial version)
An IDE (e.g., IntelliJ IDEA, Eclipse)

Maven Dependency:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.10.3</version>
    </dependency>
</dependencies>

You can use Free Spire.PDF for Java for smaller tasks.

Extract All Images from a PDF in Java

The most straightforward way to extract images from a PDF is by using the PdfImageHelper class in Spire.PDF for Java. This utility scans each page, locates embedded or inline images, and returns both the image data and metadata such as size and position.

Code Example: Basic Image Extraction

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.utilities.PdfImageHelper;
import com.spire.pdf.utilities.PdfImageInfo;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class ExtractAllImagePDF {
    public static void main(String[] args) throws IOException {
        // Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("input.pdf");

        // Create an image helper instance
        PdfImageHelper imageHelper = new PdfImageHelper();

        // Loop through each page to extract images
        for (int i = 0; i < pdf.getPages().getCount(); i++) {
            PdfPageBase page = pdf.getPages().get(i);
            PdfImageInfo[] imagesInfo = imageHelper.getImagesInfo(page);

            for (int j = 0; j < imagesInfo.length; j++) {
                BufferedImage image = imagesInfo[j].getImage();
                File file = new File("output/Page" + i + "_Image" + j + ".png");
                ImageIO.write(image, "png", file);
            }
        }

        pdf.close();
    }
}

Make sure the output folder exists before running the code to avoid IOException.

How It Works

PdfDocument loads and holds the structure of the input PDF.
PdfPageBase represents a single page inside the PDF.
PdfImageHelper.getImagesInfo(PdfPageBase) scans a specific page and returns an array of PdfImageInfo, each containing a detected image.
Each PdfImageInfo includes:
- The image itself as a BufferedImage
- Metadata like size, DPI, and page index
ImageIO.write() supports common formats like "png", "jpg", and "bmp" — you can change the format string as needed.

After running the extraction code, you’ll get a folder containing the exported images from the PDF, each saved in a separate file.

Extracted PDF images saved as PNG files from each page

These high-level abstractions save you from manually decoding image XObjects or parsing raw streams—making PDF image extraction in Java easier and cleaner.

To save full PDF pages as images instead of just extracting embedded images, follow our guide on saving PDF pages as images in Java.

Advanced Tips for More Precise Image Extraction

Extracting images from a PDF is not always a one-size-fits-all operation. Some files contain layout elements like background layers, small decorative icons, or embedded metadata images. The following advanced tips help you refine your extraction logic for better results:

Skip Background Images (Optional)

Some PDF files include background images, such as watermarks or decorative layers. When these are defined using standard PDF background settings, they are typically extracted as the first image on the page. To focus on meaningful content, simply skip the first extracted image per page.

for (int i = 1; i < imagesInfo.length; i++) {  // Skip background image
    BufferedImage image = imagesInfo[i].getImage();
    ImageIO.write(image, "PNG", new File("output/image_" + (i - 1) + ".png"));
}

You can also check the getBounds() property to assess image dimensions and placement before deciding to skip.

Filter by Image Size (Ignore Small Icons)

To exclude small elements like buttons or logos, add a size threshold before saving:

BufferedImage image = imagesInfo[i].getImage();
if (image.getWidth() > 200 && image.getHeight() > 200) {
    ImageIO.write(image, "PNG", new File("output/image_" + i + ".png"));
}

This helps keep the output folder clean and focused on relevant image content.

Export Images in Various Formats or Streams

You can output images in various formats or streams depending on your use case:

ImageIO.write(image, "JPEG", new File("output/image_" + i + ".jpg"));  // compressed
ImageIO.write(image, "BMP", new File("output/image_" + i + ".bmp"));  // high-quality

Use PNG or BMP for lossless quality (e.g., archival or OCR).
Use JPEG for web or lower storage usage.

You can also write images to a ByteArrayOutputStream or other output streams for further processing:

ByteArrayOutputStream stream = new ByteArrayOutputStream();
ImageIO.write(image, "PNG", stream);

Also Want to Extract Images from PDF Attachments?

If your PDF contains embedded file attachments like .jpg or .png images, you'll need a different approach. See our guide here:
How to Extract Attachments from PDF in Java

FAQ for Image Extraction from PDF in Java

Can I extract images from a PDF file using Java?

Yes. Using Spire.PDF for Java, you can easily extract embedded or inline images from any PDF page with a few lines of code.

Will extracted images retain their original quality?

Absolutely. Images are extracted in their original resolution and encoding. You can save them in PNG or BMP format to preserve full quality.

What’s the difference between image extraction and rendering PDF as an image?

Rendering a PDF page creates a bitmap version of the entire page (including text and layout), while image extraction pulls out only the embedded image objects that were originally inserted in the file.

Does this work for scanned PDFs?

Yes. Many scanned PDFs contain full-page raster images (e.g., JPGs or TIFFs). These are extracted just like any other embedded image.

Conclusion

Extracting images from PDF files using Java is fast and efficient with Spire.PDF. Whether you're analyzing marketing materials, scanned reports, or design portfolios, this Java PDF image extractor solution helps you programmatically access and save high-quality images embedded in your documents.

For more advanced cases—such as excluding layout images or processing attachments—the API offers enough flexibility to customize your approach.

To fully unlock the capabilities of Spire.PDF for Java without any evaluation limitations, you can apply for a free temporary license.