page 80

Subscribe to this RSS feed

Java (481)

Children categories

Spire.Presentation for Java (83)

View items...

Spire.OCR for Java (4)

View items...

PDF to Text in Java: Extract Text from PDFs (Text-Based & Scanned)

2022-12-02 07:57:00 Written by Koohji

Java code example demonstrating text extraction from PDF files

Extracting text from PDF files is a common task for Java developers working on document processing, data extraction, search indexing, and automation. PDFs often contain text in two formats: digital text embedded in the file or scanned images of text. Extracting content from these requires different approaches.

This article explains how to extract text from both text-based PDFs and scanned (image-based) PDFs using Java, complete with detailed code examples and explanations. Whether you need to process reports, invoices, or scanned documents, this guide will help you get started quickly and efficiently.

Why Extract Text from PDFs in Java?
Difference Between Text-Based and Scanned PDFs
How to Extract Text from Text-Based PDFs in Java
How to Extract Text from Scanned PDFs Using Java & OCR
Common Challenges and Best Practices for PDF Text Extraction
Conclusion
Frequently Asked Questions

Why Extract Text from PDFs in Java?

PDF files are designed for consistent visual formatting across platforms. However, extracting the underlying text lets developers:

Enable full-text search
Automate form and invoice processing
Feed text into AI models
Convert content for analysis or reporting
Repurpose documents into other formats (HTML, Markdown, CSV)

Difference Between Text-Based and Scanned PDFs

Before extracting text, it’s important to understand the PDF type because the extraction approach differs:

Text-Based PDFs

Contain embedded, selectable text stored in the document structure
Text can be extracted directly by parsing the PDF’s text objects
Typically created by exporting from word processors, reports, or digital sources

Scanned PDFs

Are images of pages, often created by scanning paper documents
Do not contain embedded text—only images of text
Require Optical Character Recognition (OCR) to convert images into machine-readable text

Knowing your PDF type determines the extraction method and tools you need.

How to Extract Text from Text-Based PDFs in Java

Text-based PDFs allow direct extraction of text content. With libraries like Spire.PDF for Java, you can extract text from an entire PDF, specific pages, or designated rectangular areas. This is useful for a variety of tasks, such as content indexing, document analysis, and data processing.

Key Features

Extract text from full documents or individual pages
Target specific rectangular regions within a page
Preserve original layout
Support for multi-language text extraction

Maven Dependency

To begin, add the following Maven dependency for Spire.PDF to your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.7.5</version>
    </dependency>
</dependencies>

Extract All Text from a PDF

If you want to convert an entire PDF to plain text, you can iterate through all the pages and use the extract method provided by the PdfTextExtractor class to retrieve text from each page in sequence. This method returns a string containing the textual content of the page and preserves the original layout, including spacing, line breaks, and paragraph structure as much as possible.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.texts.PdfTextExtractOptions;
import com.spire.pdf.texts.PdfTextExtractor;


import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class ExtractAllTextFromPDF {
    public static void main(String[] args){
        // Create a PdfDocument instance and load the PDF file
        PdfDocument doc = new PdfDocument();
        doc.loadFromFile("sample.pdf");

        // Create a StringBuilder to store extracted text from all pages
        StringBuilder fullText = new StringBuilder();

        // Loop through each page in the PDF
        for (int i = 0; i < doc.getPages().getCount(); i++) {
            // Get the current page
            PdfPageBase page = doc.getPages().get(i);
            // Create a text extractor for the page
            PdfTextExtractor extractor = new PdfTextExtractor(page);
            // Extract text using default options
            String text = extractor.extract(new PdfTextExtractOptions());
            // Append extracted text and add spacing between pages
            fullText.append(text).append("\n\n\n\n");
        }

        // Write the extracted text to a .txt file
        try (BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt"))) {
            writer.write(fullText.toString());
        } catch (IOException e) {
            // Print any file I/O errors
            e.printStackTrace();
        }

        // Close the PDF document to free resources
        doc.close();
    }
}

Note: You need to modify the PDF file path as needed.

Java code to extract text from a PDF using Spire.PDF

Extract Text from a Page

When working with multi-page PDFs, you may only need to extract content from a specific page—for example, a summary, cover sheet, or signature page. In such cases, you can access the target page by its index and use the extract method from the PdfTextExtractor class to retrieve text from that individual page.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.texts.PdfTextExtractOptions;
import com.spire.pdf.texts.PdfTextExtractor;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class ExtractTextFromSelectedPage {
    public static void main(String[] args){
        // Create a PdfDocument instance and load the PDF file
        PdfDocument doc = new PdfDocument();
        doc.loadFromFile("sample.pdf");

        // Define the target page index (e.g., 0 for the first page)
        int pageIndex = 0;

        // Check if the specified page exists in the document
        if (pageIndex >= 0 && pageIndex < doc.getPages().getCount()) {
            // Get the specified page
            PdfPageBase page = doc.getPages().get(pageIndex);
            // Create a text extractor for the page
            PdfTextExtractor extractor = new PdfTextExtractor(page);
            // Extract text from the page
            String text = extractor.extract(new PdfTextExtractOptions());

            // Write the extracted text to a .txt file
            try (BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt"))) {
                writer.write(text);
            } catch (IOException e) {
                // Print any file I/O errors
                e.printStackTrace();
            }
        } else {
            System.out.println("Invalid page index.");
        }

        // Close the PDF document to free resources
        doc.close();
    }
}

Note: You need to change the page index according to your needs.

Extract Text from a Page Area (Rectangular Region)

To extract text from a specific area of a PDF page, first define the rectangular region using a Rectangle2D object, then use the setExtractArea method of the PdfTextExtractOptions class to limit extraction to that area. This helps isolate relevant content and exclude unrelated text outside the defined region.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.texts.PdfTextExtractOptions;
import com.spire.pdf.texts.PdfTextExtractor;

import java.awt.geom.Rectangle2D;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class ExtractTextFromSelectedPage {
    public static void main(String[] args){
        // Create a PdfDocument instance and load the PDF file
        PdfDocument doc = new PdfDocument();
        doc.loadFromFile("sample.pdf");

        // Define the target page index (0-based, 0 means first page)
        int pageIndex = 0;

        // Check if the specified page exists in the document
        if (pageIndex >= 0 && pageIndex < doc.getPages().getCount()) {
            // Get the specified page
            PdfPageBase page = doc.getPages().get(pageIndex);

            // Define the rectangular region (x, y, width, height)
            // Coordinates are relative to the PDF page coordinate system in the top-left corner
            Rectangle2D region = new Rectangle2D.Float(100, 150, 300, 100);

            // Initialize a text extractor for the page
            PdfTextExtractor extractor = new PdfTextExtractor(page);

            // Create extraction options and set the region to extract text from
            PdfTextExtractOptions options = new PdfTextExtractOptions();
            options.setExtractArea(region);

            // Extract text from the defined rectangular area
            String text = extractor.extract(options);

            // Write the extracted text to a text file
            try (BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt"))) {
                writer.write(text);
            } catch (IOException e) {
                // Print any errors during file writing
                e.printStackTrace();
            }
        } else {
            // Inform if the specified page index is invalid
            System.out.println("Invalid page index.");
        }

        // Close the PDF document to free resources
        doc.close();
    }
}

Tip: Coordinates are relative to the PDF page, with the origin (0,0) at the top-left corner. The X-axis increases to the right, and the Y-axis increases downward. Learn more about PDF coordinate positioning in our guide: Generate PDF Files in Java (Developer Tutorial).

How to Extract Text from Scanned PDFs Using Java & OCR

Scanned PDFs do not contain embedded, selectable text; instead, they store images of the document pages. To extract text from such PDFs, you need to:

Convert each PDF page into an image using a PDF processing library (e.g., Spire.PDF).
Use an OCR (Optical Character Recognition) engine (e.g., Spire.OCR) to recognize and convert text from these images into machine-readable format.

Maven Dependencies

Add the following repositories and dependencies to your pom.xml to include the required libraries in your Java project:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.7.5</version>
    </dependency>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.ocr</artifactId>
        <version>1.9.22</version>
    </dependency>
</dependencies>

Download OCR Model

Spire.OCR for Java requires downloading a language model compatible with your operating system:

After downloading, extract the package to a directory accessible by your application. You'll reference its path in your code.

Java Code Example for OCR Text Extraction from Scanned PDF

The code below demonstrates how to extract text from scanned PDFs that contain only images. Each page is first converted into an image using saveAsImage(). Then, the OCR engine (OcrScanner) reads the image and extracts the text. The recognized text from all pages is saved to a .txt file.

import com.spire.ocr.ConfigureOptions;
import com.spire.ocr.OCRImageFormat;
import com.spire.ocr.OcrException;
import com.spire.ocr.OcrScanner;
import com.spire.pdf.PdfDocument;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.*;

public class ExtractTextFromScannedPDF {
    public static void main(String[] args) throws IOException, OcrException {
        // Load a scanned PDF
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("sample.pdf");

        // Create a StringBuilder to store all extracted text
        StringBuilder allText = new StringBuilder();

        // Loop through each page
        for (int i = 0; i < pdf.getPages().getCount(); i++) {
            // Convert current page to image
            BufferedImage image = pdf.saveAsImage(i);

            // Convert image to input stream
            ByteArrayOutputStream os = new ByteArrayOutputStream();
            ImageIO.write(image, "PNG", os);
            InputStream imageStream = new ByteArrayInputStream(os.toByteArray());

            // Configure OCR options
            OcrScanner scanner = new OcrScanner();
            ConfigureOptions options = new ConfigureOptions();
            // Set the language for OCR engine
            // Supported languages include: English, Chinese, Chinesetraditional, French, German, Japanese, and Korean.
            options.setLanguage("English");
            // Se the path to OCR model folder
            options.setModelPath("E:\\win-x64");
            scanner.ConfigureDependencies(options);

            // Perform OCR and collect text
            scanner.Scan(imageStream, OCRImageFormat.Png);
            String text = scanner.getText().toString();
            allText.append(text).append(System.lineSeparator()).append(System.lineSeparator());
        }

        // Save all extracted text to a .txt file
        try (FileWriter writer = new FileWriter("OCR_ExtractedText.txt")) {
            writer.write(allText.toString());
        } catch (IOException e) {
            System.out.println("Failed to save extracted text.");
            e.printStackTrace();
        }

        // Close the PDF document to free resources
        pdf.close();
    }
}

Note: The model path should point to the folder that contains the OCR model and language data. Make sure the folder is accessible in your environment.

Common Challenges and Best Practices for PDF Text Extraction

When extracting text from PDFs, developers often face several common challenges; the following table outlines these issues along with practical tips to help overcome them effectively.

Challenge	Description	Tips
Formatting Loss	Extracted text might lose original layout	Use libraries supporting layout retention
OCR Accuracy	Low-quality scans reduce recognition accuracy	Use high-resolution images and appropriate models
Multilingual Support	Scanned PDFs might contain languages other than English	Use corresponding OCR language models

Conclusion

Converting PDF files to text in Java enables efficient document processing, search, and automation. Spire.PDF for Java simplifies text extraction from digital PDFs, while Spire.OCR for Java provides a reliable solution for handling scanned and image-based PDFs. By combining these tools, developers can build robust, end-to-end PDF text extraction systems tailored to any business need.

Frequently Asked Questions

Q1: Can I extract text from scanned PDFs in Java?

A1: Yes. You’ll need to convert each page to an image and then use OCR (Optical Character Recognition) to recognize and extract the text from the image.

Q2: How can I tell if a PDF is scanned or text-based?

A2: Open the PDF and try selecting the text with your mouse. If you can select and copy text, it’s text-based. If not, it's likely a scanned image.

Q3: Can I extract text from a password-protected PDF in Java?

A3: Yes. If the password is known, the PDF can be decrypted before extracting text using a supported library like Spire.PDF.

Q4: Can I extract tables or structured data from PDFs using Java?

A4: Yes. Some Java PDF libraries support extracting tables or structured content by detecting text alignment, cell boundaries, or using region-based extraction. For more accurate results, tools that offer table recognition features - such as Spire.PDF for Java - can help simplify the process.

Published in Extract/Read

Tagged under

pdf java Extract Read

Java: Draw Shapes in PDF

2024-10-24 05:47:00 Written by Koohji

Drawing shapes such as rectangles, ellipses, and lines in a PDF document can enhance the visual effect of the document and help to highlight key points. While creating a report, presentation or thesis, if there are some concepts or data relationships that are difficult to express clearly in words, adding appropriate shapes can assist in the expression of information. In this article, you will learn how to draw shapes in a PDF document in Java using Spire.PDF for Java.

Draw Lines in PDF in Java
Draw Arcs and Pies in PDF in Java
Draw Rectangles in PDF in Java
Draw Ellipses in PDF in Java

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.12.16</version>
    </dependency>
</dependencies>

Draw Lines in PDF in Java

Spire.PDF for Java provides the PdfPageBase.getCanvas().drawLine(PdfPen pen, float x1, float y1, float x2, float y2) method to draw lines at specified locations on a PDF page. And by specifying different PDF pen styles, you can draw solid or dashed lines as needed. The following are the detailed steps.

Create a PdfDocument object.
Add a PDF page using PdfDocument.getPages().add() method.
Save the current drawing state using PdfPageBase.getCanvas().save() method.
Define the starting x, y coordinates and length of the line.
Create a PdfPen object with specified color and thickness.
Draw a solid line on the page using the pen through PdfPageBase.getCanvas().drawLine() method.
Set the pen style to dashed, and then set the dashed line pattern.
Draw a dashed line on the page using the pen with a dashed line style through PdfPageBase.getCanvas().drawLine() method.
Restore the previous drawing state using PdfPageBase.getCanvas().restore() method.
Save the result document using PdfDocument.saveToFile() method.

Java

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;

import java.awt.*;

public class drawLine {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();

        // Add a Page
        PdfPageBase page = pdf.getPages().add();

        // Save the current graphics state
        PdfGraphicsState state = page.getCanvas().save();

        // Specify the starting X and y coordinates of the line
        float x = 100;
        float y = 70;

        // Specify the length of the line
        float width = 300;

        // Create a PDF pen with blue color and thickness of 2
        PdfPen pen = new PdfPen(new PdfRGBColor(Color.BLUE), 2f);

        // Draw a solid line on the page using the pen
        page.getCanvas().drawLine(pen, x, y, x + width, y);

        // Set the pen style to dashed
        pen.setDashStyle(PdfDashStyle.Dash);

        // Set the dashed line pattern
        pen.setDashPattern(new float[]{1, 4, 1});

        // Draw a dashed line on the page using the pen
        page.getCanvas().drawLine(pen, x, y+30, x + width, y+30);

        // Restore the previous saved graphics state
        page.getCanvas().restore(state);

        // Save the PDF document
        pdf.saveToFile("DrawLines.pdf");

        // Close the document and release resources
        pdf.close();
        pdf.dispose();
    }
}

Java: Draw Shapes in PDF

Draw Arcs and Pies in PDF in Java

To draw arcs or pies at the specified locations on a PDF page, you can use the PdfPageBase.getCanvas().drawArc() and the PdfPageBase.getCanvas().drawPie() methods. The following are the detailed steps.

Create a PdfDocument object.
Add a PDF page using PdfDocument.getPages().add() method.
Save the current drawing state using PdfPageBase.getCanvas().save() method.
Create a PdfPen object with specified color and thickness.
Draw an arc on the page using the pen through PdfPageBase.getCanvas().drawArc()method.
Draw a pie chart on the page using the pen through PdfPageBase.getCanvas().drawArc() method.
Restore the previous drawing state using PdfPageBase.getCanvas().restore() method.
Save the result document using PdfDocument.saveToFile() method.

Java

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;

import java.awt.*;
import java.awt.geom.Rectangle2D;

public class drawArcAndPie {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();

        // Add a Page
        PdfPageBase page = pdf.getPages().add();

        // Save the current graphics state
        PdfGraphicsState state = page.getCanvas().save();

        // Create a PDF pen with specified color and thickness of 2
        PdfPen pen = new PdfPen(new PdfRGBColor(new Color(139,0,0)), 2f);

        // Specify the start and sweep angles of the arc
        float startAngle = 90;
        float sweepAngle = 230;
        // Draw an arc on the page using the pen
        Rectangle2D.Float rect= new Rectangle2D.Float(30, 60, 120, 120);
        page.getCanvas().drawArc(pen, rect, startAngle, sweepAngle);

        // Specify the start and sweep angles of the pie chart
        float startAngle1 = 0;
        float sweepAngle1 = 330;
        // Draw a pie chart on the page using the pen
        Rectangle2D.Float rect2= new Rectangle2D.Float(200, 60, 120, 120);
        page.getCanvas().drawPie(pen, rect2, startAngle1, sweepAngle1);

        // Restore the previous saved graphics state
        page.getCanvas().restore(state);

        // Save the PDF document
        pdf.saveToFile("DrawArcAndPie.pdf");

        // Close the document and release resources
        pdf.close();
        pdf.dispose();
    }
}

Java: Draw Shapes in PDF

Draw Rectangles in PDF in Java

Spire.PDF for Java provides the PdfPageBase.getCanvas().drawRectangle() method to draw rectangular shapes on PDF pages. You can pass different parameters to the method to define the position, size and fill color of the rectangle. The following are the detailed steps.

Create a PdfDocument object.
Add a PDF page using PdfDocument.getPages().add() method.
Save the current drawing state using PdfPageBase.getCanvas().save() method.
Create a PdfPen object with specified color and thickness.
Draw a rectangle on the page using the pen through PdfPageBase.getCanvas().drawRectangle() method.
Create a PdfLinearGradientBrush object for linear gradient filling.
Draw a filled rectangle using the linear gradient brush through PdfPageBase.getCanvas().drawRectangle() method.
Restore the previous drawing state using PdfPageBase.getCanvas().restore() method.
Save the result document using PdfDocument.saveToFile() method.

Java

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;

import java.awt.*;
import java.awt.geom.Rectangle2D;

public class drawRectangles {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();

        // Add a Page
        PdfPageBase page = pdf.getPages().add();

        // Save the current graphics state
        PdfGraphicsState state = page.getCanvas().save();

        // Create a PDF pen with specified color and thickness of 1.5
        PdfPen pen = new PdfPen(new PdfRGBColor(Color.magenta), 1.5f);

        // Draw a rectangle on the page using the pen
        page.getCanvas().drawRectangle(pen, new Rectangle(20, 60, 150, 90));

        // Create a linear gradient brush
        Rectangle2D.Float rect = new Rectangle2D.Float(220, 60, 150, 90);
        PdfLinearGradientBrush linearGradientBrush = new PdfLinearGradientBrush(rect,new PdfRGBColor(Color.white),new PdfRGBColor(Color.blue),PdfLinearGradientMode.Vertical);

        // Create a new PDF pen with specified color and thickness of 0.5
        PdfPen pen1 = new PdfPen(new PdfRGBColor(Color.black), 0.5f);

        // Draw a filled rectangle using the new pen and linear gradient brush
        page.getCanvas().drawRectangle(pen1, linearGradientBrush, rect);

        // Restore the previous graphics state
        page.getCanvas().restore(state);

        // Save the PDF document
        pdf.saveToFile("DrawRectangles.pdf");

        // Close the document and release resources
        pdf.close();
        pdf.dispose();
    }
}

Java: Draw Shapes in PDF

Draw Ellipses in PDF in Java

The PdfPageBase.getCanvas().drawEllipse() method allows for drawing ellipses on a PDF page. You can use either a PDF pen or a fill brush to draw ellipses in different styles. The following are the detailed steps.

Create a PdfDocument object.
Add a PDF page using PdfDocument.getPages().add() method.
Save the current drawing state using PdfPageBase.getCanvas().save() method.
Create a PdfPen object with specified color and thickness.
Draw an ellipse on the page using the pen through PdfPageBase.getCanvas().drawEllipse() method.
Create a PdfSolidBrush object.
Draw a filled ellipse using the brush through PdfPageBase.getCanvas().drawEllipse() method.
Restore the previous drawing state using PdfPageBase.getCanvas().restore() method.
Save the result document using PdfDocument.saveToFile() method.

Java

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;

import java.awt.*;

public class drawEllipses {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();

        // Add a Page
        PdfPageBase page = pdf.getPages().add();

        // Save the current graphics state
        PdfGraphicsState state = page.getCanvas().save();

        // Create a PDF pen with specified color and thickness
        PdfPen pen = new PdfPen(new PdfRGBColor(new Color(95, 158, 160)), 1f);
        // Draw an ellipse on the page using the pen
        page.getCanvas().drawEllipse(pen, 30, 60, 150, 100);

        // Create a brush with specified color for filling
        PdfBrush brush = new PdfSolidBrush(new PdfRGBColor(new Color(95, 158, 160)));
        // Draw a filled ellipse using the brush
        page.getCanvas().drawEllipse(brush, 220, 60, 150, 100);

        // Restore the previous graphics state
        page.getCanvas().restore(state);

        // Save the PDF document
        pdf.saveToFile("DrawEllipses.pdf");

        // Close the document and release resources
        pdf.close();
        pdf.dispose();
    }
}

Java: Draw Shapes in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Shape

Tagged under

pdf java Shape

Java: Count the Number of Pages in PDF

2024-10-16 06:47:00 Written by Koohji

Knowing the number of pages in a PDF helps you understand the length of the document, which is especially useful in scenarios where a large number of PDF documents need to be processed, such as office work, academic research, or legal document management. By getting the PDF page count, you can estimate the time required to process the document, thus rationalizing tasks and increasing efficiency. In this article, you will learn how to get the number of pages in a PDF file in Java using Spire.PDF for Java.

Install Spire.PDF for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.12.16</version>
    </dependency>
</dependencies>

Count the Number of Pages in a PDF File in Java

The PdfDocument.getPages().getCount() method provided by Spire.PDF for Java allows to quickly count the number of pages in a PDF file without opening it. The following are the detailed steps.

Create a PdfDocument object.
Load a sample PDF file using PdfDocument.loadFromFile() method.
Count the number of pages in the PDF file using PdfDocument.getPages().getCount() method.
Print out the result.

Java

import com.spire.pdf.PdfDocument;

public class CountPdfPages {

    public static void main(String[] args) {

        //Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();

        //Load a PDF file
       pdf.loadFromFile("contract.pdf");

        //Count the number of pages in the PDF file
        int pageCount = pdf.getPages().getCount();

        //Output the result
        System.out.print("The number of pages in the PDF is: " + pageCount);
    }
}

Java: Count the Number of Pages in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Page Setting

Tagged under

pdf java Page Setting

News Category

Java (481)

Children categories

Table of Contents

Why Extract Text from PDFs in Java?

Difference Between Text-Based and Scanned PDFs

Text-Based PDFs

Scanned PDFs

How to Extract Text from Text-Based PDFs in Java

Maven Dependency

Extract All Text from a PDF

Extract Text from a Page

Extract Text from a Page Area (Rectangular Region)

How to Extract Text from Scanned PDFs Using Java & OCR

Maven Dependencies

Download OCR Model

Java Code Example for OCR Text Extraction from Scanned PDF

Common Challenges and Best Practices for PDF Text Extraction

Conclusion

Frequently Asked Questions

Install Spire.PDF for Java

Draw Lines in PDF in Java

Draw Arcs and Pies in PDF in Java

Draw Rectangles in PDF in Java

Draw Ellipses in PDF in Java

Apply for a Temporary License

Install Spire.PDF for Java

Count the Number of Pages in a PDF File in Java

Apply for a Temporary License

More...