Thursday, 26 May 2022 06:31

Java: Convert PDF to Word

Nowadays, it is not difficult to convert PDF documents into Word files using a software. However, if you want to maintain the layout and even the font formatting while converting, it is not something that every software can accomplish. Spire.PDF for Java does it well and offers you the following two modes when converting PDF to Word in Java.

Fixed Layout mode has fast conversion speed and is conducive to maintaining the original appearance of PDF files to the greatest extent. However, the editability of the resulting document will be limited since each line of text in PDF will be presented in a separate frame in the generated Word document.

Flowable Structure is a full recognition mode. The converted content will not be presented in frames, and the structure of the resulting document is flowable. The generated Word document is easy to re-edit but may look different from the original PDF file.

Install Spire.PDF for Java

First, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.10.3</version>
    </dependency>
</dependencies>

Convert PDF to Doc/Docx with Fixed Layout

The following are the steps to convert PDF to Doc or Docx with fixed layout.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Convert the PDF document to a Doc or Docx format file using PdfDocument.saveToFile(String fileName, FileFormat fileFormat) method.
  • Java
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;

public class ConvertPdfToWordWithFixedLayout {

    public static void main(String[] args) {

        //Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //Load a sample PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\sample.pdf");

        //Convert PDF to Doc and save it to a specified path
        doc.saveToFile("output/ToDoc.doc", FileFormat.DOC);

        //Convert PDF to Docx and save it to a specified path
        doc.saveToFile("output/ToDocx.docx", FileFormat.DOCX);
        doc.close();
    }
}

Convert PDF to Doc/Docx with Flowable Structure

The following are the steps to convert PDF to Doc or Docx with flowable structure.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Set the conversion mode as flow using PdfDocument. getConvertOptions().setConvertToWordUsingFlow() method.
  • Convert the PDF document to a Doc or Docx format file using PdfDocument.saveToFile(String fileName, FileFormat fileFormat) method.
  • Java
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;

public class ConvertPdfToWordWithFlowableStructure {

    public static void main(String[] args) {

        //Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //Load a sample PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\sample.pdf");

        //Convert PDF to Word with flowable structure
        doc.getConvertOptions().setConvertToWordUsingFlow(true);

        //Convert PDF to Doc
        doc.saveToFile("output/ToDoc.doc", FileFormat.DOC);

        //Convert PDF to Docx
        doc.saveToFile("output/ToDocx.docx", FileFormat.DOCX);
        doc.close();
    }
}

Java: Convert PDF to Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion
Wednesday, 29 December 2021 06:05

Java: Convert PDF to PDF/A

PDF/A is a kind of PDF format designed for archiving and long-term preservation of electronic documents. Unlike paper documents that are easily damaged or smeared, PDF/A format ensures that documents can be reproduced in exactly the same way even after long-term storage. This article will demonstrate how to convert PDF to PDF/A-1A, 2A, 3A, 1B, 2B and 3B compliant PDF using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.PDF.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.10.3</version>
    </dependency>
</dependencies>

Convert PDF to PDF/A

The detailed steps are as follows:

  • Create a PdfStandardsConverter instance, and pass in a sample PDF file as a parameter.
  • Convert the sample file to PdfA1A conformance level using PdfStandardsConverter.toPdfA1A() method.
  • Convert the sample file to PdfA1B conformance level using PdfStandardsConverter. toPdfA1B() method.
  • Convert the sample file to PdfA2A conformance level using PdfStandardsConverter. toPdfA2A() method.
  • Convert the sample file to PdfA2B conformance level using PdfStandardsConverter. toPdfA2B() method.
  • Convert the sample file to PdfA3A conformance level using PdfStandardsConverter. toPdfA3A() method.
  • Convert the sample file to PdfA3B conformance level using PdfStandardsConverter. toPdfA3B() method.
  • Java
import com.spire.pdf.conversion.PdfStandardsConverter;

public class ConvertPdfToPdfA {
    public static void main(String[] args) {

        //Create a PdfStandardsConverter instance, and pass in a sample file as a parameter
        PdfStandardsConverter converter = new PdfStandardsConverter("sample.pdf");

        //Convert to PdfA1A
        converter.toPdfA1A("output/ToPdfA1A.pdf");

        //Convert to PdfA1B
        converter.toPdfA1B("output/ToPdfA1B.pdf");

        //Convert to PdfA2A
        converter.toPdfA2A( "output/ToPdfA2A.pdf");

        //Convert to PdfA2B
        converter.toPdfA2B("output/ToPdfA2B.pdf");

        //Convert to PdfA3A
        converter.toPdfA3A("output/ToPdfA3A.pdf");

        //Convert to PdfA3B
        converter.toPdfA3B("output/ToPdfA3B.pdf");
    }
}

Java: Convert PDF to PDF/A

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Convert PDF to Image in Java

Converting PDF files to images is essential for various document processing tasks, including generating thumbnails, archiving, and image manipulation. These conversions allow applications to present PDF content in more accessible formats, enhancing user experience and functionality. Java libraries such as Spire.PDF for Java enable efficient conversions to formats like PNG , JPEG , GIF , BMP , TIFF , and SVG , each serving specific purposes based on their characteristics.

This guide will walk you through the conversion process using Spire.PDF for Java, providing optimized code examples for each format. Additionally, it will explain the key differences among these formats to help you select the most suitable option for your needs.

Java PDF-to-Image Conversion Library

Spire.PDF for Java is a robust, feature-rich library for PDF manipulation. It offers several advantages for image conversion:

  1. High-quality rendering that preserves document formatting and layout
  2. Batch processing capabilities for handling multiple documents
  3. Flexible output options including resolution and format control
  4. Lightweight implementation with minimal memory footprint

The library supports conversion to all major image formats while maintaining excellent text clarity and graphic fidelity, making it suitable for both simple conversions and complex document processing pipelines.

Installation

To get started, download Spire.PDF for Java from our website and add it as a dependency in your project. For Maven users, include the following in your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.6.2</version>
    </dependency>
</dependencies>

Image Format Comparison

Different image formats serve distinct purposes. Below is a comparison of commonly used formats:

Format Compression Transparency Best For File Size
PNG Lossless Yes High-quality graphics, logos Medium
JPEG Lossy No Photographs, web images Small
GIF Lossless Yes (limited) Simple animations, low-color graphics Small
BMP None No Uncompressed images, Windows apps Large
TIFF Lossless Yes High-quality scans, printing Very Large
SVG Vector-based Yes (scalable) Logos, icons, web graphics Small

PNG is ideal for images requiring transparency and lossless quality, while JPEG is better for photographs due to its smaller file size. GIF supports simple animations, and BMP is rarely used due to its large size. TIFF is preferred for professional printing, and SVG is perfect for scalable vector graphics.

Convert PDF to PNG, JPEG, GIF, and BMP

Converting PDF files into various image formats like PNG, JPEG, GIF, and BMP allows developers to cater to diverse application needs. Each format serves different purposes; for example, PNG is ideal for high-quality graphics with transparency, while JPEG is better suited for photographs with smooth gradients.

By leveraging Spire.PDF's capabilities, developers can easily generate the required image formats for their projects, ensuring compatibility and optimal performance.

Basic Conversion Example

Let's start with the fundamental conversion process. The following code demonstrates how to convert each page of a PDF into individual PNG images:

import com.spire.pdf.PdfDocument;
import com.spire.pdf.graphics.PdfImageType;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class ConvertPdfToImage {

    public static void main(String[] args) throws IOException {

        // Create a PdfDocument instance
        PdfDocument doc = new PdfDocument();

        // Load a PDF file
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");

        // Iterate through the pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {

            // Convert the current page to BufferedImage
            BufferedImage image = doc.saveAsImage(i, PdfImageType.Bitmap);

            // Save the image data as a png file
            File file = new File("output/" + String.format(("ToImage-img-%d.png"), i));
            ImageIO.write(image, "PNG", file);
        }

        // Clear up resources
        doc.close();
    }
}

Explanation:

  • The PdfDocument class loads the input PDF file.
  • saveAsImage() converts each page into a BufferedImage .
  • ImageIO.write() saves the image in PNG format.

Tip : To convert to JPEG, GIF, or BMP, simply replace " PNG " with " JPEG ", " GIF ", or " BMP " in the ImageIO.write() method.

Output:

Screenshot of input PDF document and output PNG image files.

PNG with Transparent Background

Converting a PDF page to PNG with a transparent background involves adjusting the conversion options using the setPdfToImageOptionsmethod. This adjustment provides greater flexibility in generating the images, allowing for customized output that meets specific requirements.

Here’s how to achieve this:

doc.getConvertOptions().setPdfToImageOptions(0);

for (int i = 0; i < doc.getPages().getCount(); i++) {
    BufferedImage image = doc.saveAsImage(i, PdfImageType.Bitmap);
    File file = new File("C:\\Users\\Administrator\\Desktop\\Images\\" + String.format(("ToImage-img-%d.png"), i));
    ImageIO.write(image, "PNG", file);
}

Explanation:

  • setPdfToImageOptions(0) ensures transparency is preserved.
  • The rest of the process remains the same as the basic conversion.

Custom DPI Settings

The saveAsImage method also provides an overload that allows developers to specify the DPI (dots per inch) for the output images. This feature is crucial for ensuring that the images are rendered at the desired resolution, particularly when high-quality images are required.

Here’s an example of converting a PDF to images with a specified DPI:

for (int i = 0; i < doc.getPages().getCount(); i++) {
    BufferedImage image = doc.saveAsImage(i, PdfImageType.Bitmap,300, 300);
    File file = new File("C:\\Users\\Administrator\\Desktop\\Images\\" + String.format(("ToImage-img-%d.png"), i));
    ImageIO.write(image, "PNG", file);
}

Explanation:

  • The saveAsImage() method accepts dpiX and dpiY parameters for resolution control.
  • Higher DPI values (e.g., 300) produce sharper images but increase file size.

DPI Selection Tip:

  • 72-100 DPI : Suitable for screen display
  • 150-200 DPI : Good for basic printing
  • 300+ DPI : Professional printing quality
  • 600+ DPI : High-resolution archival

Convert PDF to TIFF in Java

TIFF (Tagged Image File Format) is another popular image format, especially in the publishing and printing industries. Spire.PDF makes it easy to convert PDF pages to TIFF format using the saveToTiff method.

Here’s a simple example demonstrating how to convert a PDF to a multi-page TIFF:

import com.spire.pdf.PdfDocument;

public class ConvertPdfToTiff {

    public static void main(String[] args) {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF file
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");

        // Convert a page range to tiff
        doc.saveToTiff("output/PageToTiff.tiff",0,2,300,300);

        // Clear up resources
        doc.dispose();
    }
}

Explanation:

  • saveToTiff() converts a specified page range (here, pages 0 to 2).
  • The last two parameters set the DPI for the output image.

Output:

Screenshot of input PDF document and output Tiff file.

Convert PDF to SVG in Java

SVG (Scalable Vector Graphics) is a vector image format that is widely used for its scalability and compatibility with web technologies. Converting PDF to SVG can be beneficial for web applications that require responsive images.

To convert a PDF document to SVG using Spire.PDF, the following code can be implemented:

import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;

public class ConvertPdfToSvg {

    public static void main(String[] args) {

        // Initialize a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load the PDF document from the specified path
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");

        // Optionally, convert to a single SVG (uncomment to enable)
        // doc.getConvertOptions().setOutputToOneSvg(true);

        // Save the document as SVG files (one SVG per page by default)
        doc.saveToFile("Output/PDFToSVG.svg", FileFormat.SVG);

        // Clear up resources 
        doc.dispose();
    }
}

Explanation:

  • saveToFile() with FileFormat.SVG exports the PDF as an SVG file.
  • Optionally, setOutputToOneSvg(true) merges all pages into a single SVG.

Output:

Screenshot of input PDF document and output SVG image files.

Conclusion

Spire.PDF for Java simplifies PDF-to-image conversion with support for popular formats like PNG, JPEG, TIFF, SVG, and more . Key features such as transparency control, custom DPI settings, and multi-page TIFF/SVG output enable tailored solutions for generating thumbnails, high-quality prints, or web-optimized graphics. The library ensures high fidelity and performance , making it ideal for batch processing or dynamic rendering. Easily integrate the provided code snippets and APIs to enhance document handling in your Java applications, whether for reporting, archiving, or interactive content.

FAQs

Q1. Can I specify the DPI when converting to TIFF or SVG?

When converting to TIFF, you can specify the DPI to ensure high-quality output. However, SVG is a vector format and does not require DPI settings, as it scales based on the display size.

Q2. Can I convert specific pages of a PDF to images?

Yes, both the saveAsImage and saveToTiff methods allow you to indicate which pages to include in the conversion.

Q3. What is the difference between lossless and lossy image formats?

Lossless formats (like PNG and TIFF) retain all image quality during compression, while lossy formats (like JPEG) reduce file size by discarding some image information, which may affect quality.

Q4. How does converting to SVG differ from raster formats?

Converting to SVG generates vector images that scale without losing quality, while raster formats like PNG and JPEG are pixel-based and can lose quality when resized.

Q5. What other file formats can Spire.PDF convert PDFs to?

Spire.PDF is a powerful Java PDF library that supports converting PDF files to multiple formats, such as:

Get a Free License

To fully experience the capabilities of Spire.PDF for Java without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion
Page 3 of 3