page 3

Subscribe to this RSS feed

Java (481)

Children categories

Spire.Presentation for Java (83)

View items...

Spire.OCR for Java (4)

View items...

How to Read Excel Files in Java (XLS/XLSX) – Complete Guide

2025-06-13 08:30:27 Written by Administrator

Cover image for tutorial on how to read Excel file in Java

Reading Excel files using Java is a common requirement in enterprise applications, especially when dealing with reports, financial data, user records, or third-party integrations. Whether you're building a data import feature, performing spreadsheet analysis, or integrating Excel parsing into a web application, learning how to read Excel files in Java efficiently is essential.

In this tutorial, you’ll discover how to read .xls and .xlsx Excel files using Java. We’ll use practical Java code examples which also cover how to handle large files, read Excel files from InputStream, and extract specific content line by line.

Table of Contents

1. Set Up Your Java Project
2. How to Read XLSX and XLS Files in Java
3. Best Practices for Large Excel Files
4. Full Example: Java Program to Read Excel File
5. Summary
6. FAQ

1. Set Up Your Java Project

To read Excel files using Java, you need a library that supports spreadsheet file formats. Spire.XLS for Java offers support for both .xls (legacy) and .xlsx (modern XML-based) files and provides a high-level API that makes Excel file reading straightforward.

Add Spire.XLS to Your Project

If you're using Maven, add the following to your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.xls</artifactId>
        <version>15.12.15</version>
    </dependency>
</dependencies>

If you're not using Maven, you can manually download the JAR from the official Spire.XLS website and add it to your classpath.

For smaller Excel processing tasks, you can also choose Free Spire.XLS for Java.

2. How to Read XLSX and XLS Files in Java

Java programs can easily read Excel files by loading the workbook and iterating through worksheets, rows, and cells. The .xlsx format is commonly used in modern Excel, while .xls is its older binary counterpart. Fortunately, Spire.XLS supports both formats seamlessly with the same code.

Load and Read Excel File (XLSX or XLS)

Here’s a basic example that loads an Excel file and prints its content:

import com.spire.xls.*;

public class ReadExcel {
    public static void main(String[] args) {
        // Create a workbook object and load the Excel file
        Workbook workbook = new Workbook();
        workbook.loadFromFile("data.xlsx"); // or "data.xls"

        // Get the first worksheet
        Worksheet sheet = workbook.getWorksheets().get(0);
        // Loop through each used row and column
        for (int i = 1; i <= sheet.getLastRow(); i++) {
            for (int j = 1; j <= sheet.getLastColumn(); j++) {
                // Get cell text of a cell range
                String cellText = sheet.getCellRange(i, j).getValue();
                System.out.print(cellText + "\t");
            }
            System.out.println();
        }
    }
}

You can replace the file path with an .xls file and the code remains unchanged. This makes it simple to read Excel files using Java regardless of format.

The Excel file being read and the output result shown in the console.

Java example reading xlsx or xls file

Read Excel File Line by Line with Row Objects

In scenarios like user input validation or applying business rules, processing each row as a data record is often more intuitive. In such cases, you can read the Excel file line by line using row objects via the getRows() method.

for (int i = 0; i < sheet.getRows().length; i++) {
    // Get a row
    CellRange row = sheet.getRows()[i];
    if (row != null && !row.isBlank()) {
        for (int j = 0; j < row.getColumns().length; j++) {
            String text = row.getColumns()[j].getText();
            System.out.print((text != null ? text : "") + "\t");
        }
        System.out.println();
    }
}

This technique works particularly well when reading Excel files in Java for batch operations or when you only need to process rows individually.

Read Excel File from InputStream

In web applications or cloud services, Excel files are often received as streams. Here’s how to read Excel files from an InputStream in Java:

import com.spire.xls.*;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;

public class ReadExcel {
    public static void main(String[] args) throws FileNotFoundException {
        // Create a InputStream
        InputStream stream = new FileInputStream("data.xlsx");
        // Load the Excel file from the stream
        Workbook workbook = new Workbook();
        workbook.loadFromStream(stream);
        System.out.println("Load Excel file successfully.");
    }
}

This is useful when handling file uploads, email attachments, or reading Excel files stored in remote storage.

Read Excel Cell Values in Different Formats

Once you load an Excel file and get access to individual cells, Spire.XLS allows you to read the contents in various formats—formatted text, raw values, formulas, and more.

Here's a breakdown of what each method does:

CellRange cell = sheet.getRange().get(2, 1); // B2

// Formatted text (what user sees in Excel)
String text = cell.getText();

// Raw string value
String value = cell.getValue();

// Generic object (number, boolean, date, etc.)
Object rawValue = cell.getValue2();

// Formula (if exists)
String formula = cell.getFormula();

// Evaluated result of the formula
String result = cell.getEnvalutedValue();

// If it's a number cell
double number = cell.getNumberValue();

// If it's a date cell
java.util.Date date = cell.getDateTimeValue();

// If it's a boolean cell
boolean bool = cell.getBooleanValue();

Tip: Use getValue2() for flexible handling, as it returns the actual underlying object. Use getText() when you want to match Excel's visible content.

You May Also Like: How to Write Data into Excel Files in Java

3. Best Practices for Reading Large Excel Files in Java

When your Excel file contains tens of thousands of rows or multiple sheets, performance can become a concern. To ensure your Java application reads large Excel files efficiently:

Load only required sheets
Access only relevant columns or rows
Avoid storing entire worksheets in memory
Use row-by-row reading patterns

Here’s an efficient pattern for reading only non-empty rows:

for (int i = 1; i <= sheet.getRows().length; i++) {
    Row row = sheet.getRows()[i];
    if (row != null && !row.isBlank()) {
        // Process only rows with data
    }
}

Even though Spire.XLS handles memory efficiently, following these practices helps scale your Java Excel reading logic smoothly.

4. Full Example: Java Program to Read Excel File

Here’s a full working Java example that reads an Excel file (users.xlsx) with extended columns such as name, email, age, department, and status. The code extracts only the original three columns (name, email, and age) and filters the output for users aged 30 or older.

import com.spire.xls.*;

public class ExcelReader {
    public static void main(String[] args) {
        Workbook workbook = new Workbook();
        workbook.loadFromFile("users.xlsx");

        Worksheet sheet = workbook.getWorksheets().get(0);
        System.out.println("Name\tEmail\tAge");

        for (int i = 2; i <= sheet.getLastRow(); i++) {
            String name = sheet.getCellRange(i, 1).getValue();
            String email = sheet.getCellRange(i, 2).getValue();
            String ageText = sheet.getCellRange(i, 3).getValue();

            int age = 0;
            try {
                age = Integer.parseInt(ageText);
            } catch (NumberFormatException e) {
                continue;  // Skip rows with invalid age data
            }

            if (age >= 30) {
                System.out.println(name + "\t" + email + "\t" + age);
            }
        }
    }
}

Result of Java program reading the Excel file and printing its contents. Java program extracting and filtering Excel data based on age

This code demonstrates how to read specific cells from an Excel file in Java and output meaningful tabular data, including applying filters on data such as age.

5. Summary

To summarize, this article showed you how to read Excel files in Java using Spire.XLS, including both .xls and .xlsx formats. You learned how to:

Set up your Java project with Excel-reading capabilities
Read Excel files using Java in row-by-row or stream-based fashion
Handle legacy and modern Excel formats with the same API
Apply best practices when working with large Excel files

Whether you're reading from an uploaded spreadsheet, a static report, or a stream-based file, the examples provided here will help you build robust Excel processing features in your Java applications.

If you want to unlock all limitations and experience the full power of Excel processing, you can apply for a free temporary license.

6. FAQ

Q1: How to read an Excel file dynamically in Java?

To read an Excel file dynamically in Java—especially when the number of rows or columns is unknown—you can use getLastRow() and getLastColumn() methods to determine the data range at runtime. This ensures that your program can adapt to various spreadsheet sizes without hardcoded limits.

Q2: How to extract data from Excel file in Java?

To extract data from Excel files in Java, load the workbook and iterate through the cells using nested loops. You can retrieve values with getCellRange(row, column).getValue(). Libraries like Spire.XLS simplify this process and support both .xls and .xlsx formats.

Q3: How to read a CSV Excel file in Java?

If your Excel data is saved as a CSV file, you can read it using Java’s BufferedReader or file streams. Alternatively, Spire.XLS supports CSV parsing directly—you can load a CSV file by specifying the separator, such as Workbook.loadFromFile("data.csv", ","). This lets you handle CSV files along with Excel formats using the same API.

Q4: How to read Excel file in Java using InputStream?

Reading Excel files from InputStream in Java is useful in server-side applications, such as handling file uploads. With Spire.XLS, simply call workbook.loadFromStream(inputStream) and process it as you would with any file-based Excel workbook.

Published in Document Operation

Tagged under

xls java Operation

How to Read PDFs in Java: Extract Text, Images, and More

2025-06-12 07:07:24 Written by Administrator

read pdf in java

In today's data-driven landscape, reading PDF files effectively is essential for Java developers. Whether you're handling scanned invoices, structured reports, or image-rich documents, the ability to read PDFs in Java can enhance workflows and reveal critical insights.

This guide will walk you through practical implementations using Spire.PDF for Java to master PDF reading in Java. You will learn to extract searchable text, retrieve embedded images, read tabular data, and perform OCR on scanned PDF documents.

Table of Contents:

Java Library for Reading PDF Content
Extract Text from Searchable PDFs
Retrieve Images from PDFs
Read Table Data from PDF Files
Convert Scanned PDFs to Text via OCR
Conclusion
FAQs

Java Library for Reading PDF Content

When it comes to reading PDF in Java, choosing the right library is half the battle. Spire.PDF stands out as a robust, feature-rich solution for developers. It supports text extraction, image retrieval, table parsing, and even OCR integration. Its intuitive API and comprehensive documentation make it ideal for both beginners and experts.

To start extracting PDF content, download Spire.PDF for Java from our website and add it as a dependency in your project. If you’re using Maven, include the following in your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.5.2</version>
    </dependency>
</dependencies>

Below, we’ll explore how to leverage Spire.PDF for various PDF reading tasks.

Extract Text from Searchable PDFs in Java

Searchable PDFs store text in a machine-readable format, allowing for efficient content extraction. The PdfTextExtractor class in Spire.PDF provides a straightforward way to access page content, while PdfTextExtractOptions allows for flexible extraction settings, including options for handling special text layouts and specifying areas for extraction.

Step-by-Step Guide

Initialize a new instance of PdfDocument to work with your PDF file.
Use the loadFromFile method to load the desired PDF document.
Loop through each page of the PDF using a for loop.
For each page, create an instance of PdfTextExtractor to facilitate text extraction.
Create a PdfTextExtractOptions object to specify how text should be extracted, including any special strategies.
Call the extract method on the PdfTextExtractor instance to retrieve the text from the page.
Write the extracted text to a text file.

The example below shows how to retrieve text from every page of a PDF and output it to individual text files.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.texts.PdfTextExtractOptions;
import com.spire.pdf.texts.PdfTextExtractor;
import com.spire.pdf.texts.PdfTextStrategy;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ExtractTextFromSearchablePdf {

    public static void main(String[] args) throws IOException {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF file
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Iterate through all pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {
            // Get the current page
            PdfPageBase page = doc.getPages().get(i);

            // Create a PdfTextExtractor object
            PdfTextExtractor textExtractor = new PdfTextExtractor(page);

            // Create a PdfTextExtractOptions object
            PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();

            // Specify extract option
            extractOptions.setStrategy(PdfTextStrategy.None);

            // Extract text from the page
            String text = textExtractor.extract(extractOptions);

            // Define the output file path
            Path outputPath = Paths.get("output/Extracted_Page_" + (i + 1) + ".txt");

            // Write to a txt file
            Files.write(outputPath, text.getBytes());
        }

        // Close the document
        doc.close();
    }
}

Result:

Input PDF document and output txt file with text extracted from the PDF.

Retrieve Images from PDFs in Java

The PdfImageHelper class in Spire.PDF enables efficient extraction of embedded images from PDF documents. It identifies images using PdfImageInfo objects, allowing for easy saving as standard image files.

Step-by-Step Guide

Initialize a new instance of PdfDocument to work with your PDF file.
Use the loadFromFile method to load the desired PDF.
Instantiate PdfImageHelper to assist with image extraction.
Loop through each page of the PDF.
For each page, retrieve all image information using the getImagesInfo method.
Loop through the retrieved image information, extract each image, and save it as a PNG file.

The following example extracts all embedded images from a PDF document and saves them as individual PNG files.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.utilities.PdfImageHelper;
import com.spire.pdf.utilities.PdfImageInfo;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class ExtractAllImages {

    public static void main(String[] args) throws IOException {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Create a PdfImageHelper object
        PdfImageHelper imageHelper = new PdfImageHelper();

        // Declare an int variable
        int m = 0;

        // Iterate through the pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {

            // Get a specific page
            PdfPageBase page = doc.getPages().get(i);

            // Get all image information from the page
            PdfImageInfo[] imageInfos = imageHelper.getImagesInfo(page);

            // Iterate through the image information
            for (int j = 0; j < imageInfos.length; j++)
            {
                // Get a specific image information
                PdfImageInfo imageInfo = imageInfos[j];

                // Get the image
                BufferedImage image = imageInfo.getImage();
                File file = new File(String.format("output/Image-%d.png",m));
                m++;

                // Save the image file in PNG format
                ImageIO.write(image, "PNG", file);
            }
        }

        // Clear up resources
        doc.dispose();
    }
}

Result:

Input PDF document and image file extracted from the PDF.

Read Table Data from PDF Files in Java

For PDF tables that need conversion to structured data, PdfTableExtractor intelligently recognizes cell boundaries and relationships. The resulting PdfTable objects maintain the original table organization, allowing for cell-level data export.

Step-by-Step Guide

Initialize an instance of PdfDocument to handle your PDF file.
Use the loadFromFile method to open the desired PDF.
Instantiate PdfTableExtractor to facilitate table extraction.
Iterate through each page of the PDF to extract tables.
For each page, retrieve tables into a PdfTable array using the extractTable method.
For each table, iterate through its rows and columns to extract data.
Write the extracted data to individual text files.

This Java code extracts table data from a PDF document and saves each table as a separate text file.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;

import java.io.FileWriter;

public class ExtractTableData {
    public static void main(String[] args) throws Exception {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Create a PdfTableExtractor instance
        PdfTableExtractor extractor = new PdfTableExtractor(doc);

        // Initialize a table counter
        int tableCounter = 1;

        // Loop through the pages in the PDF
        for (int pageIndex = 0; pageIndex < doc.getPages().getCount(); pageIndex++) {

            // Extract tables from the current page into a PdfTable array
            PdfTable[] tableLists = extractor.extractTable(pageIndex);

            // If any tables are found
            if (tableLists != null && tableLists.length > 0) {

                // Loop through the tables in the array
                for (PdfTable table : tableLists) {

                    // Create a StringBuilder for the current table
                    StringBuilder builder = new StringBuilder();

                    // Loop through the rows in the current table
                    for (int i = 0; i < table.getRowCount(); i++) {

                        // Loop through the columns in the current table
                        for (int j = 0; j < table.getColumnCount(); j++) {

                            // Extract data from the current table cell and append to the StringBuilder 
                            String text = table.getText(i, j);
                            builder.append(text).append(" | ");
                        }
                        builder.append("\r\n");
                    }

                    // Write data into a separate .txt document for each table
                    FileWriter fw = new FileWriter("output/Table_" + tableCounter + ".txt");
                    fw.write(builder.toString());
                    fw.flush();
                    fw.close();

                    // Increment the table counter
                    tableCounter++;
                }
            }
        }

        // Clear up resources
        doc.dispose();
    }
}

Result:

Input PDF document and txt file containing table data extracted from the PDF.

Convert Scanned PDFs to Text via OCR

Scanned PDFs require special handling through OCR engine such as Spire.OCR for Java. The solution first converts pages to images using Spire.PDF's rendering engine, then applies Spire.OCR's recognition capabilities via the OcrScanner class. This two-step approach effectively transforms physical document scans into editable text while supporting multiple languages.

Step 1. Install Spire.OCR and Configure the Environment

Download Spire.OCR for Java and add the Jar file as a dependency in your project.
Download the model that fits in with your operating system from one of the following links, and unzip the package somewhere on your disk.
Configure the model in your code.

OcrScanner scanner = new OcrScanner();
configureOptions.setModelPath("D:\\win-x64");// model path

For detailed steps, refer to: Extract Text from Images Using the New Model of Spire.OCR for Java

Step 2. Convert a Scanned PDF to Text

This code example converts each page of a scanned PDF into an image, applies OCR to extract text, and saves the results in a text file.

import com.spire.ocr.OcrException;
import com.spire.ocr.OcrScanner;
import com.spire.ocr.ConfigureOptions;
import com.spire.pdf.PdfDocument;
import com.spire.pdf.graphics.PdfImageType;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ExtractTextFromScannedPdf {

    public static void main(String[] args) throws IOException, OcrException {

        // Create an instance of the OcrScanner class
        OcrScanner scanner = new OcrScanner();

        // Configure the scanner
        ConfigureOptions configureOptions = new ConfigureOptions();
        configureOptions.setModelPath("D:\\win-x64"); // Set model path
        configureOptions.setLanguage("English"); // Set language

        // Apply the configuration options
        scanner.ConfigureDependencies(configureOptions);

        // Load a PDF document
        PdfDocument doc = new PdfDocument();
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Prepare temporary directory
        String tempDirPath = "temp";
        new File(tempDirPath).mkdirs(); // Create temp directory

        StringBuilder allText = new StringBuilder();

        // Iterate through all pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {

            // Convert page to image
            BufferedImage bufferedImage = doc.saveAsImage(i, PdfImageType.Bitmap);
            String imagePath = tempDirPath + File.separator + String.format("page_%d.png", i);
            ImageIO.write(bufferedImage, "PNG", new File(imagePath));

            // Perform OCR
            scanner.scan(imagePath);
            String pageText = scanner.getText().toString();
            allText.append(String.format("\n--- PAGE %d ---\n%s\n", i + 1, pageText));

            // Clean up temp image
            new File(imagePath).delete();
        }

        // Save all extracted text to a file
        Path outputTxtPath = Paths.get("output", "extracted_text.txt");
        Files.write(outputTxtPath, allText.toString().getBytes());

        // Close the document
        doc.close();
        System.out.println("Text extracted to " + outputTxtPath);
    }
}

Conclusion

Mastering how to read PDF in Java opens up a world of possibilities for data extraction and document automation. Whether you’re dealing with searchable text, images, tables, or scanned documents, the right tools and techniques can simplify the process.

By leveraging libraries like Spire.PDF and integrating OCR for scanned files, you can build robust solutions tailored to your needs. Start experimenting with the code snippets provided and unlock the full potential of PDF processing in Java!

FAQs

Q1: Can I extract text from scanned PDFs using Java?

Yes, by combining Spire.PDF with Spire.OCR. Convert PDF pages to images and perform OCR to extract text.

Q2: What’s the best library for reading PDFs in Java?

Spire.PDF is highly recommended for its versatility and ease of use. It supports extraction of text, images, tables, and OCR integration.

Q3: Does Spire.PDF support extraction of PDF elements like metadata, attachments, and hyperlinks?

Yes, Spire.PDF provides comprehensive support for extracting:

Metadata (title, author, keywords)
Attachments (embedded files)
Hyperlinks (URLs and document links)

The library offers dedicated classes like PdfDocumentInformation for metadata and methods to retrieve embedded files ( PdfAttachmentCollection ) and hyperlinks ( PdfUriAnnotation ).

Q4: How to parse tables from PDFs into CSV/Excel programmatically?

Using Spire.PDF for Java, you can extract table data from PDFs, then seamlessly export it to Excel (XLSX) or CSV format with Spire.XLS for Java. For a step-by-step guide, refer to our tutorial: Export Table Data from PDF to Excel in Java.

Get a Free License

To fully experience the capabilities of Spire.PDF for Java without any evaluation limitations, you can request a free 30-day trial license.

Published in Document Operation

Tagged under

pdf java Operation

Generate Word Documents from Templates in Java

2025-05-14 02:43:39 Written by Administrator

In modern software development, generating dynamic Word documents from templates is a common requirement for applications that produce reports, contracts, invoices, or other business documents. Java developers seeking efficient solutions for document automation can leverage Spire.Doc for Java, a robust library for processing Word files without requiring Microsoft Office.

This guide explores how to use Spire.Doc for Java to create Word documents from templates. We will cover two key approaches: replacing text placeholders and modifying bookmark content.

Java Libray for Creating Word Documents
Generate a Word Document by Replacing Text Placeholders
Generate a Word Document by Modifying Bookmark Content
Conclusion
FAQs

Java Library for Generating Word Documents

Spire.Doc for Java is a powerful library that enables developers to create, manipulate, and convert Word documents. It provides an intuitive API that allows for various operations, including the modification of text, images, and bookmarks in existing documents.

To get started, download the library from our official website and import it into your Java project. If you're using Maven, include the following dependency in your pom.xml file:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.1.3</version>
    </dependency>
</dependencies>

Generate a Word Document by Replacing Text Placeholders

This method uses a template document containing marked placeholders (e.g., #name#, #date#) that are dynamically replaced with real data. Spire.Doc's Document.replace() method handles text substitutions efficiently, while additional APIs enable advanced replacements like inserting images at specified locations.

Steps to generate Word documents from templates by replacing text placeholders:

Initialize Document: A new Document object is created to work with the Word file.
Load the template: The template document with placeholders is loaded.
Create replacement mappings: A HashMap is created to store placeholder-replacement pairs.
Perform text replacement: The replace() method finds and replaces all instances of each placeholder.
Handle image insertion: The custom replaceTextWithImage() method replaces a text placeholder with an image.
Save the result: The modified document is saved to a specified path.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.documents.TextSelection;
import com.spire.doc.fields.DocPicture;
import com.spire.doc.fields.TextRange;

import java.util.HashMap;
import java.util.Map;

public class ReplaceTextPlaceholders {

    public static void main(String[] args) {

        // Initialize a new Document object
        Document document = new Document();

        // Load the template Word file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\template.docx");

        // Map to hold text placeholders and their replacements
        Map<String, String> replaceDict = new HashMap<>();
        replaceDict.put("#name#", "John Doe");
        replaceDict.put("#gender#", "Male");
        replaceDict.put("#birthdate#", "January 15, 1990");
        replaceDict.put("#address#", "123 Main Street");
        replaceDict.put("#city#", "Springfield");
        replaceDict.put("#state#", "Illinois");
        replaceDict.put("#postal#", "62701");
        replaceDict.put("#country#", "United States");

        // Replace placeholders in the document with corresponding values
        for (Map.Entry<String, String> entry : replaceDict.entrySet()) {
            document.replace(entry.getKey(), entry.getValue(), true, true);
        }

        // Path to the image file
        String imagePath = "C:\\Users\\Administrator\\Desktop\\portrait.png";

        // Replace the placeholder “#photo#” with an image
        replaceTextWithImage(document, "#photo#", imagePath);

        // Save the modified document
        document.saveToFile("output/ReplacePlaceholders.docx", FileFormat.Docx);

        // Release resources
        document.dispose();
    }

    // Method to replace a placeholder in the document with an image
    static void replaceTextWithImage(Document document, String stringToReplace, String imagePath) {

        // Load the image from the specified path
        DocPicture pic = new DocPicture(document);
        pic.loadImage(imagePath);

        // Find the placeholder in the document
        TextSelection selection = document.findString(stringToReplace, false, true);

        // Get the range of the found text
        TextRange range = selection.getAsOneRange();
        int index = range.getOwnerParagraph().getChildObjects().indexOf(range);

        // Insert the image and remove the placeholder text
        range.getOwnerParagraph().getChildObjects().insert(index, pic);
        range.getOwnerParagraph().getChildObjects().remove(range);
    }
}

Output:

Screenshot of the input template file containing placeholders and the output Word document.

Generate a Word Document by Modifying Bookmark Content

This approach uses Word bookmarks to identify locations in the document where content should be inserted or modified. The BookmarksNavigator class in Spire.Doc streamlines the process by enabling direct access to bookmarks, allowing targeted content replacement while automatically preserving the document's original structure and formatting.

Steps to generate Word documents from templates by modifying bookmark content:

Initialize Document: A new Document object is initialized.
Load the template: The template document with predefined bookmarks is loaded.
Set up replacements: A HashMap is created to map bookmark names to their replacement values.
Navigate to bookmarks: A BookmarksNavigator is instantiated to navigate through bookmarks in the document.
Replace content: The replaceBookmarkContent() method updates the bookmark's content.
Save the result: The modified document is saved to a specified path.

Java

import com.spire.doc.*;
import com.spire.doc.documents.*;
import java.util.HashMap;
import java.util.Map;

public class ModifyBookmarkContent {

    public static void main(String[] args) {

        // Initialize a new Document object
        Document document = new Document();

        // Load the template Word file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\template.docx");

        // Define bookmark names and their replacement values
        Map<String, String> replaceDict = new HashMap<>();
        replaceDict.put("name", "Tech Innovations Inc.");
        replaceDict.put("year", "2015");
        replaceDict.put("headquarter", "San Francisco, California, USA");
        replaceDict.put("history", "Tech Innovations Inc. was founded by a group of engineers and " +
                "entrepreneurs with a vision to revolutionize the technology sector. Starting " +
                "with a focus on software development, the company expanded its portfolio to " +
                "include artificial intelligence and cloud computing solutions.");

        // Create a BookmarksNavigator to manage bookmarks in the document
        BookmarksNavigator bookmarkNavigator = new BookmarksNavigator(document);

        // Iterate through the bookmarks
        for (Map.Entry<String, String> entry : replaceDict.entrySet()) {

            // Navigate to a specific bookmark
            bookmarkNavigator.moveToBookmark(entry.getKey());

            // Replace content
            bookmarkNavigator.replaceBookmarkContent(entry.getValue(), true);
        }

        // Save the modified document
        document.saveToFile("output/ReplaceBookmarkContent.docx", FileFormat.Docx);

        // Release resources
        document.dispose();
    }
}

Output:

Screenshot of the input template file containing bookmarks and the output Word document.

Conclusion

Both methods provide effective ways to generate documents from templates, but they suit different scenarios:

Text Replacement Method is best when:

You need simple text substitutions
You need to insert images at specific locations
You want to replace text anywhere in the document (not just specific locations)

Bookmark Method is preferable when:

You're working with complex documents where precise location matters
You need to replace larger sections of content or paragraphs
You want to preserve bookmarks for future updates

Spire.Doc also offers Mail Merge capabilities, enabling high-volume document generation from templates. This feature excels at producing personalized documents like mass letters or reports by merging template fields with external data sources like databases.

FAQs

Q1: Can I convert the generated Word document to PDF?

A: Yes, Spire.Doc for Java supports converting documents to PDF and other formats. Simply use saveToFile() with FileFormat.PDF.

Q2: How can I handle complex formatting in generated documents?

A: Prepare your template with all required formatting in Word, then use placeholders or bookmarks in locations where dynamic content should appear. The formatting around these markers will be preserved.

Q3: What's the difference between mail merge and text replacement?

A: Mail merge is specifically designed for merging database-like data with documents and supports features like repeating sections for records. Text replacement is simpler but doesn't handle tabular data as elegantly.

Get a Free License

To fully experience the capabilities of Spire.Doc for Java without any evaluation limitations, you can request a free 30-day trial license.

Published in Document Operation

Tagged under

doc java Operation

News Category

Java (481)

Children categories

1. Set Up Your Java Project

Add Spire.XLS to Your Project

2. How to Read XLSX and XLS Files in Java

Load and Read Excel File (XLSX or XLS)

Read Excel File Line by Line with Row Objects

Read Excel File from InputStream

Read Excel Cell Values in Different Formats

3. Best Practices for Reading Large Excel Files in Java

4. Full Example: Java Program to Read Excel File

5. Summary

6. FAQ

Q1: How to read an Excel file dynamically in Java?

Q2: How to extract data from Excel file in Java?

Q3: How to read a CSV Excel file in Java?

Q4: How to read Excel file in Java using InputStream?

Java Library for Reading PDF Content

Extract Text from Searchable PDFs in Java

Retrieve Images from PDFs in Java

Read Table Data from PDF Files in Java

Convert Scanned PDFs to Text via OCR

Step 1. Install Spire.OCR and Configure the Environment

Step 2. Convert a Scanned PDF to Text

Conclusion

FAQs

Q1: Can I extract text from scanned PDFs using Java?

Q2: What’s the best library for reading PDFs in Java?

Q3: Does Spire.PDF support extraction of PDF elements like metadata, attachments, and hyperlinks?

Q4: How to parse tables from PDFs into CSV/Excel programmatically?

Get a Free License

Java Library for Generating Word Documents

Generate a Word Document by Replacing Text Placeholders

Generate a Word Document by Modifying Bookmark Content

Conclusion

FAQs

Q1: Can I convert the generated Word document to PDF?

Q2: How can I handle complex formatting in generated documents?

Q3: What's the difference between mail merge and text replacement?

Get a Free License

More...