Displaying items by tag: pdf java Table

Tuesday, 11 January 2022 09:30

Java: Export Table Data from PDF to Excel

Tables are commonly seen in PDF invoices and financial reports. You may encounter the situation where you need to export PDF table data into Excel, so that you can analyze the data using the tools provided by MS Excel. This article explains how to extract tables from a PDF page and export them as individual Excel worksheets using Spire.Office for Java.

Install Spire.Office for Java

The scenario actually uses Spire.PDF for Java for extracting tables from PDF, and Spire.XLS for Java for generating Excel files. In order to use them in the same project, you’ll need to add the Spire.Office.jar file as a dependency in your Java program.

The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project’s pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.office</artifactId>
        <version>10.10.0</version>
    </dependency>
</dependencies>

Export Table Data from PDF to Excel

The following are the main steps to extract all tables from a certain page and save each of them as an individual worksheet in an Excel document.

Load a sample PDF document while initializing the PdfDocument object.
Create a PdfTableExtractor object, and call extactTable(int pageIndex) method under it to extract all tables in the first page.
Create a Workbook instance.
Loop through the tables in the PdfTable[] array, and get the specific one by its index.
Add a worksheet to the workbook using Workbook.getWorksheets.add() method.
Loop through the cells in the PDF table, and get the value of a specific cell using PdfTable.getText(int rowIndex, int columnIndex) method. Then insert the value to the worksheet using Worksheet.get(int row, int column).setText(String string) method.
Save the workbook to an Excel document using Workbook.saveToFile() method.

Java

import com.spire.pdf.PdfDocument;
import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class ExtractTableDataAndSaveInExcel {

    public static void main(String[] args) {

        //Load a sample PDF document
        PdfDocument pdf = new PdfDocument("C:\\Users\\Administrator\\Desktop\\Tables.pdf");

        //Create a PdfTableExtractor instance
        PdfTableExtractor extractor = new PdfTableExtractor(pdf);
        
        //Extract tables from the first page
        PdfTable[] pdfTables  = extractor.extractTable(0);

        //Create a Workbook object,
        Workbook wb = new Workbook();

        //Remove default worksheets
        wb.getWorksheets().clear();

        //If any tables are found
        if (pdfTables != null && pdfTables.length > 0) {

            //Loop through the tables
            for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) {

                //Add a worksheet to workbook
                String sheetName = String.format("Table - %d", tableNum + 1);
                Worksheet sheet = wb.getWorksheets().add(sheetName);

                //Loop through the rows in the current table
                for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) {

                    //Loop through the columns in the current table
                    for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) {

                        //Extract data from the current table cell
                        String text = pdfTables[tableNum].getText(rowNum, colNum);

                        //Insert data into a specific cell
                        sheet.get(rowNum + 1, colNum + 1).setText(text);

                    }
                }

                //Auto fit column width
                for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) {
                    sheet.autoFitColumn(sheetColNum + 1);
                }
            }
        }

        //Save the workbook to an Excel file
        wb.saveToFile("output/ExportTableToExcel.xlsx", ExcelVersion.Version2016);
    }
}

Java: Export Table Data from PDF to Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Table

Java: Extract Table Data from PDF Document

Table is one of the most commonly used formatting elements in PDF. In some cases, you may need to extract data from PDF tables to perform further analysis. In this article, you will learn how to achieve this task programmatically in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.PDF.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.11.11</version>
    </dependency>
</dependencies>

Extract Table Data from PDF Document

Spire.PDF for Java uses the PdfTableExtractor.extractTable(int pageIndex) method to detect and extract tables from a desired PDF page.

The following are the steps to extract table data from a PDF file:

Load a sample PDF document using PdfDocument class.
Create a StringBuilder instance and a PdfTableExtractor instance.
Loop through the pages in the PDF, extract tables from each page into a PdfTable array using PdfTableExtractor.extractTable(int pageIndex) method.
Loop through the tables in the array.
Loop through the rows and columns in each table, after that extract data from each table cell using PdfTable.getText(int rowIndex, int columnIndex) method, then append the data to the StringBuilder instance using StringBuilder.append() method.
Write the extracted data to a txt document using Writer.write() method.

Java

import com.spire.pdf.PdfDocument;
import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;

import java.io.FileWriter;

public class ExtractTableData {
    public static void main(String []args) throws Exception {

        //Load a sample PDF document
        PdfDocument pdf = new PdfDocument("Sample.pdf");

        //Create a StringBuilder instance
        StringBuilder builder = new StringBuilder();
        //Create a PdfTableExtractor instance
        PdfTableExtractor extractor = new PdfTableExtractor(pdf);

        //Loop through the pages in the PDF
        for (int pageIndex = 0; pageIndex < pdf.getPages().getCount(); pageIndex++) {
            //Extract tables from the current page into a PdfTable array
            PdfTable[] tableLists = extractor.extractTable(pageIndex);
            
            //If any tables are found
            if (tableLists != null && tableLists.length > 0) {
                //Loop through the tables in the array
                for (PdfTable table : tableLists) {
                    //Loop through the rows in the current table
                    for (int i = 0; i < table.getRowCount(); i++) {
                        //Loop through the columns in the current table
                        for (int j = 0; j < table.getColumnCount(); j++) {
                            //Extract data from the current table cell and append to the StringBuilder 
                            String text = table.getText(i, j);
                            builder.append(text + " | ");
                        }
                        builder.append("\r\n");
                    }
                }
            }
        }

        //Write data into a .txt document
        FileWriter fw = new FileWriter("ExtractTable.txt");
        fw.write(builder.toString());
        fw.flush();
        fw.close();
    }
}

The input PDF:

Java: Extract Table Data from PDF Document

The output .txt document with extracted table data:

Java: Extract Table Data from PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Table

Java: Create Tables in a PDF Document

A table represents information or data in the form of horizontal rows and vertical columns. Creating tables is often more efficient than describing the data in the paragraph text, especially when the data is numerical or large. The tabular data presentation makes it easier to read and understand. In this article, you will learn how to create tables in a PDF document in Java using Spire.PDF for Java.

Spire.PDF for Java offers the PdfTable and the PdfGrid class to work with the tables in a PDF document. The PdfTable class is used to quickly create simple, regular tables without too much formatting, while the PdfGrid class is used to create more complex tables.

The table below lists the differences between these two classes.

	PdfTable	PdfGrid
Formatting
Row	Can be set through events. No API support.	Can be set through API.
Column	Can be set through API.	Can be set through API.
Cell	Can be set through events. No API support.	Can be set through API.
Others
Column span	Not support.	Can be set through API.
Row span	Can be set through events. No API support.	Can be set through API.
Nested table	Can be set through events. No API support.	Can be set through API.
Events	BeginCellLayout, EndCellLayout, BeginRowLayout, EndRowLayout, BeginPageLayout, EndPageLayout.	BeginPageLayout, EndPageLayout.

The following sections demonstrate how to create a table in PDF using the PdfTable class and the PdfGrid class, respectively.

Create a Table in PDF Using PdfTable Class
Create a Table in PDF Using PdfGrid Class

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.11.11</version>
    </dependency>
</dependencies>

Create a Table in PDF Using PdfTable Class

The following are the steps to create a table using the PdfTable class using Spire.PDF for Java.

Create a PdfDocument object.
Add a page to it using PdfDocument.getPages().add() method.
Create a Pdftable object.
Set the table style using the methods under PdfTableStyle object which is returned by PdfTable.getTableStyle() method.
Insert data to table using PdfTable.setDataSource() method.
Set row height and row color through BeginRowLayout event.
Draw table on the PDF page using PdfTable.draw() method.
Save the document to a PDF file using PdfDocument.saveToFile() method.

Java

import com.spire.data.table.DataTable;
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.PdfPageSize;
import com.spire.pdf.graphics.*;
import com.spire.pdf.tables.*;

import java.awt.*;
import java.awt.geom.Point2D;

public class CreateTable {

    public static void main(String[] args) {

        //Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //Add a page
        PdfPageBase page = doc.getPages().add(PdfPageSize.A4, new PdfMargins(40));

        //Create a PdfTable object
        PdfTable table = new PdfTable();

        //Set font for header and the rest cells
        table.getStyle().getDefaultStyle().setFont(new PdfTrueTypeFont(new Font("Times New Roman", Font.PLAIN, 12), true));
        table.getStyle().getHeaderStyle().setFont(new PdfTrueTypeFont(new Font("Times New Roman", Font.BOLD, 12), true));

        //Define data
        String[] data = {"ID;Name;Department;Position;Level",
                "1; David; IT; Manager; 1",
                "3; Julia; HR; Manager; 1",
                "4; Sophie; Marketing; Manager; 1",
                "7; Wickey; Marketing; Sales Rep; 2",
                "9; Wayne; HR; HR Supervisor; 2",
                "11; Mia; Dev; Developer; 2"};
        String[][] dataSource = new String[data.length][];
        for (int i = 0; i < data.length; i++) {
            dataSource[i] = data[i].split("[;]", -1);
        }

        //Set data as the table data
        table.setDataSource(dataSource);

        //Set the first row as header row
        table.getStyle().setHeaderSource(PdfHeaderSource.Rows);
        table.getStyle().setHeaderRowCount(1);

        //Show header(the header is hidden by default)
        table.getStyle().setShowHeader(true);

        //Set font color and background color of header row
        table.getStyle().getHeaderStyle().setBackgroundBrush(PdfBrushes.getGray());
        table.getStyle().getHeaderStyle().setTextBrush(PdfBrushes.getWhite());

        //Set text alignment in header row
        table.getStyle().getHeaderStyle().setStringFormat(new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle));

        //Set text alignment in other cells
        for (int i = 0; i < table.getColumns().getCount(); i++) {
            table.getColumns().get(i).setStringFormat(new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle));
        }

        //Register with BeginRowLayout event
        table.beginRowLayout.add(new BeginRowLayoutEventHandler() {

            public void invoke(Object sender, BeginRowLayoutEventArgs args) {
                Table_BeginRowLayout(sender, args);

            }
        });

        //Draw table on the page
        table.draw(page, new Point2D.Float(0, 30));

        //Save the document to a PDF file
        doc.saveToFile("output/PdfTable.pdf");

    }

    //Event handler
    private static void Table_BeginRowLayout(Object sender, BeginRowLayoutEventArgs args) {

        //Set row height
        args.setMinimalHeight(20f);

        //Alternate color of rows except the header row
        if (args.getRowIndex() == 0) {
            return;
        }
        if (args.getRowIndex() % 2 == 0) {
            args.getCellStyle().setBackgroundBrush(PdfBrushes.getLightGray());
        } else {
            args.getCellStyle().setBackgroundBrush(PdfBrushes.getWhite());
        }
    }
}

Java: Create Tables in a PDF Document

Create a Table in PDF Using PdfGrid Class

Below are the steps to create a table in PDF using the PdfGrid class using Spire.PDF for Java.

Create a PdfDocument object.
Add a page to it using PdfDocument.getPages().add() method.
Create a PdfGrid object.
Set the table style using the methods under the PdfGridStyle object which is returned by PdfGrid.getStyle() method.
Add rows and columns to the table using PdfGrid.getRows().add() method and PdfGrid.getColumns().add() method.
Insert data to specific cells using PdfGridCell.setValue() method.
Span cells across columns or rows using PdfGridCell.setRowSpan() method or PdfGridCell.setColumnSpan() method.
Set the formatting of a specific cell using PdfGridCell.setStringFormat() method and the methods under PdfGridCellStyle object.
Draw table on the PDF page using PdfGrid.draw() method.
Save the document to a PDF file using PdfDocument.saveToFile() method.

Java

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;
import com.spire.pdf.grid.PdfGrid;
import com.spire.pdf.grid.PdfGridRow;

import java.awt.*;
import java.awt.geom.Point2D;

public class CreateGrid {

    public static void main(String[] args) {

        //Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //Add a page
        PdfPageBase page = doc.getPages().add(PdfPageSize.A4,new PdfMargins(40));

        //Create a PdfGrid
        PdfGrid grid = new PdfGrid();

        //Set cell padding
        grid.getStyle().setCellPadding(new PdfPaddings(1, 1, 1, 1));

        //Set font
        grid.getStyle().setFont(new PdfTrueTypeFont(new Font("Times New Roman", Font.PLAIN, 13), true));

        //Add rows and columns
        PdfGridRow row1 = grid.getRows().add();
        PdfGridRow row2 = grid.getRows().add();
        PdfGridRow row3 = grid.getRows().add();
        PdfGridRow row4 = grid.getRows().add();
        grid.getColumns().add(4);

        //Set column width
        for (int i = 0; i < grid.getColumns().getCount(); i++) {

            grid.getColumns().get(i).setWidth(120);
        }

        //Write data into specific cells
        row1.getCells().get(0).setValue("Order and Payment Status");
        row2.getCells().get(0).setValue("Order number");
        row2.getCells().get(1).setValue("Date");
        row2.getCells().get(2).setValue ("Customer");
        row2.getCells().get(3).setValue("Paid or not");
        row3.getCells().get(0).setValue("00223");
        row3.getCells().get(1).setValue("2022/06/02");
        row3.getCells().get(2).setValue("Brick Lane Realty");
        row3.getCells().get(3).setValue("Yes");
        row4.getCells().get(0).setValue("00224");
        row4.getCells().get(1).setValue("2022/06/03");
        row4.getCells().get(3).setValue("No");

        //Span cell across columns
        row1.getCells().get(0).setColumnSpan(4);

        //Span cell across rows
        row3.getCells().get(2).setRowSpan(2);

        //Set text alignment of specific cells
        row1.getCells().get(0).setStringFormat(new PdfStringFormat(PdfTextAlignment.Center));
        row3.getCells().get(2).setStringFormat(new PdfStringFormat(PdfTextAlignment.Left, PdfVerticalAlignment.Middle));

        //Set background color of specific cells
        row1.getCells().get(0).getStyle().setBackgroundBrush(PdfBrushes.getOrange());
        row4.getCells().get(3).getStyle().setBackgroundBrush(PdfBrushes.getLightGray());

        //Format cell border
        PdfBorders borders = new PdfBorders();
        borders.setAll(new PdfPen(new PdfRGBColor(Color.ORANGE), 0.8f));
        for (int i = 0; i < grid.getRows().getCapacity(); i++) {

            PdfGridRow gridRow = grid.getRows().get(i);
            gridRow.setHeight(20f);
            for (int j = 0; j < gridRow.getCells().getCount(); j++) {
                gridRow.getCells().get(j).getStyle().setBorders(borders);
            }

        }

        //Draw table on the page
        grid.draw(page, new Point2D.Float(0, 30));

        //Save the document to a PDF file
        doc.saveToFile("output/PdfGrid.pdf");
    }
}

Java: Create Tables in a PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Table