page 108

Java: Get Annotations from PDF

2021-10-21 07:49:04 Written by Koohji

PDF Annotations are additional objects added to a PDF document. Sometimes you may need to extract these additional data from the PDF file so as to learn about the annotation details without opening the document. In this article, we will describe how to get the annotations from PDF in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you need to add the Spire.PDF.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.12.16</version>
    </dependency>
</dependencies>

Get Annotations from a PDF File

Spire.PDF for Java offers PdfPageBase.getAnnotationsWidget() method to get the annotation collection of the specified page of the document.

The following are the steps to get all the annotations from the first page of PDF file:

  • Create an object of PdfDocument class.
  • Load a sample PDF document using PdfDocument.loadFromFile() method.
  • Create a StringBuilder object.
  • Get the annotation collection of the first page of the document by using PdfPageBase.getAnnotationsWidget() method.
  • Loop through the pop-up annotations, after extract data from each annotation using PdfAnnotation.getText()method, then append the data to the StringBuilder instance using StringBuilder.append() method.
  • Write the extracted data to a txt document using Writer.write() method.
  • Java
import com.spire.pdf.*;
import com.spire.pdf.annotations.*;

import java.io.FileWriter;

public class Test {
    public static void main(String[] args) throws Exception {
        //Create an object of PdfDocument class.
        PdfDocument pdf = new PdfDocument();
        //Load the sample PDF document
        pdf.loadFromFile("Annotations.pdf");

        //Get the annotation collection of the first page of the document.
        PdfAnnotationCollection annotations = pdf.getPages().get(0).getAnnotationsWidget();

        //Create a StringBuilder object
        StringBuilder content = new StringBuilder();

        //Traverse all the annotations
        for (int i = 0; i < annotations.getCount(); i++) {

            //If it is the pop-up annotations, continue
              if (annotations.get(i) instanceof PdfPopupAnnotationWidget)
              continue;
              
                //Get the annotations’ author
                content.append("Annotation Author: " + annotations.get(i).getAuthor()+"\n");

                //Get the annotations’ text
                content.append("Annotation Text: " + annotations.get(i).getText()+"\n");

                //Get the annotations’ modified date
                String modifiedDate = annotations.get(i).getModifiedDate().toString();
                content.append("Annotation ModifiedDate: " + modifiedDate+"\n");

                //Get the annotations’ name
                content.append("Annotation Name: " + annotations.get(i).getName()+"\n");

                //Get the annotations’ location
                content.append ("Annotation Location: " + annotations.get(i).getLocation()+"\n");
                }
        
        //Write to a .txt file
        FileWriter fw = new FileWriter("GetAnnotations.txt");
        fw.write(content.toString());
        fw.flush();
        fw.close();
        }
    }

Java: Get Annotations from PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Stamps can guarantee the authenticity and validity of a document and also make the document look more professional. Since Microsoft Word doesn't provide a built-in stamp feature, you can add an image to your Word documents to mimic the stamp effect. This is useful when the document will be printed to paper or PDF. In this article, you will learn how to add a "stamp" to a Word document using Spire.Doc for Java.

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.1.3</version>
    </dependency>
</dependencies>

Add an Image Stamp to Word Document

Spire.Doc for Java allow developers to use the core classes and method listed in the below table to add and format an image to make it look like a stamp in the Word document.

Name Description
DocPicture Class Represents a picture in a Word document.
Paragraph.appendPicture() Method Appends an image to end of paragraph.
DocPicture.setHorizontalPosition() Method Sets absolute horizontal position of the picture.
DocPicture.setVerticalPosition() Method Sets absolute vertical position of the picture.
DocPicture.setWidth() Method Sets picture width.
DocPicture.setHeight Method Sets picture height.
DocPicture.setTextWrappingStyle() Method Sets text wrapping type of the picture.

The detailed steps are as follows:

  • Create a Document instance.
  • Load a Word document using Document.loadFromFile() method.
  • Get the specific paragraph using ParagraphCollection.get() method.
  • Add an image to the Word document using Paragraph.appendPicture() method.
  • Set position, size and wrapping style of the image using the methods offered by DocPicture class.
  • Save the document to another file using Document.saveToFile() method.
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.documents.TextWrappingStyle;
import com.spire.doc.fields.DocPicture;

public class AddStamp {
    public static void main(String[] args) {
        //Create a Document instance
        Document doc = new Document();

        //Load a Word document
        doc.loadFromFile("test.docx");

        //Get the specific paragraph
        Section section = doc.getSections().get(0);
        Paragraph paragraph = section.getParagraphs().get(4);

        //Add an image 
        DocPicture picture = paragraph.appendPicture("cert.png");

        //Set the position of the image
        picture.setHorizontalPosition(240f);
        picture.setVerticalPosition(120f);

        //Set width and height of the image
        picture.setWidth(150);
        picture.setHeight(150);

        //Set wrapping style of the image to In_Front_Of_Text, so that it looks like a stamp
        picture.setTextWrappingStyle(TextWrappingStyle.In_Front_Of_Text);

        //Save the document to file
        doc.saveToFile("AddStamp.docx", FileFormat.Docx);
        doc.dispose();
    }
}

Java: Add an Image Stamp to a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Table is one of the most commonly used formatting elements in PDF. In some cases, you may need to extract data from PDF tables to perform further analysis. In this article, you will learn how to achieve this task programmatically in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.PDF.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.12.16</version>
    </dependency>
</dependencies>

Extract Table Data from PDF Document

Spire.PDF for Java uses the PdfTableExtractor.extractTable(int pageIndex) method to detect and extract tables from a desired PDF page.

The following are the steps to extract table data from a PDF file:

  • Load a sample PDF document using PdfDocument class.
  • Create a StringBuilder instance and a PdfTableExtractor instance.
  • Loop through the pages in the PDF, extract tables from each page into a PdfTable array using PdfTableExtractor.extractTable(int pageIndex) method.
  • Loop through the tables in the array.
  • Loop through the rows and columns in each table, after that extract data from each table cell using PdfTable.getText(int rowIndex, int columnIndex) method, then append the data to the StringBuilder instance using StringBuilder.append() method.
  • Write the extracted data to a txt document using Writer.write() method.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;

import java.io.FileWriter;

public class ExtractTableData {
    public static void main(String []args) throws Exception {

        //Load a sample PDF document
        PdfDocument pdf = new PdfDocument("Sample.pdf");

        //Create a StringBuilder instance
        StringBuilder builder = new StringBuilder();
        //Create a PdfTableExtractor instance
        PdfTableExtractor extractor = new PdfTableExtractor(pdf);

        //Loop through the pages in the PDF
        for (int pageIndex = 0; pageIndex < pdf.getPages().getCount(); pageIndex++) {
            //Extract tables from the current page into a PdfTable array
            PdfTable[] tableLists = extractor.extractTable(pageIndex);
            
            //If any tables are found
            if (tableLists != null && tableLists.length > 0) {
                //Loop through the tables in the array
                for (PdfTable table : tableLists) {
                    //Loop through the rows in the current table
                    for (int i = 0; i < table.getRowCount(); i++) {
                        //Loop through the columns in the current table
                        for (int j = 0; j < table.getColumnCount(); j++) {
                            //Extract data from the current table cell and append to the StringBuilder 
                            String text = table.getText(i, j);
                            builder.append(text + " | ");
                        }
                        builder.append("\r\n");
                    }
                }
            }
        }

        //Write data into a .txt document
        FileWriter fw = new FileWriter("ExtractTable.txt");
        fw.write(builder.toString());
        fw.flush();
        fw.close();
    }
}

The input PDF:

Java: Extract Table Data from PDF Document

The output .txt document with extracted table data:

Java: Extract Table Data from PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 108