Knowledgebase (2300)
Bookmarks in PDF function as interactive table of contents, allowing users to quickly jump to specific sections within the document. Extracting these bookmarks not only provides a comprehensive overview of the document's structure, but also reveals its core parts or key information, providing users with a streamlined and intuitive method of accessing content. In this article, you will learn how to extract PDF bookmarks in Java using Spire.PDF for Java.
Install Spire.PDF for Java
First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>11.11.11</version>
</dependency>
</dependencies>
Extract Bookmarks from PDF in Java
With Spire.PDF for Java, you can create custom methods GetBookmarks() and GetChildBookmark() to get the title and text styles of both parent and child bookmarks in a PDF file, then export them to a TXT file. The following are the detailed steps.
- Create a PdfDocument instance.
- Load a PDF file using PdfDocument.loadFromFile() method.
- Get bookmarks collection in the PDF file using PdfDocument.getBookmarks() method.
- Call custom methods GetBookmarks() and GetChildBookmark() to get the text content and text style of parent and child bookmarks.
- Export the extracted PDF bookmarks to a TXT file.
- Java
import com.spire.pdf.*;
import com.spire.pdf.bookmarks.*;
import java.io.*;
public class getAllPdfBookmarks {
public static void main(String[] args) throws IOException{
//Create a PdfDocument instance
PdfDocument pdf = new PdfDocument();
//Load a PDF file
pdf.loadFromFile("AnnualReport.pdf");
//Get bookmarks collections of the PDF file
PdfBookmarkCollection bookmarks = pdf.getBookmarks();
//Get the contents of bookmarks and save them to a TXT file
GetBookmarks(bookmarks, "GetPdfBookmarks.txt");
}
private static void GetBookmarks(PdfBookmarkCollection bookmarks, String result) throws IOException {
//create a StringBuilder instance
StringBuilder content = new StringBuilder();
//Get parent bookmarks information
if (bookmarks.getCount() > 0) {
content.append("Pdf bookmarks:");
for (int i = 0; i < bookmarks.getCount(); i++) {
PdfBookmark parentBookmark = bookmarks.get(i);
content.append(parentBookmark.getTitle() + "\r\n");
//Get the text style
String textStyle = parentBookmark.getDisplayStyle().toString();
content.append(textStyle + "\r\n");
GetChildBookmark(parentBookmark, content);
}
}
writeStringToTxt(content.toString(),result);
}
private static void GetChildBookmark(PdfBookmark parentBookmark, StringBuilder content)
{
//Get child bookmarks information
if (parentBookmark.getCount() > 0)
{
content.append("Pdf bookmarks:" + "\r\n");
for (int i = 0; i < parentBookmark.getCount(); i++)
{
PdfBookmark childBookmark = parentBookmark.get(i);
content.append(childBookmark.getTitle() +"\r\n");
//Get the text style
String textStyle = childBookmark.getDisplayStyle().toString();
content.append(textStyle +"\r\n");
GetChildBookmark(childBookmark, content);
}
}
}
public static void writeStringToTxt(String content, String txtFileName) throws IOException {
FileWriter fWriter = new FileWriter(txtFileName, true);
try {
fWriter.write(content);
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
fWriter.flush();
fWriter.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Excel files often contain a wealth of comments that can provide valuable context and insights. These comments may include important text notes, instructions, or even embedded images that can be incredibly useful for various data analysis and reporting tasks. Extracting this information from the comments can be a valuable step in unlocking the full potential of the data. In this article, we will demonstrate how to effectively extract text and images from comments in Excel files in Python using Spire.XLS for Python.
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Extract Text from Comments in Excel in Python
You can get the text of comments using the ExcelCommentObject.Text property. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using Workbook.LoadFromFile() method.
- Create a list to store the extracted comment text.
- Get the comments in the worksheet using Worksheet.Comments property.
- Traverse through the comments.
- Get the text of each comment using ExcelCommentObject.Text property and append it to the list.
- Save the content of the list to a text file.
- Python
from spire.xls import *
from spire.xls.common import *
# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("Comments.xlsx")
# Get the first worksheet
worksheet = workbook.Worksheets[0]
# Create a list to store the comment text
comment_text = []
# Get all the comments in the worksheet
comments = worksheet.Comments
# Extract the text from each comment and add it to the list
for i, comment in enumerate(comments, start=1):
comment_text.append(f"Comment {i}:")
text = comment.Text
comment_text.append(text)
comment_text.append("")
# Write the comment text to a file
with open("comments.txt", "w", encoding="utf-8") as file:
file.write("\n".join(comment_text))

Extract Images from Comments in Excel in Python
To get the images embedded in Excel comments, you can use the ExcelCommentObject.Fill.Picture property. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using Workbook.LoadFromFile() method.
- Get a specific comment in the worksheet using Worksheet.Comments[index] property.
- Get the embedded image in the comment using ExcelCommentObject.Fill.Picture property.
- Save the image to an image file.
- Python
from spire.xls import *
from spire.xls.common import *
# Create a Workbook object
workbook = Workbook()
# Load an Excel file
workbook.LoadFromFile("ImageComment.xlsx")
# Get the first worksheet
worksheet = workbook.Worksheets[0]
# Get a specific comment in the worksheet
comment = worksheet.Comments[0]
# Extract the image from the comment and save it to an image file
image = comment.Fill.Picture
image.Save("CommentImage/Comment.png")

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Excel has been a widely used tool for data organization and analysis for many years. Over time, Microsoft has introduced different file formats for storing Excel data, the most common being the older XLS format and the more modern XLSX format.
The XLS format, introduced in the late 1990s, had certain limitations, such as a file size limit of 65,536 rows and 256 columns, and a maximum of 65,000 unique styles. The XLSX format, introduced in 2007, addressed these limitations by allowing for larger file sizes, more rows and columns, and expanded style capabilities. While XLSX is now the standard format, there are still many existing XLS files that need to be accessed and used, which makes the ability to convert between these formats an essential skill. In this article, we will explain how to convert Excel XLS to XLSX and vice versa in Python using Spire.XLS for Python.
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Convert XLSX to XLS in Python
To convert an XLSX file to XLS format, you can use the Workbook.SaveToFile(fileName, ExcelVersion.Version97to2003) method. The ExcelVersion.Version97to2003 parameter specifies that the workbook should be saved in the Excel 97-2003 (XLS) format. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an XLSX file using the Workbook.LoadFromFile() method.
- Save the XLSX file to XLS format using the Workbook.SaveToFile(fileName, ExcelVersion.Version97to2003) method.
- Python
from spire.xls import * from spire.xls.common import * # Specify the input and output file paths inputFile = "Sample1.xlsx" outputFile = "XlsxToXls.xls" # Create a Workbook object workbook = Workbook() # Load the XLSX file workbook.LoadFromFile(inputFile) # Save the XLSX file to XLS format workbook.SaveToFile(outputFile, ExcelVersion.Version97to2003) workbook.Dispose()

Convert XLS to XLSX in Python
To convert an XLS file to XLSX format, you need to specify the target Excel version to a version higher than 97-2003, such as 2007 (ExcelVersion.Version2007), 2010 (ExcelVersion.Version2010), 2013 (ExcelVersion.Version2013), or 2016 (ExcelVersion.Version2016). The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an XLS file using the Workbook.LoadFromFile() method.
- Save the XLS file to an Excel 2016 (XLSX) file using the Workbook.SaveToFile(fileName, ExcelVersion.Version2016) method.
- Python
from spire.xls import * from spire.xls.common import * # Specify the input and output file paths inputFile = "Sample2.xls" outputFile = "XlsToXlsx.xlsx" # Create a Workbook object workbook = Workbook() # Load the XLS file workbook.LoadFromFile(inputFile) # Save the XLS file to XLSX format workbook.SaveToFile(outputFile, ExcelVersion.Version2016) workbook.Dispose()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.