Thursday, 22 August 2024 01:04

Python: Add Annotations to PDF Documents

Adding annotations to PDFs is a common practice for adding comments, highlighting text, drawing shapes, and more. This feature is beneficial for collaborative document review, education, and professional presentations. It allows users to mark up documents digitally, enhancing communication and productivity.

In this article, you will learn how to add various types of annotations to a PDF document in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Add a Text Markup Annotation to PDF in Python

Text markup in PDF refers to the ability to emphasize important text by selecting and highlighting it. To add a text markup annotation to a PDF, you first need to locate the specific text within the document with the help of the PdfTextFinder class. Once the text is identified, you can create a PdfTextMarkupAnnotation object and apply it to the document.

The following are the steps to add a text markup annotation to PDF using Python:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfTextMarkupAnnotation object based on the text found.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("However, we cannot and do not guarantee that these measures will prevent "+
                        "every unauthorized attempt to access, use, or disclose your information since "+
                         "despite our efforts, no Internet and/or other electronic transmissions can be completely secure.");

# Get the first instance
textFragment = fragments[0]

# Specify annotation text
text = "Here is a markup annotation."

# Iterate through the text bounds
for i in range(len(textFragment.Bounds)):
    
    # Get a specific bound
    rect = textFragment.Bounds[i]

    # Create a text markup annotation
    annotation = PdfTextMarkupAnnotation("Administrator", text, rect)

    # Set the markup color
    annotation.TextMarkupColor = PdfRGBColor(Color.get_Green())

    # Add the annotation to the collection of the annotations
    page.AnnotationsWidget.Add(annotation)

# Save result to file
doc.SaveToFile("output/MarkupAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a Free Text Annotation to PDF in Python

Free text annotations allow adding freeform text comments directly on a PDF. To add a free text annotation at a specific location, you can use the PdfTextFinder class to obtain the coordinate information of the searched text, then create a PdfFreeTextAnnotation object based on that coordinates and add it to the document.

The following are the steps to add a free text annotation to PDF using Python:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfFreeTextAnnotation object based on the coordinate information of the text found.
  • Set the annotation content using PdfFreeTextAnnotation.Text property.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("Children");

# Get the first instance
textFragment = fragments[0]

# Get the text bound
rect = textFragment.Bounds[0]

# Get the coordinates to add annotation
right = bound.Right
top = bound.Top

# Create a free text annotation
rectangle = RectangleF(right + 5, top + 2, 160.0, 18.0)
textAnnotation = PdfFreeTextAnnotation(rectangle)

# Set the content of the annotation
textAnnotation.Text = "Here is a free text annotation."

# Set other properties of annotation
font = PdfFont(PdfFontFamily.TimesRoman, 13.0, PdfFontStyle.Regular)
border = PdfAnnotationBorder(1.0)
textAnnotation.Font = font
textAnnotation.Border = border
textAnnotation.BorderColor = PdfRGBColor(Color.get_SkyBlue())
textAnnotation.Color = PdfRGBColor(Color.get_LightBlue())
textAnnotation.Opacity = 1.0

# Add the annotation to the collection of the annotations
page.AnnotationsWidget.Add(textAnnotation)

# Save result to file
doc.SaveToFile("output/FreeTextAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a Popup Annotation to PDF in Python

A popup annotation allows for the display of additional information or content in a pop-up window. To get a specific position to add a popup annotation, you can still use the PdfTextFinder class. Once the coordinate information is obtained, you can create a PdfPopupAnnotation object and add it to the document.

The steps to add a popup annotation to PDF using Python are as follows:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfPopupAnnotation object based on the coordinate information of the text found.
  • Set the annotation content using PdfPopupAnnotation.Text property.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("Children");

# Get the first instance
textFragment = fragments[0]

# Get the text bound
bound = textFragment.Bounds[0]

# Get the coordinates to add annotation
right = bound.Right
top = bound.Top

# Create a free text annotation
rectangle = RectangleF(right + 5, top, 30.0, 30.0)
popupAnnotation = PdfPopupAnnotation(rectangle)

# Set the content of the annotation
popupAnnotation.Text = "Here is a popup annotation."

# Set the icon and color of the annotation
popupAnnotation.Icon = PdfPopupIcon.Comment
popupAnnotation.Color = PdfRGBColor(Color.get_Red())

# Add the annotation to the collection of the annotations
page.AnnotationsWidget.Add(popupAnnotation)

# Save result to file
doc.SaveToFile("output/PopupAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a Stamp Annotation to PDF in Python

Stamp annotations in PDF documents are a type of annotation that allow users to add custom "stamps" or symbols to a PDF file. To define the appearance of a stamp annotation, use the PdfTemplate class. Then, create a PdfRubberStampAnnotation object and apply the previously defined template as its appearance. Finally, add the annotation to the PDF document.

The steps to add a stamp annotation to PDF using Python are as follows:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Create a PdfTemplate object and draw an image on the template.
  • Create a PdfRubberStampAnnotation object and apply the template as its appearance.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Load an image file
image = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\confidential.png")

# Get the width and height of the image
width = (float)(image.Width)
height = (float)(image.Height)

# Create a PdfTemplate object based on the size of the image
template = PdfTemplate(width, height, True)

# Draw image on the template
template.Graphics.DrawImage(image, 0.0, 0.0, width, height)

# Create a rubber stamp annotation, specifying its location and position
rect = RectangleF((float) (page.ActualSize.Width/2 - width/2), 90.0, width, height)
stamp = PdfRubberStampAnnotation(rect)

# Create a PdfAppearance object
pdfAppearance = PdfAppearance(stamp)

# Set the template as the normal state of the appearance
pdfAppearance.Normal = template

# Apply the appearance to the stamp
stamp.Appearance = pdfAppearance

# Add the stamp annotation to PDF
page.AnnotationsWidget.Add(stamp)

# Save the file
doc.SaveToFile("output/StampAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a Shape Annotation to PDF in Python

Shape annotations in PDF documents are a type of annotation that allow users to add various geometric shapes to the PDF file. Spire.PDF for Python offers the classes such as PdfPolyLineAnnotation, PdfLineAnnotation, and PdfPolygonAnnotation, allowing developers to add different types of shape annotations to PDF.

The steps to add a shape annotation to PDF using Python are as follows:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfPolyLineAnnotation object based on the coordinate information of the text found.
  • Set the annotation content using PdfPolyLineAnnotation.Text property.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("Children");

# Get the first instance
textFragment = fragments[0]

# Get the text bound
bound = textFragment.Bounds[0]

# Get the coordinates to add annotation
left = bound.Left
top = bound.Top
right = bound.Right
bottom = bound.Bottom

# Create a shape annotation
polyLineAnnotation = PdfPolyLineAnnotation(page, [PointF(left, top), PointF(right, top), PointF(right - 5, bottom), PointF(left - 5, bottom), PointF(left, top)])

# Set the annotation text
polyLineAnnotation.Text = "Here is a shape annotation."

# Add the annotation to the collection of the annotations
page.AnnotationsWidget.Add(polyLineAnnotation)

# Save result to file
doc.SaveToFile("output/ShapeAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a Web Link Annotation to PDF in Python

Web link annotations in PDF documents enable users to embed clickable links to webpages, providing easy access to additional resources or related information online. You can use the PdfTextFinder class to find the specified text in a PDF document, and then create a PdfUriAnnotation object based on that text.

The following are the steps to add a web link annotation to PDF using Python:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfUriAnnotation object based on the bound of the text.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("Children");

# Get the first instance
textFragment = fragments[0]

# Get the text bound
bound = textFragment.Bounds[0]

# Create a Url annotation
urlAnnotation = PdfUriAnnotation(bound, "https://www.e-iceblue.com/");

# Add the annotation to the collection of the annotations
page.AnnotationsWidget.Add(urlAnnotation)

# Save result to file
doc.SaveToFile("output/WebLinkAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a File Link Annotation to PDF in Python

A file link annotation in a PDF document is a type of annotation that allows users to create a clickable link to an external file, such as another PDF, image, or document. Still, you can use the PdfTextFinder class to find the specified text in a PDF document, and then create a PdfFileLinkAnnotation object based on that text.

The following are the steps to add a file link annotation to PDF using Python:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfFileLinkAnnotation object based on the bound of the text.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("Children");

# Get the first instance
textFragment = fragments[0]

# Get the text bound
bound = textFragment.Bounds[0]

# Create a file link annotation
fileLinkAnnotation = PdfFileLinkAnnotation(bound, "C:\\Users\\Administrator\\Desktop\\Report.docx")

# Add the annotation to the collection of the annotations
page.AnnotationsWidget.Add(fileLinkAnnotation)

# Save result to file
doc.SaveToFile("output/FileLinkAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Add a Document Link Annotation to PDF in Python

A document link annotation in a PDF document is a type of annotation that creates a clickable link to a specific location within the same PDF file. To create a document link annotation, you can first use the PdfTextFinder class to identify the target text in the PDF document. Then, a PdfDocumentLinkAnnotation object is instantiated, and its destination property is configured to redirect the user to the specified location in the PDF.

The steps to add a document link annotation to PDF using Python are as follows:

  • Create a PdfDocument object.
  • Load a PDF file from the specified location.
  • Get a page from the document.
  • Find a specific piece of text within the page using PdfTextFinder class.
  • Create a PdfDocumentLinkAnnotation object based on the bound of the text.
  • Set the destination of the annotation using PdfDocumentLinkAnnotation.Destination property.
  • Add the annotation to the page using PdfPageBase.AnnotationsWidget.Add() method.
  • Save the modified document to a different PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object based on the page
finder = PdfTextFinder(page)

# Set the find options
finder.Options.Parameter = TextFindParameter.WholeWord
finder.Options.Parameter = TextFindParameter.IgnoreCase

# Find the instances of the specified text
fragments = finder.Find("Children");

# Get the first instance
textFragment = fragments[0]

# Get the text bound
bound = textFragment.Bounds[0]

# Create a document link annotation
documentLinkAnnotation = PdfDocumentLinkAnnotation(bound)
     
# Set the destination of the annotation
documentLinkAnnotation.Destination = PdfDestination(doc.Pages[1]);

# Add the annotation to the collection of the annotations
page.AnnotationsWidget.Add(documentLinkAnnotation)

# Save result to file
doc.SaveToFile("output/DocumentLinkAnnotation.pdf")

# Dispose resources
doc.Dispose()

Python: Add Annotations to PDF Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Setting the column width in Word tables is crucial for optimizing document readability and aesthetics. Appropriate column widths prevent long lines of text from hindering readability, particularly in text-dense tables. Word offers two approaches: percentages and fixed values. Setting column widths using percentage values allows tables to adapt to different screen sizes, keeping content neatly aligned and enhancing the reading experience. Using fixed values, on the other hand, precisely controls the structure of the table, ensuring consistency and professionalism, making it suitable for designs with strict data alignment requirements or complex layouts. This article will introduce how to set Word table column width based on percentage or fixed value settings using Spire.Doc for .NET in C# projects.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Set Column Width Based on Percentage in C#

When setting column widths in a Word table using percentage values, you first need to set the table's preferred width type to percentage. This is done with Table.PreferredWidth = new PreferredWidth(WidthType.Percentage, (short)100). Then, iterate through each column and set the width to the same or different percentage values as required. Here are the detailed steps:

  • Create a Document object.
  • Load a document using the Document.LoadFromFile() method.
  • Retrieve the first section of the document using Document.Sections[0].
  • Get the first table within the section using Section.Tables[0].
  • Use a for loop to iterate through all rows in the table.
  • Set the column width for cells in different columns to percentage values using the TableRow.Cells[index].SetCellWidth(value, CellWidthType.Percentage) method.
  • Save the changes to the Word document using the Document.SaveToFile() method.
  • C#
using Spire.Doc;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a new Document object
            Document doc = new Document();

            // Load the document named "example.docx"
            doc.LoadFromFile("Sample.docx");

            // Get the first Section of the document
            Section section = doc.Sections[0];

            // Cast the first Table in the Section to Table type
            Table table = (Table)section.Tables[0];

            // Create a PreferredWidth object, set the width type to Percentage, and set the width value to 100%
            PreferredWidth percentageWidth = new PreferredWidth(WidthType.Percentage, (short)100);

            // Set the preferred width of the Table to the PreferredWidth object created above
            table.PreferredWidth = percentageWidth;

            // Define a variable of type TableRow
            TableRow tableRow;

            // Iterate through all rows in the Table
            for (int i = 0; i < table.Rows.Count; i++)
            {
                // Get the current row
                tableRow = table.Rows[i];

                // Set the width of the first column cell to 34%, with the type as Percentage
                tableRow.Cells[0].SetCellWidth(34, CellWidthType.Percentage);

                // Set the width of the second column cell to 33%, with the type as Percentage
                tableRow.Cells[1].SetCellWidth(33, CellWidthType.Percentage);

                // Set the width of the third column cell to 33%, with the type as Percentage
                tableRow.Cells[2].SetCellWidth(33, CellWidthType.Percentage);
            }

            // Save the modified document, specifying the file format as Docx2016
            doc.SaveToFile("set_column_width_by_percentage.docx", FileFormat.Docx2016);

            // Close the document
            doc.Close();
        }
    }
}

C#: Set the Column Width of a Word Table

Set Column Width Based on Fixed Value in C#

When setting column widths in a Word table using fixed values, you first need to set the table's layout to fixed. This is done with Table.TableFormat.LayoutType = LayoutType.Fixed. Then, iterate through each column and set the width to the same or different fixed values as required. Here are the detailed steps:

  • Create a Document object.
  • Load a document using the Document.LoadFromFile() method.
  • Retrieve the first section of the document using Document.Sections[0].
  • Get the first table within the section using Section.Tables[0].
  • Use a for loop to iterate through all rows in the table.
  • Set the column width for cells in different columns to fixed values using the TableRow.Cells[index].SetCellWidth(value, CellWidthType.Point) method. Note that value should be replaced with the desired width in points.
  • Save the changes to the Word document using the Document.SaveToFile() method.
  • C#
using Spire.Doc;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a new Document object
            Document doc = new Document();

            // Load the document
            doc.LoadFromFile("Sample.docx");

            // Get the first Section of the document
            Section section = doc.Sections[0];

            // Cast the first Table in the Section to Table type
            Table table = (Table)section.Tables[0];

            // Set the table layout type to Fixed
            table.Format.LayoutType = LayoutType.Fixed;

            // Get the left margin
            float leftMargin = section.PageSetup.Margins.Left;

            // Get the right margin
            float rightMargin = section.PageSetup.Margins.Right;

            // Calculate the page width minus the left and right margins
            double pageWidth = section.PageSetup.PageSize.Width - leftMargin - rightMargin;

            // Define a variable of type TableRow
            TableRow tableRow;

            // Loop through all rows in the Table
            for (int i = 0; i < table.Rows.Count; i++)
            {
                // Get the current row
                tableRow = table.Rows[i];

                // Set the width of the first column cell to 34% of the page width
                tableRow.Cells[0].SetCellWidth((float)(pageWidth * 0.34), CellWidthType.Point);

                // Set the width of the second column cell to 33% of the page width
                tableRow.Cells[1].SetCellWidth((float)(pageWidth * 0.33), CellWidthType.Point);

                // Set the width of the third column cell to 33% of the page width
                tableRow.Cells[2].SetCellWidth((float)(pageWidth * 0.33), CellWidthType.Point);
            }

            // Save the modified document, specifying the file format as Docx2016
            doc.SaveToFile("set_column_width_to_fixed_value.docx", FileFormat.Docx2016);

            // Close the document
            doc.Close();
        }
    }
}

C#: Set the Column Width of a Word Table

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Spire.OCR for .NET offers developers a new model to extract text from images. In this article, we will demonstrate how to extract text from images in C# using the new model of Spire.OCR for .NET.

The detailed steps are as follows.

Step 1: Create a Console App (.NET Framework) in Visual Studio.

C#: Extract Text from Images using the New Model of Spire.OCR for .NET

C#: Extract Text from Images using the New Model of Spire.OCR for .NET

Step 2: Change the platform target of the application to x64.

In the application's Solution Explorer, right-click on the project's name and then select "Properties".

C#: Extract Text from Images using the New Model of Spire.OCR for .NET

Change the platform target of the application to x64. This step must be performed since Spire.OCR only supports 64-bit platforms.

C#: Extract Text from Images using the New Model of Spire.OCR for .NET

Step 3: Install Spire.OCR for .NET in your application.

Install Spire.OCR for .NET through NuGet by executing the following command in the NuGet Package Manager Console:

Install-Package Spire.OCR

C#: Extract Text from Images using the New Model of Spire.OCR for .NET

Step 4: Download the new model of Spire.OCR for .NET.

Download the model that fits in with your operating system from one of the following links.

Windows x64

Linux x64

macOS 10.15 and later

Linux aarch

Then extract the package and save it to a specific directory on your computer. In this example, we saved the package to "D:\".

C#: Extract Text from Images using the New Model of Spire.OCR for .NET

Step 5: Use the new model of Spire.OCR for .NET to extract text from images in C#.

The following code example shows how to extract text from an image using C# and the new model of Spire.OCR for .NET:

  • C#
using Spire.OCR;
using System.IO;

namespace NewOCRModel
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Set license key
            // Spire.OCR.License.LicenseProvider.SetLicenseKey("your-license-key");

            // Create an instance of the OcrScanner class
            OcrScanner scanner = new OcrScanner();

            // Create an instance of the ConfigureOptions class to set up the scanner configuration
            ConfigureOptions configureOptions = new ConfigureOptions();

            // Set the path to the new model
            configureOptions.ModelPath = "D:\\win-x64";

            // Set the language for text recognition. The default is English.
            // Supported languages include English, Chinese, Chinesetraditional, French, German, Japanese, and Korean.
            configureOptions.Language = "English";

            // Apply the configuration options to the scanner
            scanner.ConfigureDependencies(configureOptions);

            // Extract text from an image
            scanner.Scan("test.png");

            //Save the extracted text to a text file
            string text = scanner.Text.ToString();
            File.WriteAllText("Output.txt", text);
        }
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Adding, modifying, and removing Word table borders can enhance the readability, aesthetics, and organization of data. Adding borders makes the content of the table clearer, distinguishing between different cells, which helps readers quickly identify information. Modifying border styles (such as line thickness, color, or pattern) can emphasize key data, guide visual flow, or conform to specific document styles and design requirements. Removing borders, in some cases, reduces visual clutter, making the content more compact and minimalist, especially suitable for data presentation where strict divisions are not needed or when you wish to downplay structural visibility. This article will introduce how to add, modify, or remove Word table borders in C# projects using Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

C# Add Word Table Borders

To set borders for all cells in an entire Word table, you need to iterate over each cell and set its visual border properties. Here are the detailed steps:

  • Create a Document object.
  • Use the Document.LoadFromFile() method to load a document.
  • Retrieve the first section of the document using Document.Sections[0].
  • Get the first table in that section by using Section.Tables[0].
  • Use a for loop to iterate through all the cells in the table.
  • Set TableCell.CellFormat.Borders.BorderType to BorderStyle.Single, which sets the cell border to a single line style.
  • Set TableCell.CellFormat.Borders.LineWidth to 1.5, defining the border width to be 1.5 points.
  • Set TableCell.CellFormat.Borders.Color to Color.Black, setting the border color to black.
  • Save the changes to the Word document using the Document.SaveToFile() method.
  • C#
using Spire.Doc;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a new Document object
            Document doc = new Document();

            // Load the document from a file
            doc.LoadFromFile("TableExample1.docx");

            // Get the first section of the document
            Section section = doc.Sections[0];

            // Get the first table in that section
            Table table = (Table)section.Tables[0];

            // Declare TableRow and TableCell variables for use within loops
            TableRow tableRow;
            TableCell tableCell;

            // Iterate through all rows in the table
            for (int i = 0; i < table.Rows.Count; i++)
            {
                // Get the current row
                tableRow = table.Rows[i];

                // Iterate through all cells in the current row
                for (int j = 0; j < tableRow.Cells.Count; j++)
                {
                    // Get the current cell
                    tableCell = tableRow.Cells[j];

                    // Set the border style of the current cell to single line
                    tableCell.CellFormat.Borders.BorderType = Spire.Doc.Documents.BorderStyle.Single;
                }
            }

            // Save the modified document as a new file
            doc.SaveToFile("AddBorders.docx", FileFormat.Docx2016);

            // Close the document to release resources
            doc.Close();
        }
    }
}

C#: Add, Modify, or Remove Word Table Borders

C# Modify Word Table Borders

Spire.Doc offers a range of border properties such as the border style TableCell.CellFormat.Borders.BorderType, border width TableCell.CellFormat.Borders.LineWidth, and border color TableCell.CellFormat.Borders.Color, among others. You can customize these properties to achieve the desired effects. Below are the detailed steps:

  • Create a Document object.
  • Load a document using the Document.LoadFromFile() method.
  • Retrieve the first section of the document using Document.Sections[0].
  • Get the first table in the section using Section.Tables[0].
  • Use a for loop to iterate over the cells in the table whose border styles you wish to change.
  • Change the bottom border color of the cell by setting TableCell.CellFormat.Borders.Bottom.Color to Color.PaleVioletRed.
  • Change the bottom border style of the cell by setting TableCell.CellFormat.Borders.Bottom.BorderType to BorderStyle.DotDash.
  • Change the bottom border width of the cell by setting TableCell.CellFormat.Borders.Bottom.LineWidth to 2 points.
  • Save the changes to the document using the Document.SaveToFile() method.
  • C#
using Spire.Doc;
using Spire.Doc.Documents;
using System.Drawing;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a new Document object
            Document doc = new Document();

            // Load the document from a file
            doc.LoadFromFile("TableExample2.docx");

            // Get the first section of the document
            Section section = doc.Sections[0];

            // Get the first table in that section
            Table table = (Table)section.Tables[0];

            // Declare a TableRow to use within the loop
            TableRow tableRow;

            // Iterate through all rows of the table
            for (int i = 1; i < table.Rows.Count - 1; i++)
            {
                tableRow = table.Rows[i];

                // Set the border color of the current cell
                tableRow.Cells[1].CellFormat.Borders.Bottom.Color = Color.PaleVioletRed;

                // Set the border style of the current cell to DotDash
                tableRow.Cells[1].CellFormat.Borders.Bottom.BorderType = Spire.Doc.Documents.BorderStyle.DotDash;

                // Set the width of the border
                tableRow.Cells[1].CellFormat.Borders.Bottom.LineWidth = 2;
            }

            // Save the modified document as a new file
            doc.SaveToFile("ModifiedBorders.docx", FileFormat.Docx2016);

            // Close the document and release resources
            doc.Close();
        }
    }
}

C#: Add, Modify, or Remove Word Table Borders

C# Remove Word Table Borders

During the process of handling Word documents, not only can border styles be applied to entire tables, but customization can also be extended to individual cells. To completely remove all borders from a table, it is recommended to follow a two-step strategy: First, apply border removal settings to the table itself; second, visit each cell within the table individually to clear their border styles. Here are the detailed steps:

  • Create a Document object.
  • Load a document using the Document.LoadFromFile() method.
  • Retrieve the first table in the section using Section.Tables[0].
  • Use a for loop to iterate over all cells in the table.
  • Set Table.TableFormat.Borders.BorderType = BorderStyle.None to remove borders from the table.
  • Set TableCell.CellFormat.Borders.BorderType = BorderStyle.None to remove borders from each cell.
  • Save the changes to the Word document using the Document.SaveToFile() method.
  • C#
using Spire.Doc;
using Spire.Doc.Documents;
using System.Drawing;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a new Document object
            Document doc = new Document();

            // Load the document from file
            doc.LoadFromFile("TableExample2.docx");

            // Get the first section of the document
            Section section = doc.Sections[0];

            // Get the first table in that section
            Table table = (Table)section.Tables[0];

            // Remove the borders set on the table
            table.Format.Borders.BorderType = BorderStyle.None;

            // Declare a TableRow to use in the loop
            TableRow tableRow;

            // Iterate through all rows in the table
            for (int i = 0; i < table.Rows.Count; i++)
            {
                tableRow = table.Rows[i];
                for (int j = 0; j < tableRow.Cells.Count; j++)
                {
                    // Remove all borders set on the cell
                    tableRow.Cells[j].CellFormat.Borders.BorderType = BorderStyle.None;
                }
            }

            // Save the modified document as a new file
            doc.SaveToFile("RemoveBorders.docx", FileFormat.Docx2016);

            // Close the document to release resources
            doc.Close();
        }
    }
}

C#: Add, Modify, or Remove Word Table Borders

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Tables are a powerful formatting tool in Word, allowing you to organize and present data effectively. However, the default table borders may not always align with your document's style and purpose. By selectively changing or removing the borders, you can achieve a variety of visual effects to suit your requirements. In this article, we will explore how to change and remove borders for tables in Word documents in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python. It can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Change Borders for a Table in Word in Python

Spire.Doc for Python empowers you to retrieve the borders collection of a table by using the Table.TableFormat.Borders property. Once retrieved, you can access individual borders (like top border, bottom border, left border, right border, horizontal border, and vertical border) from the collection and then modify them by adjusting their line style, width, and color. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using Document.LoadFromFile() method.
  • Get a specific section using Document.Sections[index] property.
  • Get a specific table using Section.Tables[index] property.
  • Get the borders collection of the table using Table.TableFormat.Borders property.
  • Get an individual border, such as the top border from the borders collection using Borders.Top property, and then change its line style, width and color.
  • Refer to the above step to get other individual borders from the borders collection, and then change their line style, width and color.
  • Save the resulting document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
document = Document()
# Load a Word document
document.LoadFromFile("Table.docx")

# Add a section to the document
section = document.Sections.get_Item(0)

# Get the first table in the section
table = section.Tables.get_Item(0) if isinstance(section.Tables.get_Item(0), Table) else None

# Get the collection of the borders
borders = table.TableFormat.Borders

# Get the top border and change border style, line width, and color
topBorder = borders.Top
topBorder.BorderType = BorderStyle.Single
topBorder.LineWidth = 1.0
topBorder.Color = Color.get_YellowGreen()

# Get the left border and change border style, line width, and color
leftBorder = borders.Left
leftBorder.BorderType = BorderStyle.Single
leftBorder.LineWidth = 1.0
leftBorder.Color = Color.get_YellowGreen()

# Get the right border and change border style, line width, and color
rightBorder = borders.Right
rightBorder.BorderType = BorderStyle.Single
rightBorder.LineWidth = 1.0
rightBorder.Color = Color.get_YellowGreen()

# Get the bottom border and change border style, line width, and color
bottomBorder = borders.Bottom
bottomBorder.BorderType = BorderStyle.Single
bottomBorder.LineWidth = 1.0
bottomBorder.Color = Color.get_YellowGreen()

# Get the horizontal border and change border style, line width, and color
horizontalBorder = borders.Horizontal
horizontalBorder.BorderType = BorderStyle.Dot
horizontalBorder.LineWidth = 1.0
horizontalBorder.Color = Color.get_Orange()

# Get the vertical border and change border style, line width, and color
verticalBorder = borders.Vertical
verticalBorder.BorderType = BorderStyle.Dot
verticalBorder.LineWidth = 1.0
verticalBorder.Color = Color.get_CornflowerBlue()

# Save the resulting document
document.SaveToFile("ChangeBorders.docx", FileFormat.Docx2013)
document.Close()

Python: Change or Remove Borders for Tables in Word

Remove Borders from a Table in Word in Python

To remove borders from a table, you need to set the BorderType property of the borders to BorderStyle.none. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using Document.LoadFromFile() method.
  • Get a specific section using Document.Sections[index] property.
  • Get a specific table using Section.Tables[index] property.
  • Get the borders collection of the table using Table.TableFormat.Borders property.
  • Get an individual border, such as the top border from the borders collection using Borders.Top property. Then set the BorderType property of the top border to BorderStyle.none.
  • Refer to the above step to get other individual borders from the borders collection and then set the BorderType property of the borders to BorderStyle.none.
  • Save the resulting document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Initialize an instance of the Document class
document = Document()
document.LoadFromFile("ChangeBorders.docx")

# Add a section to the document
section = document.Sections.get_Item(0)

# Get the first table in the section
table = section.Tables.get_Item(0) if isinstance(section.Tables.get_Item(0), Table) else None

# Get the borders collection of the table
borders = table.TableFormat.Borders

# Remove top border
topBorder = borders.Top
topBorder.BorderType = BorderStyle.none

# Remove left border
leftBorder = borders.Left
leftBorder.BorderType = BorderStyle.none

# Remove right border
rightBorder = borders.Right
rightBorder.BorderType = BorderStyle.none

# Remove bottom border
bottomBorder = borders.Bottom
bottomBorder.BorderType = BorderStyle.none

# remove inside horizontal border
horizontalBorder = borders.Horizontal
horizontalBorder.BorderType = BorderStyle.none

# Remove inside vertical border
verticalBorder = borders.Vertical
verticalBorder.BorderType = BorderStyle.none

# Save the resulting document
document.SaveToFile("RemoveBorders.docx", FileFormat.Docx2013)
document.Close()

Python: Change or Remove Borders for Tables in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Various written documents, such as academic papers, reports, and legal materials, often have specific formatting guidelines that encompass word count, page count, and other essential metrics. Accurately measuring these elements is crucial as it ensures that your document adheres to the required standards and meets the expected quality benchmarks. In this article, we will explain how to count words, pages, characters, paragraphs, and lines in a Word document in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python. It can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows

Count Words, Pages, Characters, Paragraphs, and Lines in a Word Document in Python

Spire.Doc for Python offers the BuiltinDocumentProperties class that empowers you to retrieve crucial information from your Word document. By utilizing this class, you can access a wealth of details, including the built-in document properties, as well as the number of words, pages, characters, paragraphs, and lines contained within the document.

The steps below explain how to get the number of words, pages, characters, paragraphs, and lines in a Word document in Python using Spire.Doc for Python:

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Get the BuiltinDocumentProperties object using the Document.BuiltinDocumentProperties property.
  • Get the number of words, characters, paragraphs, lines, and pages in the document using the WordCount, CharCount, ParagraphCount, LinesCount, PageCount properties of the BuiltinDocumentProperties class, and append the result to a list.
  • Write the content of the list into a text file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc = Document("Input.docx")

# Create a list
sb = []

# Get the built-in properties of the document
properties = doc.BuiltinDocumentProperties

# Get the number of words, characters, paragraphs, lines, and pages and append the result to the list
sb.append("The number of words: " + str(properties.WordCount))
sb.append("The number of characters: " + str(properties.CharCount))
sb.append("The number of paragraphs: " + str(properties.ParagraphCount))
sb.append("The number of lines: " + str(properties.LinesCount))
sb.append("The number of pages: " + str(properties.PageCount))

# Save the data in the list to a text file
with open("result.txt", "w") as file:
file.write("\n".join(sb))

doc.Close()

Python: Count Words, Pages, Characters, Paragraphs and Lines in Word

Count Words and Characters in a Specific Paragraph of a Word Document in Python

In addition to retrieving the overall word count, page count, and other metrics for an entire Word document, you are also able to get the word count and character count for a specific paragraph by using the Paragraph.WordCount and Paragraph.CharCount properties.

The steps below explain how to get the number of words and characters of a paragraph in a Word document in Python using Spire.Doc for Python:

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Get a specific paragraph using the Document.Sections[sectionIndex].Paragraphs[paragraphIndex] property.
  • Get the number of words and characters in the paragraph using the Paragraph.WordCount and Paragraph.CharCount properties, and append the result to a list.
  • Write the content of the list into a text file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc = Document("Input.docx")

# Get a specific paragraph
paragraph = doc.Sections.get_Item(0).Paragraphs.get_Item(0)

# Create a list
sb = []

# Get the number of words and characters in the paragraph and append the result to the list
sb.append("The number of words: " + str(paragraph.WordCount))
sb.append("The number of characters: " + str(paragraph.CharCount))

# Save the data in the list to a text file
with open("result.txt", "w") as file:
file.write("\n".join(sb))

doc.Close()

Python: Count Words, Pages, Characters, Paragraphs and Lines in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Section breaks in Word allow users to divide a document into sections, each with unique formatting options. This is especially useful when working with long documents where you want to apply different layouts, headers, footers, margins or page orientations within the same document. In this article, you will learn how to insert or remove section breaks in Word in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Insert Section Breaks in Word in Python

Spire.Doc for Python provides the Paragraph.InsertSectionBreak(breakType: SectionBreakType) method to insert a specified type of section break to a paragraph. The following table provides an overview of the supported section break types, along with their corresponding Enums and descriptions:

Section Break Enum Description
New page SectionBreakType.New_Page Start the new section on a new page.
Continuous SectionBreakType.No_Break Start the new section on the same page, allowing for continuous content flow.
Odd page SectionBreakType.Odd_Page Start the new section on the next odd-numbered page.
Even page SectionBreakType.Even_Page Start the new section on the next even-numbered page.
New column SectionBreakType.New_Column Start the new section in the next column if columns are enabled.

The following are the detailed steps to insert a continuous section break:

  • Create a Document instance.
  • Load a Word document using Document.LoadFromFile() method.
  • Get a specified section using Document.Sections[] property.
  • Get a specified paragraph of the section using Section.Paragraphs[] property.
  • Add a section break to the end of the paragraph using Paragraph.InsertSectionBreak() method.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "sample.docx"
outputFile = "InsertSectionBreak.docx"

# Create a Document instance
document = Document()

# Load a Word document
document.LoadFromFile(inputFile)

# Get a specific section
section = document.Sections.get_Item(0)

# Get a specific paragraph
paragraph = section.Paragraphs.get_Item(0)

# Insert a continuous section break
paragraph.InsertSectionBreak(SectionBreakType.NoBreak)

# Save the result document
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Python: Insert or Remove Section Breaks in Word

Remove Section Breaks in Word in Python

To delete all sections breaks in a Word document, we need to access the first section in the document, then copy the contents of the other sections to the first section and delete them. The following are the detailed steps:

  • Create a Document instance.
  • Load a Word document using Document.LoadFromFile() method.
  • Get the first section using Document.Sections[] property.
  • Iterate through other sections in the document.
  • Get the second section, and then iterate through to get its child objects.
  • Clone the child objects of the second section and add them to the first section using Section.Body.ChildObjects.Add() method.
  • Delete the second section using Document.Sections.Remove() method.
  • Repeat the process to copy and delete the remaining sections.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "Report.docx"
outputFile = "RemoveSectionBreaks.docx"

# Create a Document instance
document = Document()

# Load a Word document
document.LoadFromFile(inputFile)

# Get a specific section
section = document.Sections.get_Item(0)

# Iterate through other sections in the document
for i in range(document.Sections.Count - 1):
    # Get the second section in the document
    section = document.Sections[1]
    
    # Iterate through all child objects of the second section
    for j in range(section.Body.ChildObjects.Count):
        # Get the child objects
        obj = section.Body.ChildObjects.get_Item(j)
        # Clone the child objects to the first section
        sec.Body.ChildObjects.Add(obj.Clone())
        # Remove the second section
        document.Sections.Remove(section)

# Save the result document
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Python: Insert or Remove Section Breaks in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Friday, 07 June 2024 07:59

Python: Extract Comments from Word

Comments in Word documents are often used for collaborative review and feedback purposes. They may contain text and images that provide valuable information to guide document improvements. Extracting the text and images from comments allows you to analyze and evaluate the feedback provided by reviewers, helping you gain a comprehensive understanding of the strengths, weaknesses, and suggestions related to the document. In this article, we will demonstrate how to extract text and images from Word comments in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Extract Text from Word Comments in Python

You can easily retrieve the author and text of a Word comment using the Comment.Format.Author and Comment.Body.Paragraphs[index].Text properties provided by Spire.Doc for Python. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a list to store the extracted comment data.
  • Iterate through the comments in the document.
  • For each comment, iterate through the paragraphs of the comment body.
  • For each paragraph, get the text using the Comment.Body.Paragraphs[index].Text property.
  • Get the author of the comment using the Comment.Format.Author property.
  • Add the text and author of the comment to the list.
  • Save the content of the list to a text file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
document = Document()
# Load a Word document containing comments
document.LoadFromFile("Comments.docx")

# Create a list to store the extracted comment data
comments = []

# Iterate through the comments in the document
for i in range(document.Comments.Count):
    comment = document.Comments[i]
    comment_text = ""

    # Iterate through the paragraphs in the comment body
    for j in range(comment.Body.Paragraphs.Count):
        paragraph = comment.Body.Paragraphs[j]
        comment_text += paragraph.Text + "\n"

    # Get the comment author
    comment_author = comment.Format.Author

    # Append the comment data to the list
    comments.append({
        "author": comment_author,
        "text": comment_text
    })

# Write the comment data to a file
with open("comment_data.txt", "w", encoding="utf-8") as file:
    for comment in comments:
        file.write(f"Author: {comment['author']}\nText: {comment['text']}\n\n")

Python: Extract Comments from Word

Extract Images from Word Comments in Python

To extract images from Word comments, you need to iterate through the child objects in the paragraphs of the comments to find the DocPicture objects, then get the image data using DocPicture.ImageBytes property, finally save the image data to image files.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a list to store the extracted image data.
  • Iterate through the comments in the document.
  • For each comment, iterate through the paragraphs of the comment body.
  • For each paragraph, iterate through the child objects of the paragraph.
  • Check if the object is a DocPicture object.
  • If the object is a DocPicture, get the image data using the DocPicture.ImageBytes property and add it to the list.
  • Save the image data in the list to individual image files.
  • Python
from spire.doc import *
from spire.doc.common import *
 
# Create an object of the Document class
document = Document()
# Load a Word document containing comments
document.LoadFromFile("Comments.docx")
 
# Create a list to store the extracted image data
images = []
 
# Iterate through the comments in the document
for i in range(document.Comments.Count):
    comment = document.Comments.get_Item(i)
    # Iterate through the paragraphs in the comment body
    for j in range(comment.Body.Paragraphs.Count):
        paragraph = comment.Body.Paragraphs.get_Item(j)
        # Iterate through the child objects in the paragraph
        for o in range(paragraph.ChildObjects.Count):
            obj = paragraph.ChildObjects.get_Item(o)
            # Find the images
            if isinstance(obj, DocPicture):
                picture = obj
                # Get the image data and add it to the list
                data_bytes = picture.ImageBytes
                images.append(data_bytes)
 
# Save the image data to image files
for i, image_data in enumerate(images):
    file_name = f"CommentImage-{i}.png"
    with open(os.path.join("CommentImages/", file_name), 'wb') as image_file:
        image_file.write(image_data)

Python: Extract Comments from Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Retrieving the coordinates of text or images within a PDF document can quickly locate specific elements, which is valuable for extracting content from PDFs. This capability also enables adding annotations, marks, or stamps to the desired locations in a PDF, allowing for more advanced document processing and manipulation.

In this article, you will learn how to get coordinates of the specified text or image in a PDF document using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Coordinate System in Spire.PDF

When using Spire.PDF to process an existing PDF document, the origin of the coordinate system is located at the top left corner of the page. The X-axis extends horizontally from the origin to the right, and the Y-axis extends vertically downward from the origin (shown as below).

Python: Get Coordinates of the Specified Text or Image in PDF

Get Coordinates of the Specified Text in PDF in Python

To find the coordinates of a specific piece of text within a PDF document, you must first use the PdfTextFinder.Find() method to locate all instances of the target text on a particular page. Once you have found these instances, you can then access the PdfTextFragment.Positions property to retrieve the precise (X, Y) coordinates for each instance of the text.

The steps to get coordinates of the specified text in PDF are as follows.

  • Create a PdfDocument object.
  • Load a PDF document from a specified path.
  • Get a specific page from the document.
  • Create a PdfTextFinder object.
  • Specify find options through PdfTextFinder.Options property.
  • Search for a string within the page using PdfTextFinder.Find() method.
  • Get a specific instance of the search results.
  • Get X and Y coordinates of the text through PdfTextFragment.Positions[0].X and PdfTextFragment.Positions[0].Y properties.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Privacy Policy.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object
textFinder = PdfTextFinder(page)

# Specify find options
findOptions = PdfTextFindOptions()
findOptions.Parameter = TextFindParameter.IgnoreCase
findOptions.Parameter = TextFindParameter.WholeWord
textFinder.Options = findOptions
 
# Search for the string "PRIVACY POLICY" within the page
findResults = textFinder.Find("PRIVACY POLICY") 

# Get the first instance of the results
result = findResults[0]

# Get X/Y coordinates of the found text
x = int(result.Positions[0].X)
y = int(result.Positions[0].Y)
print("The coordinates of the first instance of the found text are:", (x, y))

# Dispose resources
doc.Dispose()

Python: Get Coordinates of the Specified Text or Image in PDF

Get Coordinates of the Specified Image in PDF in Python

Spire.PDF for Python provides the PdfImageHelper class, which allows users to extract image details from a specific page within a PDF file. By doing so, you can leverage the PdfImageInfo.Bounds property to retrieve the (X, Y) coordinates of an individual image.

The steps to get coordinates of the specified image in PDF are as follows.

  • Create a PdfDocument object.
  • Load a PDF document from a specified path.
  • Get a specific page from the document.
  • Create a PdfImageHelper object.
  • Get the image information from the page using PdfImageHelper.GetImagesInfo() method.
  • Get X and Y coordinates of a specific image through PdfImageInfo.Bounds property.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Privacy Policy.pdf")

# Get a specific page 
page = doc.Pages.get_Item(0)

# Create a PdfImageHelper object
imageHelper = PdfImageHelper()

# Get image information from the page
imageInformation = imageHelper.GetImagesInfo(page)

# Get X/Y coordinates of a specific image
x = int(imageInformation[0].Bounds.X)
y = int(imageInformation[0].Bounds.Y)
print("The coordinates of the specified image are:", (x, y))

# Dispose resources
doc.Dispose()

Python: Get Coordinates of the Specified Text or Image in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

TIFF is a popular image format used in scanning and archiving due to its high quality and support for a wide range of color spaces. On the other hand, PDFs are widely used for document exchange because they preserve the layout and formatting of a document while compressing the file size. Conversion between these formats can be useful for various purposes such as archival, editing, or sharing documents.

In this article, you will learn how to convert PDF to TIFF and TIFF to PDF using the Spire.PDF for Python and Pillow libraries.

Install Spire.PDF for Python

This situation relies on the combination of Spire.PDF for Python and Pillow (PIL). Spire.PDF is used to read, create and convert PDF documents, while the PIL library is used for handling TIFF files and accessing their frames.

The libraries can be easily installed on your device through the following pip command.

pip install Spire.PDF
pip install pillow

Convert PDF to TIFF in Python

To complete the PDF to TIFF conversion, you first need to load the PDF document and convert the individual pages into image streams using Spire.PDF. Subsequently, these image streams are then merged together using the functionality of the PIL library, resulting in a consolidated TIFF image.

The following are the steps to convert PDF to TIFF using Python.

  • Create a PdfDocument object.
  • Load a PDF document from a specified file path.
  • Iterate through the pages in the document.
    • Convert each page into an image stream using PdfDocument.SaveAsImage() method.
    • Convert the image stream into a PIL image.
  • Combine these PIL images into a single TIFF image.
  • Python
from spire.pdf.common import *
from spire.pdf import *

from PIL import Image
from io import BytesIO

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf")

# Create an empty list to store PIL Images
images = []

# Iterate through all pages in the document
for i in range(doc.Pages.Count):

    # Convert a specific page to an image stream
    with doc.SaveAsImage(i) as imageData:

        # Open the image stream as a PIL image
        img = Image.open(BytesIO(imageData.ToArray())) 

        # Append the PIL image to list
        images.append(img)

# Save the PIL Images as a multi-page TIFF file
images[0].save("Output/ToTIFF.tiff", save_all=True, append_images=images[1:])

# Dispose resources
doc.Dispose()

Python: Convert PDF to TIFF and TIFF to PDF

Convert TIFF to PDF in Python

With the assistance of the PIL library, you can load a TIFF file and transform each frame into distinct PNG files. Afterwards, you can utilize Spire.PDF to draw these PNG files onto pages within a PDF document.

To convert a TIFF image to a PDF document using Python, follow these steps.

  • Create a PdfDocument object.
  • Load a TIFF image.
  • Iterate though the frames in the TIFF image.
    • Get a specific frame, and save it as a PNG file.
    • Add a page to the PDF document.
    • Draw the image on the page at the specified location using PdfPageBase.Canvas.DrawImage() method.
  • Save the document to a PDF file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

from PIL import Image
import io

# Create a PdfDocument object
doc = PdfDocument()

# Set the page margins to 0
doc.PageSettings.SetMargins(0.0)

# Load a TIFF image
tiff_image = Image.open("C:\\Users\\Administrator\\Desktop\\TIFF.tiff")

    # Go to the current frame
    tiff_image.seek(i)
    
    # Extract the image of the current frame
    frame_image = tiff_image.copy()

    # Save the image to a PNG file
    frame_image.save(f"temp/output_frame_{i}.png")
    
    # Load the image file to PdfImage
    image = PdfImage.FromFile(f"temp/output_frame_{i}.png")

    # Get image width and height
    width = image.PhysicalDimension.Width
    height = image.PhysicalDimension.Height

    # Add a page to the document
    page = doc.Pages.Add(SizeF(width, height))

    # Draw image at (0, 0) of the page
    page.Canvas.DrawImage(image, 0.0, 0.0, width, height)

# Save the document to a PDF file
doc.SaveToFile("Output/TiffToPdf.pdf",FileFormat.PDF)

# Dispose resources
doc.Dispose()

Python: Convert PDF to TIFF and TIFF to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Page 9 of 51