Spire.Office Knowledgebase Page 29 | E-iceblue

Using Python to manipulate text formatting in PDFs provides a powerful way to automate and customize documents. With the Spire.PDF for Python library, developers can efficiently find text with advanced search options to retrieve and modify text properties like font, size, color, and style, enabling users to find and update text formatting across large document sets, saving time and reducing manual work. This article will demonstrate how to use Spire.PDF for Python to retrieve and modify text formatting in PDF documents with Python code.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows

Find Text and Retrieve Formatting Information in PDFs

Developers can use the PdfTextFinder and PdfTextFindOptions classes provided by Spire.PDF for Python to precisely search for specific text in a PDF document and obtain a collection of PdfTextFragment objects representing the search results. Then, developers can access the format information of the specified search result text through properties such as FontName, FontSize, and FontFamily, under PdfTextFragment.TextStates[] property.

The detailed steps for finding text in PDF and retrieving its font information are as follows:

  • Create an instance of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Create a PdfTextFinder object using the page.
  • Create a PdfTextFindOptions object, set the search options, and apply the search options through PdfTextFinder.Options property.
  • Find specific text on the page using PdfTextFinder.Find() method and get a collection of PdfTextFragment objects.
  • Get the formatting of the first finding result through PdfTextFragment.TextStates property.
  • Get the font name, font size, and font family of the result through PdfTextStates[0].FontName, PdfTextStates[0].FontSize, and PdfTextStates[0].FontFamily properties.
  • Print the result.
  • Python
from spire.pdf import *

# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages.get_Item(0)

# Create a PdfTextFinder instance
finder = PdfTextFinder(page)

# Create a PdfTextFindOptions instance and set the search options
options = PdfTextFindOptions()
options.CaseSensitive = True
options.WholeWords = True

# Apply the options
finder.Options = options

# Find the specified text
fragments = finder.Find("History and Cultural Significance:")

# Get the formatting of the first fragment
formatting = fragments[0].TextStates

# Get the formatting information
fontInfo = ""
fontInfo += "Text: " + fragments[0].Text
fontInfo += "Font: " + formatting[0].FontName
fontInfo += "\nFont Size: " + str(formatting[0].FontSize)
fontInfo += "\nFont Family: " + formatting[0].FontFamily

# Output font information
print(fontInfo)

# Release resources
pdf.Dispose()

Python: Retrieve and Modify Text Formatting in PDF

Find and Modify Text Formatting in PDF Documents

After finding specific text, developers can overlay it with a rectangle in the same color as the background and then redraw the text in a new format at the same position, thus achieving text format modification of simple PDF text fragments on solid color pages. The detailed steps are as follows:

  • Create an instance of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Create a PdfTextFinder object using the page.
  • Create a PdfTextFindOptions object, set the search options, and apply the search options through PdfTextFinder.Options property.
  • Find specific text on the page using PdfTextFinder.Find() method and get the first result.
  • Get the color of the page background through PdfPageBase.BackgroundColor property and change the color to white if the background is empty.
  • Draw rectangles with the obtained color in the position of the found text using PdfPageBase.Canvas.DrawRectangle() method.
  • Create a new font, brush, and string format and calculate the text frame.
  • Draw the text in the new format in the same position using PdfPageBase.Canvas.DrawString() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *

# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages.get_Item(0)

# Create a PdfTextFinder instance
finder = PdfTextFinder(page)

# Create a PdfTextFindOptions instance and set the search options
options = PdfTextFindOptions()
options.CaseSensitive = True
options.WholeWords = True
finder.Options = options

# Find the specified text
fragments = finder.Find("History and Cultural Significance:")
# Get the first result
fragment = fragments[0]

# Get the background color and change it to white if its empty
backColor = page.BackgroundColor
if backColor.ToArgb() == 0:
    backColor = Color.get_White()
# Draw a rectangle with the background color to cover the text
for i in range(len(fragment.Bounds)):
    page.Canvas.DrawRectangle(PdfSolidBrush(PdfRGBColor(backColor)), fragment.Bounds[i])

# Create a new font and a new brush
font = PdfTrueTypeFont("Times New Roman", 16.0, 3, True)
brush = PdfBrushes.get_Brown()
# Create a PdfStringFormat instance
stringFormat = PdfStringFormat()
stringFormat.Alignment = PdfTextAlignment.Left
# Calculate the rectangle that contains the text
point = fragment.Bounds[0].Location
size = SizeF(fragment.Bounds[-1].Right, fragment.Bounds[-1].Bottom)
rect = RectangleF(point, size)

# Draw the text with the specified format in the same rectangle
page.Canvas.DrawString("History and Cultural Significance", font, brush, rect, stringFormat)

# Save the document
pdf.SaveToFile("output/FindModifyTextFormat.pdf")

# Release resources
pdf.Close()

Python: Retrieve and Modify Text Formatting in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Text boxes in Microsoft Word are flexible elements that improve the layout and design of documents. They enable users to place text separately from the main text flow, facilitating the creation of visually attractive documents. At times, you might need to extract text from these text boxes for reuse, or update the content within them to maintain clarity and relevance. This article demonstrates how to extract or update textboxes in a Word document using Java with Spire.Doc for Java.

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Extract Text from a Textbox in Word in Java

With Spire.Doc for Java, you can access a specific text box in a document using the Document.getTextBoxes().get() method. You can then iterate through the child objects of the text box to check if each one is a paragraph or a table. For paragraphs, retrieve the text using the Paragraph.getText() method. For tables, loop through the cells to extract text from each cell.

Here are the steps to extract text from a text box in a Word document:

  • Create a Document object.
  • Load a Word file using Document.loadFromFile() method.
  • Access a specific text box using Document.getTextBoxes().get() method.
  • Iterate through the child objects of the text box.
  • Check if a child object is a paragraph. If so, use Paragraph.getText() method to get the text.
  • Check if a child object is a table. If so, use extractTextFromTable() method to retrieve the text from the table.
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.DocumentObjectType;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.fields.TextBox;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractTextFromTextbox {

    public static void main(String[] args) throws IOException {

        // Create a Document object
        Document document = new Document();

        // Load a Word file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

        // Get a specific textbox
        TextBox textBox = document.getTextBoxes().get(0);

        // Create a FileWriter to write extracted text to a txt file
        FileWriter fileWriter = new FileWriter("Extracted.txt");

        // Iterate though child objects of the textbox
        for (Object object: textBox.getChildObjects()) {

            // Determine if the child object is a paragraph
            if (((DocumentObject) object).getDocumentObjectType() == DocumentObjectType.Paragraph) {

                // Write paragraph text to the txt file
                fileWriter.write(((Paragraph)object).getText() + "\n");
            }

            // Determine if the child object is a table
            if (((DocumentObject) object).getDocumentObjectType() == DocumentObjectType.Table) {

                // Extract text from table to the txt file
                extractTextFromTable((Table)object, fileWriter);
            }
        }

        // Close the stream
        fileWriter.close();
    }

    // Extract text from a table
    static void extractTextFromTable(Table table, FileWriter fileWriter) throws IOException {
        for (int i = 0; i < table.getRows().getCount(); i++) {
            TableRow row = table.getRows().get(i);
            for (int j = 0; j < row.getCells().getCount(); j++) {
                TableCell cell = row.getCells().get(j);
                for (Object paragraph: cell.getParagraphs()) {
                    fileWriter.write(((Paragraph) paragraph).getText() + "\n");
                }
            }
        }
    }
}

Java: Extract or Update Textboxes in Word

Update a Textbox in Word in Java

To modify a text box, first remove its existing content using TextBox.getChildObjects.clear() method. Then, create a new paragraph and assign the desired text to it.

Here are the steps to update a text box in a Word document:

  • Create a Document object.
  • Load a Word file using Document.loadFromFile() method.
  • Get a specific textbox using Document.getTextBoxes().get() method.
  • Remove existing content of the textbox using TextBox.getChildObjects().clear() method.
  • Add a paragraph to the textbox using TextBox.getBody().addParagraph() method.
  • Add text to the paragraph using Paragraph.appendText() method.
  • Save the document to a different Word file.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.fields.TextBox;
import com.spire.doc.fields.TextRange;

public class UpdateTextbox {

    public static void main(String[] args) {

        // Create a Document object
        Document document = new Document();

        // Load a Word file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

        // Get a specific textbox
        TextBox textBox = document.getTextBoxes().get(0);

        // Remove child objects of the textbox
        textBox.getChildObjects().clear();

        // Add a new paragraph to the textbox
        Paragraph paragraph = textBox.getBody().addParagraph();

        // Set line spacing
        paragraph.getFormat().setLineSpacing(15f);

        // Add text to the paragraph
        TextRange textRange = paragraph.appendText("The text in this textbox has been updated.");

        // Set font size
        textRange.getCharacterFormat().setFontSize(15f);

        // Save the document to a different Word file
        document.saveToFile("UpdateTextbox.docx", FileFormat.Docx_2019);

        // Dispose resources
        document.dispose();
    }
}

Java: Extract or Update Textboxes in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

C#: Add Filters to Pivot Tables in Excel

2024-10-14 01:02:04 Written by Koohji

Filters in pivot tables enable users to narrow down the displayed data based on specific criteria. By adding filters, users can focus on subsets of data that are most relevant to their analysis, allowing for a more targeted and efficient data exploration. In this article, we will demonstrate how to add filters to pivot tables in Excel in C# using Spire.XLS for .NET.

Install Spire.XLS for .NET

To begin with, you need to add the DLL files included in the Spire.XLS for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.XLS

Add Report Filter to Pivot Table in Excel in C#

Spire.XLS for .NET offers the XlsPivotTable.ReportFilters.Add() method to add report filters to a pivot table. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet in the file using Workbook.Worksheets[index] property.
  • Get a specific pivot table in the worksheet using Worksheet.PivotTables[index] property.
  • Create a report filter using PovotReportFilter class.
  • Add the report filter to the pivot table using XlsPivotTable.ReportFilters.Add() method.
  • Save the resulting file using Workbook.SaveToFile() method.
  • C#
using Spire.Xls;
using Spire.Xls.Core.Spreadsheet.PivotTables;

namespace AddReportFilter
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an object of the Workbook class
            Workbook workbook = new Workbook();
            // Load an Excel file
            workbook.LoadFromFile("Sample.xlsx");

            // Get the first worksheet
            Worksheet sheet = workbook.Worksheets[0];

            // Get the first pivot table
            XlsPivotTable pt = sheet.PivotTables[0] as XlsPivotTable;

            // Create a report filter
            PivotReportFilter reportFilter = new PivotReportFilter("Product", true);

            // Add the report filter to the pivot table
            pt.ReportFilters.Add(reportFilter);

            // Save the resulting file
            workbook.SaveToFile("AddReportFilter.xlsx", FileFormat.Version2016);
            workbook.Dispose();
        }
    }
}

C#: Add Filters to Pivot Tables in Excel

Add Filter to a Row Field of Pivot Table in Excel in C#

You can add a value filter or label filter to a specific row field in a pivot table using the XlsPivotTable.RowFields[index].AddValueFilter() or XlsPivotTable.RowFields[index].AddLabelFilter() method. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet in the file using Workbook.Worksheets[index] property.
  • Get a specific pivot table in the worksheet using Worksheet.PivotTables[index] property.
  • Add a value filter or label filter to a specific row field in the pivot table using XlsPivotTable.RowFields[index].AddValueFilter() or XlsPivotTable.RowFields[index].AddLabelFilter() method.
  • Calculate the data in the pivot table using XlsPivotTable.CalculateData() method.
  • Save the resulting file using Workbook.SaveToFile() method.
  • C#
using Spire.Xls;
using Spire.Xls.Core.Spreadsheet.PivotTables;

namespace AddRowFilter
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an object of the Workbook class
            Workbook workbook = new Workbook();
            // Load an Excel file
            workbook.LoadFromFile("Sample.xlsx");

            // Get the first worksheet
            Worksheet sheet = workbook.Worksheets[0];

            // Get the first pivot table
            XlsPivotTable pt = sheet.PivotTables[0] as XlsPivotTable;


            // Add a value filter to the first row field in the pivot table
            pt.RowFields[0].AddValueFilter(PivotValueFilterType.GreaterThan, pt.DataFields[0], 5000, null);
            // Or add a label filter to the first row field in the pivot table
            //pt.RowFields[0].AddLabelFilter(PivotLabelFilterType.Equal, "Mike", null);

            // Calculate the pivot table data
            pt.CalculateData();

            // Save the resulting file
            workbook.SaveToFile("AddRowFilter.xlsx", FileFormat.Version2016);
            workbook.Dispose();
        }
    }
}

C#: Add Filters to Pivot Tables in Excel

Add Filter to a Column Field of Pivot Table in Excel in C#

To add a value filter or label filter to a specific column field in a pivot table, you can use the XlsPivotTable.ColumnFields[index].AddValueFilter() or XlsPivotTable.ColumnFields[index].AddLabelFilter() method. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet in the file using Workbook.Worksheets[index] property.
  • Get a specific pivot table in the worksheet using Worksheet.PivotTables[index] property.
  • Add a value filter or label filter to a specific column field in the pivot table using XlsPivotTable.ColumnFields[index].AddValueFilter() or XlsPivotTable.ColumnFields[index].AddLabelFilter() method.
  • Calculate the data in the pivot table using XlsPivotTable.CalculateData() method.
  • Save the resulting file using Workbook.SaveToFile() method.
  • C#
using Spire.Xls;
using Spire.Xls.Core.Spreadsheet.PivotTables;

namespace AddColumnFilter
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an object of the Workbook class
            Workbook workbook = new Workbook();
            // Load an Excel file
            workbook.LoadFromFile("Sample.xlsx");

            // Get the first worksheet
            Worksheet sheet = workbook.Worksheets[0];

            // Get the first pivot table
            XlsPivotTable pt = sheet.PivotTables[0] as XlsPivotTable;

            // Add a label filter to the first column field in the pivot table
            pt.ColumnFields[0].AddLabelFilter(PivotLabelFilterType.Equal, "Laptop", null);
            // Or add a value filter to the first column field in the pivot table
            // pt.ColumnFields[0].AddValueFilter(PivotValueFilterType.Between, pt.DataFields[0], 5000, 10000);


            // Calculate the pivot table data
            pt.CalculateData();

            // Save the resulting file
            workbook.SaveToFile("AddColumnFilter.xlsx", FileFormat.Version2016);
            workbook.Dispose();
        }
    }
}

C#: Add Filters to Pivot Tables in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 29