Administrator

Thursday, 22 May 2025 09:16

How to Count Word Frequency in a Word Document Using Python

Want to count the frequency of words in a Word document? Whether you're analyzing content, generating reports, or building a document tool, Python makes it easy to find how often a specific word appears—across the entire document, within specific sections, or even in individual paragraphs. In this guide, you’ll learn how to use Python to count word occurrences accurately and efficiently, helping you extract meaningful insights from your Word files without manual effort.

Count Frequency of Words in Word with Python

Count Frequency of Words in an Entire Word Document
Count Word Frequency by Section
Count Word Frequency by Paragraph
To Wrap Up
FAQ

In this tutorial, we’ll use Spire.Doc for Python, a powerful and easy-to-use library for Word document processing. It supports a wide range of features like reading, editing, and analyzing DOCX files programmatically—without requiring Microsoft Office.

You can install it via pip:

pip install spire.doc

Let’s see how it works in practice, starting with counting word frequency in an entire Word document.

How to Count Frequency of Words in an Entire Word Document

Let’s start by learning how to count how many times a specific word or phrase appears in an entire Word document. This is a common task—imagine you need to check how often the word "contract" appears in a 50-page file.
With the FindAllString() method from Spire.Doc for Python, you can quickly search through the entire document and get an exact count in just a few lines of code—saving you both time and effort.

Steps to count the frequency of a word in the entire Word document:

Create an object of Document class and read a source Word document.
Specify the keyword to find.
Find all occurrences of the keyword in the document using Document.FindAllString() method.
Count the number of matches and print it out.

The following code shows how to count the frequency of the keyword "AI-Generated Art" in the entire Word document:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word document
document.LoadFromFile("E:/Administrator/Python1/input/AI-Generated Art.docx")

# Customize the keyword to find
keyword = "AI-Generated Art"

# Find all matches (False: distinguish case; True: full text search)
textSelections = document.FindAllString(keyword, False, True)

# Count the number of matches
count = len(textSelections)

# Print the result
print(f'"{keyword}" appears {count} times in the entire document.')

# Close the document
document.Close()

Count Frequency of Word in the Entire Document with Python

How to Count Word Frequency by Section in a Word Document Using Python

A Word document is typically divided into multiple sections, each containing its own paragraphs, tables, and other elements. Sometimes, instead of counting a word's frequency across the entire document, you may want to know how often it appears in each section. To achieve this, we’ll loop through all the document sections and search for the target word within each one. Let’s see how to count word frequency by section using Python.

Steps to count the frequency of a word by section in Word documents:

Create a Document object and load the Word file.
Define the target keyword to search.
Loop through all sections in the document. Within each section, loop through all paragraphs.
Use regular expressions to count keyword occurrences.
Accumulate and print the count for each section and the total count.

This code demonstrates how to count how many times "AI-Generated Art" appears in each section of a Word document:

import re
from spire.doc import *
from spire.doc.common import *

# Create a Document object and load a Word file
document = Document()
document.LoadFromFile("E:/Administrator/Python1/input/AI.docx")

# Specify the keyword
keyword = "AI-Generated Art"

# The total count of the keyword
total_count = 0

# Get all sections
sections = document.Sections

# Loop through each section
for i in range(sections.Count):
    section = sections.get_Item(i)
    paragraphs = section.Paragraphs

    section_count = 0  
    print(f"\n=== Section {i + 1} ===")

    # Loop through each paragraph in the section
    for j in range(paragraphs.Count):
        paragraph = paragraphs.get_Item(j)
        text = paragraph.Text

        # Find all matches using regular expressions
        count = len(re.findall(re.escape(keyword), text, flags=re.IGNORECASE))
        section_count += count
        total_count += count

    print(f'Total in Section {i + 1}: {section_count} time(s)')

print(f'\n=== Total occurrences in all sections: {total_count} ===')

# Close the document
document.Close()

How to Count Word Frequency by Sections in a Word File

How to Count Word Frequency by Paragraph in a Word Document

When it comes to tasks like sensitive word detection or content auditing, it's crucial to perform a more granular analysis of word frequency. In this section, you’ll learn how to count word frequency by paragraph in a Word document, which gives you deeper insight into how specific terms are distributed across your content. Let’s walk through the steps and see a code example in action.

Steps to count the frequency of words by paragraph in Word files:

Instantiate a Document object and load a Word document from files.
Specify the keyword to search for.
Loop through each section and each paragraph in the document.
Find and count the occurrence of the keyword using regular expressions.
Print out the count for each paragraph where the keyword appears and the total number of occurrences.

Use the following code to calculate the frequency of "AI-Generated Art" by paragraphs in a Word document:

import re
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word document
document.LoadFromFile("E:/Administrator/Python1/input/AI.docx")

# Customize the keyword to find
keyword = "AI-Generated Art"

# Initialize variables
total_count = 0
paragraph_index = 1

# Loop through sections and paragraphs
sections = document.Sections
for i in range(sections.Count):
    section = sections.get_Item(i)
    paragraphs = section.Paragraphs
    for j in range(paragraphs.Count):
        paragraph = paragraphs.get_Item(j)
        text = paragraph.Text

        # Find all occurrences of the keyword while ignoring case
        count = len(re.findall(re.escape(keyword), text, flags=re.IGNORECASE))

        # Print the result
        if count > 0:
            print(f'Paragraph {paragraph_index}: "{keyword}" appears {count} time(s)')
            total_count += count
        paragraph_index += 1

# Print the total count
print(f'\nTotal occurrences in all paragraphs: {total_count}')
document.Close()

Count Word Frequency by Paragraphs Using Python

To Wrap Up

The guide demonstrates how to count the frequency of specific words across an entire Word document, by section, and by paragraph using Python. Whether you're analyzing long reports, filtering sensitive terms, or building smart document tools, automating the task with Spire.Doc for Python can save time and boost accuracy. Give them a try in your own projects and take full control of your Word document content.

FAQs about Counting the Frequency of Words

Q1: How to count the number of times a word appears in Word?

A: You can count word frequency in Word manually using the “Find” feature, or automatically using Python and libraries like Spire.Doc. This lets you scan the entire document or target specific sections or paragraphs.

Q2: Can I analyze word frequency across multiple Word files?

A: Yes. By combining a loop in Python to load multiple documents, you can apply the same word-count logic to each file and aggregate the results—ideal for batch processing or document audits.

Published in Text

Tagged under

doc Python Text

Wednesday, 21 May 2025 08:14

Spire.Presentation 10.5.8 enhances the conversion from Presentation to PDF

We are delighted to announce the release of Spire.Presentation 10.5.8. The latest version enhances the conversion from Presentation to PDF. Besides, the issue where Microsoft Powerpoint displayed an error message when opening a PPT file that was directly loaded and saved is fixed successfully. More details are listed below.

Here is a list of changes made in this release

Category	ID	Description
Bug	SPIREPPT-2858	Fixes the issue that there was incorrect text content when converting a specific PPT document to PDF.
Bug	SPIREPPT-2842	Fixes the issue where Microsoft Powerpoint displayed an error message when opening a PPT file that was directly loaded and saved.

Click the link to download Spire.Presentation 10.5.8:

https://www.e-iceblue.com/Download/download-presentation-for-net-now.html

More information of Spire.Presentation new release or hotfix:

https://www.e-iceblue.com/forum/spire-presentation-new-release-or-hotfix-t4736.html

Published in Spire.Presentation for .NET

Wednesday, 21 May 2025 03:34

Spire.XLS for Java 15.5.1 enhances the conversion from Excel to HTML

We are delighted to announce the release of Spire.XLS for Java 15.5.1. This version enhances the conversion from Excel to HTML. Besides, some known issues are fixed successfully in this version, such as the issue that the program threw an exception when calling Worksheet.findAllString for specific documents. More details are listed below.

Here is a list of changes made in this release

Category	ID	Description
Bug	SPIREXLS-5737	Fixes the issue that the program threw an exception when calling Worksheet.findAllString for specific documents.
Bug	SPIREXLS-5750	Fixes the issue that the images shifted upward when converting specific Excel files to HTML.
Bug	SPIREXLS-5765	Fixes the issue that the program threw “ArrayIndexOutOfBoundsException” when loading specific Excel files.
Bug	SPIREXLS-5773	Optimizes the issue that the time consuming for specific Excel to HTML conversion was too much.
Bug	SPIREXLS-5786	Fixes the issue that the program threw “Invalid end column index” when converting specific Excel files to HTML.

Click the link to download Spire.XLS for Java 15.5.1:

https://www.e-iceblue.com/Download/xls-for-java.html

Published in Spire.XLS for Java

Monday, 19 May 2025 03:43

How to Convert PDF to CSV in Python (Fast & Accurate Table Extraction)

Comprehensive Guide for Converting PDF to CSV by Extracting Tables Using Python

Working with PDFs that contain tables, reports, or invoice data? Manually copying that information into spreadsheets is slow, error-prone, and just plain frustrating. Fortunately, there's a smarter way: you can convert PDF to CSV in Python automatically — making your data easy to analyze, import, or automate.

In this guide, you’ll learn how to use Python for PDF to CSV conversion by directly extracting tables with Spire.PDF for Python — a pure Python library that doesn’t require any external tools.

✅ No Adobe or third-party tools required

✅ High-accuracy table recognition

✅ Ideal for structured data workflows

In this guide, we’ll cover:

Convert PDF to CSV in Python Using Table Extraction
Related Use Cases
Why Use Spire.PDF for PDF to CSV Conversion in Python?
Frequently Asked Questions

Convert PDF to CSV in Python Using Table Extraction

The best way to convert PDF to CSV using Python is by extracting tables directly — no need for intermediate formats like Excel. This method is fast, clean, and highly effective for documents with structured data such as invoices, bank statements, or reports. It gives you usable CSV output with minimal code and high accuracy, making it ideal for automation and data analysis workflows.

Step 1: Install Spire.PDF for Python

Before writing code, make sure to install the required library. You can install Spire.PDF for Python via pip:

pip install spire.pdf

You can also install Free Spire.PDF for Python if you're working on smaller tasks:

pip install spire.pdf.free

Step 2: Python Code — Extract Table from PDF and Save as CSV

Python

from spire.pdf import PdfDocument, PdfTableExtractor
import csv
import os

# Load the PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a table extractor
extractor = PdfTableExtractor(pdf)

# Ensure output directory exists
os.makedirs("output/Tables", exist_ok=True)

# Loop through each page in the PDF
for page_index in range(pdf.Pages.Count):
    # Extract tables on the current page
    tables = extractor.ExtractTable(page_index)
    for table_index, table in enumerate(tables):
        table_data = []

        # Extract all rows and columns
        for row in range(table.GetRowCount()):
            row_data = []
            for col in range(table.GetColumnCount()):
                # Get cleaned cell text
                cell_text = table.GetText(row, col).replace("\n", "").strip()
                row_data.append(cell_text)
            table_data.append(row_data)

        # Write the table to a CSV file
        output_path = os.path.join("output", "Tables", f"Page{page_index + 1}-Table{table_index + 1}.csv")
        with open(output_path, "w", newline="", encoding="utf-8") as csvfile:
            writer = csv.writer(csvfile)
            writer.writerows(table_data)

# Release PDF resources
pdf.Dispose()

The conversion result:

The Result of Converting PDF to CSV with Python Using Spire.PDF

What is PdfTableExtractor?

PdfTableExtractor is a utility class provided by Spire.PDF for Python that detects and extracts table structures from PDF pages. Unlike plain text extraction, it maintains the row-column alignment of tabular data, making it ideal for converting PDF tables to CSV with clean structure.

Best for:

PDFs with structured tabular data
Automated Python PDF to CSV conversion
Fast Python-based data workflows

Relate Article: How to Convert PDFs to Excel XLSX Files with Python

Related Use Cases

If your PDF doesn't contain traditional tables — such as when it's formatted as paragraphs, key-value pairs, or scanned as an image — the following approaches can help you convert such PDFs to CSV using Python effectively:

Extract non-tabular PDF text and save to CSV

Useful when data is in paragraph or report form — format it into table-like CSV using Python logic.

Use OCR to extract tables from scanned PDF in Python

Perfect for image-based PDFs — use OCR to detect and export tables to CSV.

Why Choose Spire.PDF for Python?

Spire.PDF for Python is a robust PDF SDK tailored for developers. Whether you're building automated reports, analytics tools, or ETL pipelines — it just works.

Key Benefits:

Accurate Table Recognition

Smartly extracts structured data from tables

Pure Python, No Adobe Needed

Lightweight and dependency-free

Multi-Format Support

Also supports conversion to text, images, Excel, and more

Frequently Asked Questions

Can I convert PDF to CSV using Python?

Yes, you can convert PDF to CSV in Python using Spire.PDF. It supports both direct table extraction to CSV and an optional workflow that converts PDFs to Excel first. No Adobe Acrobat or third-party tools are required.

What's the best way to extract tables from PDFs in Python?

The most efficient way is using Spire.PDF’s PdfTableExtractor class. It automatically detects tables on each page and lets you export structured data to CSV with just a few lines of Python code — ideal for invoices, reports, and automated processing.

Why would I convert PDF to Excel before CSV?

You might convert PDF to Excel first if the layout is complex or needs manual review. This gives you more control over formatting and cleanup before saving as CSV, but it's slower than direct extraction and not recommended for automation workflows.

Does Spire.PDF work without Adobe Acrobat?

Yes. Spire.PDF for Python is a 100% standalone library that doesn’t rely on Adobe Acrobat or any external software. It's a pure Python solution for converting, extracting, and manipulating PDF content programmatically.

Conclusion

Converting PDF to CSV in Python doesn’t have to be a hassle. With Spire.PDF for Python, you can:

Automatically extract structured tables to CSV
Build seamless, automated workflows in Python
Handle both native PDFs and scanned ones (with OCR)

Get a Free License

Spire.PDF for Python offers a free edition suitable for basic tasks. If you need access to more features, you can also apply for a free license for evaluation use. Simply submit a request, and a license key will be sent to your email after approval.

Published in Conversion

Tagged under

pdf Python Conversion

Friday, 16 May 2025 10:01

How to Filter Excel Pivot Tables with Python: Step-by-Step Guide

filter excel pivot table data in python

Introduction

Pivot Tables in Excel are versatile tools that enable efficient data summarization and analysis. They allow users to explore data, uncover insights, and generate reports dynamically. One of the most powerful features of Pivot Tables is filtering, which lets users focus on specific data subsets without altering the original data structure.

What This Tutorial Covers

This tutorial provides a detailed, step-by-step guide on how to programmatically apply various types of filters to a Pivot Table in Excel using Python with the Spire.XLS for Python library. It covers the following topics:

Benefits of Filtering Data in Pivot Tables
Install Python Excel Library – Spire.XLS for Python
Add Report Filter to Pivot Table
Apply Row Field Filter in Pivot Table
Apply Column Field Filter in Pivot Table
FAQs
Conclusion

Benefits of Filtering Data in Pivot Tables

Filtering is an essential feature of Pivot Tables that provides the following benefits:

Enhanced Data Analysis: Quickly focus on specific segments or categories of your data to draw meaningful insights.
Dynamic Data Updates: Filters automatically adjust to reflect changes when the underlying data is refreshed, keeping your analysis accurate.
Improved Data Organization: Display only relevant data in your reports without altering or deleting the original dataset, preserving data integrity.

Install Python Excel Library – Spire.XLS for Python

Before working with Pivot Tables in Excel using Python, ensure the Spire.XLS for Python library is installed. The quickest way to do this is using pip, Python’s package manager. Simply run the following command in your terminal or command prompt:

pip install spire.xls

Once installed, you’re ready to start automating Pivot Table filtering in your Python projects.

Add Report Filter to Pivot Table

A report filter allows you to filter the entire Pivot Table based on a particular field and value. This type of filter is useful when you want to display data for a specific category or item globally across the Pivot Table, without changing the layout.

Steps to Add a Report Filter

Initialize the Workbook: Create a Workbook object to manage your Excel file.
Load the Excel File: Use Workbook.LoadFromFile() to load an existing file containing a Pivot Table.
Access the Worksheet: Use Workbook.Worksheets[] to select the desired worksheet.
Locate the Pivot Table: Use Worksheet.PivotTables[] to access the specific Pivot Table.
Define the Report Filter: Create a PivotReportFilter object specifying the field to filter.
Apply the Report Filter: Add the filter to the Pivot Table using XlsPivotTable.ReportFilters.Add().
Save the Updated File: Use Workbook.SaveToFile() to save your changes.

Code Example

Python

from spire.xls import *

# Create an object of the Workbook class
workbook = Workbook()

# Load an existing Excel file containing a Pivot Table
workbook.LoadFromFile("Sample.xlsx")

# Access the first worksheet
sheet = workbook.Worksheets[0]

# Access the first Pivot Table in the worksheet
pt = sheet.PivotTables[0]

# Create a report filter for the field "Product"
reportFilter = PivotReportFilter("Product", True)

# Add the report filter to the pivot table
pt.ReportFilters.Add(reportFilter)

# Save the updated workbook to a new file
workbook.SaveToFile("AddReportFilter.xlsx", FileFormat.Version2016)
workbook.Dispose()

add report filter to excel pivot table in python

Apply Row Field Filter in Pivot Table

Row field filters allow you to filter data displayed in the row fields of an Excel Pivot Table. These filters can be based on labels (specific text values) or values (numeric criteria).

Steps to Add a Row Field Filter

Initialize the Workbook: Create a Workbook object to manage the Excel file.
Load the Excel File: Use Workbook.LoadFromFile() to load your target file containing a Pivot Table.
Access the Worksheet: Select the desired worksheet using Workbook.Worksheets[].
Locate the Pivot Table: Access the specific Pivot Table using Worksheet.PivotTables[].
Add a Row Field Filter: Add a label filter or value filter using
XlsPivotTable.RowFields[].AddLabelFilter() or

XlsPivotTable.RowFields[].AddValueFilter().
Calculate Pivot Table Data: Use XlsPivotTable.CalculateData() to calculate the pivot table data.
Save the Updated File: Save your changes using Workbook.SaveToFile().

Code Example

Python

from spire.xls import *

# Create an object of the Workbook class
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile("Sample.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Get the first pivot table
pt = sheet.PivotTables[0]

# Add a value filter to the first row field in the pivot table
pt.RowFields[0].AddValueFilter(PivotValueFilterType.GreaterThan, pt.DataFields[0], Int32(5000), None)

# Or add a label filter to the first row field in the pivot table
# pt.RowFields[0].AddLabelFilter(PivotLabelFilterType.Equal, "Mike", None)

# Calculate the pivot table data
pt.CalculateData()

# Save the resulting file
workbook.SaveToFile("AddRowFieldFilter.xlsx", FileFormat.Version2016)
workbook.Dispose()

filter row field data in excel pivot table with python

Apply Column Field Filter in Pivot Table

Column field filters in Excel Pivot Tables allow you to filter data displayed in the column fields. Similar to row field filters, column field filters can be based on labels (text values) or values (numeric criteria).

Steps to Add Column Field Filter

Initialize the Workbook: Create a Workbook object to manage your Excel file.
Load the Excel File: Use Workbook.LoadFromFile() to open your file containing a Pivot Table.
Access the Worksheet: Select the target worksheet using Workbook.Worksheets[].
Locate the Pivot Table: Use Worksheet.PivotTables[] to access the desired Pivot Table.
Add a Column Field Filter: Add a label filter or value filter using
XlsPivotTable.ColumnFields[].AddLabelFilter() or

XlsPivotTable.ColumnFields[].AddValueFilter().
Calculate Pivot Table Data: Use XlsPivotTable.CalculateData() to calculate the Pivot Table data.
Save the Updated File: Save your changes using Workbook.SaveToFile().

Code Example

Python

from spire.xls import *

# Create an object of the Workbook class
workbook = Workbook()

# Load the Excel file
workbook.LoadFromFile("Sample.xlsx")

# Access the first worksheet
sheet = workbook.Worksheets[0]

# Access the first Pivot Table
pt = sheet.PivotTables[0]

# Add a label filter to the first column field
pt.ColumnFields[0].AddLabelFilter(PivotLabelFilterType.Equal, String("Laptop"), None)

# # Or add a value filter to the first column field
# pt.ColumnFields[0].AddValueFilter(PivotValueFilterType.Between, pt.DataFields[0], Int32(5000), Int32(10000))

# Calculate the pivot table data
pt.CalculateData()

# Save the updated workbook
workbook.SaveToFile("AddColumnFieldFilter.xlsx", FileFormat.Version2016)
workbook.Dispose()

filter column field data in excel pivot table with python

Conclusion

Filtering Pivot Tables in Excel is crucial for effective data analysis, allowing users to zoom in on relevant information without disrupting the table’s structure. Using Spire.XLS for Python, developers can easily automate adding, modifying, and managing filters on Pivot Tables programmatically. This tutorial covered the primary filter types—report filters, row field filters, and column field filters—with detailed code examples to help you get started quickly.

FAQs

Q: Can I add multiple filters to the same Pivot Table?

A: Yes, you can add multiple report filters, row filters, and column filters simultaneously to customize your data views with Spire.XLS.

Q: Do filters update automatically if the source data changes?

A: Yes, after refreshing the Pivot Table or recalculating with CalculateData(), filters will reflect the latest data.

Q: Can I filter based on custom conditions?

A: Spire.XLS supports many filter types including label filters (equals, begins with, contains) and value filters (greater than, less than, between).

Q: Is it possible to remove filters programmatically?

A: Yes, you can remove filters by clearing or resetting the respective filter collections or fields.

Get a Free License

To fully experience the capabilities of Spire.XLS for .NET without any evaluation limitations, you can request a free 30-day trial license.

Published in Pivot Table

Tagged under

xls Python Pivot Table

Friday, 16 May 2025 09:01

Extract Background Images from PowerPoint Presentations using C# .NET

PowerPoint presentations often contain background images that enhance the visual appeal of slides. Extracting these background images can be crucial for designers and content managers who wish to reuse, analyze, or archive slide visuals independently of the slide content.

This guide provides a clear, step-by-step approach to extracting background images from PowerPoint presentations in .NET using C# and the Spire.Presentation for .NET library.

Table of Contents

Why Extract Background Images from PowerPoint
Install .NET PowerPoint Library – Spire.Presentation for .NET
Extract Background Images from PowerPoint in .NET using C#
- Extract Background Images from Individual Slides
- Extract Background Images from Slide Masters
FAQs
Conclusion

Why Extract Background Images from PowerPoint

Extracting background images from PowerPoint provides several key benefits:

Reuse Designs: Repurpose background images in other presentations or design projects.
Analyze Slides: Review and understand slide designs by examining background images separately.
Archive Visuals: Store background images for documentation, backup, or future use.

Install .NET PowerPoint Library – Spire.Presentation for .NET

Spire.Presentation for .NET is a robust .NET PowerPoint library that enables developers to create, manipulate, and convert PowerPoint presentations without the need for Microsoft PowerPoint.

Key Features of Spire.Presentation for .NET

Here are some key features of Spire.Presentation for .NET:

Create and edit PowerPoint presentations.
Convert PowerPoint to other formats such as PDF, Images, HTML, Markdown, and XPS.
Secure PowerPoint presentations
Merge/split PowerPoint presentations.
Slides management, including adding/removing slides, setting/extracting/removing backgrounds, and more.
Image/shape/chart/smartart insertion and manipulation.
Animate text/shapes.

Install Spire.Presentation for .NET

Before starting the background image extraction process, you will need to install Spire.Presentation for .NET into your C# project using one of the following methods:

Option1. Install via NuGet (Recommended)

Install-Package Spire.Presentation

Option 2: Manually Add DLLs to Your Project

Download the Spire.Presentation package and extract the files.
In Visual Studio, right-click References > Add Reference > Browse, then select the appropriate Spire.Presentation.dll based on your target framework.

Extract Background Images from PowerPoint in .NET using C#

Background images in PowerPoint can be applied directly to individual slides or inherited from slide masters. This section demonstrates how to extract both types of background images using Spire.Presentation.

Extract Background Images from Individual Slides

To extract background images from individual slides in PowerPoint, follow these steps:

Create a Presentation object and load the presentation.
Loop through all slides in the presentation.
Check if the slide’s background fill type is image fill (FillFormatType.Picture).
If yes, retrieve and save the background image.

Sample Code

C#

using Spire.Presentation;
using Spire.Presentation.Drawing;
using System.IO;

namespace ExtractSlideBackgroundImages
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Specify the input file path and output folder
            string inputFile = @"example1.pptx";
            string outputFolder = @"ExtractedBackgrounds\Slides";

            // Load the PowerPoint presentation
            Presentation presentation = new Presentation();
            presentation.LoadFromFile(inputFile);

            // Create the output folder
            Directory.CreateDirectory(outputFolder);

            // Loop through all slides
            for (int i = 0; i < presentation.Slides.Count; i++)
            {
                // Check if the slide's background fill type is image fill
                var fill = presentation.Slides[i].SlideBackground.Fill;
                if (fill.FillType == FillFormatType.Picture)
                {
                    // Extract and save the background image
                    var image = fill.PictureFill.Picture.EmbedImage;
                    if (image != null)
                    {
                        string outputPath = Path.Combine(outputFolder, $"SlideBackground_{i + 1}.png");
                        image.Image.Save(outputPath, ImageFormat.Png);
                    }
                }
            }
        }
    }
}

Extract Background Images from Slide Masters

Slide masters define the design and layout of slides, including background images. To extract background images from slide masters:

Create a Presentation object and load the presentation.
Loop through all slide masters in the presentation.
For each master, check if its background fill type is image fill.
If yes, extract and save the background image.

Sample Code

C#

using Spire.Presentation;
using Spire.Presentation.Drawing;
using System.Drawing.Imaging;
using System.IO;

namespace ExtractBackgroundImages
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Specify the input file path and output folder
            string inputFile = @"example2.pptx";
            string outputFolder = @"C:\ExtractedBackgrounds\Masters";

            // Load the PowerPoint presentation
            Presentation presentation = new Presentation();
            presentation.LoadFromFile(inputFile);

            // Create the output folder
            Directory.CreateDirectory(outputFolder);

            // Loop through all slide masters
            for (int i = 0; i < presentation.Masters.Count; i++)
            {
                // Check if the slide master's background fill type is image fill
                var fill = presentation.Masters[i].SlideBackground.Fill;
                if (fill.FillType == FillFormatType.Picture)
                {
                    // Extract and save the background image
                    var image = fill.PictureFill.Picture.EmbedImage;
                    if (image != null)
                    {
                        string outputPath = Path.Combine(outputFolder, $"MasterBackground_{i + 1}.png");
                        image.Image.Save(outputPath, ImageFormat.Png);
                    }
                }
            }

        }
    }
}

Conclusion

Extracting background images from PowerPoint presentations is a crucial technique for developers and designers who want to access slide visuals independently of content. By leveraging the Spire.Presentation for .NET library with C#, you can programmatically extract background images from both individual slides and slide masters with ease.

FAQs

Q: What image formats are supported for extraction?

A: Extracted images can be saved in PNG, JPEG, BMP, or other formats supported by .NET.

Q: In addition to background images, can I extract other images from PowerPoint slides?

A: Yes, you can extract other images, such as those embedded within slide content or shapes, using Spire.Presentation.

Q: Does Spire.Presentation supports extracting text from PowerPoint presentations?

A: Yes, Spire.Presentation can also extract text from slides, including text in shapes, tables, and more.

Get a Free License

To fully experience the capabilities of Spire.Presentation for .NET without any evaluation limitations, you can request a free 30-day trial license.

Published in Document Operation

Tagged under

ppt net Document Operation

Friday, 16 May 2025 07:31

Spire.XLS for C++ 15.5.0 supports setting page size when converting Excel to PDF

We're pleased to announce the release of Spire.XLS for C++ 15.5.0. This release adds a list of new features, such as supports setting page size during Excel to PDF conversion, embedding images into cells, grouping shapes, and enabling revision mode. Details are shown below.

Here is a list of changes made in this release

Category	ID	Description
Adjustment	-	Upgrades SkiaSharp->3.116.1
New feature	-	Supports setting page size when converting Excel to PDF. intrusive_ptr workbook = new Workbook(); intrusive_ptr sheet = workbook->GetWorksheets()->Add(L""0""); intrusive_ptr a1 = sheet->GetRange(L""A1""); a1->SetValue(L""taatat""); sheet->GetPageSetup()->SetPaperSize(PaperSizeType::PaperA0); workbook->SaveToFile(outputFile.c_str(), ExcelVersion::Version2013); workbook->Dispose();
New feature	-	Supports grouping Shapes. intrusive_ptr book = new Workbook(); book->GetWorksheets()->Clear(); intrusive_ptr sheet = book->GetWorksheets()->Add(L""0""); intrusive_ptr shape1 = sheet->GetPrstGeomShapes()->AddPrstGeomShape(1, 3, 50, 50, PrstGeomShapeType::RoundRect); intrusive_ptr shape2 = sheet->GetPrstGeomShapes()->AddPrstGeomShape(5, 3, 50, 50, PrstGeomShapeType::Triangle); wcout << sheet->GetName() << endl; intrusive_ptr groupShapeCollection = sheet->GetGroupShapeCollection(); vector> shapes; shapes.push_back(shape1); shapes.push_back(shape2); groupShapeCollection->Group(shapes); book->SaveToFile(outputFile.c_str(), ExcelVersion::Version2013); book->Dispose();
New feature	-	Supports retrieving all shapes in an Excel document. intrusive_ptr book = new Workbook(); book->LoadFromFile(inputFile.c_str()); intrusive_ptr sheet = dynamic_pointer_cast(book->GetWorksheets()->Get(0)); std::vector> images; for (int i = 0; i < sheet->GetShapes()->GetCount(); ++i) { intrusive_ptr shape = dynamic_pointer_cast(sheet->GetShapes()->Get(i)); intrusive_ptr image = shape->SaveToImage(); shape->SaveToImage((outputFile + L""Bug_out_"" + std::to_wstring(i) + L"".png"").c_str()); }
New feature	-	Supports enabling revision mode. intrusive_ptr workbook = new Workbook(); workbook->LoadFromFile(inputFile.c_str()); workbook->TrackedChanges(true); workbook->SaveToFile(outputFile.c_str(), ExcelVersion::Version2013);
New feature	-	Supports embedding images into cells. intrusive_ptr workbook = new Workbook(); workbook->LoadFromFile(inputFile.c_str()); intrusive_ptr sheet = dynamic_pointer_cast(workbook->GetWorksheets()->Get(0)); intrusive_ptr range = dynamic_pointer_cast(sheet->GetRange(L""D1"")); intrusive_ptr fs = new Stream(inputFile_Img.c_str()); range->InsertOrUpdateCellImage(fs, true); workbook->SaveToFile(outputFile.c_str(), ExcelVersion::Version2013); workbook->Dispose();
New feature	-	Supports grouping PivotTables. intrusive_ptr workbook(new Workbook()); workbook->LoadFromFile(inputFile.c_str()); intrusive_ptr pivotSheet = workbook->GetWorksheets()->Get(0); intrusive_ptr pivot = dynamic_pointer_cast(pivotSheet->GetPivotTables()->Get(0)); intrusive_ptr dateBaseField = pivot->GetPivotFields()->Get(L""number""); dateBaseField->CreateGroup(3000, 3800, 1); pivot->CalculateData(); workbook->SaveToFile(outputFile.c_str());

Click the link below to download Spire.XLS for C++ 15.5.0:

https://www.e-iceblue.com/Download/xls-for-cpp.html

Published in Spire.XLS for CPP

Thursday, 15 May 2025 09:41

Spire.XLS for Python 15.5.0 supports grouping Shapes

We're pleased to announce the release of Spire.XLS for Python 15.5.0. This version supports grouping Shapes. Besides, some known bugs are fixed successfully, such as the issue where saving a chart to an image caused the program to report a "SpireException: Arg_InvalidCastException" error. More details are listed below.

Here is a list of changes made in this release

Category	ID	Description
New feature	SPIREXLS-4191	Supports grouping Shapes. workbook = Workbook() workbook.LoadFromFile(inputFile) worksheet = workbook.Worksheets[0] shape1 = worksheet.PrstGeomShapes[0] shape2 = worksheet.PrstGeomShapes[1] groupShapeCollection = worksheet.GetGroupShapeCollection() groupShapeCollection.Group([shape1, shape2]) workbook.SaveToFile(outputFile, ExcelVersion.Version2013) workbook.Dispose()
New feature	SPIREXLS-4460	Supports replacing the font. wb = Workbook() wb.LoadFromFile(inputFile) sheet = wb.Worksheets[0] # Create a cell style and set the color style = wb.Styles.Add("Mystyle") style.Font.FontName = "Arial" style2 = wb.Styles.Add("Mystyle1") style2.Font.FontName = "Times New Roman" style3 = wb.Styles.Add("Mystyle2") style3.Font.FontName = "Calibri" wb.Worksheets[0].ReplaceAll("Brillux", style,"e-iceblue", style2) wb.Worksheets[0].ReplaceAll("3342sdf",style, "text", style3) wb.SaveToFile(outputFile) wb.Dispose()
New feature	SPIREXLS-4857	Supported customizing PivotTable field names. workbook = Workbook() workbook.LoadFromFile(inputFile) sheet = workbook.Worksheets.get_Item("Tabular Pivot") table = sheet.PivotTables.get_Item(0) table.PivotFields[1].CustomName = "fieldName1" table.PivotFields[2].CustomName = "fieldName2" table.RowFields[0].CustomName = "rowName" table.ColumnFields[0].CustomName = "colName" table.PivotFields[0].CustomName = "fieldName0" table.DataFields[0].CustomName = "DataName" table.CalculateData() workbook.SaveToFile(outputFile, ExcelVersion.Version2013) workbook.Dispose()
New feature	SPIREXLS-5254	Supports enabling revision mode. workbook = Workbook() workbook.LoadFromFile(inputFile) workbook.TrackedChanges(True); workbook.SaveToFile(outputFile, ExcelVersion.Version2013); workbook.Dispose()
New feature	SPIREXLS-5482	Supports embedding images into cells. workbook = Workbook() workbook.LoadFromFile(inputFile) sheet = workbook.Worksheets[0] sheet.Range["D1"].InsertOrUpdateCellImage(inputFile_Img,True) workbook.SaveToFile(outputFile,ExcelVersion.Version2013) workbook.Dispose()
Adjustment	/	Upgrades SkiaSharp->3.116.1.
Adjustment	/	Adding support for the Linux ARM platform.
Bug	SPIREXLS-5643 SPIREXLS-5677	Fixes the issue where saving a chart to an image caused the program to report a "SpireException: Arg_InvalidCastException" error.
Bug	SPIREXLS-5755	Fixes the issue where loading a specific XLSM document caused the program to report a "SpireException: Arg_NullReferenceException" error.
Bug	SPIREXLS-5783	Fixes the issue where loading a specific XLSX document caused the program to report an "Arg_IndexOutOfRangeException" error.

Click the link to download Spire.XLS for Python 15.5.0:

https://www.e-iceblue.com/Download/Spire-XLS-Python.html

Published in Spire.XLS for Python

Thursday, 15 May 2025 08:33

How to Convert TXT to Excel in C#

In data processing and management scenarios, efficiently transforming raw text (TXT) files into structured Excel spreadsheets is a common requirement. For developers who are automating reports or processing log files, converting TXT to Excel using C# streamlines data organization and analysis. This guide explores how to achieve this using Spire.XLS for .NET, a powerful library designed to handle Excel XLS or XLSX files without requiring Microsoft Office.

Why Convert TXT to Excel Programmatically?
How to Convert Text Files to Excel in C# (Step-by-Step Guide)
- Install Spire.XLS for .NET
- Import a Text File into Excel Using C#
Pro Tips for TXT to Excel Conversion

Why Convert TXT to Excel Programmatically?

Text files are simple but lack the analytical power of Excel. Key advantages of converting TXT to XLS or XLSX format include:

Automation: Process large or recurring files without manual intervention.
Data Structuring: Organize raw text into rows, columns, and sheets.
Advanced Features: Leverage Excel formulas, charts, and pivot tables.
Integration: Embed conversion feature into .NET applications or APIs.

How to Convert Text Files to Excel in C# (Step-by-Step Guide)

Install Spire.XLS for .NET

Spire.XLS for .NET is a professional Excel document processing component, provides efficient and convenient APIs that allow developers to achieve TXT to Excel conversion through simple code.

Before getting started, you can choose one of these methods to install the library:

Method 1: NuGet Package Manager

Open your project in Visual Studio.
Right-click on the project in the Solution Explorer and select "Manage NuGet Packages."
Search for "Spire.XLS" and click "Install".

Method 2: Package Manager Console

Go to "Tools > NuGet Package Manager > Package Manager Console."
Run the following command in the console:
PM> Install-Package Spire.XLS

Method 3: Manual Installation with DLL Files

Visit the Spire.XLS Download Page and get the latest version.
Extract the files and then add the Spire.Xls.dll to your project.

Import a Text File into Excel Using C#

Follow the below steps to write the data in a txt file into an Excel worksheet:

Read TXT File: use the File.ReadAllLines() method to read all lines in a text file and returns them as an array of strings.
Parse each line:
- Use the string.Trim() method to remove the leading/trailing whitespaces.
- Use the string.Split() method to split the data based on specified delimiters.
- Add the split text data to a list.
Create a Workbook instance and get a worksheet
Write Data to specified cells:
- Iterate through the rows and columns in the list.
- Assign the data in the list to the corresponding Excel cells through the Worksheet.Range[].Value property.
Save the Excel File.

Code Example:

C#

using Spire.Xls;
using System.IO;
using System.Collections.Generic;

class TxtToExcelConverter
{
    static void Main()
    {
        // Open a text file and read all lines in it
        string[] lines = File.ReadAllLines("Data.txt");

        // Create a list to store the data in text file
        List data = new List();

        // Split data into rows and columns and add to the list
        foreach (string line in lines)
        {
            data.Add(line.Trim().Split('\t')); // Adjust delimiter as needed
        }

        // Create a Workbook object
        Workbook workbook = new Workbook();

        // Get the first worksheet
        Worksheet sheet = workbook.Worksheets[0];

        // Iterate through the rows and columns in the data list
        for (int row = 0; row < data.Count; row++)
        {
            for (int col = 0; col < data[row].Length; col++)
            {
                // Write the text data in specified cells
                sheet.Range[row + 1, col + 1].Value = data[row][col];

                // Set the header row to Bold
                sheet.Range[1, col + 1].Style.Font.IsBold = true;
            }
        }

        // Autofit columns
        sheet.AllocatedRange.AutoFitColumns();

        // Save the Excel file
        workbook.SaveToFile("TXTtoExcel.xlsx", ExcelVersion.Version2016);
        workbook.Dispose();
    }
}

Result:

Import a text file to an Excel file.

Pro Tips for TXT to Excel Conversion

Handling Different Delimiters:

If your TXT file uses a different delimiter (e.g., space, comma, semicolon), modify the parameter of the Split(params char[] separator) method.

Format Cells:

After a text file being converted to an Excel file, you can take advantage of the Spire.XLS library’s rich features to format cells, such as setting the background colors, adding cell borders, applying number formats, etc.

Conclusion

By following this step-by-step guide, you can efficiently transform unstructured text data into organized Excel spreadsheets, which is ideal for data analysis, reporting, and management. Remember to optimize your implementation for your specific delimiters and leverage Spire.XLS's advanced features for complex conversion scenarios.

Get a Free License

To fully experience the capabilities of Spire.XLS for .NET without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion

Tagged under

xls net Conversion

Thursday, 15 May 2025 05:38

Spire.XLS for JavaScript 15.5.0 supports multiple new features

We are excited to announce the release of Spire.XLS for JavaScript 15.5.0. This version adds 19 new features, such as grouping Shapes, enabling revision mode, and embedding images into cells. Moreover, it also upgrades SkiaSharp and enhances the conversion from Shape to images. More details are listed below.

Here is a list of changes made in this release

Category	ID	Description
New feature	-	Supports 19 new features, including: Groupe Shapes. const workbook = wasmModule.Workbook.Create(); const sheet = workbook.Worksheets.get(0); const shape1 = sheet.PrstGeomShapes.AddPrstGeomShape(1, 3, 50, 50, spirexls.PrstGeomShapeType.RoundRect); const shape2 = sheet.PrstGeomShapes.AddPrstGeomShape(5, 3, 50, 50, spirexls.PrstGeomShapeType.Triangle); const groupShapeCollection = sheet.GroupShapeCollection; groupShapeCollection.Group([shape1, shape2]); workbook.SaveToFile(outputFileName,spirexls.ExcelVersion.Version2013); Enable revision mode. const workbook = wasmModule.Workbook.Create(); workbook.LoadFromFile(inputFileName); workbook.TrackedChanges = true; const outputFileName = "out.xlsx"; workbook.SaveToFile(outputFileName, spirexls.ExcelVersion.Version2013); Embed images into cells. const workbook = wasmModule.Workbook.Create(); workbook.LoadFromFile(inputFileName); const params = { fileName: imageName, scale: true } const sheet = workbook.Worksheets.get(0); sheet.Range.get("B1").InsertOrUpdateCellImage(params); const outputFileName="out.xlsx"; workbook.SaveToFile(outputFileName);
Adjustment	-	Upgrades SkiaSharp to version 3.116.1.
Bug	SPIREXLS-5726	Fixes the issue that the text overflowed when converting Shape to images.

Click the link below to download Spire.XLS for JavaScript 15.5.0:

https://www.e-iceblue.com/Download/Spire-XLS-JavaScript.html

Published in Spire.XLS for Javascript