Spire.Office Knowledgebase Page 11 | E-iceblue

Comprehensive Guide for Converting PDF to CSV by Extracting Tables Using Python

Working with PDFs that contain tables, reports, or invoice data? Manually copying that information into spreadsheets is slow, error-prone, and just plain frustrating. Fortunately, there's a smarter way: you can convert PDF to CSV in Python automatically — making your data easy to analyze, import, or automate.

In this guide, you’ll learn how to use Python for PDF to CSV conversion by directly extracting tables with Spire.PDF for Python — a pure Python library that doesn’t require any external tools.

✅ No Adobe or third-party tools required

✅ High-accuracy table recognition

✅ Ideal for structured data workflows

In this guide, we’ll cover:

Convert PDF to CSV in Python Using Table Extraction

The best way to convert PDF to CSV using Python is by extracting tables directly — no need for intermediate formats like Excel. This method is fast, clean, and highly effective for documents with structured data such as invoices, bank statements, or reports. It gives you usable CSV output with minimal code and high accuracy, making it ideal for automation and data analysis workflows.

Step 1: Install Spire.PDF for Python

Before writing code, make sure to install the required library. You can install Spire.PDF for Python via pip:

pip install spire.pdf

You can also install Free Spire.PDF for Python if you're working on smaller tasks:

pip install spire.pdf.free

Step 2: Python Code — Extract Table from PDF and Save as CSV

  • Python
from spire.pdf import PdfDocument, PdfTableExtractor
import csv
import os

# Load the PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a table extractor
extractor = PdfTableExtractor(pdf)

# Ensure output directory exists
os.makedirs("output/Tables", exist_ok=True)

# Loop through each page in the PDF
for page_index in range(pdf.Pages.Count):
    # Extract tables on the current page
    tables = extractor.ExtractTable(page_index)
    for table_index, table in enumerate(tables):
        table_data = []

        # Extract all rows and columns
        for row in range(table.GetRowCount()):
            row_data = []
            for col in range(table.GetColumnCount()):
                # Get cleaned cell text
                cell_text = table.GetText(row, col).replace("\n", "").strip()
                row_data.append(cell_text)
            table_data.append(row_data)

        # Write the table to a CSV file
        output_path = os.path.join("output", "Tables", f"Page{page_index + 1}-Table{table_index + 1}.csv")
        with open(output_path, "w", newline="", encoding="utf-8") as csvfile:
            writer = csv.writer(csvfile)
            writer.writerows(table_data)

# Release PDF resources
pdf.Dispose()

The conversion result:

The Result of Converting PDF to CSV with Python Using Spire.PDF

What is PdfTableExtractor?

PdfTableExtractor is a utility class provided by Spire.PDF for Python that detects and extracts table structures from PDF pages. Unlike plain text extraction, it maintains the row-column alignment of tabular data, making it ideal for converting PDF tables to CSV with clean structure.

Best for:

  • PDFs with structured tabular data
  • Automated Python PDF to CSV conversion
  • Fast Python-based data workflows

Relate Article: How to Convert PDFs to Excel XLSX Files with Python

Related Use Cases

If your PDF doesn't contain traditional tables — such as when it's formatted as paragraphs, key-value pairs, or scanned as an image — the following approaches can help you convert such PDFs to CSV using Python effectively:

Useful when data is in paragraph or report form — format it into table-like CSV using Python logic.

Perfect for image-based PDFs — use OCR to detect and export tables to CSV.

Why Choose Spire.PDF for Python?

Spire.PDF for Python is a robust PDF SDK tailored for developers. Whether you're building automated reports, analytics tools, or ETL pipelines — it just works.

Key Benefits:

  • Accurate Table Recognition

Smartly extracts structured data from tables

  • Pure Python, No Adobe Needed

Lightweight and dependency-free

  • Multi-Format Support

Also supports conversion to text, images, Excel, and more

Frequently Asked Questions

Can I convert PDF to CSV using Python?

Yes, you can convert PDF to CSV in Python using Spire.PDF. It supports both direct table extraction to CSV and an optional workflow that converts PDFs to Excel first. No Adobe Acrobat or third-party tools are required.

What's the best way to extract tables from PDFs in Python?

The most efficient way is using Spire.PDF’s PdfTableExtractor class. It automatically detects tables on each page and lets you export structured data to CSV with just a few lines of Python code — ideal for invoices, reports, and automated processing.

Why would I convert PDF to Excel before CSV?

You might convert PDF to Excel first if the layout is complex or needs manual review. This gives you more control over formatting and cleanup before saving as CSV, but it's slower than direct extraction and not recommended for automation workflows.

Does Spire.PDF work without Adobe Acrobat?

Yes. Spire.PDF for Python is a 100% standalone library that doesn’t rely on Adobe Acrobat or any external software. It's a pure Python solution for converting, extracting, and manipulating PDF content programmatically.

Conclusion

Converting PDF to CSV in Python doesn’t have to be a hassle. With Spire.PDF for Python, you can:

  • Automatically extract structured tables to CSV
  • Build seamless, automated workflows in Python
  • Handle both native PDFs and scanned ones (with OCR)

Get a Free License

Spire.PDF for Python offers a free edition suitable for basic tasks. If you need access to more features, you can also apply for a free license for evaluation use. Simply submit a request, and a license key will be sent to your email after approval.

filter excel pivot table data in python

Introduction

Pivot Tables in Excel are versatile tools that enable efficient data summarization and analysis. They allow users to explore data, uncover insights, and generate reports dynamically. One of the most powerful features of Pivot Tables is filtering, which lets users focus on specific data subsets without altering the original data structure.

What This Tutorial Covers

This tutorial provides a detailed, step-by-step guide on how to programmatically apply various types of filters to a Pivot Table in Excel using Python with the Spire.XLS for Python library. It covers the following topics:

Benefits of Filtering Data in Pivot Tables

Filtering is an essential feature of Pivot Tables that provides the following benefits:

  • Enhanced Data Analysis: Quickly focus on specific segments or categories of your data to draw meaningful insights.
  • Dynamic Data Updates: Filters automatically adjust to reflect changes when the underlying data is refreshed, keeping your analysis accurate.
  • Improved Data Organization: Display only relevant data in your reports without altering or deleting the original dataset, preserving data integrity.

Install Python Excel Library – Spire.XLS for Python

Before working with Pivot Tables in Excel using Python, ensure the Spire.XLS for Python library is installed. The quickest way to do this is using pip, Python’s package manager. Simply run the following command in your terminal or command prompt:

pip install spire.xls

Once installed, you’re ready to start automating Pivot Table filtering in your Python projects.

Add Report Filter to Pivot Table

A report filter allows you to filter the entire Pivot Table based on a particular field and value. This type of filter is useful when you want to display data for a specific category or item globally across the Pivot Table, without changing the layout.

Steps to Add a Report Filter

  • Initialize the Workbook: Create a Workbook object to manage your Excel file.
  • Load the Excel File: Use Workbook.LoadFromFile() to load an existing file containing a Pivot Table.
  • Access the Worksheet: Use Workbook.Worksheets[] to select the desired worksheet.
  • Locate the Pivot Table: Use Worksheet.PivotTables[] to access the specific Pivot Table.
  • Define the Report Filter: Create a PivotReportFilter object specifying the field to filter.
  • Apply the Report Filter: Add the filter to the Pivot Table using XlsPivotTable.ReportFilters.Add().
  • Save the Updated File: Use Workbook.SaveToFile() to save your changes.

Code Example

  • Python
from spire.xls import *

# Create an object of the Workbook class
workbook = Workbook()

# Load an existing Excel file containing a Pivot Table
workbook.LoadFromFile("Sample.xlsx")

# Access the first worksheet
sheet = workbook.Worksheets[0]

# Access the first Pivot Table in the worksheet
pt = sheet.PivotTables[0]

# Create a report filter for the field "Product"
reportFilter = PivotReportFilter("Product", True)

# Add the report filter to the pivot table
pt.ReportFilters.Add(reportFilter)

# Save the updated workbook to a new file
workbook.SaveToFile("AddReportFilter.xlsx", FileFormat.Version2016)
workbook.Dispose()

add report filter to excel pivot table in python

Apply Row Field Filter in Pivot Table

Row field filters allow you to filter data displayed in the row fields of an Excel Pivot Table. These filters can be based on labels (specific text values) or values (numeric criteria).

Steps to Add a Row Field Filter

  • Initialize the Workbook: Create a Workbook object to manage the Excel file.
  • Load the Excel File: Use Workbook.LoadFromFile() to load your target file containing a Pivot Table.
  • Access the Worksheet: Select the desired worksheet using Workbook.Worksheets[].
  • Locate the Pivot Table: Access the specific Pivot Table using Worksheet.PivotTables[].
  • Add a Row Field Filter: Add a label filter or value filter using

    XlsPivotTable.RowFields[].AddLabelFilter() or

    XlsPivotTable.RowFields[].AddValueFilter().

  • Calculate Pivot Table Data: Use XlsPivotTable.CalculateData() to calculate the pivot table data.
  • Save the Updated File: Save your changes using Workbook.SaveToFile().

Code Example

  • Python
from spire.xls import *

# Create an object of the Workbook class
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile("Sample.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Get the first pivot table
pt = sheet.PivotTables[0]

# Add a value filter to the first row field in the pivot table
pt.RowFields[0].AddValueFilter(PivotValueFilterType.GreaterThan, pt.DataFields[0], Int32(5000), None)

# Or add a label filter to the first row field in the pivot table
# pt.RowFields[0].AddLabelFilter(PivotLabelFilterType.Equal, "Mike", None)

# Calculate the pivot table data
pt.CalculateData()

# Save the resulting file
workbook.SaveToFile("AddRowFieldFilter.xlsx", FileFormat.Version2016)
workbook.Dispose()

filter row field data in excel pivot table with python

Apply Column Field Filter in Pivot Table

Column field filters in Excel Pivot Tables allow you to filter data displayed in the column fields. Similar to row field filters, column field filters can be based on labels (text values) or values (numeric criteria).

Steps to Add Column Field Filter

  • Initialize the Workbook: Create a Workbook object to manage your Excel file.
  • Load the Excel File: Use Workbook.LoadFromFile() to open your file containing a Pivot Table.
  • Access the Worksheet: Select the target worksheet using Workbook.Worksheets[].
  • Locate the Pivot Table: Use Worksheet.PivotTables[] to access the desired Pivot Table.
  • Add a Column Field Filter: Add a label filter or value filter using

    XlsPivotTable.ColumnFields[].AddLabelFilter() or

    XlsPivotTable.ColumnFields[].AddValueFilter().

  • Calculate Pivot Table Data: Use XlsPivotTable.CalculateData() to calculate the Pivot Table data.
  • Save the Updated File: Save your changes using Workbook.SaveToFile().

Code Example

  • Python
from spire.xls import *

# Create an object of the Workbook class
workbook = Workbook()

# Load the Excel file
workbook.LoadFromFile("Sample.xlsx")

# Access the first worksheet
sheet = workbook.Worksheets[0]

# Access the first Pivot Table
pt = sheet.PivotTables[0]

# Add a label filter to the first column field
pt.ColumnFields[0].AddLabelFilter(PivotLabelFilterType.Equal, String("Laptop"), None)

# # Or add a value filter to the first column field
# pt.ColumnFields[0].AddValueFilter(PivotValueFilterType.Between, pt.DataFields[0], Int32(5000), Int32(10000))

# Calculate the pivot table data
pt.CalculateData()

# Save the updated workbook
workbook.SaveToFile("AddColumnFieldFilter.xlsx", FileFormat.Version2016)
workbook.Dispose()

filter column field data in excel pivot table with python

Conclusion

Filtering Pivot Tables in Excel is crucial for effective data analysis, allowing users to zoom in on relevant information without disrupting the table’s structure. Using Spire.XLS for Python, developers can easily automate adding, modifying, and managing filters on Pivot Tables programmatically. This tutorial covered the primary filter types—report filters, row field filters, and column field filters—with detailed code examples to help you get started quickly.

FAQs

Q: Can I add multiple filters to the same Pivot Table?

A: Yes, you can add multiple report filters, row filters, and column filters simultaneously to customize your data views with Spire.XLS.

Q: Do filters update automatically if the source data changes?

A: Yes, after refreshing the Pivot Table or recalculating with CalculateData(), filters will reflect the latest data.

Q: Can I filter based on custom conditions?

A: Spire.XLS supports many filter types including label filters (equals, begins with, contains) and value filters (greater than, less than, between).

Q: Is it possible to remove filters programmatically?

A: Yes, you can remove filters by clearing or resetting the respective filter collections or fields.

Get a Free License

To fully experience the capabilities of Spire.XLS for .NET without any evaluation limitations, you can request a free 30-day trial license.

PowerPoint presentations often contain background images that enhance the visual appeal of slides. Extracting these background images can be crucial for designers and content managers who wish to reuse, analyze, or archive slide visuals independently of the slide content.

This guide provides a clear, step-by-step approach to extracting background images from PowerPoint presentations in .NET using C# and the Spire.Presentation for .NET library.

Table of Contents

Why Extract Background Images from PowerPoint

Extracting background images from PowerPoint provides several key benefits:

  • Reuse Designs: Repurpose background images in other presentations or design projects.
  • Analyze Slides: Review and understand slide designs by examining background images separately.
  • Archive Visuals: Store background images for documentation, backup, or future use.

Install .NET PowerPoint Library – Spire.Presentation for .NET

Spire.Presentation for .NET is a robust .NET PowerPoint library that enables developers to create, manipulate, and convert PowerPoint presentations without the need for Microsoft PowerPoint.

Key Features of Spire.Presentation for .NET

Here are some key features of Spire.Presentation for .NET:

  • Create and edit PowerPoint presentations.
  • Convert PowerPoint to other formats such as PDF, Images, HTML, Markdown, and XPS.
  • Secure PowerPoint presentations
  • Merge/split PowerPoint presentations.
  • Slides management, including adding/removing slides, setting/extracting/removing backgrounds, and more.
  • Image/shape/chart/smartart insertion and manipulation.
  • Animate text/shapes.

Install Spire.Presentation for .NET

Before starting the background image extraction process, you will need to install Spire.Presentation for .NET into your C# project using one of the following methods:

Option1. Install via NuGet (Recommended)

Install-Package Spire.Presentation

Option 2: Manually Add DLLs to Your Project

  • Download the Spire.Presentation package and extract the files.
  • In Visual Studio, right-click References > Add Reference > Browse, then select the appropriate Spire.Presentation.dll based on your target framework.

Extract Background Images from PowerPoint in .NET using C#

Background images in PowerPoint can be applied directly to individual slides or inherited from slide masters. This section demonstrates how to extract both types of background images using Spire.Presentation.

Extract Background Images from Individual Slides

To extract background images from individual slides in PowerPoint, follow these steps:

  • Create a Presentation object and load the presentation.
  • Loop through all slides in the presentation.
  • Check if the slide’s background fill type is image fill (FillFormatType.Picture).
  • If yes, retrieve and save the background image.

Sample Code

  • C#
using Spire.Presentation;
using Spire.Presentation.Drawing;
using System.IO;

namespace ExtractSlideBackgroundImages
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Specify the input file path and output folder
            string inputFile = @"example1.pptx";
            string outputFolder = @"ExtractedBackgrounds\Slides";

            // Load the PowerPoint presentation
            Presentation presentation = new Presentation();
            presentation.LoadFromFile(inputFile);

            // Create the output folder
            Directory.CreateDirectory(outputFolder);

            // Loop through all slides
            for (int i = 0; i < presentation.Slides.Count; i++)
            {
                // Check if the slide's background fill type is image fill
                var fill = presentation.Slides[i].SlideBackground.Fill;
                if (fill.FillType == FillFormatType.Picture)
                {
                    // Extract and save the background image
                    var image = fill.PictureFill.Picture.EmbedImage;
                    if (image != null)
                    {
                        string outputPath = Path.Combine(outputFolder, $"SlideBackground_{i + 1}.png");
                        image.Image.Save(outputPath, ImageFormat.Png);
                    }
                }
            }
        }
    }
}

Extract Background Images from Slide Masters

Slide masters define the design and layout of slides, including background images. To extract background images from slide masters:

  • Create a Presentation object and load the presentation.
  • Loop through all slide masters in the presentation.
  • For each master, check if its background fill type is image fill.
  • If yes, extract and save the background image.

Sample Code

  • C#
using Spire.Presentation;
using Spire.Presentation.Drawing;
using System.Drawing.Imaging;
using System.IO;

namespace ExtractBackgroundImages
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Specify the input file path and output folder
            string inputFile = @"example2.pptx";
            string outputFolder = @"C:\ExtractedBackgrounds\Masters";

            // Load the PowerPoint presentation
            Presentation presentation = new Presentation();
            presentation.LoadFromFile(inputFile);

            // Create the output folder
            Directory.CreateDirectory(outputFolder);

            // Loop through all slide masters
            for (int i = 0; i < presentation.Masters.Count; i++)
            {
                // Check if the slide master's background fill type is image fill
                var fill = presentation.Masters[i].SlideBackground.Fill;
                if (fill.FillType == FillFormatType.Picture)
                {
                    // Extract and save the background image
                    var image = fill.PictureFill.Picture.EmbedImage;
                    if (image != null)
                    {
                        string outputPath = Path.Combine(outputFolder, $"MasterBackground_{i + 1}.png");
                        image.Image.Save(outputPath, ImageFormat.Png);
                    }
                }
            }

        }
    }
}

Conclusion

Extracting background images from PowerPoint presentations is a crucial technique for developers and designers who want to access slide visuals independently of content. By leveraging the Spire.Presentation for .NET library with C#, you can programmatically extract background images from both individual slides and slide masters with ease.

FAQs

Q: What image formats are supported for extraction?

A: Extracted images can be saved in PNG, JPEG, BMP, or other formats supported by .NET.

Q: In addition to background images, can I extract other images from PowerPoint slides?

A: Yes, you can extract other images, such as those embedded within slide content or shapes, using Spire.Presentation.

Q: Does Spire.Presentation supports extracting text from PowerPoint presentations?

A: Yes, Spire.Presentation can also extract text from slides, including text in shapes, tables, and more.

Get a Free License

To fully experience the capabilities of Spire.Presentation for .NET without any evaluation limitations, you can request a free 30-day trial license.

page 11