Spire.Office Knowledgebase Page 49 | E-iceblue

Python: Split Word Documents

2024-03-27 01:03:24 Written by Koohji

Efficiently managing Word documents often requires the task of splitting them into smaller sections. However, manually performing this task can be time-consuming and labor-intensive. Fortunately, Spire.Doc for Python provides a convenient and efficient way to programmatically segment Word documents, helping users to extract specific parts of a document, split lengthy documents into smaller chunks, and streamline data extraction. This article demonstrates how to use Spire.Doc for Python to split a Word document into multiple documents in Python.

The splitting of a Word document is typically done by page breaks and section breaks due to the dynamic nature of document content. Therefore, this article focuses on the following two parts:

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows

Split a Word Document by Page Breaks with Python

Page breaks allow for the forced pagination of a document, thereby achieving a fixed division of content. By using page breaks as divisions, we can split a Word document into smaller content-related documents. The detailed steps for splitting a Word document by page breaks are as follows:

  • Create an instance of Document class and load a Word document using Document.LoadFromFile() method.
  • Create a new document, add a section to it using Document.AddSection() method.
  • Iterate through all body child objects in each section in the original document and check if the child object is a paragraph or a table.
  • If the child object is a table, add it to the section in the new document using Section.Body.ChildObjects.Add() method.
  • If the child object is a paragraph, add the paragraph object to the section in the new document. Then, iterate through all child objects of the paragraph and check if a child object is a page break.
  • If the child object in the paragraph is a page break, get its index using Paragraph.ChildObjects.IndexOf() method and remove it from the paragraph by its index.
  • Save the new document using Document.SaveToFile() method and repeat the above process.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "Sample.docx"
outputFolder = "output/SplitDocument/"

# Create an instance of Document
original = Document()
# Load a Word document
original.LoadFromFile(inputFile)

# Create a new word document and add a section to it
newWord = Document()
section = newWord.AddSection()
original.CloneDefaultStyleTo(newWord)
original.CloneThemesTo(newWord)
original.CloneCompatibilityTo(newWord)

index = 0
# Iterate through all sections of original document
for m in range(original.Sections.Count):
    sec = original.Sections.get_Item(m)
    # Iterate through all body child objects of each section
    for k in range(sec.Body.ChildObjects.Count):
        obj = sec.Body.ChildObjects.get_Item(k)
        if isinstance(obj, Paragraph):
            para = obj if isinstance(obj, Paragraph) else None
            sec.CloneSectionPropertiesTo(section)
            # Add paragraph object in original section into section of new document
            section.Body.ChildObjects.Add(para.Clone())
            for j in range(para.ChildObjects.Count):
                parobj = para.ChildObjects.get_Item(j)
                if isinstance(parobj, Break) and ( parobj if isinstance(parobj, Break) else None).BreakType == BreakType.PageBreak:
                    # Get the index of page break in paragraph
                    i = para.ChildObjects.IndexOf(parobj)
                    # Remove the page break from its paragraph
                    section.Body.LastParagraph.ChildObjects.RemoveAt(i)
                    # Save the new document
                    resultF = outputFolder
                    resultF += "SplitByPageBreak-{0}.docx".format(index)
                    newWord.SaveToFile(resultF, FileFormat.Docx)
                    index += 1
                    # Create a new document and add a section
                    newWord = Document()
                    section = newWord.AddSection()
                    original.CloneDefaultStyleTo(newWord)
                    original.CloneThemesTo(newWord)
                    original.CloneCompatibilityTo(newWord)
                    sec.CloneSectionPropertiesTo(section)
                    # Add paragraph object in original section into section of new document
                    section.Body.ChildObjects.Add(para.Clone())
                    if section.Paragraphs[0].ChildObjects.Count == 0:
                        # Remove the first blank paragraph
                        section.Body.ChildObjects.RemoveAt(0)
                    else:
                        # Remove the child objects before the page break
                        while i >= 0:
                            section.Paragraphs[0].ChildObjects.RemoveAt(i)
                            i -= 1
        if isinstance(obj, Table):
            # Add table object in original section into section of new document
            section.Body.ChildObjects.Add(obj.Clone())

# Save the document
result = outputFolder+"SplitByPageBreak-{0}.docx".format(index)
newWord.SaveToFile(result, FileFormat.Docx2013)
newWord.Close()

Python: Split Word Documents

Split a Word Document by Section Breaks with Python

Sections divide a Word document into different logical parts and allow for independent formatting for each section. By splitting a Word document into sections, we can obtain multiple documents with relatively independent content and formatting. The detailed steps for splitting a Word document by section breaks are as follows:

  • Create an instance of Document class and load a Word document using Document.LoadFromFile() method.
  • Iterate through each section in the document.
  • Get a section using Document.Sections.get_Item() method.
  • Create a new Word document and copy the section in the original document to the new document using Document.Sections.Add() method.
  • Save the new document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an instance of Document class
document = Document()
# Load a Word document
document.LoadFromFile("Sample.docx")

# Iterate through all sections
for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    result = "output/SplitDocument/" + "SplitBySectionBreak_{0}.docx".format(i+1)
    # Create a new Word document
    newWord = Document()
    # Add the section to the new document
    newWord.Sections.Add(section.Clone())
    #Save the new document
    newWord.SaveToFile(result)
    newWord.Close()

Python: Split Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Python: Protect or Unprotect Excel Files

2024-03-26 01:06:36 Written by Koohji

Excel files often contain sensitive and confidential information, such as financial data, personal information, trade secrets, or proprietary formulas. When sharing these files over the internet or between organizations, there might be a risk of data leaks, theft, or unauthorized modifications. To address this concern, Excel provides a comprehensive set of protection features, such as password-protecting workbooks, restricting editing on worksheets, and locking cells, which enable users to establish multiple layers of security to control data access and maintain the integrity of their Excel files. In this article, you will learn how to protect and unprotect Excel workbooks and worksheets in Python using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Password Protect an Entire Workbook in Python

By encrypting an Excel document with a password, you ensure that the data within the document remains secure and inaccessible to unauthorized individuals. The following are the steps to password-protect a workbook using Spire.XLS for Python.

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Protect the workbook with a password using Workbook.Protect() method.
  • Save the workbook to another Excel file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load an Excel workbook
workbook.LoadFromFile("Sample.xlsx")

# Protect the workbook with a password
workbook.Protect("psd-123")

# Save the workbook to another Excel file
workbook.SaveToFile("Encrypted.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Protect or Unprotect Excel Files

Protect a Worksheet with a Specific Protection Type in Python

If you want to authorize others to view your Excel document while limiting the types of changes they can make to a worksheet, you can protect the worksheet with a specific protection type. The table below lists a variety of pre-defined protection types under the SheetProtectionType enumeration.

Protection Type Allow users to
Content Modify or insert content.
DeletingColumns Delete columns.
DeletingRows Delete rows.
Filtering Set filters.
FormattingCells Format cells.
FormattingColumns Format columns.
FormattingRows Format rows.
InsertingColumns Insert columns.
InsertingRows Insert rows.
InsertingHyperlinks Insert hyperlinks .
LockedCells Select locked cells.
UnlockedCells Select unlocked cells.
Objects Modify drawing objects.
Scenarios Modify saved scenarios.
Sorting Sort data.
UsingPivotTables Use pivot table and pivot chart.
All Do any operations listed above on the protected worksheet.
none Do nothing on the protected worksheet.

The following steps show you how to protect a worksheet with a specific protection type using Spire.XLS for Python.

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet through Workbook.Worksheets[index] property.
  • Protect the worksheet with a password and a specific protection type using Worksheet.Protect(password:str, options:SheetProtectionType) method.
  • Save the workbook to another Excel file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load an Excel workbook
workbook.LoadFromFile("Sample.xlsx")

# Get the first worksheet
worksheet = workbook.Worksheets[0]

# Protect the worksheet with a password and a specific protection type
worksheet.Protect("psd-permission", SheetProtectionType.none)

# Save the workbook to another Excel file
workbook.SaveToFile("ProtectWorksheet.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Protect or Unprotect Excel Files

Allow Users to Edit Ranges in a Protected Worksheet in Python

In certain cases, you may want to allow users to edit certain ranges of a worksheet while preserving the integrity of other data. The following steps demonstrate how to accomplish this feature using Spire.XLS for Python.

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet through Workbook.Worksheets[index] property.
  • Specify editable cell ranges using Worksheet.AddAllowEditRange() method.
  • Protect the worksheet with a password and a specific protection type using Worksheet.Protect(password:str, options:SheetProtectionType) method.
  • Save the workbook to another Excel file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load an Excel workbook
workbook.LoadFromFile("Sample.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Add ranges that allow editing
sheet.AddAllowEditRange("Range One", sheet.Range["A5:A6"])
sheet.AddAllowEditRange("Range Two", sheet.Range["A8:B11"])

# Protect the worksheet with a password and a protection type
sheet.Protect("psd-permission", SheetProtectionType.All)

# Save the workbook to another Excel file
workbook.SaveToFile("AllowEditRange.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Protect or Unprotect Excel Files

Unprotect a Password Protected Worksheet in Python

To remove the protection of a password-protected worksheet, you need to invoke the Worksheet.Unprotect() method and pass the original password to the method as a parameter. The detailed steps are as follows.

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet through Workbook.Worksheets[index] property.
  • Remove the password protection using Worksheet.Unprotect(password:str) method.
  • Save the workbook to another Excel file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load an Excel workbook containing protected worksheet
workbook.LoadFromFile("ProtectWorksheet.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Unprotect the worksheet using the specified password
sheet.Unprotect("psd-permission")

# Save the workbook to another Excel file
workbook.SaveToFile("UnprotectWorksheet.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Remove or Reset the Password of an Encrypted Workbook in Python

To remove or reset password of an encrypted workbook, you can use the Workbook.Unprotect() or the Workbook.Protect() method. The following steps show you how to load an encrypted Excel document and delete or change the password of it.

  • Create a Workbook object.
  • Specify the open password through Workbook.OpenPassword property.
  • Load the encrypted Excel file using Workbook.LoadFromFile() method.
  • Remove the encryption using Workbook.Unprotect() method. Or change the password using Workbook.Protect() method.
  • Save the workbook to another Excel file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Specify the open password
workbook.OpenPassword = "psd-123"

# Load an encrypted Excel workbook
workbook.LoadFromFile("Encrypted.xlsx")

# Unprotect the workbook
workbook.UnProtect()

# Reset password
# workbook.Protect("newpassword")

# Save the workbook to another Excel file
workbook.SaveToFile("UnprotectWorkbook.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

How to Add or Remove Auto Filters in Excel in Python

Excel’s AutoFilter feature is a powerful tool that allows you to quickly filter worksheet data based on specific criteria. When applying auto filter to a range of cells, you can display only those rows that meet certain conditions and hide the rest of the data.

However, while filters simplify workflows, knowing how to remove auto filters in Excel is equally critical to maintaining accurate, accessible, and error-free datasets. In this article, you will learn how to add or remove AutoFilters in Excel in Python using Spire.XLS for Python library.

Installation Guide for Spire.XLS for Python

Spire.XLS for Python is a robust library that enables developers to automate AutoFilter operations in Excel, including adding or removing auto filters.

To install the Python library, open your terminal or command prompt and run the following:

pip install Spire.XLS

The pip tool will search for the latest version of the Spire.XLS library on the Python Package Index (PyPI) and then download and install it along with any necessary dependencies.

How to Use Excel Auto Filters in Python

Add AutoFilter in Excel in Python

Excel AutoFilter can be applied to entire columns or specified cell ranges. The following are the core properties used:

  • Worksheet.AutoFilters property: Gets a collection of auto filters in the worksheet, and return an AutoFiltersCollection object.
  • AutoFiltersCollection.Range property: Specify the cell range to be filtered.

Code Example:

  • Python
from spire.xls import *
from spire.xls.common import *

inputFile = "Data.xlsx"
outputFile = "ExcelAutoFilter.xlsx"

# Create a Workbook instance
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile(inputFile)

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Create an AutoFilter in the sheet and specify the range to be filtered
sheet.AutoFilters.Range = sheet.Range["A1:C1"]

# Save the result file
workbook.SaveToFile(outputFile, ExcelVersion.Version2016)
workbook.Dispose()

Result: Dropdown arrows appear in the header row for filtering.

Add auto filters in Excel.

Different Excel Filter Types in Spire.XLS

The AutoFiltersCollection class of the Spire.XLS for Python library offers various methods for you to filter data in Excel in different ways. Check below for the details:

Filters Details
Filter text data Use the AddFilter() to filter cells that contain specified text content.
Filter dates Use the AddDateFilter() method to filter dates associated with the specified year/month/date, etc.
Filter blank / non-blank cells
  • Use the MatchBlanks() method to filter out non-blanks and displays only the empty rows in a given range.
  • Use the MatchNonBlanks() method to filter out blanks and displays only those rows that contain values in a given range.
Filter by color
  • Use the AddFillColorFilter() method to filter cells that are filled with the specified background color.
  • Use the AddFontColorFilter() method to filter by the specified font color.
  • Use the AddIconFilter() method to filter by the specified cell icon.
Custom filter Use the CustomFilter() method to filter by the custom criteria.

Apply Custom Auto Filter in Excel in Python

After adding one of the above filters, you can use the AutoFiltersCollection.Filter() method to apply the filter within the given range. The following is a code example of applying a custom AutoFilter to filter data that is not equal to the specified text string.

  • Python
from spire.xls import *
from spire.xls.common import *

inputFile = "Data.xlsx"
outputFile = "CustomFilter.xlsx"

# Create a Workbook instance
workbook = Workbook()

#Load an Excel file
workbook.LoadFromFile(inputFile)

#Get the first worksheet
sheet = workbook.Worksheets[0]

# Create an auto filter in the sheet and specify the range to be filtered
sheet.AutoFilters.Range = sheet.Range["B1:B12"]

# Get the column to be filtered
filtercolumn = sheet.AutoFilters[0]

# Add a custom filter to filter data that does not contain the string "Drinks"
strCrt = String("Drinks")
sheet.AutoFilters.CustomFilter(filtercolumn , FilterOperatorType.NotEqual, strCrt)

# Apply the filter
sheet.AutoFilters.Filter()

# Save the result file
workbook.SaveToFile(outputFile, ExcelVersion.Version2016)
workbook.Dispose()

Result: Only cells that are not equal to the string “Drinks” are visible.

Add custom filters in Excel.

How to Remove Auto Filters in Excel in Python

AutoFilters are great for focusing on specific data, but leaving them active can lead to critical issues. Removing auto filters ensures:

  • Full data disclosure: All rows/columns are visible.
  • Consistent formatting: Eliminates dropdown arrows for a cleaner look.
  • Avoid confusion: Prevents recipients from misinterpreting filtered data as the complete dataset.

Spire.XLS for Python provides the AutoFiltersCollection.Clear() method to remove or delete all AutoFilters from an Excel worksheet. Here’s the complete code example:

  • Python
from spire.xls import *
from spire.xls.common import *

inputFile = "CustomFilter.xlsx"
outputFile = "RemoveAutoFilter.xlsx"

# Create a Workbook instance
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile(inputFile)

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Delete AutoFilter from the sheet
sheet.AutoFilters.Clear()

# Save the result file
workbook.SaveToFile(outputFile, ExcelVersion.Version2016)
workbook.Dispose()

Result: All rows are visible, and AutoFilter dropdowns are removed.

Remove auto filters in Excel.

Conclusion

With Spire.XLS for Python, adding or removing auto filters in Excel becomes a seamless, automated process. This guide covered installation, basic and custom filters, and code examples to help you streamline data tasks. For more advanced features, explore the Python Excel library’s full documentation.

Get a Free License

To fully experience the capabilities of Spire.XLS for Python without any evaluation limitations, you can request a free 30-day trial license.

page 49

Coupon Code Copied!

Christmas Sale

Celebrate the season with exclusive savings

Save 10% Sitewide

Use Code:

View Campaign Details