Efficiently managing Word documents often requires the task of splitting them into smaller sections. However, manually performing this task can be time-consuming and labor-intensive. Fortunately, Spire.Doc for Python provides a convenient and efficient way to programmatically segment Word documents, helping users to extract specific parts of a document, split lengthy documents into smaller chunks, and streamline data extraction. This article demonstrates how to use Spire.Doc for Python to split a Word document into multiple documents in Python.
The splitting of a Word document is typically done by page breaks and section breaks due to the dynamic nature of document content. Therefore, this article focuses on the following two parts:
- Split a Word Document by Page Breaks with Python
- Split a Word Document by Section Breaks with Python
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows
Split a Word Document by Page Breaks with Python
Page breaks allow for the forced pagination of a document, thereby achieving a fixed division of content. By using page breaks as divisions, we can split a Word document into smaller content-related documents. The detailed steps for splitting a Word document by page breaks are as follows:
- Create an instance of Document class and load a Word document using Document.LoadFromFile() method.
- Create a new document, add a section to it using Document.AddSection() method.
- Iterate through all body child objects in each section in the original document and check if the child object is a paragraph or a table.
- If the child object is a table, add it to the section in the new document using Section.Body.ChildObjects.Add() method.
- If the child object is a paragraph, add the paragraph object to the section in the new document. Then, iterate through all child objects of the paragraph and check if a child object is a page break.
- If the child object in the paragraph is a page break, get its index using Paragraph.ChildObjects.IndexOf() method and remove it from the paragraph by its index.
- Save the new document using Document.SaveToFile() method and repeat the above process.
- Python
from spire.doc import *
from spire.doc.common import *
inputFile = "Sample.docx"
outputFolder = "output/SplitDocument/"
# Create an instance of Document
original = Document()
# Load a Word document
original.LoadFromFile(inputFile)
# Create a new word document and add a section to it
newWord = Document()
section = newWord.AddSection()
original.CloneDefaultStyleTo(newWord)
original.CloneThemesTo(newWord)
original.CloneCompatibilityTo(newWord)
index = 0
# Iterate through all sections of original document
for m in range(original.Sections.Count):
sec = original.Sections.get_Item(m)
# Iterate through all body child objects of each section
for k in range(sec.Body.ChildObjects.Count):
obj = sec.Body.ChildObjects.get_Item(k)
if isinstance(obj, Paragraph):
para = obj if isinstance(obj, Paragraph) else None
sec.CloneSectionPropertiesTo(section)
# Add paragraph object in original section into section of new document
section.Body.ChildObjects.Add(para.Clone())
for j in range(para.ChildObjects.Count):
parobj = para.ChildObjects.get_Item(j)
if isinstance(parobj, Break) and ( parobj if isinstance(parobj, Break) else None).BreakType == BreakType.PageBreak:
# Get the index of page break in paragraph
i = para.ChildObjects.IndexOf(parobj)
# Remove the page break from its paragraph
section.Body.LastParagraph.ChildObjects.RemoveAt(i)
# Save the new document
resultF = outputFolder
resultF += "SplitByPageBreak-{0}.docx".format(index)
newWord.SaveToFile(resultF, FileFormat.Docx)
index += 1
# Create a new document and add a section
newWord = Document()
section = newWord.AddSection()
original.CloneDefaultStyleTo(newWord)
original.CloneThemesTo(newWord)
original.CloneCompatibilityTo(newWord)
sec.CloneSectionPropertiesTo(section)
# Add paragraph object in original section into section of new document
section.Body.ChildObjects.Add(para.Clone())
if section.Paragraphs[0].ChildObjects.Count == 0:
# Remove the first blank paragraph
section.Body.ChildObjects.RemoveAt(0)
else:
# Remove the child objects before the page break
while i >= 0:
section.Paragraphs[0].ChildObjects.RemoveAt(i)
i -= 1
if isinstance(obj, Table):
# Add table object in original section into section of new document
section.Body.ChildObjects.Add(obj.Clone())
# Save the document
result = outputFolder+"SplitByPageBreak-{0}.docx".format(index)
newWord.SaveToFile(result, FileFormat.Docx2013)
newWord.Close()

Split a Word Document by Section Breaks with Python
Sections divide a Word document into different logical parts and allow for independent formatting for each section. By splitting a Word document into sections, we can obtain multiple documents with relatively independent content and formatting. The detailed steps for splitting a Word document by section breaks are as follows:
- Create an instance of Document class and load a Word document using Document.LoadFromFile() method.
- Iterate through each section in the document.
- Get a section using Document.Sections.get_Item() method.
- Create a new Word document and copy the section in the original document to the new document using Document.Sections.Add() method.
- Save the new document using Document.SaveToFile() method.
- Python
from spire.doc import *
from spire.doc.common import *
# Create an instance of Document class
document = Document()
# Load a Word document
document.LoadFromFile("Sample.docx")
# Iterate through all sections
for i in range(document.Sections.Count):
section = document.Sections.get_Item(i)
result = "output/SplitDocument/" + "SplitBySectionBreak_{0}.docx".format(i+1)
# Create a new Word document
newWord = Document()
# Add the section to the new document
newWord.Sections.Add(section.Clone())
#Save the new document
newWord.SaveToFile(result)
newWord.Close()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.