While Word is a powerful tool for creating and formatting documents, it is not optimized for advanced data management and analysis. In contrast, Excel excels at handling data in tabular form, allowing users to perform calculations, create charts, and conduct thorough data analysis.
Generally, converting complex Word documents into Excel spreadsheets is not advisable, as Excel may struggle to preserve the original layout. However, if your Word document primarily consists of tables, converting it to Excel can be highly beneficial. This transformation unlocks Excel's advanced functions, formulas, and visualization tools, enabling you to organize your data more effectively and improve your reporting and decision-making capabilities. In this article, we will focus specifically on how to convert this kind of Word documents to Excel in Python using Spire.Office for Python.
Install Spire.Office for Python
This scenario requires Spire.Office for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Office
Convert Word to Excel in Python
This process uses two libraries in the Spire.Office for Python package. They're Spire.XLS for Python and Spire.Doc for Python. The former is used to read and extract content from a Word document, and the latter is used to create an Excel document and write data in specific cells. To make this code example easy to understand, we have defined the following three custom methods that handle specific tasks:
- ExportTableInExcel() - Export data from a Word table to specified Excel cells.
- CopyContentInTable() - Copy content from a table cell in Word to an Excel cell.
- CopyTextAndStyle() - Copy text with formatting from a Word paragraph to an Excel cell.
The following steps demonstrate how to export data from an entire Word document to an Excel worksheet using Spire.Office for Python.
- Create a Document object to load a Word file.
- Create a Worbbook object and add a worksheet named "WordToExcel" to it.
- Traverse through all the sections in the Word document and all the document objects under a certain section, and then determine if a document object is a paragraph or a table.
- If the document object is a paragraph, write the paragraph in a specified cell in Excel using CoypTextAndStyle() method.
- If the document object is a table, export the table data from Word to Excel cells using ExportTableInExcel() method.
- Auto fit the row height and column width in Excel so that the data within a cell will not exceed the bound of the cell.
- Save the workbook to an Excel file using Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.doc import *
# Export data from Word table to Excel cells
def ExportTableInExcel(worksheet, row, table):
for rowIndex in range(len(table.Rows)):
tbRow = table.Rows[rowIndex]
column = 1
for cellIndex in range(len(tbRow.Cells)):
tbCell = tbRow.Cells[cellIndex]
cell = worksheet.Range[row, column]
cell.BorderAround()
CopyContentInTable(worksheet, tbCell, cell)
column += 1
row += 1
return row
# Copy content from a Word table cell to an Excel cell
def CopyContentInTable(worksheet, tbCell, cell):
newPara = Paragraph(tbCell.Document)
for i in range(len(tbCell.ChildObjects)):
documentObject = tbCell.ChildObjects[i]
if isinstance(documentObject, Paragraph):
paragraph = documentObject
for cObj in range(len(paragraph.ChildObjects)):
newPara.ChildObjects.Add(paragraph.ChildObjects[cObj].Clone())
if i < len(tbCell.ChildObjects) - 1:
newPara.AppendText("\n")
CopyTextAndStyle(worksheet, cell, newPara)
# Copy text and style of a paragraph to a cell
def CopyTextAndStyle(worksheet, cell, paragraph):
richText = cell.RichText
richText.Text = paragraph.Text
startIndex = 0
for documentObject in range(len(paragraph.ChildObjects)):
documentObject = paragraph.ChildObjects[documentObject]
if isinstance(documentObject, TextRange):
textRange = documentObject
fontName = textRange.CharacterFormat.FontName
isBold = textRange.CharacterFormat.Bold
textColor = textRange.CharacterFormat.TextColor
fontSize = textRange.CharacterFormat.FontSize
textRangeText = textRange.Text
strLength = len(textRangeText)
font = worksheet.Workbook.CreateFont()
font.Color = textColor
font.IsBold = isBold
font.Size = fontSize
font.FontName = fontName
endIndex = startIndex + strLength
richText.SetFont(startIndex, endIndex, font)
startIndex += strLength
if isinstance(documentObject, DocPicture):
picture = documentObject
worksheet.Pictures.Add(cell.Row, cell.Column, picture.Image)
worksheet.SetRowHeightInPixels(cell.Row, 1, picture.Image.Height)
if paragraph.Format.HorizontalAlignment == HorizontalAlignment.Left:
cell.Style.HorizontalAlignment = HorizontalAlignType.Left
elif paragraph.Format.HorizontalAlignment == HorizontalAlignment.Center:
cell.Style.HorizontalAlignment = HorizontalAlignType.Center
elif paragraph.Format.HorizontalAlignment == HorizontalAlignment.Right:
cell.Style.HorizontalAlignment = HorizontalAlignType.Right
# Create a Document object
doc = Document()
# Load a Word file
doc.LoadFromFile("C:/Users/Administrator/Desktop/Invoice.docx")
# Create a Workbook object
wb = Workbook()
# Remove the default worksheets
wb.Worksheets.Clear()
# Create a worksheet named "WordToExcel"
worksheet = wb.CreateEmptySheet("WordToExcel")
row = 1
column = 1
# Loop through the sections in the Word document
for sec_index in range(doc.Sections.Count):
section = doc.Sections[sec_index]
# Loop through the document object under a certain section
for obj_index in range(section.Body.ChildObjects.Count):
documentObject = section.Body.ChildObjects[obj_index]
# Determine if the object is a paragraph
if isinstance(documentObject, Paragraph):
cell = worksheet.Range[row, column]
paragraph = documentObject
# Copy paragraph from Word to a specific cell
CopyTextAndStyle(worksheet, cell, paragraph)
row += 1
# Determine if the object is a table
if isinstance(documentObject, Table):
table = documentObject
# Export table data from Word to Excel
currentRow = ExportTableInExcel(worksheet, row, table)
row = currentRow
# Auto fit row height and column width
worksheet.AllocatedRange.AutoFitRows()
worksheet.AllocatedRange.AutoFitColumns()
# Wrap text in cells
worksheet.AllocatedRange.IsWrapText = True
# Save the workbook to an Excel file
wb.SaveToFile("WordToExcel.xlsx", ExcelVersion.Version2013)
wb.Dispose()
doc.Dispose()

Get a Free License
To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.
