Spire.Office Knowledgebase Page 48 | E-iceblue

Word documents often contain valuable data in the form of tables, which can be used for reporting, data analysis, and record-keeping. However, manually extracting and transferring these tables to other formats can be a time-consuming and error-prone task. By automating this process using Python, we can save time, ensure accuracy, and maintain consistency. Spire.Doc for Python provides a seamless solution for the table extraction task, making it effortless to create accessible and manageable files with data from Word document tables. This article will demonstrate how to leverage Spire.Doc for Python to extract tables from Word documents and write them into text files and Excel worksheets.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows

Extract Tables from Word Documents to Text Files with Python

Spire.Doc for Python offers the Section.Tables property to retrieve a collection of tables within a section of a Word document. Then, developers can use the properties and methods under the ITable class to access the data in the tables and write it into a text file. This provides a convenient solution for converting Word document tables into text files.

The detailed steps for extracting tables from Word documents to text files are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
  • Iterate through the tables and create a string object for each table.
  • Iterate through the rows in each table and the cells in each row, get the text of each cell through TableCell.Paragraphs[].Text property, and add the cell text to the string.
  • Save each string to a text file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an instance of Document
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Loop through the sections
for s in range(doc.Sections.Count):
    # Get a section
    section = doc.Sections.get_Item(s)
    # Get the tables in the section
    tables = section.Tables
    # Loop through the tables
    for i in range(0, tables.Count):
        # Get a table
        table = tables.get_Item(i)
        # Initialize a string to store the table data
        tableData = ''
        # Loop through the rows of the table
        for j in range(0, table.Rows.Count):
            # Loop through the cells of the row
            for k in range(0, table.Rows.get_Item(j).Cells.Count):
                # Get a cell
                cell = table.Rows.get_Item(j).Cells.get_Item(k)
                # Get the text in the cell
                cellText = ''
                for para in range(cell.Paragraphs.Count):
                    paragraphText = cell.Paragraphs.get_Item(para).Text
                    cellText += (paragraphText + ' ')
                # Add the text to the string
                tableData += cellText
                if k < table.Rows.get_Item(j).Cells.Count - 1:
                    tableData += '\t'
            # Add a new line
            tableData += '\n'
    
        # Save the table data to a text file
        with open(f'output/Tables/WordTable_{s+1}_{i+1}.txt', 'w', encoding='utf-8') as f:
            f.write(tableData)
doc.Close()

Python: Extract Tables from Word Documents

Extract Tables from Word Documents to Excel Workbooks with Python

Developers can also utilize Spire.Doc for Python to retrieve table data and then use Spire.XLS for Python to write the table data into an Excel worksheet, thereby enabling the conversion of Word document tables into Excel workbooks.

Install Spire.XLS for Python via PyPI:

pip install Spire.XLS

The detailed steps for extracting tables from Word documents to Excel workbooks are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Create an object of Workbook class and clear the default worksheets using Workbook.Worksheets.Clear() method.
  • Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
  • Iterate through the tables and create a worksheet for each table using Workbook.Worksheets.Add() method.
  • Iterate through the rows in each table and the cells in each row, get the text of each cell through TableCell.Paragraphs[].Text property, and write the text to the worksheet using Worksheet.SetCellValue() method.
  • Save the workbook using Workbook.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *
from spire.xls import *
from spire.xls.common import *

# Create an instance of Document
doc = Document()

# Load a Word document
doc.LoadFromFile('Sample.docx')

# Create an instance of Workbook
wb = Workbook()
wb.Worksheets.Clear()

# Loop through sections in the document
for i in range(doc.Sections.Count):
    # Get a section
    section = doc.Sections.get_Item(i)
    # Loop through tables in the section
    for j in range(section.Tables.Count):
        # Get a table
        table = section.Tables.get_Item(j)
        # Create a worksheet
        ws = wb.Worksheets.Add(f'Table_{i+1}_{j+1}')
        # Write the table to the worksheet
        for row in range(table.Rows.Count):
            # Get a row
            tableRow = table.Rows.get_Item(row)
            # Loop through cells in the row
            for cell in range(tableRow.Cells.Count):
                # Get a cell
                tableCell = tableRow.Cells.get_Item(cell)
                # Get the text in the cell
                cellText = ''
                for paragraph in range(tableCell.Paragraphs.Count):
                    paragraph = tableCell.Paragraphs.get_Item(paragraph)
                    cellText = cellText + (paragraph.Text + ' ')
                # Write the cell text to the worksheet
                ws.SetCellValue(row + 1, cell + 1, cellText)

# Save the workbook
wb.SaveToFile('output/Tables/WordTableToExcel.xlsx', FileFormat.Version2016)
doc.Close()
wb.Dispose()

Python: Extract Tables from Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Python: Reorder Columns or Rows in Excel

2024-05-29 01:05:22 Written by Koohji

Reordering columns or rows in Excel is a simple process that allows you to change the arrangement of data within your spreadsheet. This can be useful for better organizing your data or aligning it with other columns or rows. You can reorder by using drag-and-drop, cut and paste, or keyboard shortcuts depending on the version of Excel you are using.

This article focus on introducing how to programmatically reorder columns or rows in an Excel worksheet in Python using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your system through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Reorder Columns in Excel in Python

Spire.XLS does not provide a straightforward way to reorganize the order of columns or rows within an Excel worksheet. The solution requires creating a duplicate of the target worksheet. Then, you can copy the columns or rows from the copied worksheet and paste them into the original worksheet in the new preferred column or row sequence.

The following are the steps to reorder columns in an Excel worksheet using Python.

  • Create a Workbook object.
  • Load an Excel document from the specified file path.
  • Get the target worksheet using Workbook.Worksheets[index] property.
  • Specify the new column order within a list.
  • Create a temporary sheet and copy the data from the target sheet into it.
  • Copy the columns from the temporary worksheet to the target worksheet in the desired order using Worksheet.Columns[index].Copy() method.
  • Remove the temporary sheet.
  • Save the workbook to a different Excel document.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load the Excel document
workbook.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.xlsx")

# Get a specific worksheet
targetSheet = workbook.Worksheets[0]

# Specify the new column order in a list (the column index starts from 0)
newColumnOrder = [3, 0, 1, 2, 4, 5 ,6, 7]

# Add a temporary worksheet
tempSheet = workbook.Worksheets.Add("temp")

# Copy data from the target worksheet to the temporary sheet
tempSheet.CopyFrom(targetSheet)

# Iterate through the newColumnOrder list
for i in range(len(newColumnOrder)):

    # Copy the column from the temporary sheet to the target sheet in the new order
    tempSheet.Columns[newColumnOrder[i]].Copy(targetSheet.Columns[i], True, True)

    # Reset the column width in the target sheet
    targetSheet.Columns[i].ColumnWidth = tempSheet.Columns[newColumnOrder[i]].ColumnWidth

# Remove the temporary sheet
workbook.Worksheets.Remove(tempSheet)

# Save the workbook to another Excel file
workbook.SaveToFile("output/ReorderColumns.xlsx", FileFormat.Version2016)

# Dispose resources
workbook.Dispose()

Python: Reorder Columns or Rows in Excel

Reorder Rows in Excel in Python

Rearranging the rows in an Excel spreadsheet follows a similar approach to reorganizing the columns. The steps to reorder the rows within an Excel worksheet are as outlined below.

  • Create a Workbook object.
  • Load an Excel document from the specified file path.
  • Get the target worksheet using Workbook.Worksheets[index] property.
  • Specify the new row order within a list.
  • Create a temporary sheet and copy the data from the target sheet into it.
  • Copy the rows from the temporary worksheet to the target worksheet in the desired order using Worksheet.Rows[index].Copy() method.
  • Remove the temporary sheet.
  • Save the workbook to a different Excel document.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load the Excel document
workbook.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.xlsx")

# Get a specific worksheet
targetSheet = workbook.Worksheets[0]

# Specify the new row order in a list (the row index starts from 0)
newRowOrder = [0, 2, 3, 1, 4, 5 ,6, 7, 8, 9, 10, 11, 12]

# Add a temporary worksheet
tempSheet = workbook.Worksheets.Add("temp")

# Copy data from the first worksheet to the temporary sheet
tempSheet.CopyFrom(targetSheet)

# Iterate through the newRowOrder list
for i in range(len(newRowOrder)):

    # Copy the row from the temporary sheet to the target sheet in the new order
    tempSheet.Rows[newRowOrder[i]].Copy(targetSheet.Rows[i], True, True)

    # Reset the row height in the target sheet
    targetSheet.Rows[i].RowHeight = tempSheet.Rows[newRowOrder[i]].RowHeight

# Remove the temporary sheet
workbook.Worksheets.Remove(tempSheet)

# Save the workbook to another Excel file
workbook.SaveToFile("output/ReorderRows.xlsx", FileFormat.Version2016)

# Dispose resources
workbook.Dispose()

Python: Reorder Columns or Rows in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Java: Convert Markdown to Word and PDF

2024-05-28 01:25:53 Written by Koohji

Markdown is a popular format among writers and developers for its simplicity and readability, allowing content to be formatted using easy-to-write plain text syntax. However, converting Markdown files to universally accessible formats like Word documents and PDF files is essential for sharing documents with readers, enabling complex formatting, and ensuring capability and consistency across devices and platforms. This article demonstrates how to convert Markdown files to Word and PDF files with the powerful library Spire.Doc for Java, enhancing the versatility and distribution potential of your written content.

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.4.0</version>
    </dependency>
</dependencies>

Convert a Markdown File to a Word Document with Java

Spire.Doc for Java provides a simple way to convert Markdown format to Word and PDF document formats by using the Document.loadFromFile(String: fileName, FileFormat.Markdown) method to load the Markdown file and the Document.saveToFile(String: fileName, FileFormat: fileFormat) method to save the file as a Word or PDF document.

It should be noted that since images are stored as links in Markdown files, they need to be further processed after conversion if they are to be retained.

The detailed steps for converting a Markdown file to a Word document are as follows:

  • Create an instance of Document class.
  • Load a Markdown file using Document.loadFromFile(String: fileName, FileFormat.Markdown) method.
  • Save the Markdown file as Word document using Document.saveToFile(String: fileName, FileFormat.Docx) method.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class MarkdownToWord {
    public static void main(String[] args) {
        // Create an instance of Document
        Document doc = new Document();

        // Load a Markdown file
        doc.loadFromFile("Sample.md", FileFormat.Markdown);

        // Save the Markdown file as Word document
        doc.saveToFile("output/MarkdownToWord.docx", FileFormat.Docx);
        doc.dispose();
    }
}

Java: Convert Markdown to Word and PDF

Convert a Markdown File to a PDF Document with Java

By using the FileFormat.PDF Enum as the format parameter of the Document.saveToFile() method, the Markdown file can be directly converted to a PDF document.

The detailed steps for converting a Markdown file to a PDF document are as follows:

  • Create an instance of Document class.
  • Load a Markdown file using Document.loadFromFile(String: fileName, FileFormat.Markdown) method.
  • Save the Markdown file as PDF document using Document.saveToFile(String: fileName, FileFormat.PDF) method.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class MarkdownToPDF {
    public static void main(String[] args) {
        // Create an instance of the Document class
        Document doc = new Document();

        // Load a Markdown file
        doc.loadFromFile("Sample.md");

        // Save the Markdown file as a PDF file
        doc.saveToFile("output/MarkdownToPDF.pdf", FileFormat.PDF);
        doc.dispose();
    }
}

Java: Convert Markdown to Word and PDF

Customizing Page Settings of the Result Document

Spire.Doc for Java also provides methods under PageSetup class to do page setup before the conversion, allowing control over page settings such as page margins and page size of the resulting document.

The following are the steps to customize the page settings of the resulting document:

  • Create an instance of Document class.
  • Load a Markdown file using Document.loadFromFile(String: fileName, FileFormat.Markdown) method.
  • Get the first section using Document.getSections().get() method.
  • Set the page size, page orientation, and page margins using methods under PageSetup class.
  • Save the Markdown file as PDF document using Document.saveToFile(String: fileName, FileFormat.PDF) method.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.PageSetup;
import com.spire.doc.Section;
import com.spire.doc.documents.MarginsF;
import com.spire.doc.documents.PageOrientation;
import com.spire.doc.documents.PageSize;

public class PageSettingMarkdown {
    public static void main(String[] args) {
        // Create an instance of the Document class
        Document doc = new Document();

        // Load a Markdown file
        doc.loadFromFile("Sample.md");

        // Get the first section
        Section section = doc.getSections().get(0);

        // Set the page size, orientation, and margins
        PageSetup pageSetup = section.getPageSetup();
        pageSetup.setPageSize(PageSize.Letter);
        pageSetup.setOrientation(PageOrientation.Landscape);
        pageSetup.setMargins(new MarginsF(100, 100, 100, 100));

        // Save the Markdown file as a PDF file
        doc.saveToFile("output/MarkdownToPDF.pdf", FileFormat.PDF);
        doc.dispose();
    }
}

Java: Convert Markdown to Word and PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 48