Conversion

Conversion (19)

Python Convert HTML to Word DOC or DOCX

Converting HTML files to Word documents in Python is an essential skill for developers building documentation systems, report generators, or applications that transform web-based content into offline editable formats. While HTML excels at displaying content on the web, Word documents provide a more versatile format for offline access, collaboration, and professional presentation.

This in-depth developer guide shows you how to automate the conversion from HTML files and HTML strings into Word DOCX/DOC documents in Python using Spire.Doc for Python—a powerful, standalone library that enables high-quality Word document generation and conversion without the need for Microsoft Word.

Table of Contents

Why Convert HTML to Word Format

HTML is ideal for online content delivery, but Word documents offer significant advantages for use cases that require formatting, annotation, printing, or offline access:

  • Offline Access: View and edit documents without an internet connection.
  • Advanced Editing: Enable features like tracked changes, comments, and section formatting.
  • Professional Presentation: Suitable for formal reports, business contracts, user manuals, and documentation.
  • Cross-Platform Compatibility: Open and edit using Microsoft Word, Google Docs, LibreOffice, and other word processors.

Install HTML to Word Converter in Python

Spire.Doc for Python is a feature-rich library designed to help developers create, read, convert, and manipulate Word documents directly within Python applications. It offers high-fidelity conversion of HTML content to Word format while preserving the original structure and styles.

Spire.Doc for Python

Key Benefits

  • Fully preserves original HTML structure, CSS styles, and layout
  • Accepts both HTML files and HTML strings as input sources
  • Supports conversion to .doc, .docx, and other formats
  • 100% standalone; no Office automation needed

Installation

You can install the library from PyPI using the following pip command:

pip install spire.doc

Export HTML Files to Word Documents in Python

If you already have an HTML file—such as a saved webpage or generated HTML report—you can save it to a Word document with just a few lines of code.

Code Example

from spire.doc import *
from spire.doc.common import *

# Specify the input and output file paths
inputFile = "Input.html"
outputFile = "HtmlToWord.docx"

# Create an object of the Document class
document = Document()
# Load an HTML file 
document.LoadFromFile(inputFile, FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file to a .docx file
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Explanation:

This example demonstrates how to load an existing .html file and save it to a Word .docx document:

  • Document(): creates a new Word document object.
  • LoadFromFile(): loads the HTML file and parses it as an HTML document.
  • XHTMLValidationType.none: disables strict validation of the HTML content.
  • SaveToFile(): saves the result as a .docx file using the FileFormat.Docx2016 format.

To export as .doc, replace FileFormat.Docx2016 with FileFormat.Doc.

Output:

Here is the Word document generated from the HTML file:

HTML File to Word Output

Insert HTML Strings into Word Documents in Python

Sometimes, you may have HTML content as a string—perhaps scraped from the web or dynamically generated. Spire.Doc allows you to insert such HTML content into a Word document without saving it as a file first.

Code Example

from spire.doc import *
from spire.doc.common import *

# Specify the output file path
outputFile = "HtmlStringToWord.docx"

# Create an object of the Document class
document = Document()
# Add a section to the document
sec = document.AddSection()

# Add a paragraph to the section
paragraph = sec.AddParagraph()

# Specify the HTML string
htmlString = """
<html>
<head>
    <title>HTML to Word Example</title>
    <style>
        body {
            font-family: Arial, sans-serif;
        }
        h1 {
            color: #FF5733;
            font-size: 24px;
            margin-bottom: 20px;
        }
        p {
            color: #333333;
            font-size: 16px;
            margin-bottom: 10px;
        }
        ul {
            list-style-type: disc;
            margin-left: 20px;
            margin-bottom: 15px;
        }
        li {
            font-size: 14px;
            margin-bottom: 5px;
        }
        table {
            border-collapse: collapse;
            width: 100%;
            margin-bottom: 20px;
        }
        th, td {
            border: 1px solid #CCCCCC;
            padding: 8px;
            text-align: left;
        }
        th {
            background-color: #F2F2F2;
            font-weight: bold;
        }
        td {
            color: #0000FF;
        }
    </style>
</head>
<body>
    <h1>This is a Heading</h1>
    <p>This is a paragraph demonstrating the conversion of HTML to Word document.</p>
    <p>Here's an example of an unordered list:</p>
    <ul>
        <li>Item 1</li>
        <li>Item 2</li>
        <li>Item 3</li>
    </ul>
    <p>And here's a table:</p>
    <table>
        <tr>
            <th>Product</th>
            <th>Quantity</th>
            <th>Price</th>
        </tr>
        <tr>
            <td>Jacket</td>
            <td>30</td>
            <td>$150</td>
        </tr>
        <tr>
            <td>Sweater</td>
            <td>25</td>
            <td>$99</td>
        </tr>
    </table>
</body>
</html>
"""

# Append the HTML string to the paragraph
paragraph.AppendHTML(htmlString)

# Save the result document
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Explanation:

This code converts an HTML string directly into Word content:

  • Document(): creates a new document.
  • AddSection() and AddParagraph(): adds a section and paragraph to hold the content.
  • AppendHTML(): parses and inserts the HTML string into the paragraph, preserving styles and structure.
  • SaveToFile(): saves the document to a .docx file using the FileFormat.Docx2016 format.

This approach is ideal for use cases like email-to-Word, content pulled from CMS platforms, or HTML snippets generated dynamically at runtime.

Output:

Here is the Word document generated from the HTML string:

HTML String to Word Output

Supported Output Formats

With Spire.Doc for Python, you’re not limited to Word output. You can also convert HTML to various formats, including:

Conclusion

Spire.Doc for Python provides a powerful solution for developers looking to convert HTML to Word documents with precision and efficiency. Whether you’re working with HTML files or strings, the library simplifies the process while maintaining the integrity of your content.

Give Spire.Doc a try today and see how effortlessly you can add professional document generation to your Python projects!

FAQs

Q1: Can I convert HTML to Word without installing Microsoft Word?

A1: Yes. Spire.Doc is a standalone component and does not require Word or Office on the machine.

Q2: Are CSS styles and tables preserved?

A2: Yes. The library retains CSS styles, tables, images, lists, fonts, and layout formatting.

Q3: Can I batch-convert multiple HTML files to Word?

A3: Absolutely. You can loop through folders and apply the same conversion logic to each file.

Q4: What other formats can I export HTML to?

A4: HTML can be converted to .doc, .docx, .pdf, image formats, .rtf, .xml, and more.

Q5: Is there a trial license?

A5: Yes. you can request a 30-day trial license for full functionality.

Convert Word DOCX or DOC to HTML in Python

Converting Word documents (DOCX or DOC) to HTML format is essential when you want to display formatted content on web pages, import legacy documents into content management systems, or generate web previews for DOCX files. HTML’s universal browser compatibility makes it an ideal format for sharing content online.

This guide shows how to convert Word to HTML in Python using Spire.Doc for Python. It covers both basic and advanced conversion techniques with practical examples, helping you handle diverse conversion needs.

Table of Contents

Why Convert Word to HTML?

Here are some typical scenarios where converting Word to HTML is beneficial:

  • Web publishing: Display Word content in a browser without requiring users to download the document.
  • CMS integration: Import Word-based articles into a web-based content system.
  • Content preview: Generate HTML previews for Word attachments or document archives.
  • Email rendering: Convert DOCX content into HTML-friendly formats for email templates.

Install Word to HTML Converter in Python

Spire.Doc for Python is a professional library designed for Word document processing and conversion. It provides a reliable way to export Word documents to HTML while preserving accurate formatting and layout. Spire.Doc for Python

Benefits of Using Spire.Doc for Word-to-HTML Conversion

  • Accurate formatting: Preserves fonts, colors, styles, tables, and images.
  • No Office dependency: Does not require Microsoft Word or Office Interop.
  • Supports DOCX and DOC: Compatible with both modern and legacy Word formats.
  • Customizable output: Fine-tune HTML export settings, including image embedding and CSS styling.

Installation

Install the library from PyPI using the following command:

pip install spire.doc

Need help with the installation? Check this step-by-step guide: How to Install Spire.Doc for Python on Windows.

How to Convert Word to HTML Using Python

This section demonstrates how to convert Word documents to HTML using Spire.Doc for Python. First, you'll see a quick example using default settings for fast export. Then, you'll learn how to customize the HTML output with advanced options.

Quick Conversion with Default Settings

The following code snippet shows how to save a Word document to HTML format using the default export settings. It’s suitable for simple use cases where no customization is needed.

from spire.doc import *
from spire.doc.common import *
     
# Create a Document instance
document = Document()

# Load a doc or docx document 
document.LoadFromFile("Statement.docx")

# Save the document to HTML format
document.SaveToFile("WordToHtml.html", FileFormat.Html)
document.Close()

Output of exporting Word document to HTML with default settings

Advanced Conversion Options

You can customize the HTML export to suit your needs by configuring options such as including headers and footers, linking to an external CSS stylesheet, choosing whether to embed images or save them separately, and exporting form fields as plain text. The example below shows how to set these options.

from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()

# Load a .docx or .doc document
document.LoadFromFile("Statement.docx")

# Control whether to include headers and footers in the exported HTML
document.HtmlExportOptions.HasHeadersFooters = False

# Specify the name of the CSS file to use for styling the exported HTML
document.HtmlExportOptions.CssStyleSheetFileName = "sample.css"

# Set the CSS stylesheet type to external, so the HTML file links to the specified CSS file instead of embedding styles inline
document.HtmlExportOptions.CssStyleSheetType = CssStyleSheetType.External

# Configure image export: do not embed images inside HTML, save them to a separate folder
document.HtmlExportOptions.ImageEmbedded = False
document.HtmlExportOptions.ImagesPath = "Images/"

# Export form fields as plain text instead of interactive form elements
document.HtmlExportOptions.IsTextInputFormFieldAsText = True

# Save the document as an HTML file
document.SaveToFile("ToHtmlExportOption.html", FileFormat.Html)
document.Close()

Conclusion

Spire.Doc for Python delivers high-fidelity Word-to-HTML conversions without requiring Microsoft Word. Whether for quick exports or customized HTML output, it provides a versatile, dependable solution.

Beyond HTML conversion, Spire.Doc supports a wide range of Word automation tasks such as document merging, text replacement, and PDF conversion, empowering developers to build robust document processing pipelines. To explore these capabilities further, check out the full Python Word programming guide and start enhancing your document workflows today.

FAQs

Q1: Can Spire.Doc convert both DOC and DOCX files to HTML?

A1: Yes, it supports exporting both legacy DOC and modern DOCX formats.

Q2: Is Microsoft Word required for conversion?

A2: No, Spire.Doc works independently without needing Microsoft Word or Office Interop.

Q3: Can images be embedded directly in the HTML instead of saved separately?

A3: Yes, you can embed images directly into the HTML output by setting ImageEmbedded to True. This ensures that all images are included within the HTML file itself, without creating separate image files or folders.

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Python: Convert Word to Images

2023-09-14 00:55:04 Written by Koohji

Converting a Word document into images can be a useful and convenient option when you want to share or present the content without worrying about formatting issues or compatibility across devices. By converting a Word document into images, you can ensure that the text, images, and formatting remain intact, making it an ideal solution for sharing documents on social media, websites, or through email. In this article, you will learn how to convert Word to PNG, JPEG or SVG in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Word to PNG or JPEG in Python

Spire.Doc for Python offers the Document.SaveImageToStream() method to convert a certain page into a bitmap image. Afterwards, you can save the bitmap image to a popular image format such as PNG, JPEG, or BMP. The detailed steps are as follows.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Retrieve each page in the document, and convert a specific page into a bitmap image using Document.SaveImageToStreams() method.
  • Save the bitmap image into a PNG or JPEG file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Loop through the pages in the document
for i in range(document.GetPageCount()):

    # Convert a specific page to bitmap image
    imageStream = document.SaveImageToStreams(i, ImageType.Bitmap)

    # Save the bitmap to a PNG file
    with open('Output/ToImage-{0}.png'.format(i),'wb') as imageFile:
        imageFile.write(imageStream.ToArray())

document.Close()

Python: Convert Word to Images

Convert Word to SVG in Python

To convert a Word document into multiple SVG files, you can simply use the Document.SaveToFile() method. Here are the steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Convert it to individual SVG files using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Convert it to SVG files
document.SaveToFile("output/ToSVG.svg", FileFormat.SVG)

document.Close()

Python: Convert Word to Images

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Text files are a common file type that contain only plain text without any formatting or styles. If you want to apply formatting or add images, charts, tables, and other media elements to text files, one of the recommended solutions is to convert them to Word files.

Conversely, if you want to efficiently extract content or reduce the file size of Word documents, you can convert them to text format. This article will demonstrate how to programmatically convert text files to Word format and convert Word files to text format using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Text (TXT) to Word in Python

Conversion from TXT to Word is quite simple that requires only a few lines of code. The following are the detailed steps.

  • Create a Document object.
  • Load a text file using Document.LoadFromFile(string fileName) method.
  • Save the text file as a Word file using Document.SaveToFile(string fileName, FileFormat fileFormat) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a TXT file
document.LoadFromFile("input.txt")

# Save the TXT file as Word
document.SaveToFile("TxtToWord.docx", FileFormat.Docx2016)
document.Close()

Python: Convert Text to Word or Word to Text

Convert Word to Text (TXT) in Python

The Document.SaveToFile(string fileName, FileFormat.Txt) method provided by Spire.Doc for Python allows you to export a Word file to text format. The following are the detailed steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile(string fileName) method.
  • Save the Word file in txt format using Document.SaveToFile(string fileName, FileFormat.Txt) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file from disk
document.LoadFromFile("Input.docx")

# Save the Word file in txt format
document.SaveToFile("WordToTxt.txt", FileFormat.Txt)
document.Close()

Python: Convert Text to Word or Word to Text

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Python: Convert Word to PDF

2023-08-22 01:01:05 Written by Koohji

Nowadays, digital documents play a significant role in our daily lives, both in personal and professional settings. One such common format is Microsoft Word - used for creating and editing textual documents. However, there may come a time when you need to convert your Word files into a more universally accessible format, such as PDF. PDFs offer advantages like preserving formatting, ensuring compatibility, and maintaining document integrity across different devices and operating systems. In this article, you will learn how to convert Word to PDF in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Doc or Docx to PDF in Python

Spire.Doc for Python offers the Document.SaveToFile(string fileName, FileFormat fileFormat) method that allows to save Word as PDF, XPS, HTML, RTF, etc. If you just want to save your Word documents as regular PDFs without additional settings, follow the steps below.

  • Create a Document object.
  • Load a sample Word document using Document.LoadFromFile() method.
  • Save the document to PDF using Doucment.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create word document
document = Document()

# Load a doc or docx file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

#Save the document to PDF
document.SaveToFile("output/ToPDF.pdf", FileFormat.PDF)
document.Close()

Python: Convert Word to PDF

Convert Word to Password-Protected PDF in Python

To convert Word to a Password-Protected PDF, you can utilize the Document.SaveToFile(string fileName, ToPdfParameterList paramList) method, where the ToPdfParameterList parameter allows you to control the conversion process of a Word document into a PDF format. This includes options such as encrypting the document during the conversion. Here are the specific steps to accomplish this task.

  • Create a Document object.
  • Load a sample Word document using Document.LoadFromFile() method.
  • Create a ToPdfParameterList object, which is used to set conversion options.
  • Specify the open password and permission password and then set both passwords for the generated PDF using ToPdfParameterList.PdfSecurity.Encrypt() method.
  • Save the Word document to PDF with password using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a ToPdfParameterList object
parameter = ToPdfParameterList()

# Specify open password and permission password
openPsd = "abc-123"
permissionPsd = "permission"

# Protect the PDF to be generated with open password and permission password
parameter.PdfSecurity.Encrypt(openPsd, permissionPsd, PdfPermissionsFlags.Default, PdfEncryptionKeySize.Key128Bit)

# Save the Word document to PDF
document.SaveToFile("output/ToPdfWithPassword.pdf", parameter)
document.Close()

Python: Convert Word to PDF

Convert Word to PDF with Bookmarks in Python

Adding bookmarks to a document can improve its readability. When creating PDF from Word, you may want to keep the existing bookmarks or create new ones based on the headings. Here are the steps to convert Word to PDF while maintaining bookmarks.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Create a ToPdfParameterList object, which is used to set conversion options.
  • Create bookmarks in PDF from the headings in Word by setting ToPdfParameterList.CreateWordBookmarksUsingHeadings to true.
  • Save the document to PDF with bookmarks using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a ToPdfParameterList object
parames = ToPdfParameterList()

# Create bookmarks from Word headings
parames.CreateWordBookmarksUsingHeadings = True

# Create bookmarks in PDF from existing bookmarks in Word
# parames.CreateWordBookmarks = True

# Save the document to PDF
document.SaveToFile("output/ToPdfWithBookmarks.pdf", FileFormat.PDF)
document.Close()

Python: Convert Word to PDF

Convert Word to PDF with Fonts Embedded in Python

To ensure consistent appearance of a PDF document on any device, you probably need to embed fonts in the generated PDF document. The following are the steps to embed the fonts used in a Word document into the resulting PDF.

  • Create a Document object.
  • Load a sample Word file using Document.LoadFromFile() method.
  • Create a ToPdfParameterList object, which is used to set conversion options.
  • Embed fonts in generated PDF through ToPdfParameterList.IsEmbeddedAllFonts property.
  • Save the document to PDF using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
  • Python
from spire.doc import *
from spire.doc.common import *
        
# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a ToPdfParameterList object
parameter = ToPdfParameterList()

# Embed fonts in PDF
parameter.IsEmbeddedAllFonts = True

# Save the Word document to PDF
document.SaveToFile("output/EmbedFonts.pdf", parameter)
document.Close()

Python: Convert Word to PDF

Set Image Quality When Converting Word to PDF in Python

When converting a Word document to PDF, it is important to consider the size of the resulting file, especially if it contains numerous high-quality images. You have the option to compress the image quality during the conversion process. To do this, follow the steps below.

  • Create a Document object.
  • Load a sample Word file using Document.LoadFromFile() method.
  • Set the image quality through Document.JPEGQuality property.
  • Save the document to PDF using Doucment.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *
        
# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Compress image to 40% of its original quality
document.JPEGQuality = 40

# Preserve original image quality
# document.JPEGQuality = 100

# Save the Word document to PDF
document.SaveToFile("output/SetImageQuality.pdf", FileFormat.PDF)
document.Close()

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Page 2 of 2
page 2