Conversion

Conversion (34)

Convert HTML to Word and Word to HTML using C# .NET

Microsoft Word and HTML (Hypertext Markup Language) are two of the most widely used formats worldwide. Microsoft Word is the go-to solution for crafting rich, feature-packed documents such as reports, proposals, and print-ready files, while HTML is the foundational language that powers content on the web. Understanding how to effectively convert between these formats can enhance document usability and accessibility.

In this article, we will provide a detailed step-by-step guide on converting HTML to Word and Word to HTML in .NET using C#. It covers the following topics:

Why Convert Between Word and HTML?

Before diving into the technical details, let's understand why you might need to convert between Word and HTML:

  • Cross-Platform Accessibility: HTML is the backbone of web pages, while Word documents are industry-standard for creating, sharing and editing content. Converting between them enables content to be accessible and editable across different platforms.
  • Rich Formatting: Word documents support complex formatting and elements; converting HTML to Word lets users retain formatting when exporting web content.
  • Document Archiving and Data Exchange: Archive HTML content as Word or publish Word-based reports to the web.

.NET Word Library Installation

The .NET framework does not natively support HTML or Word conversions. To bridge this gap, Spire.Doc for .NET provides a powerful, developer-friendly API for document creation, manipulation, and conversion—without requiring Microsoft Office or Interop libraries.

Install Spire.Doc for .NET

Before getting started with the conversion, you need to install Spire.Doc for .NET through one of the following methods:

Method 1: Install via NuGet

Run the following command in the NuGet Package Manager Console:

Install-Package Spire.Doc

Method 2: Manually Add the DLLs

You can also download the Spire.Doc for .NET package, extract the files, and then reference Spire.Doc.dll manually in your Visual Studio project.

How to Convert HTML to Word Using C#

Spire.Doc enables you to load HTML files or HTML strings and save them as Word documents. Let’s see how to implement these conversions.

Convert HTML String to Word

To convert an HTML string to Word format, follow these steps:

  • Create a Document Object: Instantiate a new Document object.
  • Add a Section and Paragraph: Create a section in the document and add a paragraph.
  • Append HTML String: Use the Paragraph.AppendHTML() method to include the HTML content.
  • Save the Document: Save the document using Document.SaveToFile() with the desired format (e.g., Docx).

Example code

using Spire.Doc;
using Spire.Doc.Documents;
using System.IO;

namespace ConvertHtmlStringToWord
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document document = new Document();

            // Add a section to the document
            Section section = document.AddSection();

            // Set the page margins
            section.PageSetup.Margins.All = 2;

            // Add a paragraph to the section
            Paragraph paragraph = section.AddParagraph();

            // Read HTML string from a file
            string htmlFilePath = @"C:\Users\Administrator\Desktop\Html.html";
            string htmlString = File.ReadAllText(htmlFilePath, System.Text.Encoding.UTF8);

            // Append the HTML string to the paragraph
            paragraph.AppendHTML(htmlString);

            // Save the document to a Word file
            document.SaveToFile("AddHtmlStringToWord.docx", FileFormat.Docx);

            // Dispose resources
            document.Dispose();
        }
    }
}

Convert HTML String to Word using C# .NET

Convert HTML File to Word

If you have existing HTML files, converting them to Word is straightforward. Here’s how to do that:

  • Create a Document Object: Instantiate a new Document object.
  • Load the HTML File: Use Document.LoadFromFile() to load the HTML file.
  • Save as Word Format: Save the document using Document.SaveToFile() with the desired format (e.g., Docx).

Example Code

using Spire.Doc;

namespace ConvertHtmlToWord
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document document = new Document();
            // Load the HTML file
            document.LoadFromFile(@"C:\Users\Administrator\Desktop\MyHtml.html", FileFormat.Html);

            // Save the file as a Word document
            document.SaveToFile("HtmlToWord.docx", FileFormat.Docx);

            // Dispose resources
            document.Dispose();
        }
    }
}

Convert HTML File to Word using C# .NET

How to Convert Word to HTML Using C#

Spire.Doc also supports exporting Word documents (such as .docx and .doc) to HTML format. You can perform basic conversion with default behavior, or customize the output using advanced settings.

Basic Word to HTML Conversion

To convert a Word document to an HTML file using default settings, follow these steps:

  • Create a Document Object: Instantiate a new Document object.
  • Load the Word Document: Use Document.LoadFromFile() to load the Word document.
  • Save as HTML File: Save the document using Document.SaveToFile() with HTML as the format.

Example Code

using Spire.Doc;

namespace BasicWordToHtmlConversion
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document document = new Document();
            // Load the Word document
            document.LoadFromFile("input.docx");

            // Save the document as an HTML file
            document.SaveToFile("BasicWordToHtmlConversion.html", FileFormat.Html);

            // Dispose resources
            document.Dispose();
        }
    }
}

Advanced Word to HTML Conversion Settings

To tailor the conversion process, use the HtmlExportOptions class, which allows you to adjust a variety of settings, including:

  • Whether to export the document's styles.
  • Whether to embed images in the converted HTML.
  • Whether to export headers and footers.
  • Whether to export form fields as text.

Follow these steps to convert a Word document to HTML with customized options:

  • Create a Document Object: Instantiate a new Document object.
  • Load the Word Document: Use Document.LoadFromFile() to load the Word document.
  • Get HtmlExportOptions: Access the HtmlExportOptions through Document.HtmlExportOptions.
  • Customize Conversion Settings: Modify the properties of HtmlExportOptions to customize the conversion.
  • Save as HTML File: Save the document using Document.SaveToFile() with HTML as the format.

Example Code

using Spire.Doc;

namespace AdvancedWordToHtmlConversion
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a Document object
            Document doc = new Document();

            //Load a Word document
            doc.LoadFromFile("sample.docx");

            HtmlExportOptions htmlExportOptions = doc.HtmlExportOptions;
            // Set whether to export the document styles
            htmlExportOptions.IsExportDocumentStyles = true;
            // Set whether to embed the images in the HTML
            htmlExportOptions.ImageEmbedded = true;
            // Set the type of the CSS style sheet
            htmlExportOptions.CssStyleSheetType = CssStyleSheetType.Internal;
            // Set whether to export headers and footers
            htmlExportOptions.HasHeadersFooters = true;
            // Set whether to export form fields as text
            htmlExportOptions.IsTextInputFormFieldAsText = false;

            // Save the document as an HTML file
            doc.SaveToFile("AdvancedWordToHtmlConversion.html", FileFormat.Html);
            doc.Close();
        }
    }
}

Conclusion

Converting HTML to Word and Word to HTML using C# and the Spire.Doc library is a seamless process that enhances document management and accessibility. By following the detailed steps outlined in this tutorial, developers can easily implement these conversions in their applications, improving workflow and productivity.

FAQs

Q1: Is it possible to batch convert multiple Word files to HTML using C#?

A1: Yes, you can loop through a list of Word files and apply the conversion logic in your C# code.

Q2: What types of HTML elements are supported during conversion to Word?

A2: Spire.Doc supports a wide range of HTML elements, including text, tables, images, lists, and more. However, certain elements not supported by Microsoft Word may also not be rendered correctly in Spire.Doc.

Q3: Can I convert formats other than HTML and Word?

A3: Yes. Spire.Doc supports various file format conversions, such as Word to PDF, Markdown to Word, Word to Markdown, RTF to Word, RTF to PDF.

Q4: Is Spire.Doc free to use?

A4: Spire.Doc offers a free version for lightweight use, but for extensive features and commercial use, a licensed version is recommended.

Get a Free License

To fully experience the capabilities of Spire.Doc for .NET without any evaluation limitations, you can request a free 30-day trial license.

C#/VB.NET: Convert Word to HTML

2022-03-27 06:14:00 Written by Koohji

When you'd like to put a Word document on the web, it's recommended that you should convert the document to HTML in order to make it accessible via a web page. This article will demonstrate how to convert Word to HTML programmatically in C# and VB.NET using Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Convert Word to HTML

The following steps show you how to convert Word to HTML using Spire.Doc for .NET.

  • Create a Document instance.
  • Load a Word sample document using Document.LoadFromFile() method.
  • Save the document as an HTML file using Document.SaveToFile() method.
  • C#
  • VB.NET
using Spire.Doc;

namespace WordToHTML
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a Document instance
            Document mydoc = new Document();

            //Load a Word document
            mydoc.LoadFromFile("sample.docx");

            //Save to HTML
            mydoc.SaveToFile("WordToHTML.html", FileFormat.Html);
        }
    }
}

C#/VB.NET: Convert Word to HTML

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Text files are simple and versatile, but they don't support formatting options and advanced features like headers, footers, page numbers, and styles, and cannot include multimedia content like images or tables. Additionally, spell-checking and grammar-checking features are also not available in plain text editors.

If you need to add formatting, multimedia content, or advanced features to a text document, you'll need to convert it to a more advanced format like Word. Similarly, if you need to simplify the formatting of a Word document, reduce its file size, or work with its content using basic tools, you might need to convert it to a plain text format. In this article, we will explain how to convert text files to Word format and convert Word files to text format in C# and VB.NET using Spire.Doc for .NET library.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Convert a Text File to Word Format in C# and VB.NET

Spire.Doc for .NET offers the Document.LoadText(string fileName) method which enables you to load a text file. After the text file is loaded, you can easily save it in Word format by using the Document.SaveToFile(string fileName, FileFormat fileFormat) method. The detailed steps are as follows:

  • Initialize an instance of the Document class.
  • Load a text file using the Document.LoadText(string fileName) method.
  • Save the text file in Word format using the Document.SaveToFile(string fileName, FileFormat fileFormat) method.
  • C#
  • VB.NET
using Spire.Doc;

namespace ConvertTextToWord
{
    internal class Program
    {
        static void Main(string[] args)
        {
            //Initialize an instance of the Document class
            Document doc = new Document();
            //Load a text file
            doc.LoadText("Sample.txt");

            //Save the text file in Word format
            doc.SaveToFile("TextToWord.docx", FileFormat.Docx2016);
            doc.Close();
        }
    }
}

C#/VB.NET: Convert Text to Word or Word to Text

Convert a Word File to Text Format in C# and VB.NET

To convert a Word file to text format, you just need to load the Word file using the Document.LoadFromFile(string fileName) method, and then call the Document.SaveToFile(string fileName, FileFormat fileFormat) method to save it in text format. The detailed steps are as follows:

  • Initialize an instance of the Document class.
  • Load a Word file using the Document.LoadFromFile(string fileName) method.
  • Save the Word file in text format using the Document.SaveToFile(string fileName, FileFormat fileFormat) method.
  • C#
  • VB.NET
using Spire.Doc;

namespace ConvertWordToText
{
    internal class Program
    {
        static void Main(string[] args)
        {
            //Initialize an instance of the Document class
            Document doc = new Document();
            //Load a Word file
            doc.LoadFromFile(@"Sample.docx");

            //Save the Word file in text format
            doc.SaveToFile("WordToText.txt", FileFormat.Txt);
            doc.Close();
        }
    }
}

C#/VB.NET: Convert Text to Word or Word to Text

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

C#/VB.NET: Convert XML to PDF

2022-03-04 06:05:00 Written by Koohji

An Extensible Markup Language (XML) file is a standard text file that utilizes customized tags to describe the structure and other features of a document. By converting XML to PDF, you make it easier to share with others since PDF is a more common and ease-to-access file format. This article will demonstrate how to convert XML to PDF in C# and VB.NET using Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Convert XML to PDF

The following are steps to convert XML to PDF using Spire.Doc for .NET.

  • Create a Document instance.
  • Load an XML sample document using Document.LoadFromFile() method.
  • Save the document as a PDF file using Document.SaveToFile() method.
  • C#
  • VB.NET
using Spire.Doc;

namespace XMLToPDF
{
   class Program
    {
        static void Main(string[] args)
        {
            //Create a Document instance
            Document mydoc = new Document();
            //Load an XML sample document
            mydoc.LoadFromFile(@"XML Sample.xml", FileFormat.Xml);
            //Save it to PDF
            mydoc.SaveToFile("XMLToPDF.pdf", FileFormat.PDF);
            
        }
    }
}

C#/VB.NET: Convert XML to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

convert word to pdf in c#

Converting Word documents to PDF is a common requirement in many C# applications, but relying on Microsoft Office Interop can be cumbersome and inefficient. Fortunately, third-party libraries like Spire.Doc for .NET provide a powerful and seamless alternative for high-quality conversions without Interop dependencies. Whether you need to preserve formatting, secure PDFs with passwords, or optimize file size, Spire.Doc offers a flexible solution with minimal code.

In this guide, we’ll explore how to convert Word to PDF in C# using Spire.Doc, covering basic conversions, advanced customization, and best practices for optimal results.

C# .NET Library for Converting Word to PDF

Spire.Doc for .NET is a robust API that enables developers to create, edit, and convert Word documents programmatically. It supports converting Word (DOC, DOCX) to PDF while preserving formatting, images, hyperlinks, and other elements.

With Spire.Doc, you can benefit from:

  • High-fidelity conversion with minimal formatting loss
  • Support for password-protected PDFs
  • Customizable PDF settings (PDF/A compliance, font embedding, etc.)
  • Batch conversion of multiple Word files

To get started, download Spire.Doc from the official website and reference the DLLs in your project. Or, you can install it via NuGet through the following command:

PM> Install-Package Spire.Doc

Basic DOCX to PDF Conversion Example

Converting Word documents to PDFs using Spire.Doc is a simple process that requires minimal code. The following example demonstrates how to load a DOCX file and save it as a PDF with default settings.

  • C#
using Spire.Doc;

namespace ConvertWordToPdf
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document doc = new Document();

            // Load a Word document
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

            // Save the document to PDF
            doc.SaveToFile("ToPDF.pdf", FileFormat.PDF);

            // Dispose resources
            doc.Dispose();
        }
    }
}

In this example:

  • A Document object is instantiated to manage the Word file.
  • The LoadFromFile method loads the DOCX file from the specified path.
  • The SaveToFile method converts and saves the document in PDF format.
  • Finally, the Dispose method is called to release resources used by the Document object.

This straightforward approach allows for quick and efficient conversion of DOCX files into PDFs with just a few lines of code.

Result:

Comparsion of the input Word file and the ouput PDF file generated from Word.

Advanced Word to PDF Conversion Options

To gain greater control over the conversion process, Spire.Doc offers the ToPdfParameterList class. With this class, you can:

  • Convert to PDF/A (a standardized archival format)
  • Apply password protection and permission restrictions
  • Embed fonts to ensure consistent rendering
  • Preserve bookmarks for better navigation
  • Disable hyperlinks if necessary

Here’s a summary of available options:

Option Implemented by
Convert to PDF/A PdfConformanceLevel
Protect PDF with a passoword PdfSecurity
Restrict permessions (e.g., printing) PdfSecurity
Embed all fonts IsEmbeddedAllFonts
Embed specific fonts EmbeddedFontNameList
Preserve bookmarks CreateWordsBookmarks
Create bookmarks from headings CreateWordBookmarksUsingHeadings
Disable hyperlinks DisableLink

Example 1: Convert Word to Password-Protected PDF

When sharing confidential Word documents as PDFs, a simple conversion isn't enough. Spire.Doc lets you add military-grade password protection by using the PdfSecurity.Encrypt method, preventing unauthorized access while maintaining perfect formatting.

The following code encrypts the generated PDF document with an open password:

  • C#
using Spire.Doc;

namespace ConvertWordToPasswordProtectedPdf
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document doc = new Document();

            // Load a Word document
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

            // Create a ToPdfParameterList object
            ToPdfParameterList parameters = new ToPdfParameterList();

            // Set an open password
            parameters.PdfSecurity.Encrypt("openPsd");

            // Save the Word to PDF with options
            doc.SaveToFile("PasswordProtected.pdf", parameters);

            // Dispose resources
            doc.Dispose();
        }
    }
}

Advanced Contol

Want even more control? Combine with document permessions:

  • C#
parameters.PdfSecurity.Encrypt("openPsd", "permissionPsd", PdfPermissionsFlags.Print, PdfEncryptionKeySize.Key128Bit);
doc.SaveToFile("PasswordProtected.pdf", parameters);

Example 2: Ensure Consistent Text Rendering by Embedding Fonts in PDF

When converting Word to PDF, fonts may appear differently (or even as gibberish) if the viewer’s system lacks the original fonts used in your document. Spire.Doc solves this by embedding fonts directly into the PDF, guaranteeing that text displays exactly as intended—regardless of the device or software used to open the file.

The following code embeds all fonts when converting Word to PDF in C#:

  • C#
using Spire.Doc;

namespace EmbedFonts
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document doc = new Document();

            // Load a Word document
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Sample.docx");

            // Create a ToPdfParameterList object
            ToPdfParameterList parameters = new ToPdfParameterList();

            // Embed all the fonts used in Word in the generated PDF
            parameters.IsEmbeddedAllFonts = true;

            // Save the document to PDF
            doc.SaveToFile("EmbedFonts.pdf", parameters);

            // Dispose resources
            doc.Dispose();
        }
    }
}

Advanced Contol:

To reduce file size, you can selectively embed fonts (e.g., only your custom font, not common ones like Arial):

  • C#
parameters.PrivateFontPaths = new List() 
 { 
     new PrivateFontPath("YourCustomFont", "FontPath"),
     new PrivateFontPath("AnotherFont", "FontPath")
 };
 doc.SaveToFile("EmbedCustomFonts.pdf", parameters);

Adjust Word Documents for Optimal Conversion

To achieve the best PDF output, you may need to prepare your Word document before conversion. Consider the following adjustments:

Example: Reduce PDF Size by Compressing Images

Large image-heavy Word documents often create bloated PDFs that are difficult to share. With Spire.Doc, you can automatically optimize images during conversion, dramatically reducing file size while maintaining acceptable quality.

The following code reduces image quality to 50%, resulting in a smaller PDF:

  • C#
using Spire.Doc;

namespace SetImageQualityWhenConverting
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document doc = new Document();

            // Load a Word document
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

            // Reduce image quality to 50%
            doc.JPEGQuality = 50;

            // Save the document to PDF
            doc.SaveToFile("CompressImage.pdf", FileFormat.PDF);

            // Dispose resources
            doc.Dispose();
        }
    }
}

Conclusion

Converting Word documents to PDF in C# doesn’t have to be complicated—Spire.Doc for .NET simplifies the process and offers extensive customization options, from basic conversions to advanced features like PDF encryption, font embedding, and image compression, all without Interop.

By following the techniques outlined in this guide, you can efficiently integrate Word-to-PDF functionality into your applications. For further assistance, explore Spire.Doc’s documentation or leverage its free trial to test its capabilities.

FAQs

Q1: How do I convert multiple Word files to PDFs in C#?

A: You can create a loop in your code to process multiple files at once. For example:

  • C#
string[] files = Directory.GetFiles("input_folder", "*.docx");
foreach (string file in files)
{
    Document document = new Document();
    document.LoadFromFile(file);
    document.SaveToPDF(Path.ChangeExtension(file, ".pdf"), FileFormat.PDF);
    document.Dispose();
}

Q2: How to merge multiple Word files into a single PDF?

A: You can merge Word files first (using Spire.Doc), and then convert the combined document to PDF. For example:

  • C#
Document mergedDoc = new Document();
string[] filesToMerge = Directory.GetFiles("input_folder ", "*.docx");
foreach (string file in filesToMerge)
{
    mergedDoc.InsertTextFromFile(file, FileFormat.Docx);
}
mergedDoc.SaveToFile("Merged.pdf", FileFormat.PDF);
mergedDoc.Dispose();

Q3: Why is my converted PDF missing text or formatting?

A: This issue may arise from missing custom fonts on your system. To resolve it, install the required fonts on the machine performing the conversion. Alternatively, you can embed the fonts directly into the PDF using Spire.Doc during the conversion process.

Q4: Is Spire.Doc free for Word-to-PDF conversion?

A: No, Spire.Doc is a paid library. However, a free version is available with limited functionality, allowing users to convert only the first three pages of a Word document to PDF. This option is ideal for small projects or personal use.

Get a Free License

To fully experience the capabilities of Spire.Doc for .NET without any evaluation limitations, you can request a free 30-day trial license.

C#/VB.NET: Convert Word to XML

2022-05-31 07:20:00 Written by Administrator

XML is a markup language and file format designed mainly to store and transmit arbitrary content. XML files have the feature of simplicity, generality, and usability, making them popular, especially among web servers. XML and HTML are two important markup languages on the web, but XML focuses on storing and transmitting data while HTML focuses on displaying webpages. This article demonstrates how to convert Word documents to XML files with the help of Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Convert a Word Document to an XML File

The detailed steps are as follows:

  • Create an object of Document class.
  • Load the Word document from disk using Document.LoadFromFile().
  • Save the Word document as an XML file using Document.SaveToFile().
  • C#
  • VB.NET
using System;
using Spire.Doc;

namespace WordtoXML
{
    internal class Program
    {
        static void Main(string[] args)
        {
            //Create an object of Document class
            Document document = new Document();

            //Load a Word document from disk
            document.LoadFromFile(@"D:\testp\test.docx");

            //Save the Word document as an XML file
            document.SaveToFile("Sample.xml", FileFormat.Xml);
        }
    }
}

C#/VB.NET: Convert Word to XML

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Page 3 of 3
page 3