Displaying items by tag: doc java Conversion

Friday, 04 July 2025 08:35

How to Convert TXT to Word or Word to TXT with Java Code

cover page of converting txt to word with java

Plain text (.txt) files are simple and widely used, but they lack formatting and structure. If you need to enhance a TXT file with headings, fonts, tables, or images, converting it to a Word (.docx) file is a great solution.

In this tutorial, you'll learn how to convert a .txt file to a .docx Word document in Java using Spire.Doc for Java — a powerful library for Word document processing.

Why choose Spire.Doc for Java:

The converted Word document preserves the line breaks and content from the TXT file.
You can further modify fonts, add styles, or insert images using Spire.Doc's rich formatting APIs.
Supported various output formats, including converting Word to PDF, Excel, TIFF, PostScript, etc.

Prerequisites

To convert TXT to Word with Spire.Doc for Java smoothly, you should download it from its official download page and add the Spire.Doc.jar file as a dependency in your Java program.

If you are using Maven, you can easily import the JAR file by adding the following code to your project's pom.xml file:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.6.2</version>
    </dependency>
</dependencies>

Steps to Convert TXT to Word in Java

Now let's take a look at how to implement it in code. With Spire.Doc for Java, the process is straightforward. You can complete the conversion with just a few lines — no need for manual formatting or additional dependencies.

To help you better understand the code:

Document is the core class that acts as an in-memory representation of a Word document.
loadFromFile() uses internal parsers to read .txt content and wrap it into a single Word section with default font and margins.
When saveToFile() is called, Spire.Doc automatically converts the plain text into a .docx file by generating a structured Word document in the OpenXML format.

Below is a step-by-step code example to help you get started quickly:

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class ConvertTextToWord {

    public static void main(String[] args) {

        // Create a Text object
        Document txt = new Document();

        // Load a Word document
        txt.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.txt");

        // Save the document to Word
        txt.saveToFile("ToWord.docx", FileFormat.Docx);

        // Dispose resources
        doc.dispose();
    }
}

RESULT:

result of converting txt to word with spire doc for java

Tip:

After converting TXT files to DOC/DOCX, you can further customize the document's formatting as needed. To simplify this process, Spire.Doc for Java provides built-in support for editing text properties such as changing font color, inserting footnote, adding text and image watermark, etc.

How to Convert Word to TXT with Java

Except for TXT to Word conversion, Spire.Doc for Java also supports converting DOC/DOCX files to TXT format, making it easy to extract plain text from richly formatted Word documents. This functionality is especially useful when you need to strip out styling and layout to work with clean, raw content — such as for text analysis, search indexing, archiving, or importing into other systems that only support plain text.

Simply copy the code below and run the code to manage conversion:

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class ConvertWordtoText {

    public static void main(String[] args) {

        // Create a Doc object
        Document doc = new Document();

        // Load a Word document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.doc");

        // Save the document to Word
        doc.saveToFile("ToText.txt", FileFormat.Txt);

        // Dispose resources
        doc.dispose();
    }
}

RESULT:

result of converting word to txt with spire doc for java

Get a Free License

To remove evaluation watermarks and unlock full features, you can request a free 30-day license.

Conclusion

With Spire.Doc for Java, converting TXT to Word is fast, accurate, and doesn't require Microsoft Word to be installed. This is especially useful for Java developers working on reporting, document generation, or file conversion tools. Don't hesitate and give it a try now.

Published in Conversion

Tuesday, 08 April 2025 00:57

How to Convert HTML to Word in Java (Complete Guide)

Java Guide to Convert HTML to Word while Preserving Formatting

Converting HTML to Word in Java is essential for developers building reporting tools, content management systems, and enterprise applications. While HTML powers web content, Word documents offer professional formatting, offline accessibility, and easy editing, making them ideal for reports, invoices, contracts, and formal submissions.

This comprehensive guide demonstrates how to use Java and Spire.Doc for Java to convert HTML to Word. It covers everything from converting HTML files and strings, batch processing multiple files, and preserving formatting and images.

Why Convert HTML to Word in Java
Set Up Spire.Doc for Java
Convert HTML File to Word in Java
Convert HTML String to Word in Java
Batch Conversion of Multiple HTML Files to Word in Java
Best Practices for HTML to Word Conversion
Conclusion
FAQs

Why Convert HTML to Word in Java?

Converting HTML to Word offers several advantages:

Flexible editing – Add comments, track changes, and review content easily.
Consistent formatting – Preserve layouts, fonts, and styles across documents.
Professional appearance – DOCX files look polished and ready to share.
Offline access – Word files can be opened without an internet connection.
Integration – Word is widely supported across tools and industries.

Common use cases: exporting HTML reports from web apps, archiving dynamic content in editable formats, and generating formal reports, invoices, or contracts.

Set Up Spire.Doc for Java

Spire.Doc for Java is a robust library that enables developers to create Word documents, edit existing Word documents, and read and convert Word documents in Java without requiring Microsoft Word to be installed.

Before you can convert HTML content into Word documents, it’s essential to properly install and configure Spire.Doc for Java in your development environment.

1. Java Version Requirement

Ensure that your development environment is running Java 6 (JDK 1.6) or a higher version.

2. Installation

Option 1: Using Maven

For projects managed with Maven, you can add the repository and dependency to your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

For a step-by-step guide on Maven installation and configuration, refer to our article**:** How to Install Spire Series Products for Java from Maven Repository.

Option 2. Manual JAR Installation

For projects without Maven, you can manually add the library:

Download Spire.Doc.jar from the official website.
Add it to your project classpath.

Convert HTML File to Word in Java

If you already have an existing HTML file, converting it into a Word document is straightforward and efficient. This method is ideal for situations where HTML reports, templates, or web content need to be transformed into professionally formatted, editable Word files.

By using Spire.Doc for Java, you can preserve the original layout, text formatting, tables, lists, images, and hyperlinks, ensuring that the converted document remains faithful to the source. The process is simple, requiring only a few lines of code while giving you full control over page settings and document structure.

Conversion Steps:

Create a new Document object.
Load the HTML file with loadFromFile().
Adjust settings like page margins.
Save the output as a Word document with saveToFile().

Example:

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.Section;
import com.spire.doc.documents.XHTMLValidationType;

public class ConvertHtmlFileToWord {
    public static void main(String[] args) {
        // Create a Document object
        Document document = new Document();

        // Load an HTML file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\sample.html",
                FileFormat.Html,
                XHTMLValidationType.None);

        // Adjust margins
        Section section = document.getSections().get(0);
        section.getPageSetup().getMargins().setAll(2);

        // Save as Word file
        document.saveToFile("output/FromHtmlFile.docx", FileFormat.Docx);

        // Release resources
        document.dispose();

        System.out.println("HTML file successfully converted to Word!");
    }
}

Convert HTML file to Word in Java using Spire.Doc for Java

You may also be interested in: Java: Convert Word to HTML

Convert HTML String to Word in Java

In many real-world applications, HTML content is generated dynamically - whether it comes from user input, database records, or template engines. Converting these HTML strings directly into Word documents allows developers to create professional, editable reports, invoices, or documents on the fly without relying on pre-existing HTML files.

Using Spire.Doc for Java, you can render rich HTML content, including headings, lists, tables, images, hyperlinks, and more, directly into a Word document while preserving formatting and layout.

Conversion Steps:

Create a new Document object.
Add a section and adjust settings like page margins.
Add a paragraph.
Add the HTML string to the paragraph using appendHTML().
Save the output as a Word document with saveToFile().

Example:

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.Section;
import com.spire.doc.documents.Paragraph;

public class ConvertHtmlStringToWord {
    public static void main(String[] args) {
        // Sample HTML string
        String htmlString = "<h1>Java HTML to Word Conversion</h1>" +
                "<p><b>Spire.Doc</b> allows you to convert HTML content into Word documents seamlessly. " +
                "This includes support for headings, paragraphs, lists, tables, links, and images.</p>" +
                "<h2>Features</h2>" +
                "<ul>" +
                "<li>Preserve text formatting such as <i>italic</i>, <u>underline</u>, and <b>bold</b></li>" +
                "<li>Support for ordered and unordered lists</li>" +
                "<li>Insert tables with multiple rows and columns</li>" +
                "<li>Add hyperlinks and bookmarks</li>" +
                "<li>Embed images from URLs or base64 strings</li>" +
                "</ul>" +
                "<h2>Example Table</h2>" +
                "<table border='1' style='border-collapse:collapse;'>" +
                "<tr><th>Item</th><th>Description</th><th>Quantity</th></tr>" +
                "<tr><td>Notebook</td><td>Spire.Doc Java Guide</td><td>10</td></tr>" +
                "<tr><td>Pen</td><td>Blue Ink</td><td>20</td></tr>" +
                "<tr><td>Marker</td><td>Permanent Marker</td><td>5</td></tr>" +
                "</table>" +
                "<h2>Links and Images</h2>" +
                "<p>Visit <a href='https://www.e-iceblue.com/'>E-iceblue Official Site</a> for more resources.</p>" +
                "<p>Sample Image:</p>" +
                "<img src='https://www.e-iceblue.com/images/intro_pic/Product_Logo/doc-j.png' alt='Product Logo' width='150' height='150'/>" +
                "<h2>Conclusion</h2>" +
                "<p>Using Spire.Doc, Java developers can easily generate Word documents from rich HTML content while preserving formatting and layout.</p>";

        // Create a Document
        Document document = new Document();

        // Add section and paragraph
        Section section = document.addSection();
        section.getPageSetup().getMargins().setAll(72);

        Paragraph paragraph = section.addParagraph();

        // Render HTML string
        paragraph.appendHTML(htmlString);

        // Save as Word
        document.saveToFile("output/FromHtmlString.docx", FileFormat.Docx);

        document.dispose();

        System.out.println("HTML string successfully converted to Word!");
    }
}

Convert HTML String to Word in Java using Spire.Doc for Java

Batch Conversion of Multiple HTML Files to Word in Java

Sometimes you may need to convert hundreds of HTML files into Word documents. Here’s how to batch process them in Java.

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.documents.XHTMLValidationType;
import java.io.File;

public class BatchConvertHtmlToWord {
    public static void main(String[] args) {
        File folder = new File("C:\\Users\\Administrator\\Desktop\\HtmlFiles");

        for (File file : folder.listFiles()) {
            if (file.getName().endsWith(".html") || file.getName().endsWith(".htm")) {
                Document document = new Document();
                document.loadFromFile(file.getAbsolutePath(), FileFormat.Html, XHTMLValidationType.None);

                String outputPath = "output/" + file.getName().replace(".html", ".docx");
                document.saveToFile(outputPath, FileFormat.Docx);
                document.dispose();

                System.out.println(file.getName() + " converted to Word!");
            }
        }
    }
}

This approach is great for reporting systems where multiple HTML reports are generated daily.

Best Practices for HTML to Word Conversion

Use Inline CSS for Reliable Styling
Inline CSS ensures that fonts, colors, and spacing are preserved during conversion. External stylesheets may not always render correctly, especially if they are not accessible at runtime.
Validate HTML Structure
Well-formed HTML with proper nesting and closed tags helps render tables, lists, and headings accurately.
Optimize Images
Use absolute URLs or embed images as base64. Resize large images to fit Word layouts and reduce file size.
Manage Resources in Batch Conversion
When processing multiple files, convert them one by one and call dispose() after each document to prevent memory issues.
Preserve Page Layouts
Set page margins, orientation, and paper size to ensure the Word document looks professional, especially for reports and formal documents.

Conclusion

Converting HTML to Word in Java is an essential feature for many enterprise applications. Using Spire.Doc for Java, you can:

Convert HTML files into Word documents.
Render HTML strings directly into DOCX.
Handle batch processing for multiple files.
Preserve images, tables, and styles with ease.

By following the examples and best practices above, you can integrate HTML to Word conversion seamlessly into your Java applications.

FAQs (Frequently Asked Questions)

Q1. Can Java convert multiple HTML files into one Word document?

A1: Yes. Instead of saving each file separately, you can load multiple HTML contents into the same Document and then save it once.

Q2. How to preserve CSS styles during HTML to Word conversion?

A2: Inline CSS will be preserved; external stylesheets can also be applied if they’re accessible at run time.

Q3. Can I generate a Word document directly from a web page?

A3: Yes. You can fetch the HTML using an HTTP client in Java, then pass it into the conversion method.

Q4. What Word formats are supported for saving the converted document?

A4: You can save as DOCX, DOC, or other Word-compatible formats supported by Spire.Doc. DOCX is recommended for modern applications due to its compatibility and smaller file size.

Published in Conversion

Friday, 22 November 2024 01:48

Java: Convert RTF to HTML, Image

Converting RTF to HTML helps improve accessibility as HTML documents can be easily displayed in web browsers, making them accessible to a global audience. While converting RTF to images can help preserve document layout as images can accurately represent the original document, including fonts, colors, and graphics. In this article, you will learn how to convert RTF to HTML or images in Java using Spire.Doc for Java.

Convert RTF to HTML in Java
Convert RTF to Image in Java

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Convert RTF to HTML in Java

Converting RTF to HTML ensures that the document can be easily viewed and edited in any modern web browser without requiring any additional software.

With Spire.Doc for Java, you can achieve RTF to HTML conversion through the Document.saveToFile(String fileName, FileFormat.Html) method. The following are the detailed steps.

Create a Document instance.
Load an RTF document using Document.loadFromFile() method.
Save the RTF document in HTML format using Document.saveToFile(String fileName, FileFormat.Html) method.

Java

import com.spire.doc.*;

public class RTFToHTML {
    public static void main(String[] args) {
        // Create a Document instance
        Document document = new Document();

        // Load an RTF document
        document.loadFromFile("input.rtf", FileFormat.Rtf);

        // Save as HTML format
        document.saveToFile("RtfToHtml.html", FileFormat.Html);
        document.dispose();
    }
}

Convert a Word document to an HTML file

Convert RTF to Image in Java

To convert RTF to images, you can use the Document.saveToImages() method to convert an RTF file into individual Bitmap or Metafile images. Then, the Bitmap or Metafile images can be saved as a BMP, EMF, JPEG, PNG, GIF, or WMF format files. The following are the detailed steps.

Create a Document object.
Load an RTF document using Document.loadFromFile() method.
Convert the document to images using Document.saveToImages() method.
Iterate through the converted image, and then save each as a PNG file.

Java

import com.spire.doc.*;
import com.spire.doc.documents.*;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;

public class RTFtoImage {
    public static void main(String[] args) throws Exception{
        // Create a Document instance
        Document document = new Document();

        // Load an RTF document
        document.loadFromFile("input.rtf", FileFormat.Rtf);

        // Convert the RTF document to images
        BufferedImage[] images = document.saveToImages(ImageType.Bitmap);

        // Iterate through the image collection
        for (int i = 0; i < images.length; i++) {

            // Get the specific image
            BufferedImage image = images[i];

            // Save the image as png format
            File file = new File("Images\\" + String.format(("Image-%d.png"), i));
            ImageIO.write(image, "PNG", file);
        }
    }
}

Convert all pages in a Word document to multiple PNG images

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Tuesday, 28 May 2024 01:25

Java: Convert Markdown to Word and PDF

Markdown is a popular format among writers and developers for its simplicity and readability, allowing content to be formatted using easy-to-write plain text syntax. However, converting Markdown files to universally accessible formats like Word documents and PDF files is essential for sharing documents with readers, enabling complex formatting, and ensuring capability and consistency across devices and platforms. This article demonstrates how to convert Markdown files to Word and PDF files with the powerful library Spire.Doc for Java, enhancing the versatility and distribution potential of your written content.

Convert a Markdown File to a Word Document with Java
Convert a Markdown File to a PDF Document with Java
Customizing Page Settings of the Result Document

Install Spire.Doc for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Convert a Markdown File to a Word Document with Java

Spire.Doc for Java provides a simple way to convert Markdown format to Word and PDF document formats by using the Document.loadFromFile(String: fileName, FileFormat.Markdown) method to load the Markdown file and the Document.saveToFile(String: fileName, FileFormat: fileFormat) method to save the file as a Word or PDF document.

It should be noted that since images are stored as links in Markdown files, they need to be further processed after conversion if they are to be retained.

The detailed steps for converting a Markdown file to a Word document are as follows:

Create an instance of Document class.
Load a Markdown file using Document.loadFromFile(String: fileName, FileFormat.Markdown) method.
Save the Markdown file as Word document using Document.saveToFile(String: fileName, FileFormat.Docx) method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class MarkdownToWord {
    public static void main(String[] args) {
        // Create an instance of Document
        Document doc = new Document();

        // Load a Markdown file
        doc.loadFromFile("Sample.md", FileFormat.Markdown);

        // Save the Markdown file as Word document
        doc.saveToFile("output/MarkdownToWord.docx", FileFormat.Docx);
        doc.dispose();
    }
}

Java: Convert Markdown to Word and PDF

Convert a Markdown File to a PDF Document with Java

By using the FileFormat.PDF Enum as the format parameter of the Document.saveToFile() method, the Markdown file can be directly converted to a PDF document.

The detailed steps for converting a Markdown file to a PDF document are as follows:

Create an instance of Document class.
Load a Markdown file using Document.loadFromFile(String: fileName, FileFormat.Markdown) method.
Save the Markdown file as PDF document using Document.saveToFile(String: fileName, FileFormat.PDF) method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class MarkdownToPDF {
    public static void main(String[] args) {
        // Create an instance of the Document class
        Document doc = new Document();

        // Load a Markdown file
        doc.loadFromFile("Sample.md");

        // Save the Markdown file as a PDF file
        doc.saveToFile("output/MarkdownToPDF.pdf", FileFormat.PDF);
        doc.dispose();
    }
}

Java: Convert Markdown to Word and PDF

Customizing Page Settings of the Result Document

Spire.Doc for Java also provides methods under PageSetup class to do page setup before the conversion, allowing control over page settings such as page margins and page size of the resulting document.

The following are the steps to customize the page settings of the resulting document:

Create an instance of Document class.
Load a Markdown file using Document.loadFromFile(String: fileName, FileFormat.Markdown) method.
Get the first section using Document.getSections().get() method.
Set the page size, page orientation, and page margins using methods under PageSetup class.
Save the Markdown file as PDF document using Document.saveToFile(String: fileName, FileFormat.PDF) method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.PageSetup;
import com.spire.doc.Section;
import com.spire.doc.documents.MarginsF;
import com.spire.doc.documents.PageOrientation;
import com.spire.doc.documents.PageSize;

public class PageSettingMarkdown {
    public static void main(String[] args) {
        // Create an instance of the Document class
        Document doc = new Document();

        // Load a Markdown file
        doc.loadFromFile("Sample.md");

        // Get the first section
        Section section = doc.getSections().get(0);

        // Set the page size, orientation, and margins
        PageSetup pageSetup = section.getPageSetup();
        pageSetup.setPageSize(PageSize.Letter);
        pageSetup.setOrientation(PageOrientation.Landscape);
        pageSetup.setMargins(new MarginsF(100, 100, 100, 100));

        // Save the Markdown file as a PDF file
        doc.saveToFile("output/MarkdownToPDF.pdf", FileFormat.PDF);
        doc.dispose();
    }
}

Java: Convert Markdown to Word and PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Tuesday, 29 August 2023 01:53

Java: Extract Text from HTML

HTML (Hypertext Markup Language) has become one of the most commonly used text markup languages on the Internet, and nearly all web pages are created using HTML. While HTML contains numerous tags and formatting information, the most valuable content is typically the visible text. It is important to know how to extract the text content from an HTML file when users intend to utilize it for tasks such as editing, AI training, or storing in databases. This article will demonstrate how to extract text from HTML using Spire.Doc for Java within Java programs.

Extract Text from HTML File
Extract Text from URL

Install Spire.Doc for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Extract Text from HTML File

Spire.Doc for Java supports loading HTML files using the Document.loadFromFile(fileName, FileFormat.Html) method. Then, users can use Document.getText() method to get the text that is visible in browsers and write it to a TXT file. The detailed steps are as follows:

Create an object of Document class.
Load an HTML file using Document.loadFromFile(fileName, FileFormat.Html) method.
Get the text of the HTML file using Document.getText() method.
Write the text to a TXT file.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractTextFromHTML {
    public static void main(String[] args) throws IOException {

        //Create an object of Document class
        Document doc = new Document();

        //Load an HTML file
        doc.loadFromFile("Sample.html", FileFormat.Html);

        //Get text from the HTML file
        String text = doc.getText();

        //Write the text to a TXT file
        FileWriter fileWriter = new FileWriter("HTMLText.txt");
        fileWriter.write(text);
        fileWriter.close();
    }
}

HTML Web Page:

Java: Extract Text from HTML

Extracted Text:

Java: Extract Text from HTML

Extract Text from URL

To extract text from a URL, users need to create a custom method to retrieve the HTML file from the URL and then extract the text from it. The detailed steps are as follows:

Create an object of Document class.
Use the custom method readHTML() to get the HTML file from a URL and return the file path.
Load the HTML file using Document.loadFromFile(filename, FileFormat.Html) method.
Get the text from the HTML file using Document.getText() method.
Write the text to a TXT file.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

import java.io.*;
import java.net.URL;
import java.net.URLConnection;

public class ExtractTextFromURL {
    public static void main(String[] args) throws IOException {
        //Create an object of Document
        Document doc = new Document();

        //Call the custom method to load the HTML file from a URL
        doc.loadFromFile(readHTML("https://aeon.co/essays/how-to-face-the-climate-crisis-with-spinoza-and-self-knowledge", "output.html"), FileFormat.Html);

        //Get the text from the HTML file
        String urlText = doc.getText();

        //Write the text to a TXT file
        FileWriter fileWriter = new FileWriter("URLText.txt");
        fileWriter.write(urlText);
    }

    public static String readHTML(String urlString, String saveHtmlFilePath) throws IOException {

        //Create an object of URL class
        URL url = new URL(urlString);

        //Open the URL
        URLConnection connection = url.openConnection();

        //Save the url as an HTML file
        BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(saveHtmlFilePath), "UTF-8"));
        String line;
        while ((line = reader.readLine()) != null) {
            writer.write(line);
            writer.newLine();
        }

        reader.close();
        writer.close();

        //Return the file path of the saved HTML file
        return saveHtmlFilePath;
    }
}

URL Web Page:

Java: Extract Text from HTML

Extracted Text:

Java: Extract Text from HTML

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Monday, 19 December 2022 00:59

Java: Convert Word to Excel

Word and Excel are different from each other in terms of their uses and functioning. Word is used primarily for text documents such as essays, emails, letters, books, resumes, or academic papers where text formatting is essential. Excel is used to save data, make tables and charts and make complex calculations.

It is not recommended to convert a complex Word file to an Excel spreadsheet, because Excel can hardly render contents in the same way as Word. However, if your Word document is mainly composed of tables and you want to analyze the table data in Excel, you can use Spire.Office for Java to convert Word to Excel while maintaining good readability.

Install Spire.Office for Java

First of all, you're required to add the Spire.Office.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.office</artifactId>
        <version>10.10.0</version>
    </dependency>
</dependencies>

Convert Word to Excel in Java

This scenario actually uses two libraries in the Spire.Office package. They're Spire.Doc for Java and Spire.XLS for Java. The former is used to read and extract content from a Word document, and the latter is used to create an Excel document and write data in the specific cells. To make this code example easy to understand, we created the following three custom methods that preform specific functions.

exportTableInExcel() - Export data from a Word table to specified Excel cells.
copyContentInTable() - Copy content from a table cell in Word to an Excel cell.
copyTextAndStyle() - Copy text with formatting from a Word paragraph to an Excel cell.

The following steps demonstrate how to export data from a Word document to a worksheet using Spire.Office for Java.

Create a Document object to load a Word file.
Create a Workbook object and add a worksheet named "WordToExcel" to it.
Traverse though all the sections in the Word document, traverse through all the document objects under a certain section, and then determine if a document object is a paragraph or a table.
If the document object is a paragraph, write the paragraph in a specified cell in Excel using coypTextAndStyle() method.
If the document object is a table, export the table data from Word to Excel cells using exportTableInExcel() method.
Auto fit the row height and column width in Excel so that the data within a cell will not exceed the bound of the cell.
Save the workbook to an Excel file using Workbook.saveToFile() method.

Java

import com.spire.doc.*;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.fields.DocPicture;
import com.spire.doc.fields.TextRange;
import com.spire.xls.*;

import java.awt.*;

public class ConvertWordToExcel {

    public static void main(String[] args) {

        //Create a Document object
        Document doc = new Document();

        //Load a Word file
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Invoice.docx");

        //Create a Workbook object
        Workbook wb = new Workbook();

        //Remove the default worksheets
        wb.getWorksheets().clear();

        //Create a worksheet named "WordToExcel"
        Worksheet worksheet = wb.createEmptySheet("WordToExcel");
        int row = 1;
        int column = 1;

        //Loop through the sections in the Word document
        for (int i = 0; i < doc.getSections().getCount(); i++) {
            //Get a specific section
            Section section = doc.getSections().get(i);

            //Loop through the document object under a certain section
            for (int j = 0; j < section.getBody().getChildObjects().getCount(); j++) {
                //Get a specific document object
                DocumentObject documentObject = section.getBody().getChildObjects().get(j);

                //Determine if the object is a paragraph
                if (documentObject instanceof Paragraph) {
                    CellRange cell = worksheet.getCellRange(row, column);
                    Paragraph paragraph = (Paragraph) documentObject;
                    //Copy paragraph from Word to a specific cell
                    copyTextAndStyle(cell, paragraph);
                    row++;
                }

                //Determine if the object is a table
                if (documentObject instanceof Table) {
                    Table table = (Table) documentObject;
                    //Export table data from Word to Excel
                    int currentRow = exportTableInExcel(worksheet, row, table);
                    row = currentRow;
                }
            }
        }

        //Wrap text in cells
        worksheet.getAllocatedRange().isWrapText(true);

        //Auto fit row height and column width
        worksheet.getAllocatedRange().autoFitRows();
        worksheet.getAllocatedRange().autoFitColumns();
        
        //Save the workbook to an Excel file
        wb.saveToFile("output/WordToExcel.xlsx", ExcelVersion.Version2013);
    }

    //Export data from Word table to Excel cells
    private static int exportTableInExcel(Worksheet worksheet, int row, Table table) {
        CellRange cell;
        int column;
        for (int i = 0; i < table.getRows().getCount(); i++) {
            column = 1;
            TableRow tbRow = table.getRows().get(i);
            for (int j = 0; j < tbRow.getCells().getCount(); j++) {
                TableCell tbCell = tbRow.getCells().get(j);
                cell = worksheet.getCellRange(row, column);
                cell.borderAround(LineStyleType.Thin, Color.BLACK);
                copyContentInTable(tbCell, cell);
                column++;
            }
            row++;
        }
        return row;
    }

    //Copy content from a Word table cell to an Excel cell
    private static void copyContentInTable(TableCell tbCell, CellRange cell) {
        Paragraph newPara = new Paragraph(tbCell.getDocument());
        for (int i = 0; i < tbCell.getChildObjects().getCount(); i++) {
            DocumentObject documentObject = tbCell.getChildObjects().get(i);
            if (documentObject instanceof Paragraph) {
                Paragraph paragraph = (Paragraph) documentObject;
                for (int j = 0; j < paragraph.getChildObjects().getCount(); j++) {
                    DocumentObject cObj = paragraph.getChildObjects().get(j);
                    newPara.getChildObjects().add(cObj.deepClone());
                }
                if (i < tbCell.getChildObjects().getCount() - 1) {
                    newPara.appendText("\n");
                }
            }
        }
        copyTextAndStyle(cell, newPara);
    }

    //Copy text and style of a paragraph to a cell
    private static void copyTextAndStyle(CellRange cell, Paragraph paragraph) {

        RichText richText = cell.getRichText();
        richText.setText(paragraph.getText());
        int startIndex = 0;
        for (int i = 0; i < paragraph.getChildObjects().getCount(); i++) {
            DocumentObject documentObject = paragraph.getChildObjects().get(i);
            if (documentObject instanceof TextRange) {
                TextRange textRange = (TextRange) documentObject;
                String fontName = textRange.getCharacterFormat().getFontName();
                boolean isBold = textRange.getCharacterFormat().getBold();
                Color textColor = textRange.getCharacterFormat().getTextColor();
                float fontSize = textRange.getCharacterFormat().getFontSize();
                String textRangeText = textRange.getText();
                int strLength = textRangeText.length();
                ExcelFont font = new ExcelFont(cell.getWorksheet().getWorkbook().createFont());
                font.setColor(textColor);
                font.isBold(isBold);
                font.setSize(fontSize);
                font.setFontName(fontName);
                int endIndex = startIndex + strLength;
                richText.setFont(startIndex, endIndex, font);
                startIndex += strLength;
            }
            if (documentObject instanceof DocPicture) {
                DocPicture picture = (DocPicture) documentObject;
                cell.getWorksheet().getPictures().add(cell.getRow(), cell.getColumn(), picture.getImage());
                cell.getWorksheet().setRowHeightInPixels(cell.getRow(), 1, picture.getImage().getHeight());
            }
        }
        switch (paragraph.getFormat().getHorizontalAlignment()) {
            case Left:
                cell.setHorizontalAlignment(HorizontalAlignType.Left);
                break;
            case Center:
                cell.setHorizontalAlignment(HorizontalAlignType.Center);
                break;
            case Right:
                cell.setHorizontalAlignment(HorizontalAlignType.Right);
                break;
        }
    }
}

Java: Convert Word to Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Wednesday, 21 September 2022 01:29

Java: Convert ODT to PDF

Before you email or share an ODT file with others, you may want to convert the file to PDF in order to make it accessible to anyone across multiple operating systems. In this article, you will learn how to convert ODT to PDF in Java using Spire.Doc for Java.

Install Spire.Doc for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Convert ODT to PDF using Java

The following are the steps to convert an ODT file to PDF:

Create an instance of Document class.
Load an ODT file using Document.loadFromFile() method.
Convert the ODT file to PDF using Document.saveToFile(String fileName, FileFormat fileFormat) method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class ConvertOdtToPdf {
    public static void main(String[] args){
        //Create a Document instance
        Document doc = new Document();
        //Load an ODT file
        doc.loadFromFile("Sample.odt");

        //Save the ODT file to PDF
        doc.saveToFile("OdtToPDF.pdf", FileFormat.PDF);

    }
}

Java: Convert ODT to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Thursday, 07 April 2022 06:04

Java: Convert XML to Word

An XML file is a plain text file that uses custom tags to display a document's structure and other features. In daily work, you sometimes need to convert Word to XML for storing and organizing data, or convert XML to Word for working on them more easily and efficiently. This article will demonstrate how to programmatically convert XML to Word using Spire.Doc for Java.

Install Spire.Doc for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Convert XML to Word

The following are steps to convert XML to Word using Spire.Doc for Java.

Create a Document instance.
Load an XML sample document using Document.loadFromFile() method.
Save the document as a Word file using Document.saveToFile() method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class XMLToWord {
    public static void main(String[] args) {
        //Create a Document instance
        Document document = new Document();

        //Load an XML sample document
        document.loadFromFile(sample.xml");

        //Save the document to Word
        document.saveToFile("output/XMLToWord.docx", FileFormat.Docx );
    }
}

Java: Convert XML to Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Thursday, 24 March 2022 09:01

Java: Convert RTF to PDF

RTF (Rich Text Format) is a proprietary file format developed by Microsoft for cross-platform document interchange. RTF files have good compatibility and they can be opened by most word processors on any operating system such as Unix, Macintosh, and Windows. In some cases, you may need to convert RTF to other file formats to meet different requirements. In this article, you will learn how to convert RTF to PDF programmatically using Spire.Doc for Java.

Install Spire.Doc for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Convert RTF to PDF in Java

The detailed steps are as follows.

Create a Document instance.
Load a sample RTF document using Document.loadFromFile() method.
Save the RTF to PDF using Document.saveToFile() method.

Java

import com.spire.doc.*;

public class RTFToPDF {
    public static void main(String[] args) {

        //Create Document instance.
        Document document = new Document();

        //Load a sample RTF document
        document.loadFromFile("sample.rtf", FileFormat.Rtf);

        //Save the document to PDF
        document.saveToFile("rtfToPdf.pdf", FileFormat.PDF);
    }
}

Java: Convert RTF to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Wednesday, 02 March 2022 08:19

Java: Convert XML to PDF

An Extensible Markup Language (XML) file is a plain text file that uses custom tags to describe the structure and other features of a document. In some cases, you may need to convert XML to PDF because the latter one is easier for others to access. This article will show you how to programmatically convert XML to PDF using Spire.Doc for Java.

Install Spire.Doc for Java

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.11.2</version>
    </dependency>
</dependencies>

Convert XML to PDF

Spire.Doc for Java supports converting XML to PDF using the Document.saveToFile() method. The following are detailed steps.

Create a Document instance.
Load an XML sample document using Document.loadFromFile() method.
Save the document as a PDF file using Document.saveToFile() method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class XMLToPDF {
    public static void main(String[] args) {
        //Create a Document instance
        Document document = new Document();
        //Load a XML sample document
        document.loadFromFile("toXML.xml");
        //Save  the document to PDF
        document.saveToFile("output/XMLToPDF.pdf", FileFormat.PDF );
    }
}

Java: Convert XML to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

12 3 »End

Page 1 of 3

Prerequisites

Steps to Convert TXT to Word in Java

How to Convert Word to TXT with Java

Get a Free License

Conclusion

Table of Contents

Why Convert HTML to Word in Java?

Set Up Spire.Doc for Java

1. Java Version Requirement

2. Installation

Convert HTML File to Word in Java

Convert HTML String to Word in Java

Batch Conversion of Multiple HTML Files to Word in Java

Best Practices for HTML to Word Conversion

Conclusion

FAQs (Frequently Asked Questions)

Q1. Can Java convert multiple HTML files into one Word document?

Q2. How to preserve CSS styles during HTML to Word conversion?

Q3. Can I generate a Word document directly from a web page?

Q4. What Word formats are supported for saving the converted document?

Install Spire.Doc for Java

Convert RTF to HTML in Java

Convert RTF to Image in Java

Apply for a Temporary License

Install Spire.Doc for Java

Convert a Markdown File to a Word Document with Java

Convert a Markdown File to a PDF Document with Java

Customizing Page Settings of the Result Document

Apply for a Temporary License

Install Spire.Doc for Java

Extract Text from HTML File

Extract Text from URL

Apply for a Temporary License

Install Spire.Office for Java

Convert Word to Excel in Java

Apply for a Temporary License

Install Spire.Doc for Java

Convert ODT to PDF using Java

Apply for a Temporary License

Install Spire.Doc for Java

Convert XML to Word

Apply for a Temporary License

Install Spire.Doc for Java

Convert RTF to PDF in Java

Apply for a Temporary License

Install Spire.Doc for Java

Convert XML to PDF

Apply for a Temporary License