page 318

Spire.XLS for .NET is a professional Excel .NET API that can be used to create, read, write, convert and print Excel files in any type of .NET (C#, VB.NET, ASP.NET, .NET Core, .NET 5.0, .NET 6.0, MonoAndroid and Xamarin.iOS) application. Spire.XLS for .NET offers object model Excel API for speeding up Excel programming in .NET platform - create new Excel documents from template, edit existing Excel documents and convert Excel files.

Spire.XLS for .NET enjoys good reputation in both enterprise and individual customers. These customer types include Banks, Data processing houses, Educational institutions, Government organizations, Insurance firms, Legal institutions, Postal/cargo services and etc.

Extracting images from a Word document programmatically can be useful for automating document processing tasks. In this article, we’ll demonstrate how to extract images from a Word file using C# and the Spire.Doc for .NET library. Spire.Doc is a powerful .NET library that enables developers to manipulate Word documents efficiently.

Getting Started: Installing Spire.Doc

Before you can start extracting images, you need to install Spire.Doc for .NET. Here's how:

  • Using NuGet Package Manager:
    • Open your Visual Studio project.
    • Right-click on the project in the Solution Explorer and select "Manage NuGet Packages."
    • Search for "Spire.Doc" and install the latest version.
  • Manual Installation:
    • Download the Spire.Doc package from the official website.
    • Extract the files and reference the DLLs in your project.

Once installed, you're ready to begin.

Steps for Extracting Images from Word

  • Import Spire.Doc module.
  • Load the Word document.
  • Iterate through sections, paragraphs, and child objects.
  • Identify images and saving them to a specified location.

Using the Code

The following C# code demonstrates how to extract images from a Word document:

  • C#
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;

namespace ExtractImages
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize a Document object
            Document document = new Document();

            // Load the Word file
            document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx");

            // Counter for image files
            int index = 0;

            // Loop through each section in the document
            foreach (Section section in document.Sections)
            {
                // Loop through paragraphs in the section
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    // Loop through objects in the paragraph
                    foreach (DocumentObject docObject in paragraph.ChildObjects)
                    {
                        // Check if the object is an image
                        if (docObject.DocumentObjectType == DocumentObjectType.Picture)
                        {
                            // Save the image as a PNG file
                            DocPicture picture = docObject as DocPicture;
                            picture.Image.Save(string.Format("output/image_{0}.png", index), System.Drawing.Imaging.ImageFormat.Png);
                            index++;
                        }
                    }
                }
            }

            // Dispose resources
            document.Dispose();
        }
    }
}

The extracted images will be saved in the "output" folder with filenames like image_0.png, image_1.png, etc.

Extract images from Word

Additional Tips & Best Practices

  • Handling Different Image Formats:
    • Convert images to preferred formats (JPEG, BMP) by changing ImageFormat.Png
    • Consider using ImageFormat.Jpeg for smaller file sizes
  • Error Handling:
    • C#
    try {
        // extraction code
    }
    catch (Exception ex) {
        Console.WriteLine($"Error: {ex.Message}");
    }
    
  • Performance Optimization:
    • For large documents, consider using parallel processing
    • Implement progress reporting for user feedback
  • Advanced Extraction Scenarios:
    • Extract images from headers/footers by checking Section.HeadersFooters

Conclusion

Using Spire.Doc in C# simplifies the process of extracting images from Word documents. This approach is efficient and can be integrated into larger document-processing workflows.

Beyond images, Spire.Doc also supports extracting various other elements from Word documents, including:

Whether you're building a document management system or automating report generation, Spire.Doc provides a reliable way to handle Word documents programmatically.

Get a Free License

To fully experience the capabilities of Spire.Doc for .NET without any evaluation limitations, you can request a free 30-day trial license.

How to Traverse a Document Tree

2010-12-28 06:45:38 Written by Administrator
Spire.Doc represents a document as a tree, every document element is a node of that tree. Some nodes such as section, paragraph and table may have many child nodes. For example, a section node has several paragraph nodes, a paragraph node has many text nodes and each row is the child node of a table node. And other nodes have no child node, such as text-range, image, form-field.
If a node has child nodes, it should be an instance of Spire.Doc.Interface.ICompositeObject.
If you want to operate all the nodes, you can use the tree navigation method to visit each node.

Document Tree Traversal

The following example demonstrates how to traverse a document tree to collect all nodes and ouput the text of all text-range nodes.
[C#]
using System;
using System.Collections.Generic;
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using Spire.Doc.Interface;
using Spire.Doc.Collections;

namespace ExtractText
{
    class Program
    {
        static void Main(string[] args)
        {
            //Open a word document.
            Document document = new Document("Sample.doc");
            IList<IDocumentObject> nodes = GetAllObjects(document);
            foreach (IDocumentObject node in nodes)
            {
                //Judge the object type. 
                if (node.DocumentObjectType == DocumentObjectType.TextRange)
                {
                    TextRange textNode = node as TextRange;
                    Console.WriteLine(textNode.Text);
                }
            }
        }

        private static IList<IDocumentObject> GetAllObjects(Document document)
        {
        
            //Create a list.
            List<IDocumentObject> nodes = new List<IDocumentObject>();
            
            //Create a new queue.
            Queue<ICompositeObject> containers = new Queue<ICompositeObject>();
            
            //Put the document objects in the queue.
            containers.Enqueue(document);
            while (containers.Count > 0)
            {
                ICompositeObject container = containers.Dequeue();
                DocumentObjectCollection docObjects = container.ChildObjects;
                foreach (DocumentObject docObject in docObjects)
                { 
                    nodes.Add(docObject);
                    
                    //Judge the docObject.
                    if (docObject is ICompositeObject)
                    {
                        containers.Enqueue(docObject as ICompositeObject);
                    }
                }
            }

            return nodes;
        }
    }
}
          
[VB.NET]
Imports System
Imports System.Collections.Generic
Imports Spire.Doc
Imports Spire.Doc.Documents
Imports Spire.Doc.Fields
Imports Spire.Doc.Interface
Imports Spire.Doc.Collections

Module Module1

    Sub Main()
        'Open a word document.
        Dim document As New Document("Sample.doc")
        Dim nodes As IList(Of IDocumentObject)() = GetAllObjects(document)
        Dim containers As New Queue(Of ICompositeObject)()

        For Each node As IDocumentObject In nodes
        
            'Judge the object type.
            If (node.DocumentObjectType = DocumentObjectType.TextRange) Then
                Dim textNode As TextRange = node
                Console.WriteLine(textNode.Text)

            End If
        Next
    End Sub
    Function GetAllObjects(ByVal document As Document) As IList(Of IDocumentObject)
        
        'Create a list.
        Dim nodes As New List(Of IDocumentObject)()
        
        'Create a new queue.
        Dim containers As New Queue(Of ICompositeObject)()
        
        'Put the document objects in the queue.
        containers.Enqueue(document)
        While (containers.Count > 0)
            Dim container As ICompositeObject = containers.Dequeue()
            Dim docObjects As DocumentObjectCollection = container.ChildObjects
            For Each docObject As DocumentObject In docObjects
                nodes.Add(docObject)
                
                'Judge the docObject.
                If TypeOf docObject Is ICompositeObject Then
                    containers.Enqueue(TryCast(docObject, ICompositeObject))
                End If
            Next
        End While

        Return nodes
    End Function
End Module
          
page 318