Sometimes, we need to extract the OLE Objects that are embedded in a word document. With Spire.Doc, we can easily achieve this task with a few lines of code. This article explains how to extract the embedded PDF document and Excel workbook from a word document using Spire.Doc and C#.

Below is the screenshot of the word document:

How to Extract OLE Objects from a Word Document

Detail steps:

Step 1: Instantiate a Document object and load the word document.

Document doc = new Document();
doc.LoadFromFile("OleObject.docx");

Step 2: Traverse through the word document, find the Ole Objects, then get the Object type of each Ole Object to determine if the Ole Object is PDF document or Excel workbook and write the native data of the Ole object into a new PDF document or an Excel workbook.

//Traverse through all sections of the word document           
foreach (Section sec in doc.Sections)
{
    //Traverse through all Child Objects in the body of each section
    foreach (DocumentObject obj in sec.Body.ChildObjects)
    {
        if (obj is Paragraph)
        {
            Paragraph par = obj as Paragraph;
            //Traverse through all Child Objects in Paragraph
            foreach (DocumentObject o in par.ChildObjects)
            {
                //Find the Ole Objects and Extract
                if (o.DocumentObjectType == DocumentObjectType.OleObject)
                {
                    DocOleObject Ole = o as DocOleObject;
                    string s = Ole.ObjectType;
                    //If s == "AcroExch.Document.11", means it’s a PDF document
                    if (s == "AcroExch.Document.11")
                    {
                        File.WriteAllBytes("Result.pdf", Ole.NativeData);
                    }
                    //If s == " Excel.Sheet.12", means it’s an Excel workbook
                    else if (s == "Excel.Sheet.12")
                    {
                        File.WriteAllBytes("Result.xlsx", Ole.NativeData);
                    }
                }
            }
        }
    }
}

Below is the screenshot of the extracted PDF file and Excel workbook after running the code:

How to Extract OLE Objects from a Word Document

How to Extract OLE Objects from a Word Document

Full codes:

using System.IO;
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;

namespace Extract_OLEObjects_from_Word
{
    class Program
    {
        static void Main(string[] args)
        {
            Document doc = new Document();
            doc.LoadFromFile("OleObject.docx");
        
            foreach (Section sec in doc.Sections)
            {
                foreach (DocumentObject obj in sec.Body.ChildObjects)
                {
                    if (obj is Paragraph)
                    {
                        Paragraph par = obj as Paragraph;
                        foreach (DocumentObject o in par.ChildObjects)
                        {
                            if (o.DocumentObjectType == DocumentObjectType.OleObject)
                            {
                                DocOleObject Ole = o as DocOleObject;
                                string s = Ole.ObjectType;
                                if (s == "AcroExch.Document.11")
                                {
                                    File.WriteAllBytes("Result.pdf", Ole.NativeData);
                                }
                                else if (s == "Excel.Sheet.12")
                                {
                                    File.WriteAllBytes("Result.xlsx", Ole.NativeData);
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Form fields are often used in documents like surveys, registration forms, or feedback forms to collect data from users. Extracting form field values allows you to gather and consolidate the submitted data for further analysis or processing. In this article, we will demonstrate how to extract form field values from PDF documents in C# and VB.NET using Spire.PDF for .NET.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLLs files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.PDF

Extract Form Field Values from PDF in C# and VB.NET

In a PDF document, you may encounter various types of form fields, such as textboxes, checkboxes, radio buttons, list boxes, and combo boxes (drop-down lists). Before extracting form field values, it is crucial to identify the specific type of each form field. Once identified, you can utilize corresponding properties tailored for each form field type to accurately extract their values. The detailed steps are as follows:

  • Initialize an instance of the PdfDocument instance.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Initialize an instance of the StringBuilder class for storing the extract form field values.
  • Get the form from the document using PdfDocument.Form property.
  • Iterate through all form fields in the form.
  • Determine the types of the form fields, then get the names and values of the form fields using the corresponding properties and append them to the StringBuilder instance.
  • Write the content of the StringBuilder instance into a text file.
  • C#
  • VB.NET
using Spire.Pdf;
using Spire.Pdf.Fields;
using Spire.Pdf.Widget;
using System.IO;
using System.Text;

namespace ExtractFormFieldValues
{
    internal class Program
    {
        static void Main(string[] args)
        {            
            //Initialize an instance of the PdfDocument instance
            PdfDocument doc = new PdfDocument();
            //Load a PDF document
            doc.LoadFromFile(@"Forms.pdf");

            //Initialize an instance of the StringBuilder class
            StringBuilder sb = new StringBuilder();

            //Get the form from the document
            PdfFormWidget formWidget = doc.Form as PdfFormWidget;

            //Iterate through all fields in the form
            for (int i = 0; i < formWidget.FieldsWidget.List.Count; i++)
            {
                PdfField field = formWidget.FieldsWidget.List[i] as PdfField;

                //Get the name and value of textbox field
                if (field is PdfTextBoxFieldWidget)
                {
                    PdfTextBoxFieldWidget textBoxField = field as PdfTextBoxFieldWidget;                    
                    string name = textBoxField.Name;
                    string value = textBoxField.Text;
                    sb.Append("Textbox Name: " + name + "\r\n");
                    sb.Append("Textbox Value: " + value + "\r\n");
                }

                //Get the name, items and selected item of list box field
                if (field is PdfListBoxWidgetFieldWidget)
                {
                    PdfListBoxWidgetFieldWidget listBoxField = field as PdfListBoxWidgetFieldWidget;
                    string name = listBoxField.Name;
                    sb.Append("Listbox Name: " + name + "\r\n");
                    sb.Append("Listbox Items: \r\n");

                    PdfListWidgetItemCollection items = listBoxField.Values;

                    foreach (PdfListWidgetItem item in items)
                    {
                        sb.Append(item.Value + "\r\n");
                    }
                    string selectedValue = listBoxField.SelectedValue;
                    sb.Append("Listbox Selected Value: " + selectedValue + "\r\n");
                }

                //Get the name, items and selected item of combo box field
                if (field is PdfComboBoxWidgetFieldWidget)
                {
                    PdfComboBoxWidgetFieldWidget comBoxField = field as PdfComboBoxWidgetFieldWidget;
                    string name = comBoxField.Name;
                    sb.Append("Combobox Name: " + name + "\r\n");
                    sb.Append("Combobox Items: \r\n");
                    PdfListWidgetItemCollection items = comBoxField.Values;

                    foreach (PdfListWidgetItem item in items)
                    {
                        sb.Append(item.Value + "\r\n");
                    }
                    string selectedValue = comBoxField.SelectedValue;
                    sb.Append("Combobox Selected Value: " + selectedValue + "\r\n");

                }

                //Get the name and selected item of radio button field
                if (field is PdfRadioButtonListFieldWidget)
                {
                    PdfRadioButtonListFieldWidget radioBtnField = field as PdfRadioButtonListFieldWidget;
                    string name = radioBtnField.Name;
                    sb.Append("Radio Button Name: " + name + "\r\n");
                    string selectedValue = radioBtnField.SelectedValue;
                    sb.Append("Radio Button Selected Value: " + selectedValue + "\r\n");
                }

                //Get the name and status of checkbox field
                if (field is PdfCheckBoxWidgetFieldWidget)
                {
                    PdfCheckBoxWidgetFieldWidget checkBoxField = field as PdfCheckBoxWidgetFieldWidget;
                    string name = checkBoxField.Name;
                    sb.Append("Checkbox Name: " + name + "\r\n");
                    bool status = checkBoxField.Checked;
                    if (status)
                    {
                        sb.Append("Checkbox Status: Checked \r\n");
                    }
                    else
                    {
                        sb.Append("Checkbox Status: Unchecked \r\n");
                    }
                }

                sb.Append("\n");
            }

            //Write the content of the StringBuilder into a text file
            File.WriteAllText("GetAllValues.txt", sb.ToString());
            doc.Dispose();
        }
    }
}

C#/VB.NET: Extract Form Field Values from PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

An image watermark is usually a logo or sign that appears on the background of digital documents, indicating the copyright owner of the content. Watermarking your PDF document with an image can prevent your data from being reused or modified. This article demonstrates how to add an image watermark to PDF in C# and VB.NET using Spire.PDF for .NET.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLLs files can be either downloaded from this link or installed via NuGet.

  • Package Manager
PM> Install-Package Spire.PDF 

Add an Image Watermark to PDF

The following are the main steps to add an image watermark to a PDF document.

  • Create a PdfDocument object, and load a sample PDF file using PdfDocument.LoadFromFile() method.
  • Load an image file using Image.FromFile() method.
  • Loop through the pages in the document, and get the specific page through PdfDocument.Pages[] property.
  • Set the image as background/watermark image of the current page through PdfPageBase.BackgroundImage property. Set the image position and size through PdfPageBase.BackgroundRegion property.
  • Save the document to a different PDF file using PdfDocument.SaveToFile() method.
  • C#
  • VB.NET
using Spire.Pdf;
using System.Drawing;

namespace AddImageWatermark
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument document = new PdfDocument();

            //Load a sample PDF document
            document.LoadFromFile(@"C:\Users\Administrator\Desktop\sample.pdf");

            //Load an image
            Image image = Image.FromFile(@"C:\Users\Administrator\Desktop\logo.png");

            //Get the image width and height
            int imgWidth = image.Width;
            int imgHeight = image.Height;
         
            //Loop through the pages
            for (int i = 0; i < document.Pages.Count; i++)
            {
                //Get the page width and height
                float pageWidth = document.Pages[i].ActualSize.Width;
                float pageHeight = document.Pages[i].ActualSize.Height;

                //Set the background opacity
                document.Pages[i].BackgroudOpacity = 0.3f;

                //Set the background image of current page
                document.Pages[i].BackgroundImage = image;

                //Position the background image at the center of the page
                Rectangle rect = new Rectangle((int)(pageWidth - imgWidth) / 2, (int)(pageHeight - imgHeight) / 2, imgWidth, imgHeight);
                document.Pages[i].BackgroundRegion = rect;
            }

            //Save the document to file
            document.SaveToFile("AddImageWatermark.pdf");
            document.Close();
        }
    }
}
Imports Spire.Pdf
Imports System.Drawing
 
Namespace AddImageWatermark
    Class Program
        Shared  Sub Main(ByVal args() As String)
            'Create a PdfDocument object
            Dim document As PdfDocument =  New PdfDocument() 
 
            'Load a sample PDF document
            document.LoadFromFile("C:\Users\Administrator\Desktop\sample.pdf")
 
            'Load an image
            Dim image As Image =  Image.FromFile("C:\Users\Administrator\Desktop\logo.png") 
 
            'Get the image width and height
            Dim imgWidth As Integer =  image.Width 
            Dim imgHeight As Integer =  image.Height 
 
            'Loop through the pages
            Dim i As Integer
            For  i = 0 To  document.Pages.Count- 1  Step  i + 1
                'Get the page width and height
                Dim pageWidth As single =  document.Pages(i).ActualSize.Width 
                Dim pageHeight As single =  document.Pages(i).ActualSize.Height 
 
                'Set the background opacity
                document.Pages(i).BackgroudOpacity = 0.3f
 
                'Set the background image of current page
                document.Pages(i).BackgroundImage = image
 
                Dim rect As Rectangle = New Rectangle(CInt((pageWidth - imgWidth) / 2), CInt((pageHeight - imgHeight) / 2), imgWidth, imgHeight)
                document.Pages(i).BackgroundRegion = rect
            Next
 
            'Save the document to file
            document.SaveToFile("AddImageWatermark.pdf")
            document.Close()
        End Sub
    End Class
End Namespace

C#/VB.NET: Add Image Watermarks to PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 211