page 206

Knowledgebase (2311)

Children categories

Spire.OfficeJs (3)

View items...

How to get text from word document in C#

2016-11-08 08:05:46 Written by Koohji

Sometimes we only need to get the text from the word document for other use when we deal with the word documents with large amount of information. With the help of Spire.Doc, we have already demonstrated how to extract the text from the word document by traverse every paragraph on the word document and then append the text accordingly. This article will show you how to use the method of doc.GetText() to extract the text directly from the word documents with texts, images and tables. It is more convenient for developers to extract the text from the word document from code.

Firstly, view the sample word document which will be extracted the text firstly:

How to get text from word document in C#

Step 1: Create a word instance and load the source word document from file.

Document doc = new Document();
doc.LoadFromFile("Sample.docx");

Step 2: Invoke the doc.GetText() method to get all the texts from the word document.

string s = doc.GetText();

Step 3: Create a New TEXT File to Save Extracted Text.

File.WriteAllText("Extract.txt", s.ToString());

Effective screenshot after get all the text from the word document:

How to get text from word document in C#

Full codes:

using Spire.Doc;
using System.IO;
namespace GetText
{
   class WordText
 {
   public void GetText()
   {
     Document doc = new Document();
     doc.LoadFromFile("Sample.docx");

     string s = doc.GetText();

     File.WriteAllText("Extract.txt", s.ToString());

    }
 }
}

Published in Text

C#: Get Coordinates of Text or an Image in PDF

2023-11-28 06:30:00 Written by Administrator

Getting the coordinates of text or an image in a PDF is a useful task that allows precise referencing and manipulation of specific elements within the document. By extracting the coordinates, one can accurately identify the position of text or images on each page. This information proves valuable for tasks like data extraction, text recognition, or highlighting specific areas. This article introduces how to get the coordinate information of text or an image in a PDF document in C# using Spire.PDF for .NET.

Get Coordinates of Text in PDF in C#
Get Coordinates of an Image in PDF in C#

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for .NET package as references in your .NET project. You can download Spire.PDF for .NET from our website or install it directly through NuGet.

Package Manager

PM> Install-Package Spire.PDF

Get Coordinates of Text in PDF in C#

The PdfTextFinder.Find() method provided by Spire.PDF can help us find all instances of the string to be searched in a searchable PDF document. The coordinate information of a specific instance can be obtained through the PdfTextFragment.Positions property. The following are the step to get the (X, Y) coordinates of the specified text in a PDF using Spire.PDF for .NET.

Create a PdfDocument object.
Load a PDF file using PdfDocument.LoadFromFile() method.
Loop through the pages in the document.
Create a PdfTextFinder object, and get all instances of the specified text from a page using PdfTextFinder.Find() method.
Loop through the find results and get the coordinate information of a specific result through PdfTextFragment.Positions property.

using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Collections.Generic;
using System;
using System.Drawing;

namespace GetCoordinatesOfText
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            //Load a PDF file
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");

            //Loop through all pages
            foreach (PdfPageBase page in doc.Pages)
            {
                //Create a PdfTextFinder object
                PdfTextFinder finder = new PdfTextFinder(page);

                //Set the find options
                PdfTextFindOptions options = new PdfTextFindOptions();
                options.Parameter = TextFindParameter.IgnoreCase;
                finder.Options = options;

                //Find all instances of a specific text
                List⁢PdfTextFragment> fragments = finder.Find("target audience");

                //Loop through the instances
                foreach (PdfTextFragment fragment in fragments)
                {
                    //Get the position of a specific instance
                    PointF found = fragment.Positions[0];
                    Console.WriteLine(found);
                }
            }
        }
    }
}

C#: Get Coordinates of Text or an Image in PDF

Get Coordinates of an Image in PDF in C#

Spire.PDF provides the PdfImageHelper.GetImagesInfo() method to help us get all image information on a specific page. The coordinate information of a specific image can be obtained through the PdfImageInfo.Bounds property. The following are the steps to get the coordinates of an image in a PDF document using Spire.PDF for .NET.

Create a PdfDocument object.
Load a PDF file using PdfDocument.LoadFromFile() method.
Get a specific page through PdfDocument.Pages[index] property.
Create a PdfImageHelper object, and get all image information from the page using PdfImageHelper.GetImageInfo() method.
Get the coordinate information of a specific image through PdfImageInfo.Bounds property.

using Spire.Pdf;
using Spire.Pdf.Utilities;

namespace GetCoordinatesOfImage
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            //Load a PDF file
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");

            //Get a specific page
            PdfPageBase page = doc.Pages[0];

            //Create a PdfImageHelper object
            PdfImageHelper helper = new PdfImageHelper();

            //Get image information from the page
            PdfImageInfo[] images = helper.GetImagesInfo(page);

            //Get X,Y coordinates of a specific image
            float xPos = images[0].Bounds.X;
            float yPos = images[0].Bounds.Y;
            Console.WriteLine("The image is located at（{0},{1}）", xPos, yPos);
        }
    }
}

C#: Get Coordinates of Text or an Image in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Text

Tagged under

pdf net Text

C#: Get the Font Information in PDF

2025-01-02 08:10:00 Written by Administrator

Get PDF font information is the process of extracting details about the fonts used in a PDF document. This information typically includes the font name, size, type, color, and other attributes. Knowing these details can help in ensuring consistency, copyright compliance, and aesthetics of the PDF document. In this article, you will learn how to get the font information in PDF in C# using Spire.PDF for .NET.

Get the Fonts of Specified Text in PDF in C#
Get the Used Fonts in PDF in C#

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

Package Manager

PM> Install-Package Spire.PDF

Get the Fonts of Specified Text in PDF in C#

With Spire.PDF for .NET, you can find specified text and get its font formatting such as font name, size, style and color through the corresponding properties of the PdfTextFragment class. The following are the detailed steps.

Create a PdfDocument instance.
Load a PDF file using PdfDocument.LoadFromFile() method.
Get a specified page using PdfDocument.Pages[] property.
Create a PdfTextFinder instance.
Find the specified text using PdfTextFinder.Find() method and return a PdfTextFragment object.
Create a StringBuilder instance to store information.
Iterate through all found text.
- Get the found text using PdfTextFragment.Text property.
- Get the font name of the found text using PdfTextFragment.TextStates[0].FontName property.
- Get the font size of the found text using PdfTextFragment.TextStates[0].FontSize property.
- Get the font family of the found text using PdfTextFragment.TextStates[0].FontFamily property.
- Indicate whether the font is bold or faux bold (font style set to fill and stroke) using PdfTextFragment.TextStates[0].IsSimulateBold and PdfTextFragment.TextStates[0].IsItalic properties.
- Get the font color of the found text using PdfTextFragment.TextStates[0].ForegroundColor property.
- Append the font information to the StringBuilder instance using StringBuilder.AppendLine() method.
Write to a txt file.

using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Text;

namespace GetTextFont
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument instance
            PdfDocument pdf = new PdfDocument();

            // Load a PDF file
            pdf.LoadFromFile("NET Framework.pdf");

            // Get the first page
            PdfPageBase page = pdf.Pages[0];

            // Create a PdfTextFinder instance
            PdfTextFinder finds = new PdfTextFinder(page);

            // Find specified text on the page
            finds.Options.Parameter = TextFindParameter.None;
            List⁢PdfTextFragment> result = finds.Find(".NET Framework");

            // Create a StringBuilder instance
            StringBuilder str = new StringBuilder();

            // Iterate through all found text
            foreach (PdfTextFragment find in result)
            {
                // Get the found text
                string text = find.Text;
                // Get the font name 
                string FontName = find.TextStates[0].FontName;
                // Get the font size
                float FontSize = find.TextStates[0].FontSize;
                // Get the font family
                string FontFamily = find.TextStates[0].FontFamily;
                // Indicate whether the font is bold or italic
                bool IsBold = find.TextStates[0].IsBold;
                bool IsSimulateBold = find.TextStates[0].IsSimulateBold;
                bool IsItalic = find.TextStates[0].IsItalic;
                // Get the font color
                Color color = find.TextStates[0].ForegroundColor;

                // Append the font information to the StringBuilder instance
                str.AppendLine(text);
                str.AppendLine("FontName: " + FontName);
                str.AppendLine("FontSize: " + FontSize);
                str.AppendLine("FontFamily: " + FontFamily);
                str.AppendLine("IsBold: " + IsBold);
                str.AppendLine("IsSimulateBold: " + IsSimulateBold);
                str.AppendLine("IsItalic: " + IsItalic);
                str.AppendLine("color: " + color);
                str.AppendLine(" ");
            }
            // Write to a txt file
            File.WriteAllText("PdfFont.txt", str.ToString());
        }
    }
}

Get the font name, size, color and style of the specified text in PDF

Get the Used Fonts in PDF in C#

Spire.PDF for .NET also provides the PdfUsedFont class to represent the fonts used in a PDF document. To get the formatting of all used fonts, you can iterate through each font and retrieve its font name, size, type and style using the corresponding properties of the PdfUsedFont class. The following are the detailed steps.

Create a PdfDocument instance.
Load a PDF file using PdfDocument.LoadFromFile() method.
Get all the fonts used in the PDF file using PdfDocument.UsedFonts property.
Create a StringBuilder instance to store information.
Iterate through the used fonts.
- Get the font name using PdfUsedFont.Name property.
- Get the font size using PdfUsedFont.Size property.
- Get the font type using PdfUsedFont.Type property.
- Get the font style using PdfUsedFont.Style property.
- Append the font information to the StringBuilder instance using StringBuilder.AppendLine() method.
Write to a txt file

using Spire.Pdf;
using Spire.Pdf.Graphics;
using Spire.Pdf.Graphics.Fonts;
using System.IO;
using System.Text;

namespace GetPdfFont
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument instance
            PdfDocument pdf = new PdfDocument();

            // Load a PDF file
            pdf.LoadFromFile("NET Framework.pdf");

            // Get the used fonts in the PDF file
            PdfUsedFont[] fonts = pdf.UsedFonts;

            // Create a StringBuilder instance
            StringBuilder str = new StringBuilder();

            // Iterate through the used fonts
            foreach (PdfUsedFont font in fonts)
            {
                // Get the font name
                string name = font.Name;

                // Get the font size
                float size = font.Size;

                // Get the font type
                PdfFontType type = font.Type;

                // Get the font style
                PdfFontStyle style = font.Style;

                // Append the font information to the StringBuilder instance
                str.AppendLine("FontName: " + name + " FontSize: " + size + " FontType: " + type + " FontStyle: " + style);

            }

            // Write to a txt file
            File.WriteAllText("PdfFontInfo.txt", str.ToString());
        }
    }
}

Get the font name, size, style and type of all used fonts in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Text

Tagged under

pdf net Text

News Category

Knowledgebase (2311)

Children categories

Purchase (7)

Licensing (7)

Benchmark (1)

Java (481)

.NET (1317)

Cloud (13)

CPP (71)

Python (355)

AI (4)

JavaScript (51)

Spire.OfficeJs (3)

How to get text from word document in C#

C#: Get Coordinates of Text or an Image in PDF

Install Spire.PDF for .NET

Get Coordinates of Text in PDF in C#

Get Coordinates of an Image in PDF in C#

Apply for a Temporary License

C#: Get the Font Information in PDF

Install Spire.PDF for .NET

Get the Fonts of Specified Text in PDF in C#

Get the Used Fonts in PDF in C#

Apply for a Temporary License

More...

Add Oval shape to Excel Chart in C#

How to Convert PowerPoint Document to SVG Images in C#, VB.NET

C#/VB.NET: Convert Excel to SVG

How to Convert Word to SVG (Scalable Vector Graphics) in C#