Knowledgebase (2311)
Children categories
Sometimes we only need to get the text from the word document for other use when we deal with the word documents with large amount of information. With the help of Spire.Doc, we have already demonstrated how to extract the text from the word document by traverse every paragraph on the word document and then append the text accordingly. This article will show you how to use the method of doc.GetText() to extract the text directly from the word documents with texts, images and tables. It is more convenient for developers to extract the text from the word document from code.
Firstly, view the sample word document which will be extracted the text firstly:

Step 1: Create a word instance and load the source word document from file.
Document doc = new Document();
doc.LoadFromFile("Sample.docx");
Step 2: Invoke the doc.GetText() method to get all the texts from the word document.
string s = doc.GetText();
Step 3: Create a New TEXT File to Save Extracted Text.
File.WriteAllText("Extract.txt", s.ToString());
Effective screenshot after get all the text from the word document:

Full codes:
using Spire.Doc;
using System.IO;
namespace GetText
{
class WordText
{
public void GetText()
{
Document doc = new Document();
doc.LoadFromFile("Sample.docx");
string s = doc.GetText();
File.WriteAllText("Extract.txt", s.ToString());
}
}
}
Getting the coordinates of text or an image in a PDF is a useful task that allows precise referencing and manipulation of specific elements within the document. By extracting the coordinates, one can accurately identify the position of text or images on each page. This information proves valuable for tasks like data extraction, text recognition, or highlighting specific areas. This article introduces how to get the coordinate information of text or an image in a PDF document in C# using Spire.PDF for .NET.
Install Spire.PDF for .NET
To begin with, you need to add the DLL files included in the Spire.PDF for .NET package as references in your .NET project. You can download Spire.PDF for .NET from our website or install it directly through NuGet.
PM> Install-Package Spire.PDF
Get Coordinates of Text in PDF in C#
The PdfTextFinder.Find() method provided by Spire.PDF can help us find all instances of the string to be searched in a searchable PDF document. The coordinate information of a specific instance can be obtained through the PdfTextFragment.Positions property. The following are the step to get the (X, Y) coordinates of the specified text in a PDF using Spire.PDF for .NET.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Loop through the pages in the document.
- Create a PdfTextFinder object, and get all instances of the specified text from a page using PdfTextFinder.Find() method.
- Loop through the find results and get the coordinate information of a specific result through PdfTextFragment.Positions property.
- C#
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Collections.Generic;
using System;
using System.Drawing;
namespace GetCoordinatesOfText
{
class Program
{
static void Main(string[] args)
{
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");
//Loop through all pages
foreach (PdfPageBase page in doc.Pages)
{
//Create a PdfTextFinder object
PdfTextFinder finder = new PdfTextFinder(page);
//Set the find options
PdfTextFindOptions options = new PdfTextFindOptions();
options.Parameter = TextFindParameter.IgnoreCase;
finder.Options = options;
//Find all instances of a specific text
ListPdfTextFragment> fragments = finder.Find("target audience");
//Loop through the instances
foreach (PdfTextFragment fragment in fragments)
{
//Get the position of a specific instance
PointF found = fragment.Positions[0];
Console.WriteLine(found);
}
}
}
}
}

Get Coordinates of an Image in PDF in C#
Spire.PDF provides the PdfImageHelper.GetImagesInfo() method to help us get all image information on a specific page. The coordinate information of a specific image can be obtained through the PdfImageInfo.Bounds property. The following are the steps to get the coordinates of an image in a PDF document using Spire.PDF for .NET.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get a specific page through PdfDocument.Pages[index] property.
- Create a PdfImageHelper object, and get all image information from the page using PdfImageHelper.GetImageInfo() method.
- Get the coordinate information of a specific image through PdfImageInfo.Bounds property.
- C#
using Spire.Pdf;
using Spire.Pdf.Utilities;
namespace GetCoordinatesOfImage
{
class Program
{
static void Main(string[] args)
{
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf");
//Get a specific page
PdfPageBase page = doc.Pages[0];
//Create a PdfImageHelper object
PdfImageHelper helper = new PdfImageHelper();
//Get image information from the page
PdfImageInfo[] images = helper.GetImagesInfo(page);
//Get X,Y coordinates of a specific image
float xPos = images[0].Bounds.X;
float yPos = images[0].Bounds.Y;
Console.WriteLine("The image is located at({0},{1})", xPos, yPos);
}
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Get PDF font information is the process of extracting details about the fonts used in a PDF document. This information typically includes the font name, size, type, color, and other attributes. Knowing these details can help in ensuring consistency, copyright compliance, and aesthetics of the PDF document. In this article, you will learn how to get the font information in PDF in C# using Spire.PDF for .NET.
Install Spire.PDF for .NET
To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.
PM> Install-Package Spire.PDF
Get the Fonts of Specified Text in PDF in C#
With Spire.PDF for .NET, you can find specified text and get its font formatting such as font name, size, style and color through the corresponding properties of the PdfTextFragment class. The following are the detailed steps.
- Create a PdfDocument instance.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get a specified page using PdfDocument.Pages[] property.
- Create a PdfTextFinder instance.
- Find the specified text using PdfTextFinder.Find() method and return a PdfTextFragment object.
- Create a StringBuilder instance to store information.
- Iterate through all found text.
- Get the found text using PdfTextFragment.Text property.
- Get the font name of the found text using PdfTextFragment.TextStates[0].FontName property.
- Get the font size of the found text using PdfTextFragment.TextStates[0].FontSize property.
- Get the font family of the found text using PdfTextFragment.TextStates[0].FontFamily property.
- Indicate whether the font is bold or faux bold (font style set to fill and stroke) using PdfTextFragment.TextStates[0].IsSimulateBold and PdfTextFragment.TextStates[0].IsItalic properties.
- Get the font color of the found text using PdfTextFragment.TextStates[0].ForegroundColor property.
- Append the font information to the StringBuilder instance using StringBuilder.AppendLine() method.
- Write to a txt file.
- C#
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Text;
namespace GetTextFont
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument instance
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("NET Framework.pdf");
// Get the first page
PdfPageBase page = pdf.Pages[0];
// Create a PdfTextFinder instance
PdfTextFinder finds = new PdfTextFinder(page);
// Find specified text on the page
finds.Options.Parameter = TextFindParameter.None;
ListPdfTextFragment> result = finds.Find(".NET Framework");
// Create a StringBuilder instance
StringBuilder str = new StringBuilder();
// Iterate through all found text
foreach (PdfTextFragment find in result)
{
// Get the found text
string text = find.Text;
// Get the font name
string FontName = find.TextStates[0].FontName;
// Get the font size
float FontSize = find.TextStates[0].FontSize;
// Get the font family
string FontFamily = find.TextStates[0].FontFamily;
// Indicate whether the font is bold or italic
bool IsBold = find.TextStates[0].IsBold;
bool IsSimulateBold = find.TextStates[0].IsSimulateBold;
bool IsItalic = find.TextStates[0].IsItalic;
// Get the font color
Color color = find.TextStates[0].ForegroundColor;
// Append the font information to the StringBuilder instance
str.AppendLine(text);
str.AppendLine("FontName: " + FontName);
str.AppendLine("FontSize: " + FontSize);
str.AppendLine("FontFamily: " + FontFamily);
str.AppendLine("IsBold: " + IsBold);
str.AppendLine("IsSimulateBold: " + IsSimulateBold);
str.AppendLine("IsItalic: " + IsItalic);
str.AppendLine("color: " + color);
str.AppendLine(" ");
}
// Write to a txt file
File.WriteAllText("PdfFont.txt", str.ToString());
}
}
}

Get the Used Fonts in PDF in C#
Spire.PDF for .NET also provides the PdfUsedFont class to represent the fonts used in a PDF document. To get the formatting of all used fonts, you can iterate through each font and retrieve its font name, size, type and style using the corresponding properties of the PdfUsedFont class. The following are the detailed steps.
- Create a PdfDocument instance.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get all the fonts used in the PDF file using PdfDocument.UsedFonts property.
- Create a StringBuilder instance to store information.
- Iterate through the used fonts.
- Get the font name using PdfUsedFont.Name property.
- Get the font size using PdfUsedFont.Size property.
- Get the font type using PdfUsedFont.Type property.
- Get the font style using PdfUsedFont.Style property.
- Append the font information to the StringBuilder instance using StringBuilder.AppendLine() method.
- Write to a txt file
- C#
using Spire.Pdf;
using Spire.Pdf.Graphics;
using Spire.Pdf.Graphics.Fonts;
using System.IO;
using System.Text;
namespace GetPdfFont
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument instance
PdfDocument pdf = new PdfDocument();
// Load a PDF file
pdf.LoadFromFile("NET Framework.pdf");
// Get the used fonts in the PDF file
PdfUsedFont[] fonts = pdf.UsedFonts;
// Create a StringBuilder instance
StringBuilder str = new StringBuilder();
// Iterate through the used fonts
foreach (PdfUsedFont font in fonts)
{
// Get the font name
string name = font.Name;
// Get the font size
float size = font.Size;
// Get the font type
PdfFontType type = font.Type;
// Get the font style
PdfFontStyle style = font.Style;
// Append the font information to the StringBuilder instance
str.AppendLine("FontName: " + name + " FontSize: " + size + " FontType: " + type + " FontStyle: " + style);
}
// Write to a txt file
File.WriteAllText("PdfFontInfo.txt", str.ToString());
}
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.