Knowledgebase (2311)
Children categories

Optical Character Recognition (OCR) technology bridges the physical and digital worlds by converting text within images into machine-readable data. For .NET developers, the ability to extract text from images in C# is essential for building intelligent document processing, automated data entry, and accessibility solutions.
In this article, we’ll explore how to implement OCR in C# using the Spire.OCR for .NET library, covering basic extraction, advanced features like coordinate tracking, and best practices to ensure accuracy and efficiency.
Table of Contents:
- Understanding OCR and Spire.OCR
- Setting Up Your OCR Environment
- Basic Recognition: Extract Text from Images in C#
- Advanced Extraction: Extract Text with Coordinates in C#
- Tips to Optimize OCR Accuracy
- FAQs (Supported Languages and Image Formats)
- Conclusion & Free License
Understanding OCR and Spire.OCR
What is OCR?
OCR technology analyzes images of text - such as scanned documents, screenshots, or photos - and converts them into text strings that can be edited, searched, or processed programmatically.
Why Spire.OCR Stands Out?
Spire.OCR for .NET is a powerful, developer-friendly library that enables highly accurate text recognition from images in C# applications. Key features include:
- Support for multiple languages (English, Chinese, Japanese, etc.).
- High accuracy recognition algorithms optimized for various fonts and styles.
- Text coordinate extraction for precise positioning.
- Batch processing capabilities.
- Compatibility with .NET Framework and .NET Core.
Setting Up Your OCR Environment
Before diving into the C# code for image to text OCR operations, configure your development environment first:
1. Install via NuGet:
Open the NuGet Package Manager in Visual Studio. Search for "Spire.OCR" and install the latest version in your project. Alternatively, use the Package Manager Console:
Install-Package Spire.OCR
2. Download OCR Models:
Spire.OCR relies on pre-trained models to recognize image text. Download the model files for your operating system:
After downloading, extract to a directory (e.g., F:\OCR Model\win-x64)
Important Note: Remember to change the platform target of your solution to x64 as Spire.OCR only supports 64-bit platforms.

Basic Recognition: Extract Text from Images in C#
Let’s start with a simple example that demonstrates how to read text from an image using Spire.OCR.
C# code to get text from an image:
using Spire.OCR;
using System.IO;
namespace OCRTextFromImage
{
internal class Program
{
static void Main(string[] args)
{
// Create an instance of the OcrScanner class
OcrScanner scanner = new OcrScanner();
// Create an instance of the ConfigureOptions class
ConfigureOptions configureOptions = new ConfigureOptions();
// Set the path to the OCR model
configureOptions.ModelPath = "F:\\OCR Model\\win-x64";
// Set the language for text recognition. (The default is English.)
configureOptions.Language = "English";
// Apply the configuration options to the scanner
scanner.ConfigureDependencies(configureOptions);
// Scan image and extract text
scanner.Scan("sample.png");
// Save the extracted text to a txt file
string text = scanner.Text.ToString();
File.WriteAllText("output.txt", text);
}
}
}
Code Explanation:
- OcrScanner: Core class for text recognition.
- ConfigureOptions: Sets OCR parameters:
- ModelPath: Specifies the path to the OCR model files.
- Language: Defines the recognition language (e.g., "English", "Chinese").
- Scan(): Processes image and extracts text using the configured settings.
Output:
This C# code processes an image file (sample.png) and saves the extracted text to a text file (output.txt) using File.WriteAllText().

Advanced Extraction: Extract Text with Coordinates in C#
In many cases, knowing the position of extracted text within an image is as important as the text itself - for example, when processing invoices, forms, or structured documents. Spire.OCR allows you to extract not just text but also the coordinates of the text blocks, enabling precise analysis.
C# code to extract text with coordinates from an Image:
using Spire.OCR;
using System.Collections.Generic;
using System.IO;
namespace OCRWithCoordinates
{
internal class Program
{
static void Main(string[] args)
{
// Create an instance of the OcrScanner class
OcrScanner scanner = new OcrScanner();
// Create an instance of the ConfigureOptions class
ConfigureOptions configureOptions = new ConfigureOptions();
// Set the path to the OCR model
configureOptions.ModelPath = "F:\\OCR Model\\win-x64";
// Set the language for text recognition. (The default is English.)
configureOptions.Language = "English";
// Apply the configuration options to the scanner
scanner.ConfigureDependencies(configureOptions);
// Extract text from an image
scanner.Scan("invoice.png");
// Get the OCR result text
IOCRText text = scanner.Text;
// Create a list to store information
List<string> results = new List<string>();
// Iterate through each block of the OCR result text
foreach (IOCRTextBlock block in text.Blocks)
{
// Add the text of each block and its location information to the list
results.Add($"Block Text: {block.Text}");
results.Add($"Coordinates: {block.Box}");
results.Add("---------");
}
// Save the extracted text with coordinates to a txt file
File.WriteAllLines("ExtractWithCoordinates.txt", results);
}
}
}
Critical Details
- IOCRText: Represents the entire OCR result.
- IOCRTextBlock: Represents a block of contiguous text (e.g., a paragraph, line, or word).
- IOCRTextBlock.Box: Contains the rectangular coordinates of the text block:
- X (horizontal position)
- Y (vertical position)
- Width
- Height
Output:
This C# code performs OCR on an image file (invoice.png), extracting both the recognized text and its position coordinates in the image, then saves this information to a text file (ExtractWithCoordinates.txt).

Tips to Optimize OCR Accuracy
To ensure reliable results when using C# to recognize text from images, consider these best practices:
- Use high-resolution images (300 DPI or higher).
- Preprocess images (e.g., resize, deskew) for better results.
- Ensure correct language settings correspond to the text in image.
- Store OCR models in a secure, accessible location.
FAQs (Supported Languages and Image Formats)
Q1: What image formats does Spire.OCR support?
A: Spire.OCR supports all common formats:
- PNG
- JPEG/JPG
- BMP
- TIFF
- GIF
Q2: What languages does Spire.OCR support?
A: Multiple languages are supported:
- English (default)
- Chinese (Simplified and Traditional)
- Japanese
- Korean
- German
- French
Q3: Can I use Spire.OCR in ASP.NET Core applications?
A: Yes. Supported environments:
- .NET Framework 2.0+
- .NET Standard 2.0+
- .NET Core 2.0+
- .NET 5
Q4: Can Spire.OCR extract text from scanned PDFs in C#?
A: The task requires the Spire.PDF integration to convert PDFs to images or extract images from scanned PDFs first, and then use the above C# examples to get text from the images.
Conclusion & Free License
Spire.OCR for .NET provides a powerful yet straightforward solution for extracting text from images in C# applications. Whether you’re building a simple tool to convert images to text or a complex system for processing thousands of invoices, by following the techniques and best practices outlined in this guide, you can integrate OCR functionality into your C# applications with ease.
Request a 30-day trial license here to get unlimited OCR capabilities and unlock valuable information trapped in visual format.
In daily work, extracting text from PDF files is a common task. For standard digital documents—such as those exported from Word to PDF—this process is usually straightforward. However, things get tricky when dealing with scanned PDFs, which are essentially images of printed documents. In such cases, traditional text extraction methods fail, and OCR (Optical Character Recognition) becomes necessary to recognize and convert the text within images into editable content.
In this article, we’ll walk through how to perform PDF OCR using Python to automate this workflow and significantly reduce manual effort.
- Why OCR is Needed for PDF Text Extraction
- Best Python OCR Libraries for PDF Processing
- Convert PDF Pages to Images Using Python
- Scan and Extract Text from Images Using Spire.OCR
- Conclusion
Why OCR is Needed for PDF Text Extraction
When it comes to extracting text from PDF files, one important factor that determines your approach is the type of PDF. Generally, PDFs fall into two categories: scanned (image-based) PDFs and searchable PDFs. Each requires a different strategy for text extraction.
-
Scanned PDFs are typically created by digitizing physical documents such as books, invoices, contracts, or magazines. While the text appears readable to the human eye, it's actually embedded as an image—making it inaccessible to traditional text extraction tools. Older digital files or password-protected PDFs may also lack an actual text layer.
-
Searchable PDFs, on the other hand, contain a hidden text layer that allows computers to search, copy, or parse the content. These files are usually generated directly from applications like Microsoft Word or PDF editors and are much easier to process programmatically.
This distinction highlights the importance of OCR (Optical Character Recognition) when working with scanned PDFs. With tools like Python PDF OCR, we can convert these image-based PDFs into images, run OCR to recognize the text, and extract it for further use—all in an automated way.
Best Python OCR Libraries for PDF Processing
Before diving into the implementation, let’s take a quick look at the tools we’ll be using in this tutorial. To simplify the process, we’ll use Spire.PDF for Python and Spire.OCR for Python to perform PDF OCR in Python.
- Spire.PDF will handle the conversion from PDF to images.
- Spire.OCR, a powerful OCR tool for PDF files, will recognize the text in those images and extract it as editable content.
You can install Spire.PDF using the following pip command:
pip install spire.pdf
and install Spire.OCR with:
pip install spire.ocr
Alternatively, you can download and install them manually by visiting the official Spire.PDF and Spire.OCR pages.
Convert PDF Pages to Images Using Python
Before we dive into Python PDF OCR, it's crucial to understand a foundational step: OCR technology doesn't directly process PDF files. Especially with image-based PDFs (like those created from scanned documents), we first need to convert them into individual image files.
Converting PDFs to images using the Spire.PDF library is straightforward. You simply load your target PDF document and then iterate through each page. For every page, call the PdfDocument.SaveAsImage() method to save it as a separate image file. Once this step is complete, your images are ready for the subsequent OCR process.
Here's a code example showing how to convert PDF to PNG:
from spire.pdf import *
# Load the PDF file
pdf = PdfDocument()
pdf.LoadFromFile("/AI-Generated Art.pdf")
# Loop through pages and save as images
for i in range(pdf.Pages.Count):
# Convert each page to image
with pdf.SaveAsImage(i) as image:
# Save in different formats as needed
image.Save(f"/output/pdftoimage/ToImage_{i}.png")
# image.Save(f"Output/ToImage_{i}.jpg")
# image.Save(f"Output/ToImage_{i}.bmp")
# Close the PDF document
pdf.Close()
Conversion result preview: 
Scan and Extract Text from Images Using Spire.OCR
After converting the scanned PDF into images, we can now move on to OCR PDF with Python and to extract text from the PDF. With OcrScanner.Scan() from Spire.OCR, recognizing text in images becomes straightforward. It supports multiple languages such as English, Chinese, French, and German. Once the text is extracted, you can easily save it to a .txt file or generate a Word document.
The code example below shows how to OCR the first PDF page and export to text in Python:
from spire.ocr import *
# Create OCR scanner instance
scanner = OcrScanner()
# Configure OCR model path and language
configureOptions = ConfigureOptions()
configureOptions.ModelPath = r'E:/DownloadsNew/win-x64/'
configureOptions.Language = 'English'
scanner.ConfigureDependencies(configureOptions)
# Perform OCR on the image
scanner.Scan(r'/output/pdftoimage/ToImage_0.png')
# Save extracted text to file
text = scanner.Text.ToString()
with open('/output/scannedpdfoutput.txt', 'a', encoding='utf-8') as file:
file.write(text + '\n')
Result preview: 
The Conclusion
In this article, we covered how to perform PDF OCR with Python—from converting PDFs to images, to recognizing text with OCR, and finally saving the extracted content as a plain text file. With this streamlined approach, extracting text from scanned PDFs becomes effortless. If you're looking to automate your PDF processing workflows, feel free to reach out and request a 30-day free trial. It’s time to simplify your document management.
How to Convert Markdown to PDF and Excel in C# .NET: A Complete Guide
2025-07-18 06:09:23 Written by zaki zou
Markdown is a lightweight markup language widely used for writing formatted text using simple plain syntax. Favored by developers, writers, and technical content creators for its readability and ease of use, Markdown is perfect for drafting documents, notes, and technical content. However, Markdown files (.md) often need to be converted into other formats such as PDF for official distribution or Excel for data analysis and reporting.
In this comprehensive guide, you will learn how to convert Markdown files to PDF and Excel using C# and Spire.XLS for .NET — a powerful and easy-to-use library that supports direct Markdown loading and exporting to multiple formats. Whether you want to generate polished PDF documents or structured Excel spreadsheets, this tutorial covers everything you need.
Table of Contents
- Why Convert Markdown to PDF and Excel?
- Prerequisites (Library & Environment Setup)
- How to Convert Markdown to PDF in C# (Step-by-Step with Code)
- How to Convert Markdown to Excel in C# (Step-by-Step with Code)
- Complete C# Code Example: Convert Markdown to PDF and Excel in One Go
- Best Practices for Markdown Conversion
- Conclusion
- FAQs
Why Convert Markdown to PDF and Excel?
Markdown is great for writing, but it has limitations when sharing or processing documents:
- PDF files provide a fixed-layout, platform-independent format ideal for sharing polished reports, manuals, or official documentation. PDFs maintain the original style and layout regardless of device or software.
- Excel files are essential when Markdown contains tabular data that you want to analyze, manipulate, or integrate into business processes. Converting Markdown tables to Excel spreadsheets lets you utilize formulas, filters, charts, and data tools effectively.
By converting Markdown to these formats programmatically in C#, you can automate documentation workflows, batch-process files, and integrate with other .NET applications.
Prerequisites (Library & Environment Setup)
Before you start converting Markdown files, ensure your development environment meets the following requirements:
- .NET Framework or .NET Core installed.
- Spire.XLS for .NET: A comprehensive Excel library that supports loading Markdown and exporting to PDF and Excel formats.
Install Spire.XLS via NuGet
You can easily install Spire.XLS in your C# project via NuGet by opening the NuGet Package Manager and executing the following command:
Install-Package Spire.XLS
This package provides all the necessary classes and methods to load Markdown and export documents without relying on Microsoft Office automation.
How to Convert Markdown to PDF in C# (Step-by-Step with Code)
To convert a Markdown file to PDF in C#, follow the steps below:
Step 1: Load the Markdown File
Create a new instance of the Workbook class and load your Markdown file:
Workbook workbook = new Workbook();
workbook.LoadFromMarkdown("test.md");
This method parses the Markdown content, including text and tables, into an Excel workbook structure that Spire.XLS can manipulate.
Step 2: Customize Conversion Settings (Optional)
After loading the Markdown file, you can apply conversion settings to ensure the PDF output retains a clean and readable layout. For example, enabling the SheetFitToPage option ensures that the entire Markdown content fits within a single PDF page.
workbook.ConverterSetting.SheetFitToPage = true;
Step 3: Export as PDF
Save the workbook as a PDF file:
workbook.SaveToFile("output.pdf", FileFormat.PDF);
This generates a well-formatted PDF document preserving the Markdown layout and styling suitable for printing or sharing.

How to Convert Markdown to Excel in C# (Step-by-Step with Code)
After loading the Markdown file into the workbook, you can also export it to an Excel spreadsheet format:
workbook.SaveToFile("output.xls", ExcelVersion.Version97to2003);
You may also choose other Excel versions depending on your target compatibility:
workbook.SaveToFile("output.xlsx", ExcelVersion.Version2016);
The exported Excel file retains tables and structured data from your Markdown, allowing further analysis or manipulation.

Complete C# Code Example: Convert Markdown to PDF and Excel in One Go
Here’s the full example combining both PDF and Excel exports in a single run:
using Spire.Xls;
namespace MarkdownToPdfAndExcel
{
internal class Program
{
static void Main(string[] args)
{
// Initialize the workbook
Workbook workbook = new Workbook();
// Load Markdown content
workbook.LoadFromMarkdown("test.md");
// Fit the sheet data to one page (optional)
workbook.ConverterSetting.SheetFitToPage = true;
// Export to PDF
workbook.SaveToFile("output.pdf", FileFormat.PDF);
// Export to xls (Excel 97-2003 format)
workbook.SaveToFile("output.xls", ExcelVersion.Version97to2003);
// Export to xlsx (Excel 2016 format)
workbook.SaveToFile("output.xlsx", ExcelVersion.Version2016);
workbook.Dispose();
}
}
}
Best Practices for Markdown Conversion
- Use UTF-8 Encoding: Ensure your Markdown files use UTF-8 encoding, especially if they contain special or non-English characters. This ensures proper character display and avoids encoding errors during conversion.
- Maintain Clean and Well-Formatted Markdown Tables: To achieve accurate and reliable Excel conversions, structure your Markdown tables carefully. Use consistent pipe (|) delimiters and avoid malformed syntax to preserve table integrity in the output spreadsheet.
- Load Once, Export Multiple Formats: For optimal performance, load your Markdown content into the Workbook object a single time, then export it to various formats such as PDF and Excel. This reduces processing overhead and speeds up batch conversions.
- Assess Markdown Complexity: Spire.XLS effectively supports basic Markdown syntax, such as headings and tables. However, advanced features like embedded images or code blocks with syntax highlighting might require pre-processing or conversion through intermediate formats like HTML.
- Choose Appropriate Excel Export Versions: To maximize compatibility with your users’ software, select the Excel file format based on their environment. For example, use the .xls format for legacy Excel 97–2003 users, and .xlsx for Excel 2007 and later versions to ensure broad accessibility and full feature support.
Conclusion
Converting Markdown to PDF and Excel using C# and Spire.XLS is a fast, flexible, and reliable approach to modern document workflows. With minimal code, developers can automate the transformation of lightweight Markdown into professional PDFs for distribution and Excel spreadsheets for business analytics.
This method streamlines technical writing, reporting, and data handling tasks within .NET applications and enables seamless integration with other business processes.
FAQs
Q1: Can I batch convert multiple Markdown files using Spire.XLS in C#?
A1: Yes, you can loop through multiple Markdown files, load each one using Spire.XLS, and export them individually to PDF or Excel formats within the same C# project.
Q2: Is Microsoft Office required to use Spire.XLS for Markdown conversion?
A2: No, Spire.XLS is a standalone library and does not rely on Microsoft Office or Excel being installed on the machine.
Q3: Is Spire.XLS free to use?
A3: Spire.XLS offers a free version with some limitations. A commercial license is available for full features.
Get a Free License
To fully experience the capabilities of Spire.XLS for .NET without any evaluation limitations, you can request a free 30-day trial license.