Spire.Office Knowledgebase Page 16 | E-iceblue

When working with Excel, you may sometimes need to protect critical data while allowing users to edit other parts of the worksheet. This is especially important for scenarios where certain formulas, headers, or reference values must remain unchanged to ensure data integrity. By locking specific areas, you can prevent accidental modifications, maintain consistency, and control access to key information within the spreadsheet. In this article, you will learn how to lock cells, rows, and columns in Excel in React using JavaScript and the Spire.XLS for JavaScript library.

Install Spire.XLS for JavaScript

To get started with locking cells, rows, and columns in Excel files within a React application, you can either download Spire.XLS for JavaScript from our website or install it via npm with the following command:

npm i spire.xls

After that, copy the "Spire.Xls.Base.js" and "Spire.Xls.Base.wasm" files to the public folder of your project. Additionally, include the required font files to ensure accurate and consistent text rendering.

For more details, refer to the documentation: How to Integrate Spire.XLS for JavaScript in a React Project

Lock Cells in Excel

Spire.XLS for JavaScript offers the Worksheet.Range.get().Style.Locked property, allowing you to protect critical data cells while enabling edits to the rest of the worksheet. The detailed steps are as follows.

  • Create a Workbook object using the wasmModule.Workbook.Create() method.
  • Load a sample Excel file using the Workbook.LoadFromFile() method.
  • Get the first worksheet using the Workbook.Worksheets.get() method.
  • Unlock all cells in the used range of the worksheet by setting the Worksheet.Range.Style.Locked property to "false".
  • Set text for specific cells using the Worksheet.Range.get().Text property and then lock them by setting the Worksheet.Range.get().Style.Locked property to "true".
  • Protect the worksheet with a password using the Worksheet.Protect() method.
  • Save the result file using the Workbook.SaveToFile() method.
  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to hold the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {

        // Access the Module and spirexls from the global window object
        const { Module, spirexls } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirexls);
        };
      } catch (err) {

        // Log any errors that occur during loading
        console.error('Failed to load WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file

    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Xls.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []); 

  // Function to lock specific cells in Excel
  const LockExcelCells = async () => {
    if (wasmModule) {
      // Load the ARIALUNI.TTF font file into the virtual file system (VFS)
      await wasmModule.FetchFileToVFS('ARIALUNI.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`);
      
      // Load the input Excel file into the virtual file system (VFS)
      const inputFileName = 'Sample.xlsx';
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);
      
      // Create a new workbook
      const workbook = wasmModule.Workbook.Create();
      // Load the Excel file from the virtual file system
      workbook.LoadFromFile({fileName: inputFileName});

      // Get the first worksheet
      let sheet = workbook.Worksheets.get(0);

      // Unlock all cells in the used range of the worksheet
      sheet.Range.Style.Locked = false;

      // Lock a specific cell in the worksheet
      sheet.Range.get("A1").Text = "Locked";
      sheet.Range.get("A1").Style.Locked = true;

      // Lock a specific cell range in the worksheet
      sheet.Range.get("C1:E3").Text = "Locked";
      sheet.Range.get("C1:E3").Style.Locked = true;

      // Protect the worksheet with a password
      sheet.Protect({password: "123", options: wasmModule.SheetProtectionType.All});

      let outputFileName = "LockCells.xlsx";
      // Save the resulting file
      workbook.SaveToFile({ fileName: outputFileName, version: wasmModule.ExcelVersion.Version2013 });
      
      // Read the saved file and convert it to a Blob object
      const modifiedFileArray = wasmModule.FS.readFile(outputFileName);
      const modifiedFile = new Blob([modifiedFileArray], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' });
      
      // Create a URL for the Blob and initiate the download
      const url = URL.createObjectURL(modifiedFile);
      const a = document.createElement('a');
      a.href = url;
      a.download = outputFileName;
      document.body.appendChild(a);
      a.click(); 
      document.body.removeChild(a); 
      URL.revokeObjectURL(url); 

      // Clean up resources used by the workbooks
      workbook.Dispose();
    }
  };

  return (
    <div style={{ textAlign: 'center', height: '300px' }}>
      <h1>Lock Specific Cells in Excel Using JavaScript in React</h1>
      <button onClick={LockExcelCells} disabled={!wasmModule}>
        Lock
      </button>
    </div>
  );
}

export default App;	

Run the code to launch the React app at localhost:3000. Once it's running, click on the "Lock" button to lock specific cells in the Excel file:

Run the code to launch the React app

Upon opening the output Excel sheet and attempting to edit the protected cells, a dialog box will appear, notifying you that the cell you're trying to change is on a protected sheet:

Lock Cells in Excel

Lock Rows in Excel

If you need to preserve row-based data, such as headers or summaries, you can lock entire rows using the Worksheet.Rows.get().Style.Locked property in Spire.XLS for JavaScript. The detailed steps are as follows.

  • Create a Workbook object using the wasmModule.Workbook.Create() method.
  • Load a sample Excel file using the Workbook.LoadFromFile() method.
  • Get the first worksheet using the Workbook.Worksheets.get() method.
  • Unlock all cells in the used range of the worksheet by setting the Worksheet.Range.Style.Locked property to "false".
  • Set text for a specific row using the Worksheet.Rows.get().Text property and then lock it by setting the Worksheet.Rows.get().Style.Locked property to "true".
  • Protect the worksheet with a password using the Worksheet.Protect() method.
  • Save the result file using the Workbook.SaveToFile() method.
  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to hold the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {

        // Access the Module and spirexls from the global window object
        const { Module, spirexls } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirexls);
        };
      } catch (err) {

        // Log any errors that occur during loading
        console.error('Failed to load WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file

    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Xls.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []); 

  // Function to lock specific rows in Excel
  const LockExcelRows = async () => {
    if (wasmModule) {
      // Load the ARIALUNI.TTF font file into the virtual file system (VFS)
      await wasmModule.FetchFileToVFS('ARIALUNI.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`);
      
      // Load the input Excel file into the virtual file system (VFS)
      const inputFileName = 'Sample.xlsx';
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);
      
      // Create a new workbook
      const workbook = wasmModule.Workbook.Create();
      // Load the Excel file from the virtual file system
      workbook.LoadFromFile({fileName: inputFileName});

      // Get the first worksheet
      let sheet = workbook.Worksheets.get(0);

      // Unlock all cells in the used range of the worksheet
      sheet.Range.Style.Locked = false;

      // Lock the third row in the worksheet
      sheet.Rows.get(2).Text = "Locked";
      sheet.Rows.get(2).Style.Locked = true;

      // Protect the worksheet with a password
      sheet.Protect({password: "123", options: wasmModule.SheetProtectionType.All});

      let outputFileName = "LockRows.xlsx";
      // Save the resulting file
      workbook.SaveToFile({ fileName: outputFileName, version: wasmModule.ExcelVersion.Version2013 });
      
      // Read the saved file and convert it to a Blob object
      const modifiedFileArray = wasmModule.FS.readFile(outputFileName);
      const modifiedFile = new Blob([modifiedFileArray], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' });
      
      // Create a URL for the Blob and initiate the download
      const url = URL.createObjectURL(modifiedFile);
      const a = document.createElement('a');
      a.href = url;
      a.download = outputFileName;
      document.body.appendChild(a);
      a.click(); 
      document.body.removeChild(a); 
      URL.revokeObjectURL(url); 

      // Clean up resources used by the workbooks
      workbook.Dispose();
    }
  };

  return (
    <div style={{ textAlign: 'center', height: '300px' }}>
      <h1>Lock Specific Rows in Excel Using JavaScript in React</h1>
      <button onClick={LockExcelRows} disabled={!wasmModule}>
        Lock
      </button>
    </div>
  );
}

export default App;

Lock Rows in Excel

Lock Columns in Excel

To maintain the integrity of key vertical data, such as fixed identifiers or category labels, you can lock entire columns using the Worksheet.Columns.get().Style.Locked property in Spire.XLS for JavaScript. The detailed steps are as follows.

  • Create a Workbook object using the wasmModule.Workbook.Create() method.
  • Load a sample Excel file using the Workbook.LoadFromFile() method.
  • Get the first worksheet using the Workbook.Worksheets.get() method.
  • Unlock all cells in the used range of the worksheet by setting the Worksheet.Range.Style.Locked property to "false".
  • Set text for a specific column using the Worksheet.Columns.get().Text property and then lock it by setting the Worksheet.Columns.get().Style.Locked property to "true".
  • Protect the worksheet with a password using the Worksheet.Protect() method.
  • Save the result file using the Workbook.SaveToFile() method.
  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to hold the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {

        // Access the Module and spirexls from the global window object
        const { Module, spirexls } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirexls);
        };
      } catch (err) {

        // Log any errors that occur during loading
        console.error('Failed to load WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file

    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Xls.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []); 

  // Function to lock specific columns in Excel
  const LockExcelColumns = async () => {
    if (wasmModule) {
      // Load the ARIALUNI.TTF font file into the virtual file system (VFS)
      await wasmModule.FetchFileToVFS('ARIALUNI.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`);
      
      // Load the input Excel file into the virtual file system (VFS)
      const inputFileName = 'Sample.xlsx';
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);
      
      // Create a new workbook
      const workbook = wasmModule.Workbook.Create();
      // Load the Excel file from the virtual file system
      workbook.LoadFromFile({fileName: inputFileName});

      // Get the first worksheet
      let sheet = workbook.Worksheets.get(0);

      // Unlock all cells in the used range of the worksheet
      sheet.Range.Style.Locked = false;

      // Lock the fourth column in the worksheet
      sheet.Columns.get(3).Text = "Locked";
      sheet.Columns.get(3).Style.Locked = true;

      // Protect the worksheet with a password
      sheet.Protect({password: "123", options: wasmModule.SheetProtectionType.All});

      let outputFileName = "LockColumns.xlsx";
      // Save the resulting file
      workbook.SaveToFile({ fileName: outputFileName, version: wasmModule.ExcelVersion.Version2013 });
      
      // Read the saved file and convert it to a Blob object
      const modifiedFileArray = wasmModule.FS.readFile(outputFileName);
      const modifiedFile = new Blob([modifiedFileArray], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' });
      
      // Create a URL for the Blob and initiate the download
      const url = URL.createObjectURL(modifiedFile);
      const a = document.createElement('a');
      a.href = url;
      a.download = outputFileName;
      document.body.appendChild(a);
      a.click(); 
      document.body.removeChild(a); 
      URL.revokeObjectURL(url); 

      // Clean up resources used by the workbooks
      workbook.Dispose();
    }
  };

  return (
    <div style={{ textAlign: 'center', height: '300px' }}>
      <h1>Lock Specific Columns in Excel Using JavaScript in React</h1>
      <button onClick={LockExcelColumns} disabled={!wasmModule}>
        Lock
      </button>
    </div>
  );
}

export default App;

Lock Columns in Excel

Get a Free License

To fully experience the capabilities of Spire.XLS for JavaScript without any evaluation limitations, you can request a free 30-day trial license.

Fonts play a crucial role in defining the visual appeal and readability of Word documents, influencing everything from professional reports to creative projects. Whether you're looking to refresh the design of your document by replacing outdated fonts or troubleshooting missing fonts that disrupt formatting, understanding how to retrieve and replace fonts in Microsoft Word is an essential skill.

In this article, you will learn how to get and replace fonts in a Word document using C# and Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Get Fonts Used in a Word Document in C#

To extract font information from a Word document, you must traverse its sections and paragraphs, examining each child object within the paragraphs. If a child object is identified as a TextRange, you can retrieve the font details—such as the font name, size, and color—using the properties of the TextRange class.

The following are the steps to get fonts used in a Word document in C#:

  • Create a Document object.
  • Load a Word document using the Document.LoadFromFile() method.
  • Iterate through each section, paragraph, and child object.
  • For each child object, check if it is an instance of TextRange class.
  • If it is, retrieve the font name and size using the TextRange.CharacterFormat.FontName and TextRange.CharacterFormat.FontSize properties.
  • Write the font information in a text file.
  • C#
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;

namespace RetrieveFonts
{
    // Customize a FontInfo class to help store font information
    class FontInfo
    {
        public string Name { get; set; }
        public float? Size { get; set; }

        public FontInfo()
        {
            Name = "";
            Size = null;
        }

        public override bool Equals(object obj)
        {
            if (this == obj) return true;
            if (!(obj is FontInfo other)) return false;
            return Name.Equals(other.Name) && Size.Equals(other.Size);
        }

        public override int GetHashCode()
        {
            return HashCode.Combine(Name, Size);
        }
    }
    class Program
    {
        // Function to write string to a txt file
        static void WriteAllText(string filename, List<string> text)
        {
            try
            {
                using (StreamWriter writer = new StreamWriter(filename))
                {
                    foreach (var line in text)
                    {
                        writer.WriteLine(line);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }
        static void Main(string[] args)
        {
            List<FontInfo> fontInfos = new List<FontInfo>();
            List<string> fontInformations = new List<string>();

            // Create a Document instance
            Document document = new Document();

            // Load a Word document
            document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx");

            // Iterate through the sections
            foreach (Section section in document.Sections)
            {
                // Iterate through the paragraphs
                foreach (Paragraph paragraph in section.Body.Paragraphs)
                {
                    // Iterate through the child objects
                    foreach (DocumentObject obj in paragraph.ChildObjects)
                    {
                        if (obj is TextRange txtRange)
                        {
                            // Get the font name, size and text color
                            string fontName = txtRange.CharacterFormat.FontName;
                            float fontSize = txtRange.CharacterFormat.FontSize;
                            string textColor = txtRange.CharacterFormat.TextColor.ToString();

                            // Store the font information
                            FontInfo fontInfo = new FontInfo { Name = fontName, Size = fontSize };

                            if (!fontInfos.Contains(fontInfo))
                            {
                                fontInfos.Add(fontInfo);
                                string str = $"Font Name: {fontInfo.Name}, Size: {fontInfo.Size:F2}, Color: {textColor}";
                                fontInformations.Add(str);
                            }
                        }
                    }
                }
            }

            // Write font information to a txt file
            WriteAllText("GetFonts.txt", fontInformations);

            // Dispose resources
            document.Dispose();
        }
    }
}

Get fonts used in a Word document in C#

Replace a Specific Font in a Word Document in C#

After retrieving the font name from a specific TextRange, you can easily replace it with a new font using the TextRange.CharacterFormat.FontName property. Additionally, you can modify the font size and text color by accessing the corresponding properties in the TextRange class. This allows for comprehensive customization of the text formatting within the document.

The following are the steps to replace a specific font in a Word document in C#:

  • Create a Document object.
  • Load a Word document using the Document.LoadFromFile() method.
  • Iterate through each section and its paragraphs.
  • For each paragraph, check each child object to see if it is an instance of the TextRange class.
  • If it is a TextRange, retrieve the font name using the TextRange.CharacterFormat.FontName property.
  • Compare the font name to the specified font.
  • If they match, set a new font name using the TextRange.CharacterFormat.FontName property.
  • Save the modified document to a new Word file using the Document.SaveToFile() method.
  • C#
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;

namespace ReplaceFont
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document instance
            Document document = new Document();

            // Load a Word document
            document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx");

            // Iterate through the sections
            foreach (Section section in document.Sections)
            {
                // Iterate through the paragraphs
                foreach (Paragraph paragraph in section.Body.Paragraphs)
                {
                    // Iterate through the child objects
                    foreach (DocumentObject obj in paragraph.ChildObjects)
                    {
                        // Determine if a child object is a TextRange
                        if (obj is TextRange txtRange)
                        {
                            // Get the font name
                            string fontName = txtRange.CharacterFormat.FontName;

                            // Determine if the font name is Calibri
                            if (fontName.Equals("Calibri", StringComparison.OrdinalIgnoreCase))
                            {
                                // Replace the font with another font
                                txtRange.CharacterFormat.FontName = "Segoe Print";
                            }
                        }
                    }
                }
            }

            // Save the document to a different file
            document.SaveToFile("ReplaceFont.docx", FileFormat.Docx);

            // Dispose resources
            document.Dispose();
        }
    }
}

Replace fonts in a Word document in C#

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Extracting text from PDF documents directly within a React application using JavaScript provides a streamlined, self-contained solution for handling dynamic content. Given that PDFs remain a ubiquitous format for reports, forms, and data sharing, parsing their contents on the client side enables developers to build efficient applications without relying on external services. By integrating Spire.PDF for JavaScript into React, development teams gain full control over data processing, reduce latency by eliminating server-side dependencies, and deliver real-time user experiences—all while ensuring that sensitive information remains secure within the browser.

In this article, we explore how to use Spire.PDF for JavaScript to extract text from PDF documents in React applications, simplifying the integration of robust PDF content extraction features.

Install Spire.PDF for JavaScript

To get started with extracting text from PDF documents with JavaScript in a React application, you can either download Spire.PDF for JavaScript from our website or install it via npm with the following command:

npm i spire.pdf

After that, copy the "Spire.Pdf.Base.js" and "Spire.Pdf.Base.wasm" files to the public folder of your project.

For more details, refer to the documentation: How to Integrate Spire.PDF for JavaScript in a React Project

General Steps for Extracting PDF Text Using JavaScript

Spire.PDF for JavaScript provides a WebAssembly module that enables PDF document processing using simple JavaScript code in React applications. Developers can utilize the PdfTextExtractor class to handle text extraction tasks efficiently. The general steps for extracting text from PDF documents using Spire.PDF for JavaScript in React are as follows:

  • Load the Spire.Pdf.Base.js file to initialize the WebAssembly module.
  • Fetch the PDF files into the Virtual File System (VFS) using the wasmModule.FetchFileToVFS() method.
  • Create an instance of the PdfDocument class using the wasmModule.PdfDocument.Create() method.
  • Load the PDF document from the VFS into the PdfDocument instance using the PdfDocument.LoadFromFile() method.
  • Create an instance of the PdfTextExtractOptions class using the wasmModule.PdfTextExtractOptions.Create() method and configure the text extraction options.
  • Retrieve a PDF page using the PdfDocument.Pages.get_Item() method or iterate through the document's pages.
  • Create an instance of the PdfTextExtractor class with the page object using the wasmModule.PdfTextExtractor.Create() method.
  • Extract text from the page using the PdfTextExtractor.ExtractText() method.
  • Download the extracted text or process it as needed.

The PdfTextExtractOptions class allows customization of extraction settings, supporting features such as simple extraction, extracting specific page areas, and retrieving hidden text. The following table outlines the properties of the PdfTextExtractOptions class and their functions:

Property Description
IsSimpleExtraction Specifies whether to perform simple text extraction.
IsExtractAllText Specifies whether to extract all text.
ExtractArea Defines the extraction area.
IsShowHiddenText Specifies whether to extract hidden text.

Extract PDF Text with Layout Preservation

Using the PdfTextExtractor.ExtractText() method with default options enables text extraction while preserving the original text layout of the PDF pages. Below is a code example and the corresponding extraction result:

  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to store the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {
        // Access the Module and spirepdf from the global window object
        const { Module, spirepdf } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirepdf);
        };
      } catch (err) {
        // Log any errors that occur during module loading
        console.error('Failed to load the WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file
    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []);

  // Function to extract all text from a PDF document
  const ExtractPDFText = async () => {
    if (wasmModule) {
      // Specify the input and output file names
      const inputFileName = 'Sample.pdf';
      const outputFileName = 'PDFTextWithLayout.txt';

      // Fetch the input file and add it to the VFS
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);

      // Create an instance of the PdfDocument class
      const pdf = wasmModule.PdfDocument.Create();

      // Load the PDF document from the VFS
      pdf.LoadFromFile(inputFileName);

      // Create a string object to store the extracted text
      let text = '';

      // Create an instance of the PdfTextExtractOptions class
      const extractOptions = wasmModule.PdfTextExtractOptions.Create();

      // Iterate through each page of the PDF document
      for (let i = 0; i < pdf.Pages.Count; i++) {
        // Get the current page
        const page = pdf.Pages.get_Item(i);
        // Create an instance of the PdfTextExtractor class
        const textExtractor = wasmModule.PdfTextExtractor.Create(page);
        // Extract the text from the current page and add it to the text string
        text += textExtractor.ExtractText(extractOptions);
      }

      // Create a Blob object from the text string and download it
      const blob = new Blob([text], { type: 'text/plain' });
      const url = URL.createObjectURL(blob);
      const a = document.createElement('a');
      a.href = url;
      a.download = `${outputFileName}`;
      document.body.appendChild(a);
      a.click();
      document.body.removeChild(a);
      URL.revokeObjectURL(url);
    }
  };

  return (
      <div style={{ textAlign: 'center', height: '300px' }}>
        <h1>Extract Text from PDF Using JavaScript in React</h1>
        <button onClick={ExtractPDFText} disabled={!wasmModule}>
          Extract and Download
        </button>
      </div>
  );
}

export default App;

 Text Extracted from PDF with Layout Using Spire.PDF for JavaScript

Extract PDF Text without Layout Preservation

Setting the PdfTextExtractOptions.IsSimpleExtraction property to true enables a simple text extraction strategy, allowing text extraction from PDF pages without preserving the layout. In this approach, blank spaces are not retained. Instead, the program tracks the Y position of each text string and inserts line breaks whenever the Y position changes.

Below is a code example demonstrating text extraction without layout preservation using Spire.PDF for JavaScript, along with the extraction result:

  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to store the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {
        // Access the Module and spirepdf from the global window object
        const { Module, spirepdf } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirepdf);
        };
      } catch (err) {
        // Log any errors that occur during module loading
        console.error('Failed to load the WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file
    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []);

  // Function to extract all text from a PDF document without layout preservation
  const ExtractPDFText = async () => {
    if (wasmModule) {
      // Specify the input and output file names
      const inputFileName = 'Sample.pdf';
      const outputFileName = 'PDFTextWithoutLayout.txt';

      // Fetch the input file and add it to the VFS
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);

      // Create an instance of the PdfDocument class
      const pdf = wasmModule.PdfDocument.Create();

      // Load the PDF document from the VFS
      pdf.LoadFromFile(inputFileName);

      // Create a string object to store the extracted text
      let text = '';

      // Create an instance of the PdfTextExtractOptions class
      const extractOptions = wasmModule.PdfTextExtractOptions.Create();

      // Enable simple text extraction to extract text without preserving layout
      extractOptions.IsSimpleExtraction = true;

      // Iterate through each page of the PDF document
      for (let i = 0; i < pdf.Pages.Count; i++) {
        // Get the current page
        const page = pdf.Pages.get_Item(i);
        // Create an instance of the PdfTextExtractor class
        const textExtractor = wasmModule.PdfTextExtractor.Create(page);
        // Extract the text from the current page and add it to the text string
        text += textExtractor.ExtractText(extractOptions);
      }

      // Create a Blob object from the text string and download it
      const blob = new Blob([text], { type: 'text/plain' });
      const url = URL.createObjectURL(blob);
      const a = document.createElement('a');
      a.href = url;
      a.download = `${outputFileName}`;
      document.body.appendChild(a);
      a.click();
      document.body.removeChild(a);
      URL.revokeObjectURL(url);
    }
  };

  return (
      <div style={{ textAlign: 'center', height: '300px' }}>
        <h1>Extract Text from PDF Without Layout Preservation Using JavaScript in React</h1>
        <button onClick={ExtractPDFText} disabled={!wasmModule}>
          Extract and Download
        </button>
      </div>
  );
}

export default App;

Text Extracted from PDF Without Layout Using JavaScript in React

Extract PDF Text from Specific Page Areas

The PdfTextExtractOptions.ExtractArea property allows users to define a specific area using a RectangleF object to extract only the text within that area from a PDF page. This method helps exclude unwanted fixed content from the extraction process. The following code example and extraction result illustrate this functionality:

  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to store the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {
        // Access the Module and spirepdf from the global window object
        const { Module, spirepdf } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirepdf);
        };
      } catch (err) {
        // Log any errors that occur during module loading
        console.error('Failed to load the WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file
    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []);

  // Function to extract text from a specific area of a PDF page
  const ExtractPDFText = async () => {
    if (wasmModule) {
      // Specify the input and output file names
      const inputFileName = 'Sample.pdf';
      const outputFileName = 'PDFTextPageArea.txt';

      // Fetch the input file and add it to the VFS
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);

      // Create an instance of the PdfDocument class
      const pdf = wasmModule.PdfDocument.Create();

      // Load the PDF document from the VFS
      pdf.LoadFromFile(inputFileName);

      // Create a string object to store the extracted text
      let text = '';

      // Get a page from the PDF document
      const page = pdf.Pages.get_Item(0);

      // Create an instance of the PdfTextExtractOptions class
      const extractOptions = wasmModule.PdfTextExtractOptions.Create();

      // Set the page area to extract text from using a RectangleF object
      extractOptions.ExtractArea = wasmModule.RectangleF.Create({ x: 0, y: 500, width: page.Size.Width, height: 200});

      // Create an instance of the PdfTextExtractor class
      const textExtractor = wasmModule.PdfTextExtractor.Create(page);

      // Extract the text from specified area of the page
      text = textExtractor.ExtractText(extractOptions);

      // Create a Blob object from the text string and download it
      const blob = new Blob([text], { type: 'text/plain' });
      const url = URL.createObjectURL(blob);
      const a = document.createElement('a');
      a.href = url;
      a.download = `${outputFileName}`;
      document.body.appendChild(a);
      a.click();
      document.body.removeChild(a);
      URL.revokeObjectURL(url);
    }
  };

  return (
      <div style={{ textAlign: 'center', height: '300px' }}>
        <h1>Extract Text from a PDF Page Area Using JavaScript in React</h1>
        <button onClick={ExtractPDFText} disabled={!wasmModule}>
          Extract and Download
        </button>
      </div>
  );
}

export default App;

PDF Text Extracted from Page Areas Using JavaScript

Extract Highlighted Text from PDF

Text highlighting in PDF documents is achieved using annotation features. With Spire.PDF for JavaScript, we can retrieve all annotations on a PDF page via the PdfPageBase.Annotations property. By checking whether each annotation is an instance of the PdfTextMarkupAnnotationWidget class, we can identify highlight annotations. Once identified, we can use the PdfTextExtractOptions.Bounds property to obtain the bounding rectangles of these annotations and set them as extraction areas, thereby extracting only the highlighted text.

The following code example demonstrates this process along with the extracted result:

  • JavaScript
import React, { useState, useEffect } from 'react';

function App() {

  // State to store the loaded WASM module
  const [wasmModule, setWasmModule] = useState(null);

  // useEffect hook to load the WASM module when the component mounts
  useEffect(() => {
    const loadWasm = async () => {
      try {
        // Access the Module and spirepdf from the global window object
        const { Module, spirepdf } = window;

        // Set the wasmModule state when the runtime is initialized
        Module.onRuntimeInitialized = () => {
          setWasmModule(spirepdf);
        };
      } catch (err) {
        // Log any errors that occur during module loading
        console.error('Failed to load the WASM module:', err);
      }
    };

    // Create a script element to load the WASM JavaScript file
    const script = document.createElement('script');
    script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
    script.onload = loadWasm;

    // Append the script to the document body
    document.body.appendChild(script);

    // Cleanup function to remove the script when the component unmounts
    return () => {
      document.body.removeChild(script);
    };
  }, []);

  // Function to extract highlighted text from PDF
  const ExtractPDFText = async () => {
    if (wasmModule) {
      // Specify the input and output file names
      const inputFileName = 'Sample.pdf';
      const outputFileName = 'PDFTextHighlighted.txt';

      // Fetch the input file and add it to the VFS
      await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);

      // Create an instance of the PdfDocument class
      const pdf = wasmModule.PdfDocument.Create();

      // Load the PDF document from the VFS
      pdf.LoadFromFile(inputFileName);

      // Create a string object to store the extracted text
      let text = '';

      // Iterate through each page of the PDF document
      for (const page of pdf.Pages) {
        // Iterate through each annotation on the page
        for (let i = 0; i < page.Annotations.Count; i++) {
          // Get the current annotation
          const annotation = page.Annotations.get_Item(i)
          // Check if the annotation is an instance of PdfTextMarkupAnnotation
          if (annotation instanceof wasmModule.PdfTextMarkupAnnotationWidget) {
            // Get the bounds of the annotation
            const bounds = annotation.Bounds;
            // Create an instance of PdfTextExtractOptions
            const extractOptions = wasmModule.PdfTextExtractOptions.Create();
            // Set the bounds of the highlight annotation as the extraction area
            extractOptions.ExtractArea = bounds;
            //
            const textExtractor = wasmModule.PdfTextExtractor.Create(page)
            // Extract the highlighted text and append it to the text string
            text += textExtractor.ExtractText(extractOptions);
          }
        }
      }

      // Create a Blob object from the text string and download it
      const blob = new Blob([text], { type: 'text/plain' });
      const url = URL.createObjectURL(blob);
      const a = document.createElement('a');
      a.href = url;
      a.download = `${outputFileName}`;
      document.body.appendChild(a);
      a.click();
      document.body.removeChild(a);
      URL.revokeObjectURL(url);
    }
  };

  return (
      <div style={{ textAlign: 'center', height: '300px' }}>
        <h1>Extract Highlighted Text from PDF Using JavaScript in React</h1>
        <button onClick={ExtractPDFText} disabled={!wasmModule}>
          Extract and Download
        </button>
      </div>
  );
}

export default App;

Highlighted Text Extracted from PDF in React

Get a Free License

To fully experience the capabilities of Spire.PDF for JavaScript without any evaluation limitations, you can request a free 30-day trial license.

page 16