Administrator

Monday, 18 September 2023 01:16

Convert PDF to Images in Python (PNG, JPG, BMP, SVG, TIFF)

Python examples to render PDF to PNG, JPG, BMP, SVG, and TIFF images

Converting PDF files to images in Python is a common need for developers and professionals working with digital documents. Whether you want to generate thumbnails, create previews, extract specific content areas, or prepare files for printing, transforming a PDF into image formats gives you flexibility and compatibility across platforms.

This comprehensive guide demonstrates how to convert PDF files into popular image formats—such as PNG, JPG, BMP, SVG, and TIFF—in Python, using practical, easy-to-follow code examples.

Why Convert PDF to Image?

Converting PDF to image formats offers several benefits:

Cross-platform compatibility: Images are easier to embed in web pages, mobile apps, or presentations.
Preview and thumbnail generation: Quickly create page snapshots without rendering the full PDF.
Selective content extraction: Save specific areas of a PDF as images for focused analysis or reuse.
Simplified sharing: Images can be easily emailed, uploaded, or displayed without special PDF readers.

Python PDF-to-Image Converter Library

Spire.PDF for Python is a powerful and easy-to-use library designed for handling PDF files. It enables developers to convert PDF pages into multiple image formats like PNG, JPG, BMP, SVG, and TIFF with excellent quality and performance.

PDF to Image Library for Python

Installation

You can easily install the library using pip. Simply open your terminal and run the following command:

pip install Spire.PDF

Simple PDF to PNG, JPG, and BMP Conversion

The SaveAsImage method of the PdfDocument class allows you to render each page of a PDF into an image format of your choice.

The code example below demonstrates how to load a PDF file, iterate through its pages, and save each one as a PNG image. You can easily adjust the file format to JPG or BMP by changing the file extension.

from spire.pdf import *

# Load the PDF file
pdf = PdfDocument()
pdf.LoadFromFile("template.pdf")

# Loop through pages and save as images
for i in range(pdf.Pages.Count):
    # Convert each page to image
    with pdf.SaveAsImage(i) as image:
        
        # Save in different formats as needed
        image.Save(f"Output/ToImage_{i}.png")
        # image.Save(f"Output/ToImage_{i}.jpg")
        # image.Save(f"Output/ToImage_{i}.bmp")

# Close the PDF document
pdf.Close()

Python: Convert PDF to Images (JPG, PNG, BMP)

Advanced Conversion Options

Enable Transparent Image Background

Transparent backgrounds help integrate images seamlessly into designs, avoiding unwanted borders or background colors.

To enable a transparent background during PDF-to-image conversion in Python, use the SetPdfToImageOptions() method with an alpha value of 0. This setting ensures that the background of the output image is fully transparent.

The following example demonstrates how to export each PDF page as a transparent PNG image.

from spire.pdf import *

# Load PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("template.pdf")

# Set the transparent value of the image's background to 0
pdf.ConvertOptions.SetPdfToImageOptions(0)

# Loop through all pages and save each as an image
for i in range(pdf.Pages.Count):
    # Convert each page to an image
    with pdf.SaveAsImage(i) as image:
        # Save the image to the output directory
        image.Save(f"Output/ToImage_{i}_transparent.png")

# Close the PDF document
pdf.Close()

Note: Transparency is supported in PNG but not in JPG or BMP formats.

Crop Specific PDF Areas to Image

In some cases, you may only need to export a specific area of a PDF page—such as a chart, table, or block of text. This can be done by adjusting the page’s CropBox before rendering.

The CropBox property defines the visible region of the page used for display and printing. By setting it to a specific RectangleF(x, y, width, height) value, you can isolate and export only the desired portion of the content.

The example below demonstrates how to crop a rectangular area on the first page of a PDF and save that section as a PNG image.

from spire.pdf import *

# Load the PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Access the first page of the PDF
page = doc.Pages.get_Item(0)

# Define the crop area of the page using a rectangle (x, y, width, height)
page.CropBox = RectangleF(0.0, 300.0, 600.0, 260.0)

# Convert the cropped page to an image
with pdf.SaveAsImage(0) as image:
    # Save the image to a PNG file
    image.Save("Output/CropPDFSaveAsImage.png")
    
# Close the PDF document
pdf.Close()

Note: You need to adjust the coordinates based on the location of your target content. Coordinates start from the top-left corner of the page.

Python example to crop PDF page area to image

Generate Multi-Page TIFF from PDF

The TIFF format supports multi-page documents, making it a popular choice for archival and printing purposes. Although Spire.PDF for Python doesn't natively create multi-page TIFFs, you can render individual pages as images and then use the Pillow library to merge them into one .tiff file.

Before proceeding, ensure Pillow is installed by running:

pip install Pillow

The following example illustrates how to:

Load a PDF
Convert each page to an image
Combine all images into a single multi-page TIFF

from spire.pdf import *

from PIL import Image
from io import BytesIO

# Load the PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("Input.pdf")

# Create an empty list to store PIL Images
images = []

# Iterate through all pages in the document
for i in range(pdf.Pages.Count):

    # Convert a specific page to an image stream
    with pdf.SaveAsImage(i) as imageData:

        # Open the image stream as a PIL image
        img = Image.open(BytesIO(imageData.ToArray())) 

        # Append the PIL image to list
        images.append(img)

# Save the PIL Images as a multi-page TIFF file
images[0].save("Output/ToTIFF.tiff", save_all=True, append_images=images[1:])

# Dispose resources
pdf.Dispose()

Python example to generate multi-page TIFF from PDF

It’s also possible to convert TIFF files back to PDF. For detailed instructions on it, please refer to the tutorial: Python: Convert PDF to TIFF and TIFF to PDF.

Export PDF as SVG

SVG (Scalable Vector Graphics) is an ideal format for content that requires scaling without quality loss, such as charts, vector illustrations, and technical diagrams.

By using the SaveToFile() method with the FileFormat.SVG option, you can export PDF pages as SVG files. This conversion preserves the vector characteristics of the content, making it well-suited for web embedding, responsive design, and further editing in vector graphic tools.

The following example demonstrates how to export an entire PDF document to SVG format.

from spire.pdf import *

# Load the PDF document from file
pdf = PdfDocument()
pdf.LoadFromFile("Example.pdf")

# Save each page of the file to a separate SVG file
pdf.SaveToFile("PdfToSVG/ToSVG.svg", FileFormat.SVG)

# Close the PdfDocument object
pdf.Close()

Note: Each page in the PDF will be saved as a separate SVG file named ToSVG_i.svg, where i is the page number (1-based).

To export specific pages or customize the SVG output size, please refer to our detailed guide: Python: Convert PDF to SVG.

Conclusion

Converting PDF to images in formats like PNG, JPG, BMP, SVG, and TIFF provides flexibility for sharing, displaying, and processing digital documents. With Spire.PDF for Python, you can:

Export high-quality images from PDFs in various formats
Crop specific regions for focused content extraction
Generate multi-page TIFF files for archival purposes
Create scalable SVG vector graphics for diagrams and charts

By automating PDF to image conversion in Python, you can seamlessly integrate image export into your applications and workflows.

FAQs

Q1: Can I convert a range of pages from a PDF to images?

A1: Yes. You can convert specific pages by specifying their indices in a loop. For example, to export pages 1 to 3:

# Convert only pages 1-3
for i in range(0, 3):  # 0-based index
    with pdf.SaveAsImage(i) as img:
        img.Save(f"page_{i}.png")

Q2: Can I batch convert multiple PDF files to images?

A2: Yes, batch conversion is supported. You can iterate through a list of PDF file paths and convert each one within a loop.

pdf_files = ["a.pdf", "b.pdf", "c.pdf"]
for file in pdf_files:
    pdf = PdfDocument()
    pdf.LoadFromFile(file)
    for i in range(pdf.Pages.Count):
        with pdf.SaveAsImage(i) as img:
            img.Save(f"{file}_page_{i}.png")

Q3: Is it possible to convert password-protected PDFs to images?

A3: Yes, you can convert secured PDFs to images as long as you provide the correct password when loading the PDF document.

pdf = PdfDocument()
pdf.LoadFromFile("protected.pdf", "password")

Q4: Is it possible to extract embedded images from a PDF instead of rendering pages?

A4: Yes. Aside from rendering entire pages, the library also supports extracting images directly from the PDF.

Get a Free License

To fully experience the capabilities of Spire.PDF for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion

Tagged under

pdf Python Conversion

Friday, 15 September 2023 01:01

Python: Add or Delete Table Rows and Columns in Word

Adding or removing rows and columns in a Word table allows you to adjust the table's structure to accommodate your data effectively. By adding rows and columns, you can effortlessly expand the table as your data grows, ensuring that all relevant information is captured and displayed in a comprehensive manner. On the other hand, removing unnecessary rows and columns allows you to streamline the table, eliminating any redundant or extraneous data that may clutter the document. In this article, we will demonstrate how to add or delete table rows and columns in Word in Python using Spire.Doc for Python.

Add or Insert a Row into a Word Table in Python
Add or Insert a Column into a Word Table in Python
Delete a Row from a Word Table in Python
Delete a Column from a Word Table in Python

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Add or Insert a Row into a Word Table in Python

You can add a row to the end of a Word table or insert a row at a specific location of a Word table using the Table.AddRow() or Table.InsertRow() method. The following are the detailed steps:

Create a Document object.
Load a Word document using Document.LoadFromFile() method.
Get the first section of the document using Document.Sections[] property.
Get the first table of the section using Section.Tables[] property.
Insert a row at a specific location of the table using Table.Rows.Insert() method.
Add data to the newly inserted row.
Add a row to the end of the table using Table.AddRow() method.
Add data to the newly added row.
Save the resulting document using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word document
document.LoadFromFile("Table1.docx")

# Get the first section of the document
section = document.Sections.get_Item(0)

# Get the first table of the first section
table = section.Tables.get_Item(0) if isinstance(section.Tables.get_Item(0), Table) else None

# Insert a row into the table as the third row
table.Rows.Insert(2, table.AddRow())
# Get the inserted row
insertedRow = table.Rows[2]
# Add data to the row
for i in range(insertedRow.Cells.Count):
    cell = insertedRow.Cells[i]
    paragraph = cell.AddParagraph()
    paragraph.AppendText("Inserted Row")
    paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
    cell.CellFormat.VerticalAlignment = VerticalAlignment.Middle

# Add a row at the end of the table
addedRow = table.AddRow()
# Add data to the row
for i in range(addedRow.Cells.Count):
    cell = addedRow.Cells[i]
    paragraph = cell.AddParagraph()
    paragraph.AppendText("End Row")
    paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
    cell.CellFormat.VerticalAlignment = VerticalAlignment.Middle

# Save the resulting document
document.SaveToFile("AddRows.docx", FileFormat.Docx2016)
document.Close()

Python: Add or Delete Table Rows and Columns in Word

Add or Insert a Column into a Word Table in Python

Spire.Doc for Python doesn't offer a direct method to add or insert a column into a Word table. But you can achieve this by adding or inserting cells at a specific location of each table row using TableRow.Cells.Add() or TableRow.Cells.Insert() method. The detailed steps are as follows:

Create a Document object.
Load a Word document using Document.LoadFromFile() method.
Get the first section of the document using Document.Sections[] property.
Get the first table of the section using Section.Tables[] property.
Loop through each row of the table.
Create a TableCell object, then insert it at a specific location of each row using TableRow.Cells.Insert() method and set cell width.
Add data to the cell and set text alignment.
Add a cell to the end of each row using TableRow.AddCell() method and set cell width.
Add data to the cell and set text alignment.
Save the resulting document using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word document
document.LoadFromFile("Table1.docx")

# Get the first section of the document
section = document.Sections.get_Item(0)

# Get the first table of the first section
table = section.Tables.get_Item(0) if isinstance(section.Tables.get_Item(0), Table) else None

# Loop through the rows of the table
for i in range(table.Rows.Count):
    row = table.Rows.get_Item(i)
    # Create a TableCell object
    cell = TableCell(document)
    # Insert the cell as the third cell of the row and set cell width
    row.Cells.Insert(2, cell)
    cell.Width = row.Cells[0].Width
    # Add data to the cell
    paragraph = cell.AddParagraph()
    paragraph.AppendText("Inserted Column")
    # Set text alignment
    paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
    cell.CellFormat.VerticalAlignment = VerticalAlignment.Middle

    # Add a cell to the end of the row and set cell width
    cell = row.AddCell()
    cell.Width = row.Cells[1].Width
    # Add data to the cell
    paragraph = cell.AddParagraph()
    paragraph.AppendText("End Column")
    # Set text alignment
    paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
    cell.CellFormat.VerticalAlignment = VerticalAlignment.Middle

# Save the resulting document
document.SaveToFile("AddColumns.docx", FileFormat.Docx2016)
document.Close()

Python: Add or Delete Table Rows and Columns in Word

Delete a Row from a Word Table in Python

To delete a specific row from a Word table, you can use the Table.Rows.RemoveAt() method. The detailed steps are as follows:

Create a Document object.
Load a Word document using Document.LoadFromFile() method.
Get the first section of the document using Document.Sections[] property.
Get the first table of the section using Section.Tables[] property.
Remove a specific row from the table using Table.Rows.RemoveAt() method.
Save the resulting document using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word document
document.LoadFromFile("AddRows.docx")

# Get the first section of the document
section = document.Sections.get_Item(0)

# Get the first table of the first section
table = section.Tables.get_Item(0) if isinstance(section.Tables.get_Item(0), Table) else None

# Remove the third row
table.Rows.RemoveAt(2)
# Remove the last row
table.Rows.RemoveAt(table.Rows.Count - 1)

# Save the resulting document
document.SaveToFile("RemoveRows.docx", FileFormat.Docx2016)
document.Close()

Python: Add or Delete Table Rows and Columns in Word

Delete a Column from a Word Table in Python

To delete a specific column from a Word table, you need to remove the corresponding cell from each table row using the TableRow.Cells.RemoveAt() method. The detailed steps are as follows:

Create a Document object.
Load a Word document using Document.LoadFromFile() method.
Get the first section of the document using Document.Sections[] property.
Get the first table of the section using Section.Tables[] property.
Loop through each row of the table.
Remove a specific cell from each row using TableRow.Cells.RemoveAt() method.
Save the resulting document using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word document
document.LoadFromFile("AddColumns.docx")

# Get the first section of the document
section = document.Sections.get_Item(0)

# Get the first table of the first section
table = section.Tables.get_Item(0) if isinstance(section.Tables.get_Item(0), Table) else None

# Loop through the rows of the table
for i in range(table.Rows.Count):
    row = table.Rows.get_Item(i)
    # Remove the third cell from the row
    row.Cells.RemoveAt(2)
    # Remove the last cell from the row
    row.Cells.RemoveAt(row.Cells.Count - 1)

# Save the resulting document
document.SaveToFile("RemoveColumns.docx", FileFormat.Docx2016)
document.Close()

Python: Add or Delete Table Rows and Columns in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Table

Tagged under

Friday, 08 September 2023 01:11

Python: Extract Text and Images from Word Documents

By extracting text from Word documents, you can effortlessly obtain the written information contained within them. This allows for easier manipulation, analysis, and organization of textual content, enabling tasks such as text mining, sentiment analysis, and natural language processing. Extracting images, on the other hand, provides access to visual elements embedded within Word documents, which can be crucial for tasks like image recognition, content extraction, or creating image databases. In this article, you will learn how to extract text and images from a Word document in Python using Spire.Doc for Python.

Extract Text from a Specific Paragraph in Python
Extract Text from an Entire Word Document in Python
Extract Images from an Entire Word Document in Python

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Extract Text from a Specific Paragraph in Python

To get a certain paragraph from a section, use Section.Paragraphs[index] property. Then, you can get the text of the paragraph through Paragraph.Text property. The detailed steps are as follows.

Create a Document object.
Load a Word file using Document.LoadFromFile() method.
Get a specific section through Document.Sections[index] property.
Get a specific paragraph through Section.Paragraphs[index] property.
Get text from the paragraph through Paragraph.Text property.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Get a specific section
section = doc.Sections.get_Item(0)

# Get a specific paragraph
paragraph = section.Paragraphs.get_Item(2)

# Get text from the paragraph
str = paragraph.Text

# Print result
print(str)

Python: Extract Text and Images from Word Documents

Extract Text from an Entire Word Document in Python

If you want to get text from a whole document, you can simply use Document.GetText() method. Below are the steps.

Create a Document object.
Load a Word file using Document.LoadFromFile() method.
Get text from the document using Document.GetText() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Get text from the entire document
str = doc.GetText()

# Print result
print(str)

Python: Extract Text and Images from Word Documents

Extract Images from an Entire Word Document in Python

Spire.Doc for Python does not provide a straightforward method to get images from a Word document. You need to iterate through the child objects in the document, and determine if a certain a child object is a DocPicture. If yes, you get the image data using DocPicture.ImageBytes property and then save it as a popular image format file. The main steps are as follows.

Create a Document object.
Load a Word file using Document.LoadFromFile() method.
Loop through the child objects in the document.
Determine if a specific child object is a DocPicture. If yes, get the image data through DocPicture.ImageBytes property.
Write the image data as a PNG file.

Python

import queue
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a Queue object
nodes = queue.Queue()
nodes.put(doc)

# Create a list
images = []

while nodes.qsize() > 0:
    node = nodes.get()

    # Loop through the child objects in the document
    for i in range(node.ChildObjects.Count):
        child = node.ChildObjects.get_Item(i)

        # Determine if a child object is a picture
        if child.DocumentObjectType == DocumentObjectType.Picture:
            picture = child if isinstance(child, DocPicture) else None
            dataBytes = picture.ImageBytes

            # Add the image data to the list 
            images.append(dataBytes)
         
        elif isinstance(child, ICompositeObject):
            nodes.put(child if isinstance(child, ICompositeObject) else None)

# Loop through the images in the list
for i, item in enumerate(images):
    fileName = "Image-{}.png".format(i)
    with open("ExtractedImages/"+fileName,'wb') as imageFile:

        # Write the image to a specified path
        imageFile.write(item)
doc.Close()

Python: Extract Text and Images from Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Text

Tagged under

doc Python Text

Tuesday, 05 September 2023 01:40

Python: Create, Read, or Update a Word Document

Creating, reading, and updating Word documents is a common need for many developers working with the Python programming language. Whether it's generating reports, manipulating existing documents, or automating document creation processes, having the ability to work with Word documents programmatically can greatly enhance productivity and efficiency. In this article, you will learn how to create, read, or update Word documents in Python using Spire.Doc for Python.

Create a Word Document from Scratch in Python
Read Text of a Word Document in Python
Update a Word Document in Python

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Create a Word Document from Scratch in Python

Spire.Doc for Python offers the Document class to represent a Word document model. A document must contain at least one section (represented by the Section class) and each section is a container for various elements such as paragraphs, tables, charts, and images. This example shows you how to create a simple Word document containing several paragraphs using Spire.Doc for Python.

Create a Document object.
Add a section using Document.AddSection() method.
Set the page margins through Section.PageSetUp.Margins property.
Add several paragraphs to the section using Section.AddParagraph() method.
Add text to the paragraphs using Paragraph.AppendText() method.
Create a ParagraphStyle object, and apply it to a specific paragraph using Paragraph.ApplyStyle() method.
Save the document to a Word file using Document.SaveToFile() method.

Python

from spire.doc import *	
from spire.doc.common import *

# Create a Document object
doc = Document()

# Add a section
section = doc.AddSection()

# Set the page margins
section.PageSetup.Margins.All = 40

# Add a title
titleParagraph = section.AddParagraph()
titleParagraph.AppendText("Introduction of Spire.Doc for Python")

# Add two paragraphs
bodyParagraph_1 = section.AddParagraph()
bodyParagraph_1.AppendText("Spire.Doc for Python is a professional Python library designed for developers to " +
                           "create, read, write, convert, compare and print Word documents in any Python application " +
                           "with fast and high-quality performance.")

bodyParagraph_2 = section.AddParagraph()
bodyParagraph_2.AppendText("As an independent Word Python API, Spire.Doc for Python doesn't need Microsoft Word to " +
                           "be installed on neither the development nor target systems. However, it can incorporate Microsoft Word " +
                           "document creation capabilities into any developers' Python applications.")

# Apply heading1 to the title
titleParagraph.ApplyStyle(BuiltinStyle.Heading1)

# Create a style for the paragraphs
style2 = ParagraphStyle(doc)
style2.Name = "paraStyle"
style2.CharacterFormat.FontName = "Arial"
style2.CharacterFormat.FontSize = 13
doc.Styles.Add(style2)
bodyParagraph_1.ApplyStyle("paraStyle")
bodyParagraph_2.ApplyStyle("paraStyle")

# Set the horizontal alignment of the paragraphs
titleParagraph.Format.HorizontalAlignment = HorizontalAlignment.Center
bodyParagraph_1.Format.HorizontalAlignment = HorizontalAlignment.Left
bodyParagraph_2.Format.HorizontalAlignment = HorizontalAlignment.Left

# Set the after spacing
titleParagraph.Format.AfterSpacing = 10
bodyParagraph_1.Format.AfterSpacing = 10

# Save to file
doc.SaveToFile("output/WordDocument.docx", FileFormat.Docx2019)

Python: Create, Read, or Update a Word Document

Read Text of a Word Document in Python

To get the text of an entire Word document, you could simply use Document.GetText() method. The following are the detailed steps.

Create a Document object.
Load a Word document using Document.LoadFromFile() method.
Get text from the entire document using Document.GetText() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\WordDocument.docx")

# Get text from the entire document
text = doc.GetText()

# Print text
print(text)

Python: Create, Read, or Update a Word Document

Update a Word Document in Python

To access a specific paragraph, you can use the Section.Paragraphs[index] property. If you want to modify the text of the paragraph, you can reassign text to the paragraph through the Paragraph.Text property. The following are the detailed steps.

Create a Document object.
Load a Word document using Document.LoadFromFile() method.
Get a specific section through Document.Sections[index] property.
Get a specific paragraph through Section.Paragraphs[index] property.
Change the text of the paragraph through Paragraph.Text property.
Save the document to another Word file using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\WordDocument.docx")

# Get a specific section
section = doc.Sections.get_Item(0)

# Get a specific paragraph
paragraph = section.Paragraphs.get_Item(1)

# Change the text of the paragraph
paragraph.Text = "The title has been changed"

# Save to file
doc.SaveToFile("output/Updated.docx", FileFormat.Docx2019)

Python: Create, Read, or Update a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Document Operation

Tagged under

doc Python Document Operation

Tuesday, 11 July 2023 00:39

C#/VB.NET: Change or Delete Hyperlinks in PDF

Hyperlinks in PDF documents allow users to jump to pages or open documents, making PDF files more interactive and easier to use. However, if the target site of the link has been changed or the link points to the wrong page, it may cause trouble or misunderstanding to the document users. Therefore, it is very important to change or remove wrong or invalid hyperlinks in PDF documents to ensure the accuracy and usability of the hyperlinks, so as to provide a better reading experience for users. This article will introduce how to change or remove hyperlinks in PDF documents through .NET programs using Spire.PDF for .NET.

Change the URL of a Hyperlink in PDF in C#/VB.NET
Remove Hyperlinks from PDF in C#/VB.NET

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

Package Manager

PM> Install-Package Spire.PDF

Change the URL of a Hyperlink in PDF

To change the URL of a hyperlink on a PDF page, it is necessary to get the hyperlink annotation widget and use the PdfUriAnnotationWidget.Uri property to reset the URL. The detailed steps are as follows:

Create an object of PdfDocument class.
Load a PDF file using PdfDocument.LoadFromFIle() method.
Get the first page of the document using PdfDocument.Pages[] property.
Get the first hyperlink widget on the page using PdfPageBase.AnnotationsWidget[] property.
Reset the URL of the hyperlink using PdfUriAnnotationWidget.Uri property.
Save the document using PdfDocument.SaveToFile() method.

C#
VB.NET

using Spire.Pdf;
using Spire.Pdf.Annotations;
using System;

namespace ChangeHyperlink
{
    internal class Program
    {
        static void Main(string[] args)
        {
            //Cretae an object of PdfDocument
            PdfDocument pdf = new PdfDocument();

            //Load a PDF file
            pdf.LoadFromFile("Sample.pdf");

            //Get the first page
            PdfPageBase page = pdf.Pages[0];

            //Get the first hyperlink
            PdfUriAnnotationWidget url = (PdfUriAnnotationWidget)page.Annotations[0];

            //Reset the url of the hyperlink
            url.Uri = "https://en.wikipedia.org/wiki/Climate_change";

            //Save the PDF file
            pdf.SaveToFile("ChangeHyperlink.pdf");
            pdf.Dispose();
        }
    }
}

C#/VB.NET: Change or Delete Hyperlinks in PDF

Remove Hyperlinks from PDF

Spire.PDF for .NET provides the PdfPageBase.AnnotationsWidget.RemoveAt() method to remove a hyperlink on a PDF page by its index. Eliminating all hyperlinks from a PDF document requires iterating through the pages, obtaining the annotation widgets of each page, verifying whether an annotation is an instance of the PdfUriAnnotationWidget class, and deleting the annotation if it is. The following are the detailed steps:

Create an object of PdfDocument class.
Load a PDF document using PdfDocument.LoadFromFIle() method.
To remove a specific hyperlink, get the page containing the hyperlink and remove the hyperlink by its index using PdfPageBase.AnnotationsWidget.RemoveAt() method.
To remove all hyperlinks, loop through the pages in the document to get the annotation collection of each page using PdfPageBase.AnnotationsWidget property.
Check if an annotation widget is an instance of PdfUriAnnotationWidget class and remove the annotation widget using PdfAnnotationCollection.Remove(PdfUriAnnotationWidget) method if it is.
Save the document using PdfDocument.SaveToFIle() method.

C#
VB.NET

using Spire.Pdf;
using Spire.Pdf.Annotations;
using System;
using System.Dynamic;

namespace DeleteHyperlink
{
    internal class Program
    {
        static void Main(string[] args)
        {
            //Cretae an object of PdfDocument
            PdfDocument pdf = new PdfDocument();

            //Load a PDF file
            pdf.LoadFromFile("Sample.pdf");

            //Remove the second hyperlink in the fisrt page
            //PdfPageBase page = pdf.Pages[0];
            //page.AnnotationsWidget.RemoveAt(1);

            //Remove all hyperlinks in the document
            //Loop through pages in the document
            foreach (PdfPageBase page in pdf.Pages)
            {
                //Get the annotation collection of a page
                PdfAnnotationCollection collection = page.Annotations;
                for (int i = collection.Count - 1; i >= 0; i--)
                {
                    PdfAnnotation annotation = collection[i];
                    //Check if an annotation is an instance of PdfUriAnnotationWidget
                    if (annotation is PdfUriAnnotationWidget)
                    {
                        PdfUriAnnotationWidget url = (PdfUriAnnotationWidget)annotation;
                        //Remove the hyperlink
                        collection.Remove(url);
                    }
                }
            }

            //Save the document
            pdf.SaveToFile("DeleteHyperlink.pdf");
            pdf.Dispose();
        }
    }
}

C#/VB.NET: Change or Delete Hyperlinks in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Link

Tagged under

pdf net Link

Friday, 23 May 2025 01:36

How to Print Word Documents in C#: The Ultimate Guide

print word documents in c#

Printing Word documents programmatically in C# can streamline business workflows, automate reporting, and enhance document management systems. This comprehensive guide explores how to print Word documents in C# using Spire.Doc for .NET, covering everything from basic printing to advanced customization techniques. We'll walk through practical code examples for each scenario, ensuring you can implement these solutions in real-world applications.

.NET Library for Printing Word Documents
Print Word Documents in C#
Customize Printing Options
Silently Print Word Documents
Print Multiple Pages on One Sheet
Conclusion
FAQs

.NET Library for Printing Word Documents

Spire.Doc for .NET is a robust, standalone library that supports comprehensive Word document processing without requiring Microsoft Office to be installed. It provides intuitive APIs for loading, editing, and printing Word files (DOC/DOCX) while maintaining perfect formatting fidelity.

To get started, install the library via NuGet Package Manager:

Install-Package Spire.Doc

Alternatively, you can download Spire.Doc for .NET from our official website and reference the DLL file manually.

Print Word Documents in C#

The foundation of Word document printing in C# involves three key steps demonstrated in the following code. First, we create a Document object to represent our Word file, then load the actual document, and finally access the printing functionality through the PrintDocument class.

C#

using Spire.Doc;
using System.Drawing.Printing;

namespace PrintWordDocument
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Initialize a new Document instance
            Document doc = new Document();

            // Load the Word file from specified path
            doc.LoadFromFile("Input.docx");

            // Access the PrintDocument object for printing operations
            PrintDocument printDoc = doc.PrintDocument;

            // Send document to default printer
            printDoc.Print();
        }
    }
}

This basic implementation handles the entire printing process, from document loading to physical printing, with just a few lines of code. The PrintDocument object abstracts all the underlying printing operations, making the process straightforward for developers.

Customize Printing Options

Beyond basic printing, Spire.Doc offers extensive customization via the PrinterSettings class, providing developers with granular control over the printing process. These settings allow you to tailor the output to specific needs, such as selecting particular pages or configuring advanced printer features.

To obtain the PrinterSettings object associated with the current document, use the following line of code:

C#

PrinterSettings settings = printDoc.PrinterSettings;

Now, let’s explore the specific settings.

1. Specify the Printer Name

C#

settings.PrinterName = "Your Printer Name";

This code snippet demonstrates how to target a specific printer in environments with multiple installed printers. The PrinterName property accepts the exact name of the printer as it appears in the system's printer list.

2. Specify Pages to Print

C#

settings.FromPage = 1;
settings.ToPage = 5;

These settings are particularly useful when dealing with large documents, allowing you to print only the relevant sections and conserve resources.

3. Specify Number of Copies to Print

C#

settings.Copies = 2;

The Copies property controls how many duplicates of the document will be printed, with the printer handling the duplication process efficiently.

4. Enable Duplex Printing

C#

if (settings.CanDuplex)
{
    settings.Duplex = Duplex.Default;
}

This example first checks for duplex printing support before enabling two-sided printing, ensuring compatibility across different printer hardware.

5. Print on a Custom Paper Size

C#

settings.DefaultPageSettings.PaperSize = new PaperSize("custom", 800, 500);

Here we create a custom paper size (800x500 units) for specialized printing requirements, demonstrating Spire.Doc's flexibility in handling non-standard document formats.

6. Print Word to File

C#

settings.PrintToFile = true;
settings.PrinterName = "Microsoft Print to PDF";
settings.PrintFileName = @"C:\Output.pdf";

This configuration uses the system's PDF virtual printer to create a PDF file instead of physical printing, showcasing how Spire.Doc can be used for document conversion as well.

Silently Print Word Documents

In automated environments, you may need to print documents without any user interaction or visible dialogs. The following implementation achieves silent printing by using the StandardPrintController.

C#

using Spire.Doc;
using System.Drawing.Printing;

namespace SilentlyPrintWord
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize a new Document instance
            Document doc = new Document();

            // Load the Word file from specified path
            doc.LoadFromFile("Input.docx");

            // Access the PrintDocument object for printing operations
            PrintDocument printDoc = doc.PrintDocument;

            // Disable the print dialog
            printDoc.PrintController = new StandardPrintController();

            // Exexute printing
            printDoc.Print();
        }
    }
}

The key to silent printing lies in assigning the StandardPrintController to the PrintController property, which suppresses all printing-related dialogs and progress indicators. This approach is ideal for server-side applications or batch processing scenarios where user interaction is not possible or desired.

Print Multiple Pages on One Sheet

For economizing paper usage or creating compact document versions, Spire.Doc supports printing multiple document pages on a single physical sheet. The PrintMultipageToOneSheet method simplifies this process with predefined layout options.

C#

using Spire.Doc;
using Spire.Doc.Printing;
using System.Drawing.Printing;

namespace PrintMultiplePagesOnOneSheet
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Initialize a new Document instance
            Document doc = new Document();

            // Load the Word file from specified path
            doc.LoadFromFile("Input.docx");

            // Configure 2-page-per-sheet printing and execute printing
            doc.PrintMultipageToOneSheet(PagesPerSheet.TwoPages, false);
        }
    }
}

The PagesPreSheet enumeration offers several layout options (OnePage, TwoPages, FourPages, etc.), while the boolean parameter determines whether to include a page border on the printed sheet. This feature is particularly valuable for creating booklet layouts or draft versions of documents.

P.S. This scenario works only with .NET Framework versions earlier than 5.0.

Conclusion

This guide has demonstrated how Spire.Doc for .NET provides a comprehensive solution for Word document printing in C#. It simplifies the process with features such as:

Basic & silent printing.
Customizable print settings (printer selection, duplex, copies).
Multi-page per sheet printing to reduce paper usage.

By integrating these techniques, developers can efficiently automate document printing in enterprise applications, enhancing productivity and reducing manual effort. Overall, Spire.Doc empowers developers to create robust printing solutions that meet diverse business requirements.

FAQs

Q1. Can I print encrypted or password-protected Word files?

A: Yes, Spire.Doc supports printing password-protected documents after loading them with the correct password:

C#

doc.LoadFromFile("Protected.docx", FileFormat.Docx, "password");

After successful loading, you can print it like any other document, with all the same customization options available.

Q2. How can I print only selected text from a Word document?

A: You can extract specific content by accessing document sections and paragraphs:

C#

Section section = doc.Sections[0];
Paragraph paragraph = section.Paragraphs[0];
// Create new document with selected content
Document newDoc = new Document();
newDoc.Sections.Add(section.Clone());
newDoc.Print();

This approach gives you precise control over which document portions get printed.

Q3. Can I print documents in landscape mode or adjust margins programmatically?

A: Yes! Modify the DefaultPageSettings properties:

C#

printDoc.DefaultPageSettings.Landscape = true;
printDoc.DefaultPageSettings.Margins = new Margins(50, 50, 50, 50);

Q4. Can I print other file formats (e.g., PDF, Excel) using Spire.Doc?

A: Spire.Doc is designed for Word files (DOC/DOCX). For PDFs, use Spire.PDF; for Excel, use Spire.XLS.

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Print

Tagged under

doc net Print

Thursday, 29 June 2023 01:32

Java: Insert Repeating Watermarks into Word Documents

Repeating watermarks, also called multi-line watermarks, are a type of watermark that appears multiple times on a page of a Word document at regular intervals. Compared with single watermarks, repeating watermarks are more difficult to remove or obscure, thus offering a better deterrent to unauthorized copying and distribution. This article is going to show how to insert repeating text and image watermarks into Word documents programmatically using Spire.Doc for Java.

Add Repeating Text Watermarks to Word Documents in Java
Add Repeating Picture Watermarks to Word Documents in Java

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.1.0</version>
    </dependency>
</dependencies>

Add Repeating Text Watermarks to Word Documents in Java

We can insert repeating text watermarks to Word documents by adding repeating WordArt to the headers of a document at specified intervals. The detailed steps are as follows:

Create an object of Document class.
Load a Word document using Document.loadFromFile() method.
Create an object of ShapeObject class and set the WordArt text using ShapeObject.getWordArt().setText() method.
Specify the rotation angle and the number of vertical repetitions and horizontal repetitions.
Set the format of the shape using methods under ShapeObject class.
Loop through the sections in the document to insert repeating watermarks to each section by adding the WordArt shape to the header of each section multiple times at specified intervals using Paragraph.getChildObjects().add(ShapeObject) method.
Save the document using Document.saveToFile() method.

Java

import com.spire.doc.Document;
import com.spire.doc.HeaderFooter;
import com.spire.doc.Section;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.documents.ShapeLineStyle;
import com.spire.doc.documents.ShapeType;
import com.spire.doc.fields.ShapeObject;

import java.awt.*;

public class insertRepeatingTextWatermark {
    public static void main(String[] args) {

        //Create an object of Document class
        Document doc = new Document();

        //Load a Word document
        doc.loadFromFile("Sample.docx");

        //Create an object of ShapeObject class and set the WordArt text
        ShapeObject shape = new ShapeObject(doc, ShapeType.Text_Plain_Text);
        shape.getWordArt().setText("DRAFT");

        //Specify the watermark rotating angle and the number of vertical repetitions and horizontal repetitions
        double rotation = 315;
        int ver = 5;
        int hor = 3;

        //Set the format of the WordArt shape
        shape.setWidth(60);
        shape.setHeight(20);
        shape.setVerticalPosition(30);
        shape.setHorizontalPosition(20);
        shape.setRotation(rotation);
        shape.setFillColor(Color.BLUE);
        shape.setLineStyle(ShapeLineStyle.Single);
        shape.setStrokeColor(Color.CYAN);
        shape.setStrokeWeight(1);

        //Loop through the sections in the document
        for (Section section : (Iterable<Section>) doc.getSections()) {
            //Get the header of a section
            HeaderFooter header = section.getHeadersFooters().getHeader();
            //Add paragraphs to the header
            Paragraph paragraph = header.addParagraph();
            for (int i = 0; i < ver; i++) {
                for (int j = 0; j < hor; j++) {
                    //Add the WordArt shape to the header
                    shape = (ShapeObject) shape.deepClone();
                    shape.setVerticalPosition((float) (section.getPageSetup().getPageSize().getHeight()/ver * i + Math.sin(rotation) * shape.getWidth()/2));
                    shape.setHorizontalPosition((float) ((section.getPageSetup().getPageSize().getWidth()/hor - shape.getWidth()/2) * j));
                    paragraph.getChildObjects().add(shape);
                }
            }
        }

        //Save the document
        doc.saveToFile("RepeatingTextWatermark.docx");
        doc.dispose();
    }
}

Java: Insert Repeating Watermarks into Word Documents

Add Repeating Picture Watermarks to Word Documents in Java

Similarly, we can insert repeating image watermarks into Word documents by adding repeating pictures to headers at regular intervals. The detailed steps are as follows:

Create an object of Document class.
Load a Word document using Document.loadFromFile() method.
Load a picture using DocPicture.loadImage() method.
Set the text wrapping style of the picture as Behind using DocPicture.setTextWrappingStyle(TextWrappingStyle.Behind) method.
Specify the number of vertical repetitions and horizontal repetitions.
Loop through the sections in the document to insert repeating picture watermarks to the document by adding a picture to the header of each section at specified intervals using Paragraph.getChildObjects().add(DocPicture) method.
Save the document using Document.saveToFile() method.

Java

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.HeaderFooter;
import com.spire.doc.Section;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.documents.TextWrappingStyle;
import com.spire.doc.fields.DocPicture;

public class insertRepeatingPictureWatermark {
    public static void main(String[] args) {

        //Create an object of Document class
        Document doc = new Document();

        //Load a Word document
        doc.loadFromFile("Sample.docx");

        //Load a picture
        DocPicture pic = new DocPicture(doc);
        pic.loadImage("watermark.png");

        //Set the text wrapping style of the picture as Behind
        pic.setTextWrappingStyle(TextWrappingStyle.Behind);

        //Specify the number of vertical repetitions and horizontal repetitions
        int ver = 4;
        int hor = 3;

        //Loop through the sections in the document
        for (Section section : (Iterable<Section>) doc.getSections()) {
            //Get the header of a section
            HeaderFooter header = section.getHeadersFooters().getHeader();
            //Add a paragraph to the section
            Paragraph paragraph = header.addParagraph();
            for (int i = 0; i < ver; i++) {
                for (int j = 0; j < hor; j++) {
                    //Add the picture to the header
                    pic = (DocPicture) pic.deepClone();
                    pic.setVerticalPosition((float) ((section.getPageSetup().getPageSize().getHeight()/ver) * i));
                    pic.setHorizontalPosition((float) (section.getPageSetup().getPageSize().getWidth()/hor - pic.getWidth()/2) * j);
                    paragraph.getChildObjects().add(pic);
                }
            }
        }

        //Save the document
        doc.saveToFile("RepeatingPictureWatermark.docx", FileFormat.Auto);
        doc.dispose();
    }
}

Java: Insert Repeating Watermarks into Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Watermark

Tagged under

doc java Watermark

Friday, 02 June 2023 01:18

C++: Split a PDF File into Multiple PDFs

When dealing with large PDF files, splitting them into multiple separate files is a useful operation to streamline your work. By doing this, you can get the specific parts you need, or get smaller PDF files that are easy to upload to a website, send via email, etc. In this article, you will learn how to split a PDF into multiple files in C++ using Spire.PDF for C++.

Split a PDF File into Multiple Single-Page PDFs
Split a PDF File by Page Ranges

Install Spire.PDF for C++

There are two ways to integrate Spire.PDF for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.

Integrate Spire.PDF for C++ in a C++ Application

Split a PDF File into Multiple Single-Page PDFs in C++

Spire.PDF for C++ offers the PdfDocument->Split() method to divide a multipage PDF document into multiple single-page files. The following are the detailed steps.

Create a PdfDcoument instance.
Load a sample PDF document using PdfDocument->LoadFromFile() method.
Split the document into one-page PDFs using PdfDocument->Split() method.

C++

#include "Spire.Pdf.o.h";

using namespace Spire::Pdf;


int main()
{
	//Specify the input and output files
	std::wstring inputFile = L"Data\\template.pdf";
	std::wstring outputFile = L"SplitDocument/";
	std::wstring pattern = outputFile + L"SplitDocument-{0}.pdf";

	//Create a PdfDocument instance
	intrusive_ptr<PdfDocument> pdf = new PdfDocument();

	//Load a sample PDF file
	pdf->LoadFromFile(inputFile.c_str());

	//Split the PDF to one-page PDFs
	pdf->Split(pattern.c_str());
	pdf->Close();

}

C++: Split a PDF File into Multiple PDFs

Split a PDF File by Page Ranges in C++

There's no straightforward way to split PDF documents by page ranges. To do so, you can create two or more new PDF documents and then use the PdfPageBase->CreateTemplate()->Draw() method to draw the contents of the specified pages in the input PDF file onto the pages of the new PDFs. The following are the detailed steps.

Create a PdfDocument instance and load a sample PDF file.
Create a new PDF document, and then Initialize a new instance of PdfPageBase class.
Iterate through the first several pages in the sample PDF file.
Create a new page with specified size and margins in the new PDF document
Get the specified page in the sample PDF using PdfDocument->GetPages()->GetItem() method, and then draw the contents of the specified page onto the new page using PdfPageBase->CreateTemplate()->Draw() method.
Save the first new PDF document using PdfDocument->SaveToFile() method.
Create another new PDF document and then draw the remaining pages of the sample PDF file into it.
Save the second new PDF document.

C++

#include "Spire.Pdf.o.h";

using namespace Spire::Pdf;

int main()
{
	//Create a PdfDocument instance and load a sample PDF file
	intrusive_ptr<PdfDocument> oldPdf = new PdfDocument();
	oldPdf->LoadFromFile(L"Data\\template.pdf");

	//Create a new PDF document
	intrusive_ptr<PdfDocument> newPdf1 = new PdfDocument();

	//Initialize a new instance of PdfPageBase class
	intrusive_ptr<PdfPageBase> page;

	//Draw the first three pages of the sample file into the new PDF document
	for (int i = 0; i < 3; i++)
	{
		//Create a new page with specified size and margin in the new PDF document
		intrusive_ptr<PdfMargins> tempVar = new PdfMargins(0);
		page = newPdf1->GetPages()->Add(oldPdf->GetPages()->GetItem(i)->GetSize(), tempVar);

		//Draw the contents of a specified page in the sample file onto the new page
		oldPdf->GetPages()->GetItem(i)->CreateTemplate()->Draw(page, new PointF(0, 0));
	}

	//Save the first PDF document
	newPdf1->SaveToFile(L"SplitByRange1.pdf");
	newPdf1->Close();

	//Create another new PDF document
	intrusive_ptr<PdfDocument> newPdf2 = new PdfDocument();

	//Draw the rest pages of the sample file into the new PDF document
	for (int i = 3; i < oldPdf->GetPages()->GetCount(); i++)
	{
		//Create a new page with specified size and margin in the new PDF document
		intrusive_ptr<PdfMargins> tempVar = new PdfMargins(0);
		page = newPdf2->GetPages()->Add(oldPdf->GetPages()->GetItem(i)->GetSize(), tempVar);

		// Draw the contents of a specified page in the sample file onto the new page
		oldPdf->GetPages()->GetItem(i)->CreateTemplate()->Draw(page, new PointF(0, 0));
	}

	//Save the second PDF document
	newPdf2->SaveToFile(L"SplitByRange2.pdf");
	newPdf2->Close();
}

C++: Split a PDF File into Multiple PDFs

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Document Operation

Tagged under

pdf cpp Document Operation

Wednesday, 31 May 2023 00:53

C++: Create a PDF Document from Scratch

PDF documents generated through code are consistent in terms of formatting, layout, and content, ensuring a professional look. Automating the creation of PDF documents reduces the time and effort required to produce them manually. Nowadays, most invoices, receipts and other financial documents are generated programmatically. In this article, you will learn how to create PDF documents from scratch in C++ using Spire.PDF for C++.

Install Spire.PDF for C++

There are two ways to integrate Spire.PDF for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.

Integrate Spire.PDF for C++ in a C++ Application

Background Knowledge

A page in Spire.PDF for C++ (represented by PdfPageBase class) consists of client area and margins all around. The content area is for users to write various contents, and the margins are usually blank edges.

As shown in the figure below, the origin of the coordinate system on the page is located at the top left corner of the client area, with the x-axis extending horizontally to the right and the y-axis extending vertically down. All elements added to the client area must be based on the specified coordinates.

C++: Create a PDF Document from Scratch

In addition, the following table lists the important classes and methods, which can help you easily understand the code snippet provided in the following section.

Member	Description
PdfDocument class	Represents a PDF document model.
PdfPageBase class	Represents a page in a PDF document.
PdfSolidBrush class	Represents a brush that fills any object with a solid color.
PdfTrueTypeFont class	Represents a true type font.
PdfStringFormat class	Represents text format information, such as alignment, characters spacing and indent.
PdfTextWidget class	Represents the text area with the ability to span several pages.
PdfTextLayout class	Represents the text layout information.
PdfDocument->GetPages()->Add() method	Adds a page to a PDF document.
PdfPageBase->GetCanvas()->DrawString() method	Draws string on a page at the specified location with specified font and brush objects.
PdfLayoutWidget->Draw() method	Draws widget on a page at the specified location.
PdfDocument->Save() method	Saves the document to a PDF file.

Create a PDF Document from Scratch in C++

Despite the fact that Spire.PDF for C++ enables users to add various elements to PDF documents, this article demonstrates how to create a simple PDF document with only plain text. The following are the detailed steps.

Create a PdfDocument object.
Add a page using PdfDocument->GetPages()->Add() method.
Create brush and font objects.
Draw string on the page at a specified coordinate using PdfPageBase->GetCanvas()->DrawString() method.
Create a PdfTextWidget object to hold a chunk of text.
Convert the text widget to an object of PdfLayoutWidget class and draw it on the page using PdfLayoutWidget->Draw() method
Save the document to a PDF file using PdfDocument->Save() method.

C++

#include "Spire.Pdf.o.h";

using namespace Spire::Pdf;
using namespace std;

wstring readFileIntoWstring(const string& path) {

    ifstream input_file(path);
    if (!input_file.is_open()) {
        cerr << "Could not open the file - '"
            << path << "'" << endl;
        exit(EXIT_FAILURE);
    }
    string s1 = string((std::istreambuf_iterator<char>(input_file)), std::istreambuf_iterator<char>());
    wstring ws(s1.begin(), s1.end());
    return ws;
}

int main() {

    //Create a PdfDocument object
    intrusive_ptr<PdfDocument> doc = new PdfDocument();

    //Add a page
    intrusive_ptr<PdfPageBase> page = doc->GetPages()->Add(PdfPageSize::A4(), new PdfMargins(35));

    //Specify title text
    wstring titleText = L"What is MySQL";

    //Create solid brushes
    intrusive_ptr<PdfSolidBrush> titleBrush = new PdfSolidBrush(new PdfRGBColor(Color::GetPurple()));
    intrusive_ptr<PdfSolidBrush> paraBrush = new PdfSolidBrush(new PdfRGBColor(Color::GetBlack()));

    //Create true type fonts
    intrusive_ptr<PdfTrueTypeFont> titleFont = new PdfTrueTypeFont(L"Times New Roman", 18, PdfFontStyle::Bold, true);
    intrusive_ptr<PdfTrueTypeFont> paraFont = new PdfTrueTypeFont(L"Times New Roman", 12, PdfFontStyle::Regular, true);

    //Set the text alignment via PdfStringFormat class
    intrusive_ptr<PdfStringFormat> format = new PdfStringFormat();
    format->SetAlignment(PdfTextAlignment::Center);

    //Draw title on the page
    page->GetCanvas()->DrawString(titleText.c_str(), titleFont, titleBrush, page->GetClientSize()->GetWidth() / 2, 20, format);  

    //Get paragraph text from a .txt file
    wstring paraText = readFileIntoWstring("C:\\Users\\Administrator\\Desktop\\content.txt");

    //Create a PdfTextWidget object to hold the paragraph content
    intrusive_ptr<PdfTextWidget> widget = new PdfTextWidget(paraText.c_str(), paraFont, paraBrush);

    //Create a rectangle where the paragraph content will be placed
    intrusive_ptr<RectangleF> rect = new RectangleF(0, 50, (float)page->GetClientSize()->GetWidth(), (float)page->GetClientSize()->GetHeight());

    //Set the PdfLayoutType to Paginate to make the content paginated automatically
    intrusive_ptr<PdfTextLayout> layout = new PdfTextLayout();
    layout->SetLayout(PdfLayoutType::Paginate);

    //Draw paragraph text on the page
    Object::Convert<PdfLayoutWidget>(widget)->Draw(page, rect, layout);

    //Save to file
    doc->SaveToFile(L"output/CreatePdfDocument.pdf");
    doc->Dispose();
}

C++: Create a PDF Document from Scratch

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Document Operation

Tagged under

pdf cpp Document Operation

Tuesday, 16 May 2023 01:22

C++: Insert, Replace or Delete Images in PDF

Inserting images into a document is a great way to enhance its visual appearance and make it more understandable for readers. For example, if you are creating a product guide for a piece of furniture that has complex assembly steps, including images of each step can help users understand how to assemble the product quickly. In this article, you will learn how to insert images into PDF documents along with how to replace and delete images in PDF documents in C++ using Spire.PDF for C++.

Insert an Image into a PDF Document
Replace an Image with Another Image in a PDF Document
Delete an Image from a PDF Document

Install Spire.PDF for C++

There are two ways to integrate Spire.PDF for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.

Integrate Spire.PDF for C++ in a C++ Application

Insert an Image into a PDF Document in C++

Spire.PDF for C++ offers the PdfPageBase->GetCanvas()->DrawImage(intrusive_ptr<PdfImage> image, float x, float y, float width, float height) method to add an image to a specific page in a PDF document. The detailed steps are as follows:

Initialize an instance of the PdfDocument class.
Load a PDF document using the PdfDocument->LoadFromFile(LPCWSTR_S filename) method.
Get a specific page of the PDF document using the PdfDocument->GetPages()->GetItem(int index) method.
Initialize an instance of the PdfImage class.
Load an image using the PdfImage->FromFile(LPCWSTR_S filename) method.
Draw the image to a specific location on the page using the PdfPageBase->GetCanvas()->DrawImage(intrusive_ptr<PdfImage> image, float x, float y, float width, float height) method.
Save the result document using the PdfDocument->SaveToFile(LPCWSTR_S filename) method.

C++

#include "Spire.Pdf.o.h"

using namespace Spire::Pdf;

int main()
{
    //Initialize an instance of the PdfDocument class
    intrusive_ptr<PdfDocument> pdf = new PdfDocument();
    //Load a PDF file
    pdf->LoadFromFile(L"Input.pdf");

    //Get the first page of the PDF file
    intrusive_ptr<PdfPageBase> page = pdf->GetPages()->GetItem(0);

    //Initialize an instance of the PdfImage class
    intrusive_ptr<PdfImage> image = new PdfImage();
    //Load an image
    image = image->FromFile(L"PDF-CPP.png");
    float width = image->GetWidth() * 0.5;
    float height = image->GetHeight() * 0.5;
    float x = (page->GetCanvas()->GetClientSize()->GetWidth() - width) / 2;

    //Draw the image to a specific location on the page
    page->GetCanvas()->DrawImage(image, x, 120, width, height);

    //Save the result file
    pdf->SaveToFile(L"AddImage.pdf");
    pdf->Close();
}

C++: Insert, Replace or Delete Images in PDF

Replace an Image with Another Image in a PDF Document in C++

You can use the PdfImageHelper->ReplaceImage(intrusive_ptr<Utilities_PdfImageInfo> imageInfo,intrusive_ptr<PdfImage>image) method to replace an existing image on a PDF page with another image. The detailed steps are as follows:

Initialize an instance of the PdfDocument class.
Load a PDF document using the PdfDocument->LoadFromFile(LPCWSTR_S filename) method.
Get a specific page of the PDF document using the PdfDocument->GetPages()->GetItem(int index) method.
Create an instance of the PdfImageHelper class.
Get images of the page using the PdfImageHelper->GetImagesInfo(intrusive_ptr<PdfPageBase> page) method.
Initialize an instance of the PdfImage class.
Load an image using the PdfImage->FromFile(LPCWSTR_S filename) method.
Replace a specific image on the page with the loaded image using the PdfImageHelper->ReplaceImage(intrusive_ptr<Utilities_PdfImageInfo> imageInfo,intrusive_ptr<PdfImage>image) method.
Save the result document using the PdfDocument->SaveToFile(LPCWSTR_S filename) method.

C++

#include "Spire.Pdf.o.h"

using namespace Spire::Pdf;

int main()
{
	//Initialize an instance of the PdfDocument class
	intrusive_ptr⁢PdfDocument> pdf = new PdfDocument();

	//Load a PDF file
	pdf->LoadFromFile(L"AddImage.pdf");

	//Get the first page of the PDF file
	intrusive_ptr⁢PdfPageBase> page = pdf->GetPages()->GetItem(0);

	//Create a PdfImageHelper object
	intrusive_ptr⁢PdfImageHelper>  imagehelper = new PdfImageHelper();

	//Get image information from the page
	std::vector<intrusive_ptr<Utilities_PdfImageInfo> >  imageInfo = imagehelper->GetImagesInfo(page);

	//Initialize an instance of the PdfImage class
	intrusive_ptr⁢PdfImage>  image = new PdfImage();

	//Load an image
	image = image->FromFile(L"PDF-java.png");

	//Replace the first image on the first page
	imagehelper->ReplaceImage(imageInfo[0], image);

	//Save the result file
	pdf->SaveToFile(L"ReplaceImage.pdf");
	pdf->Close();
}

C++: Insert, Replace or Delete Images in PDF

Delete an Image from a PDF Document in C++

You can use the PdfImageHelper->DeleteImage(intrusive_ptr<Utilities_PdfImageInfo> imageInfo) method to delete a specific image from a PDF page. The detailed steps are as follows:

Initialize an instance of the PdfDocument class.
Load a PDF document using the PdfDocument->LoadFromFile(LPCWSTR_S filename) method.
Get a specific page of the PDF document using the PdfDocument->GetPages()->GetItem(int index) method.
Create an instance of the PdfImageHelper class.
Get images of the page using the PdfImageHelper->GetImagesInfo(intrusive_ptr<PdfPageBase> page) method.
Delete a specific image on the page using the PdfImageHelper->DeleteImage(intrusive_ptr<Utilities_PdfImageInfo> imageInfo) method.
Save the result document using the PdfDocument->SaveToFile(LPCWSTR_S filename) method.

C++

#include "Spire.Pdf.o.h"

using namespace Spire::Pdf;

int main()
{
	//Initialize an instance of the PdfDocument class
	intrusive_ptr⁢PdfDocument> pdf = new PdfDocument();

	//Load a PDF file
	pdf->LoadFromFile(L"AddImage.pdf");

	//Get the first page of the PDF file
	intrusive_ptr⁢PdfPageBase> page = pdf->GetPages()->GetItem(0);

	//Create a PdfImageHelper object
	intrusive_ptr⁢PdfImageHelper> imagehelper = new PdfImageHelper();

	//Get image information from the page
	std::vector<intrusive_ptr<Utilities_PdfImageInfo>> imageInfo = imagehelper->GetImagesInfo(page);

	//Delete the first image on the first page
	imagehelper->DeleteImage(imageInfo[0]);

	//Save the result file
	pdf->SaveToFile(L"DeleteImage.pdf");
	pdf->Close();
}

C++: Insert, Replace or Delete Images in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Image

Tagged under

Table of Contents

Why Convert PDF to Image?

Python PDF-to-Image Converter Library

Installation

Simple PDF to PNG, JPG, and BMP Conversion

Advanced Conversion Options

Enable Transparent Image Background

Crop Specific PDF Areas to Image

Generate Multi-Page TIFF from PDF

Export PDF as SVG

Conclusion

FAQs

Q1: Can I convert a range of pages from a PDF to images?

Q2: Can I batch convert multiple PDF files to images?

Q3: Is it possible to convert password-protected PDFs to images?

Q4: Is it possible to extract embedded images from a PDF instead of rendering pages?

Get a Free License

Install Spire.Doc for Python

Add or Insert a Row into a Word Table in Python

Add or Insert a Column into a Word Table in Python

Delete a Row from a Word Table in Python

Delete a Column from a Word Table in Python

Apply for a Temporary License

Install Spire.Doc for Python

Extract Text from a Specific Paragraph in Python

Extract Text from an Entire Word Document in Python

Extract Images from an Entire Word Document in Python

Apply for a Temporary License

Install Spire.Doc for Python

Create a Word Document from Scratch in Python

Read Text of a Word Document in Python

Update a Word Document in Python

Apply for a Temporary License

Install Spire.PDF for .NET

Change the URL of a Hyperlink in PDF

Remove Hyperlinks from PDF

Apply for a Temporary License

.NET Library for Printing Word Documents

Print Word Documents in C#

Customize Printing Options

1. Specify the Printer Name

2. Specify Pages to Print

3. Specify Number of Copies to Print

4. Enable Duplex Printing

5. Print on a Custom Paper Size

6. Print Word to File

Silently Print Word Documents

Print Multiple Pages on One Sheet

Conclusion

FAQs

Q1. Can I print encrypted or password-protected Word files?

Q2. How can I print only selected text from a Word document?

Q3. Can I print documents in landscape mode or adjust margins programmatically?

Q4. Can I print other file formats (e.g., PDF, Excel) using Spire.Doc?

Get a Free License

Install Spire.Doc for Java

Add Repeating Text Watermarks to Word Documents in Java

Add Repeating Picture Watermarks to Word Documents in Java

Apply for a Temporary License

Install Spire.PDF for C++

Split a PDF File into Multiple Single-Page PDFs in C++

Split a PDF File by Page Ranges in C++

Apply for a Temporary License

Install Spire.PDF for C++

Background Knowledge

Create a PDF Document from Scratch in C++

Apply for a Temporary License

Install Spire.PDF for C++

Insert an Image into a PDF Document in C++

Replace an Image with Another Image in a PDF Document in C++

Delete an Image from a PDF Document in C++

Apply for a Temporary License