Spire.Office Knowledgebase Page 28

Knowledgebase (2311)

Children categories

Spire.OfficeJs (3)

View items...

Python: Convert XML to Excel and XML to PDF

2024-11-22 02:14:35 Written by Koohji

XML is widely used for storing and exchanging data due to its flexibility and self-descriptive nature. However, many users find it challenging to work with XML files directly. Excel, on the other hand, is a familiar and user-friendly tool for data analysis. Converting XML to Excel not only makes data more accessible but also enhances its usability for various applications.

In this article, you will learn how to convert XML to Excel as well as XML to PDF in Python using Spire.XLS for Python.

Convert XML to Excel in Python
Convert XML to PDF in Python

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Understanding XML Structure: Elements, Attributes, and Data

Before converting XML to Excel, it's crucial to understand the structure of XML files. XML is a markup language that uses tags to define elements, attributes, and data. Here’s a breakdown of these components:

Elements: These are the building blocks of XML. They are defined by start and end tags and can contain data or other elements.

Package Manager

<person>
    <name>John Doe</name>
    <age>30</age>
</person>

Attributes: These provide additional information about elements. They are specified within the start tag of an element.

Package Manager

<person id="1">
    <name>John Doe</name>
    <age>30</age>
</person>

Data: This is the content enclosed within the start and end tags of an element.

Understanding these components will help you map XML data to Excel effectively.

Convert XML to Excel in Python

To load an XML file into Python, you can use the xml.etree.ElementTree library, which is included in Python's standard library. This library provides methods to navigate and manipulate the XML tree. Here is an example:

Package Manager

import xml.etree.ElementTree as ET

# Load the XML file
tree = ET.parse('data.xml')
root = tree.getroot()

# Iterate through elements
for person in root.findall('person'):
    name = person.find('name').text
    age = person.find('age').text

After parsing the XML data, the next step is to map it to an Excel worksheet. You can utilize Spire.XLS for Python to create a new workbook, input data into specific cells, and apply styles and formatting to the worksheet. These formatting options include auto-fitting column widths, adjusting text alignment and making the header bold.

To convert XML to Excel in Python, follow these steps:

Use the xml.etree.ElementTree library to retrieve data from an XML file.
Create a Workbook object.
Add a worksheet using the Workbook.Worksheets.Add() method.
Write data extracted from the XML file into the cells of the worksheet using the Worksheet.SetValue() method.
Apply styles and formatting to enhance the worksheet appearance.
Save the workbook to an Excel file.

The following code provides a more intelligent and advanced way to read data from XML and import it into an Excel file.

Python

from spire.xls import *
from spire.xls.common import *
import xml.etree.ElementTree as ET

# Create a Workbook object
workbook = Workbook()

# Remove default worksheets
workbook.Worksheets.Clear()

# Add a worksheet and name it
worksheet = workbook.Worksheets.Add("Books")

# Load an XML file
xml_tree = ET.parse("C:\\Users\\Administrator\\Desktop\\Books.xml")

# Get the root element of the XML tree
xml_root = xml_tree.getroot()

# Get the first the "book" element
first_book = xml_root.find("book")

# Extract header information and convert it into a list
header = list(first_book.iter())[1:]  

# Write header to Excel
for col_index, header_node in enumerate(header, start=1):
    header_text = header_node.tag
    worksheet.SetValue(1, col_index, header_text)

# Write other data to Excel by iterating over each book element and each data node within it
row_index = 2
for book in xml_root.iter("book"):
    for col_index, data_node in enumerate(list(book.iter())[1:], start=1):  
        value = data_node.text
        header_text = list(header[col_index - 1].iter())[0].tag
        worksheet.SetValue(row_index, col_index, value)
    row_index += 1

# Set column width
worksheet.AllocatedRange.AutoFitColumns()

# Set alignment
worksheet.AllocatedRange.HorizontalAlignment = HorizontalAlignType.Left

# Set font style
worksheet.Range["A1:F1"].Style.Font.IsBold = True

# Save the workbook to an Excel file
workbook.SaveToFile("output/XmlToExcel.xlsx")

# Dispose resources
workbook.Dispose()

Convert XML to Excel in Python

Convert XML to PDF in Python

The previous example successfully imports data from an XML file into an Excel worksheet. You can convert this worksheet to a PDF using the Worksheet.SaveToPdf() method. To create a well-structured PDF, consider adjusting page layout settings, such as margins and gridline preservation, during the conversion process.

The steps to convert XML to PDF using Python are as follows:

Use the xml.etree.ElementTree library to retrieve data from an XML file.
Create a Workbook object.
Add a worksheet using the Workbook.Worksheets.Add() method.
Write data extracted from the XML file into the cells of the worksheet using the Worksheet.SetValue() method.
Apply styles and formatting to enhance the worksheet appearance.
Configure page settings using the properties under the PageSetup object, which is returned by the Worksheet.PageSetup property.
Save the worksheet as a PDF file using the Worksheet.SaveToPdf() method.

The following code snippet demonstrates how to import data from XML into a worksheet and save that worksheet as a PDF file.

Python

from spire.xls import *
from spire.xls.common import *
import xml.etree.ElementTree as ET

# Create a Workbook object
workbook = Workbook()

# Remove default worksheets
workbook.Worksheets.Clear()

# Add a worksheet and name it
worksheet = workbook.Worksheets.Add("Books")

# Load an XML file
xml_tree = ET.parse("C:\\Users\\Administrator\\Desktop\\Books.xml")

# Get the root element of the XML tree
xml_root = xml_tree.getroot()

# Get the first the "book" element
first_book = xml_root.find("book")

# Extract header information and convert it into a list
header = list(first_book.iter())[1:]  

# Write header to Excel
for col_index, header_node in enumerate(header, start=1):
    header_text = header_node.tag
    worksheet.SetValue(1, col_index, header_text)

# Write other data to Excel by iterating over each book element and each data node within it
row_index = 2
for book in xml_root.iter("book"):
    for col_index, data_node in enumerate(list(book.iter())[1:], start=1):  
        value = data_node.text
        header_text = list(header[col_index - 1].iter())[0].tag
        worksheet.SetValue(row_index, col_index, value)
    row_index += 1

# Set column width
worksheet.AllocatedRange.AutoFitColumns()

# Set alignment
worksheet.AllocatedRange.HorizontalAlignment = HorizontalAlignType.Left

# Set font style
worksheet.Range["A1:F1"].Style.Font.IsBold = True

# Fit worksheet on one page
workbook.ConverterSetting.SheetFitToPage = True

# Get the PageSetup object
pageSetup = worksheet.PageSetup

# Set page margins
pageSetup.TopMargin = 0.3
pageSetup.BottomMargin = 0.3
pageSetup.LeftMargin = 0.3
pageSetup.RightMargin = 0.3

# Preserve gridlines 
pageSetup.IsPrintGridlines = True

# Save the worksheet to a PDF file
worksheet.SaveToPdf("output/XmlToPdf.pdf")

# Dispose resources
workbook.Dispose()

Convert XML to PDF in Python

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Tagged under

xls Python Conversion

Java: Convert RTF to HTML, Image

2024-11-22 01:48:04 Written by Koohji

Converting RTF to HTML helps improve accessibility as HTML documents can be easily displayed in web browsers, making them accessible to a global audience. While converting RTF to images can help preserve document layout as images can accurately represent the original document, including fonts, colors, and graphics. In this article, you will learn how to convert RTF to HTML or images in Java using Spire.Doc for Java.

Convert RTF to HTML in Java
Convert RTF to Image in Java

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.1.3</version>
    </dependency>
</dependencies>

Convert RTF to HTML in Java

Converting RTF to HTML ensures that the document can be easily viewed and edited in any modern web browser without requiring any additional software.

With Spire.Doc for Java, you can achieve RTF to HTML conversion through the Document.saveToFile(String fileName, FileFormat.Html) method. The following are the detailed steps.

Create a Document instance.
Load an RTF document using Document.loadFromFile() method.
Save the RTF document in HTML format using Document.saveToFile(String fileName, FileFormat.Html) method.

Java

import com.spire.doc.*;

public class RTFToHTML {
    public static void main(String[] args) {
        // Create a Document instance
        Document document = new Document();

        // Load an RTF document
        document.loadFromFile("input.rtf", FileFormat.Rtf);

        // Save as HTML format
        document.saveToFile("RtfToHtml.html", FileFormat.Html);
        document.dispose();
    }
}

Convert a Word document to an HTML file

Convert RTF to Image in Java

To convert RTF to images, you can use the Document.saveToImages() method to convert an RTF file into individual Bitmap or Metafile images. Then, the Bitmap or Metafile images can be saved as a BMP, EMF, JPEG, PNG, GIF, or WMF format files. The following are the detailed steps.

Create a Document object.
Load an RTF document using Document.loadFromFile() method.
Convert the document to images using Document.saveToImages() method.
Iterate through the converted image, and then save each as a PNG file.

Java

import com.spire.doc.*;
import com.spire.doc.documents.*;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;

public class RTFtoImage {
    public static void main(String[] args) throws Exception{
        // Create a Document instance
        Document document = new Document();

        // Load an RTF document
        document.loadFromFile("input.rtf", FileFormat.Rtf);

        // Convert the RTF document to images
        BufferedImage[] images = document.saveToImages(ImageType.Bitmap);

        // Iterate through the image collection
        for (int i = 0; i < images.length; i++) {

            // Get the specific image
            BufferedImage image = images[i];

            // Save the image as png format
            File file = new File("Images\\" + String.format(("Image-%d.png"), i));
            ImageIO.write(image, "PNG", file);
        }
    }
}

Convert all pages in a Word document to multiple PNG images

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Tagged under

doc java Conversion

Python: Split PowerPoint Presentations

2024-11-21 01:03:03 Written by Administrator

Splitting a PowerPoint presentation into smaller files or individual sections can be useful in various situations. For instance, when collaborating with a team, each member may only need a specific section of the presentation to work on. Additionally, breaking a large presentation into smaller parts can simplify sharing over email or uploading to platforms with file size restrictions. In this article, we'll show you how to split PowerPoint presentations by slides, slide ranges, and sections in Python using Spire.Presentation for Python.

Split PowerPoint Presentations by Slides in Python
Split PowerPoint Presentations by Slide Ranges in Python
Split PowerPoint Presentations by Sections in Python

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows

Split PowerPoint Presentations by Slides in Python

Developers can use Spire.Presentation for Python to split a PowerPoint presentation into individual slides by iterating through the slides in the presentation and adding each slide to a new presentation. The detailed steps are as follows.

Create an instance of the Presentation class.
Load a PowerPoint presentation using the Presentation.LoadFromFile() method.
Iterate through all slides in the presentation:
- Access the current slide through the Presentation.Slides[index] property.
- Create a new PowerPoint presentation using the Presentation class and remove its default slide using the Presentation.Slides.RemoveAt(0) method.
- Append the current slide to the new presentation using the Presentation.Slides.AppendBySlide() method.
- Save the new presentation as a file using the ISlide.SaveToFile() method.

Python

from spire.presentation.common import *
from spire.presentation import *

# Create an instance of the Presentation class
presentation = Presentation()
# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Iterate through all slides in the presentation
for i in range(presentation.Slides.Count):
    # Get the current slide
    slide = presentation.Slides[i]

    # Create a new PowerPoint presentation and remove its default slide
    newPresentation = Presentation()
    newPresentation.Slides.RemoveAt(0)

    # Append the current slide to the new presentation
    newPresentation.Slides.AppendBySlide(slide)

    # Save the new presentation as a file
    newPresentation.SaveToFile(f"output/Presentations/Slide-{i + 1}.pptx", FileFormat.Pptx2013)
    newPresentation.Dispose()

presentation.Dispose()

Split PowerPoint Presentations by Slides in Python

Split PowerPoint Presentations by Slide Ranges in Python

Apart from splitting a PowerPoint presentation into individual slides, developers can also divide it into specific ranges of slides by adding the desired slides to new presentations. The detailed steps are as follows.

Create an instance of the Presentation class.
Load a PowerPoint presentation using the Presentation.LoadFromFile() method.
Create new PowerPoint presentations using the Presentation class and remove the default slides within them using the Presentation.Slides.RemoveAt(0) method.
Append specified ranges of slides to the new presentations using the Presentation.Slides.AppendBySlide() method.
Save the new presentations as files using the Presentation.SaveToFile() method.

Python

from spire.presentation.common import *
from spire.presentation import *

# Create an instance of the Presentation class
presentation = Presentation()
# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Create two new PowerPoint presentations and remove their default slides
presentation1 = Presentation()
presentation2 = Presentation()
presentation1.Slides.RemoveAt(0)
presentation2.Slides.RemoveAt(0)

# Append slides 1-3 to the first new presentation
for i in range(3):
    presentation1.Slides.AppendBySlide(presentation.Slides[i])
# Append the remaining slides to the second new presentation
for i in range(3, presentation.Slides.Count):
    presentation2.Slides.AppendBySlide(presentation.Slides[i])

# Save the new presentations as files
presentation1.SaveToFile("output/Presentations/SlideRange1.pptx", FileFormat.Pptx2013)
presentation2.SaveToFile("output/Presentations/SlideRange2.pptx", FileFormat.Pptx2013)

presentation1.Dispose()
presentation2.Dispose()
presentation.Dispose()

Split PowerPoint Presentations by Slide Ranges in Python

Split PowerPoint Presentations by Sections in Python

Sections in PowerPoint are often used to organize slides into manageable groups. With Spire.Presentation for Python, developers can split a PowerPoint presentation into sections by iterating through the sections in the presentation and adding the slides within each section to a new presentation. The detailed steps are as follows.

Create an instance of the Presentation class.
Load a PowerPoint presentation using the Presentation.LoadFromFile() method.
Iterate through all sections in the presentation:
- Access the current section through the Presentation.SectionList[] property.
- Create a new PowerPoint presentation using the Presentation class and remove its default slide using the Presentation.Slides.RemoveAt(0) method.
- Add a section to the new presentation with the same name using the Presentation.SectionList.Append() method.
- Retrieve the slides of the current section using the Section.GetSlides() method.
- Iterate through the retrieved slides and add them to the section of the new presentation using the Section.Insert() method.
- Save the new presentation as a file using the Presentation.SaveToFile() method.

Python

from spire.presentation.common import *
from spire.presentation import *

# Create an instance of the Presentation class
presentation = Presentation()
# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Iterate through all sections
for i in range(presentation.SectionList.Count):
    # Get the current section
    section = presentation.SectionList.get_Item(0)

    # Create a new PowerPoint presentation and remove its default slide
    newPresentation = Presentation()
    newPresentation.Slides.RemoveAt(0)
    # Add a section to the new presentation
    newSection = newPresentation.SectionList.Append(section.Name)

    # Retrieve the slides of the current section
    slides = section.GetSlides()

    # Insert each retrieved slide into the section of the new presentation
    for slide_index, slide in enumerate(slides):
        newSection.Insert(slide_index, slide)

    # Save the new presentation as a file
    newPresentation.SaveToFile(f"output/Presentations/Section-{i + 1}.pptx", FileFormat.Pptx2019)
    newPresentation.Dispose()

presentation.Dispose()

Split PowerPoint Presentations by Sections in Python

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Document Operation

Tagged under

ppt Python Document Operation

News Category

Knowledgebase (2311)

Children categories

Purchase (7)

Licensing (7)

Benchmark (1)

Java (481)

.NET (1317)

Cloud (13)

CPP (71)

Python (355)

AI (4)

JavaScript (51)

Spire.OfficeJs (3)

Python: Convert XML to Excel and XML to PDF

Install Spire.XLS for Python

Understanding XML Structure: Elements, Attributes, and Data

Convert XML to Excel in Python

Convert XML to PDF in Python

Apply for a Temporary License

Java: Convert RTF to HTML, Image

Install Spire.Doc for Java

Convert RTF to HTML in Java

Convert RTF to Image in Java

Apply for a Temporary License

Python: Split PowerPoint Presentations

Install Spire.Presentation for Python

Split PowerPoint Presentations by Slides in Python

Split PowerPoint Presentations by Slide Ranges in Python

Split PowerPoint Presentations by Sections in Python

Apply for a Temporary License

More...

C#: Modify or Extract Comments in PowerPoint

Python: Convert Excel Data to Word Table with Formatting

Python: Extract Hyperlinks from Word Documents

C#: Insert, Retrieve, Reorder and Remove Slides in PowerPoint Sections