Comments
Comments

Comments (2)

Python: Extract Comments from Word

2024-06-07 07:59:40 Written by Koohji

Comments in Word documents are often used for collaborative review and feedback purposes. They may contain text and images that provide valuable information to guide document improvements. Extracting the text and images from comments allows you to analyze and evaluate the feedback provided by reviewers, helping you gain a comprehensive understanding of the strengths, weaknesses, and suggestions related to the document. In this article, we will demonstrate how to extract text and images from Word comments in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Extract Text from Word Comments in Python

You can easily retrieve the author and text of a Word comment using the Comment.Format.Author and Comment.Body.Paragraphs[index].Text properties provided by Spire.Doc for Python. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a list to store the extracted comment data.
  • Iterate through the comments in the document.
  • For each comment, iterate through the paragraphs of the comment body.
  • For each paragraph, get the text using the Comment.Body.Paragraphs[index].Text property.
  • Get the author of the comment using the Comment.Format.Author property.
  • Add the text and author of the comment to the list.
  • Save the content of the list to a text file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
document = Document()
# Load a Word document containing comments
document.LoadFromFile("Comments.docx")

# Create a list to store the extracted comment data
comments = []

# Iterate through the comments in the document
for i in range(document.Comments.Count):
    comment = document.Comments[i]
    comment_text = ""

    # Iterate through the paragraphs in the comment body
    for j in range(comment.Body.Paragraphs.Count):
        paragraph = comment.Body.Paragraphs[j]
        comment_text += paragraph.Text + "\n"

    # Get the comment author
    comment_author = comment.Format.Author

    # Append the comment data to the list
    comments.append({
        "author": comment_author,
        "text": comment_text
    })

# Write the comment data to a file
with open("comment_data.txt", "w", encoding="utf-8") as file:
    for comment in comments:
        file.write(f"Author: {comment['author']}\nText: {comment['text']}\n\n")

Python: Extract Comments from Word

Extract Images from Word Comments in Python

To extract images from Word comments, you need to iterate through the child objects in the paragraphs of the comments to find the DocPicture objects, then get the image data using DocPicture.ImageBytes property, finally save the image data to image files.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a list to store the extracted image data.
  • Iterate through the comments in the document.
  • For each comment, iterate through the paragraphs of the comment body.
  • For each paragraph, iterate through the child objects of the paragraph.
  • Check if the object is a DocPicture object.
  • If the object is a DocPicture, get the image data using the DocPicture.ImageBytes property and add it to the list.
  • Save the image data in the list to individual image files.
  • Python
from spire.doc import *
from spire.doc.common import *
 
# Create an object of the Document class
document = Document()
# Load a Word document containing comments
document.LoadFromFile("Comments.docx")
 
# Create a list to store the extracted image data
images = []
 
# Iterate through the comments in the document
for i in range(document.Comments.Count):
    comment = document.Comments[i]
    # Iterate through the paragraphs in the comment body
    for j in range(comment.Body.Paragraphs.Count):
        paragraph = comment.Body.Paragraphs[j]
        # Iterate through the child objects in the paragraph
        for o in range(paragraph.ChildObjects.Count):
            obj = paragraph.ChildObjects[o]
            # Find the images
            if isinstance(obj, DocPicture):
                picture = obj
                # Get the image data and add it to the list
                data_bytes = picture.ImageBytes
                images.append(data_bytes)
 
# Save the image data to image files
for i, image_data in enumerate(images):
    file_name = f"CommentImage-{i}.png"
    with open(os.path.join("CommentImages/", file_name), 'wb') as image_file:
        image_file.write(image_data)

Python: Extract Comments from Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Comments are invaluable tools for collaboration and feedback in Word documents, allowing users to provide insights, suggestions, clarification, and discussions. Whether the goal is to share ideas, respond to existing comments, or remove outdated feedback, mastering the efficient handling of Word document comments can significantly enhance document workflow. This article aims to show how to add, delete, or reply to comments in Word documents using Spire.Doc for Python in Python programs.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Add Comments to a Paragraph in Word Documents using Python

Spire.Doc for Python provides the Paragraph.AppendComment() method to add a comment to a specified paragraph. And the range of text corresponding to the comment needs to be controlled using the comment start marks and end marks. The detailed steps to add comments on paragraphs are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Get the first section using Document.Sections.get_Item() method.
  • Get the first paragraph in the section using Section.Paragraphs.get_Item() method.
  • Add a comment to the paragraph using Paragraph.AppendComment() method.
  • Set the author the comment through Comment.Format.Author property.
  • Create a comment start mark and an end mark and set them as the start and end marks of the created comment though CommentMark.CommentId property.
  • Insert the comment start mark and end mark at the beginning and end of the paragraph respectively using Paragraph.ChildObjects.Insert() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class and load a Word document
doc = Document()
doc.LoadFromFile("Sample.docx")

# Get the first section
section = doc.Sections.get_Item(0)

# Get the forth paragraph
paragraph = section.Paragraphs.get_Item(4)

# Add a comment to the paragraph
comment = paragraph.AppendComment("Lack of detail on the history of the currency.")

# Set the author of the comment
comment.Format.Author = "Joe Butler"

# Create a comment start mark and an end mark and set them as the start and end marks of the created comment
commentStart = CommentMark(doc, CommentMarkType.CommentStart)
commentEnd = CommentMark(doc, CommentMarkType.CommentEnd)
commentStart.CommentId = comment.Format.CommentId
commentEnd.CommentId = comment.Format.CommentId

# Insert the comment start mark and end mark at the beginning and end of the paragraph respectively
paragraph.ChildObjects.Insert(0, commentStart)
paragraph.ChildObjects.Add(commentEnd)

# Save the document
doc.SaveToFile("output/CommentOnParagraph.docx")
doc.Close()

Python: Add, Delete, or Reply to Comments in Word Documents

Add Comments to Text in Word Documents using Python

Spire.Doc for Python also supports finding specified text and adding comments to it. The detailed steps are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Find the text to comment on using Document.FindString() method.
  • Create an object of Comment class, set the comment content through Comment.Body.AddParagraph().Text property, and set the author of the comment through Comment.Format.Author property.
  • Get the text as one text range using TextSelection.GetAsOneRange() method and get the paragraph the text belongs to through TextRange.OwnerParagraph property.
  • Insert the comment after the found text using Paragraph.ChildObjects.Insert() method.
  • Create a comment start mark and an end mark and set them as the start and end marks of the created comment though CommentMark.CommentId property.
  • Insert the comment start mark and end mark before and after found text respectively using Paragraph.ChildObjects.Insert() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class and load a Word document
doc = Document()
doc.LoadFromFile("Sample.docx")

# Find the text to comment on
text = doc.FindString("medium of exchange", True, True)

# Create a comment and set the content and author of the comment
comment = Comment(doc)
comment.Body.AddParagraph().Text = "Fiat currency is the only legal means of payment within a country or region."
comment.Format.Author = "Linda Taylor"

# Get the found text as a text range and get the paragraph it belongs to
range = text.GetAsOneRange()
paragraph =  range.OwnerParagraph

# Add the comment to the paragraph
paragraph.ChildObjects.Insert(paragraph.ChildObjects.IndexOf(range) + 1, comment)

# Create a comment start mark and an end mark and set them as the start and end marks of the created comment
commentStart = CommentMark(doc, CommentMarkType.CommentStart)
commentEnd = CommentMark(doc, CommentMarkType.CommentEnd)
commentStart.CommentId = comment.Format.CommentId
commentEnd.CommentId = comment.Format.CommentId

# Insert the created comment start and end tags before and after the found text respectively
paragraph.ChildObjects.Insert(paragraph.ChildObjects.IndexOf(range), commentStart)
paragraph.ChildObjects.Insert(paragraph.ChildObjects.IndexOf(range) + 1, commentEnd)

# Save the document
doc.SaveToFile ("output/CommentOnText.docx")
doc.Close()

Python: Add, Delete, or Reply to Comments in Word Documents

Remove Comments from Word Documents using Python

Spire.Doc for Python provides the Document.Comments.RemoveAt() method that can be used to remove a specified comment and the Document.Clear() method that can remove all comments. The detailed steps for removing comments are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Delete a specific comment using Document.Comments.RemoveAt() method or delete all the comments using Document.Comments.Clear() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class and load a Word document
doc = Document()
doc.LoadFromFile("Sample1.docx")

# Remove the second comment
doc.Comments.RemoveAt(1)

# Remove all comments
#doc.Comments.Clear()

# Save the document
doc.SaveToFile("output/RemoveComments.docx")
doc.Close()

Python: Add, Delete, or Reply to Comments in Word Documents

Reply to Comments in Word Documents using Python

Spire.Doc for Python allows users to reply to a comment by setting a comment as a reply to another comment using Comment.ReplyToComment(Comment) method. The detailed steps are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Get a comment using Document.Comments.get_Item() method.
  • Create a comment and set its content and author through Comment.Body.AddParagraph().Text property and Comment.Format.Author property.
  • Set the created comment as a reply to the obtained comment using Comment.ReplyToComment() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class and load a Word document
doc = Document()
doc.LoadFromFile("output/CommentOnParagraph.docx")

# Get a comment
comment = doc.Comments.get_Item(0)

# Create a reply comment and set the content and author of it
reply = Comment(doc)
reply.Body.AddParagraph().Text = "We will give more details about the history of the currency."
reply.Format.Author = "Moris Peter"

# Set the created comment as a reply to obtained comment
comment.ReplyToComment(reply)

# Save the document
doc.SaveToFile("output/ReplyToComments.docx")
doc.Close()

Python: Add, Delete, or Reply to Comments in Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page