PDF forms are commonly used to collect user information, and extracting form values programmatically allows for automated processing of submitted data, ensuring accurate data collection and analysis. After extraction, you can generate reports based on form field values or migrate them to other systems or databases. In this article, you will learn how to extract form field values from PDF with Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Extract Form Field Values from PDF with Python
Spire.PDF for Python supports various types of PDF form fields, including:
- Text box field (represented by the PdfTextBoxFieldWidget class)
- Check box field (represented by the PdfCheckBoxWidgetFieldWidget class)
- Radio button field (represented by the PdfRadioButtonListFieldWidget class)
- List box field (represented by the PdfListBoxWidgetFieldWidget class)
- Combo box field (represented by the PdfComboBoxWidgetFieldWidget class)
Before extracting data from the PDF forms, it is necessary to determine the specific type of each form field first, and then you can use the properties of the corresponding form field class to extract their values accurately. The following are the detailed steps.
- Initialize an instance of the PdfDocument class.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Get the form in the PDF document using PdfDocument.Form property.
- Create a list to store the extracted form field values.
- Iterate through all fields in the PDF form.
- Determine the types of the form fields, then get the names and values of the form fields using the corresponding properties.
- Write the results to a text file.
- Python
from spire.pdf.common import *
from spire.pdf import *
inputFile = "Forms.pdf"
outputFile = "GetFormFieldValues.txt"
# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF document
pdf.LoadFromFile(inputFile)
# Get PDF forms
pdfform = pdf.Form
formWidget = PdfFormWidget(pdfform)
sb = []
# Iterate through all fields in the form
if formWidget.FieldsWidget.Count > 0:
for i in range(formWidget.FieldsWidget.Count):
field = formWidget.FieldsWidget.get_Item(i)
# Get the name and value of the textbox field
if isinstance(field, PdfTextBoxFieldWidget):
textBoxField = field if isinstance(field, PdfTextBoxFieldWidget) else None
name = textBoxField.Name
value = textBoxField.Text
sb.append("Textbox Name: " + name + "\r")
sb.append("Textbox Name " + value + "\r\n")
# Get the name of the listbox field
if isinstance(field, PdfListBoxWidgetFieldWidget):
listBoxField = field if isinstance(field, PdfListBoxWidgetFieldWidget) else None
name = listBoxField.Name
sb.append("Listbox Name: " + name + "\r")
# Get the items of the listbox field
sb.append("Listbox Items: \r")
items = listBoxField.Values
for i in range(items.Count):
item = items.get_Item(i)
sb.append(item.Value + "\r")
# Get the selected item of the listbox field
selectedValue = listBoxField.SelectedValue
sb.append("Listbox Selected Value: " + selectedValue + "\r\n")
# Get the name of the combo box field
if isinstance(field, PdfComboBoxWidgetFieldWidget):
comBoxField = field if isinstance(field, PdfComboBoxWidgetFieldWidget) else None
name = comBoxField.Name
sb.append("Combobox Name: " + name + "\r");
# Get the items of the combo box field
sb.append("Combobox Items: \r");
items = comBoxField.Values
for i in range(items.Count):
item = items.get_Item(i)
sb.append(item.Value + "\r")
# Get the selected item of the combo box field
selectedValue = comBoxField.SelectedValue
sb.append("Combobox Selected Value: " + selectedValue + "\r\n")
# Get the name and selected item of the radio button field
if isinstance(field, PdfRadioButtonListFieldWidget):
radioBtnField = field if isinstance(field, PdfRadioButtonListFieldWidget) else None
name = radioBtnField.Name
selectedValue = radioBtnField.SelectedValue
sb.append("Radio Button Name: " + name + "\r");
sb.append("Radio Button Selected Value: " + selectedValue + "\r\n")
# Get the name and status of the checkbox field
if isinstance(field, PdfCheckBoxWidgetFieldWidget):
checkBoxField = field if isinstance(field, PdfCheckBoxWidgetFieldWidget) else None
name = checkBoxField.Name
sb.append("Checkbox Name: " + name + "\r")
state = checkBoxField.Checked
stateValue = "Yes" if state else "No"
sb.append("If the checkBox is checked: " + stateValue + "\r\n")
# Write the results to a text file
f2=open(outputFile,'w', encoding='UTF-8')
for item in sb:
f2.write(item)
f2.close()
pdf.Close()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
