Parse and Manipulate XML with the Python xml.etree.ElementTree Module

Click to share! ⬇️

XML, or Extensible Markup Language, is a widely-used markup language designed for encoding documents in a format that is both human-readable and machine-readable. It allows for the easy sharing and manipulation of structured data across various platforms and applications. In this tutorial, we will explore the xml.etree.ElementTree module in Python, which provides a lightweight and efficient way to work with XML data.

The xml.etree.ElementTree module, often referred to as ElementTree, is a part of Python’s standard library and offers a convenient way to parse, manipulate, and generate XML documents. This tutorial is designed for users who have a basic understanding of Python and are looking to expand their skills by working with XML data.

  1. How To Install and Import the xml.etree.ElementTree Module
  2. How To Read and Parse an XML File Using ElementTree
  3. How To Create an XML Tree From Scratch
  4. How To Access and Retrieve XML Elements and Attributes
  5. How To Modify XML Elements and Attributes
  6. How To Add and Remove XML Elements
  7. How To Iterate Through XML Elements Using Loops
  8. How To Search for Specific Elements in XML
  9. How To Convert an XML Tree to a String
  10. How To Write and Save XML Data to a File

We will cover a range of topics, starting with installing and importing the xml.etree.ElementTree module, followed by parsing and creating XML trees. Additionally, we will explore how to access, retrieve, modify, add, and remove XML elements and attributes, as well as iterate through elements using loops. Furthermore, we will discuss searching for specific elements, converting XML trees to strings, and writing XML data to files.

By the end of this tutorial, you will have a solid foundation in using the Python xml.etree.ElementTree module to work with XML data effectively and efficiently.

How To Install and Import the xml.etree.ElementTree Module

The xml.etree.ElementTree module is a part of Python’s standard library, which means it comes pre-installed with Python. Therefore, you don’t need to install it separately. You can start working with the xml.etree.ElementTree module by importing it into your Python script or interactive shell. There are two common ways to import the module:

  1. Import the entire module:
import xml.etree.ElementTree

With this approach, you will need to use the full module name when calling its functions, like this:

tree = xml.etree.ElementTree.parse('example.xml')
  1. Import the module with an alias:
import xml.etree.ElementTree as ET

By using an alias, you can shorten the module name and make your code more readable. In this case, we used the alias ‘ET’, but you can choose any name that suits you. To call a function from the module, you would now use the alias:

tree = ET.parse('example.xml')
  1. Import specific functions or classes directly:
from xml.etree.ElementTree import ElementTree, Element, SubElement

This method allows you to directly use the imported functions or classes without referring to the module name or alias. For example:

tree = ElementTree()

Now that you know how to import the xml.etree.ElementTree module, you’re ready to start working with XML data in Python. In the next sections, we’ll cover various operations, such as parsing, creating, and modifying XML documents.

How To Read and Parse an XML File Using ElementTree

Parsing an XML file using the xml.etree.ElementTree module is straightforward. The module provides two primary methods for parsing XML data: parse() and fromstring(). The parse() function reads an XML file, while the fromstring() function reads an XML string.

Here’s a step-by-step guide on how to read and parse an XML file using the ElementTree module:

  1. Import the xml.etree.ElementTree module:
import xml.etree.ElementTree as ET
  1. Use the parse() function to read and parse the XML file:
tree = ET.parse('example.xml')

Replace ‘example.xml’ with the path to your XML file. The parse() function returns an ElementTree object, which represents the entire XML tree.

  1. Get the root element of the XML tree:
root = tree.getroot()

The getroot() method returns the root Element of the XML tree, which allows you to start navigating and manipulating the XML data.

Now let’s see how to parse an XML string using the fromstring() function:

  1. Import the xml.etree.ElementTree module:
import xml.etree.ElementTree as ET
  1. Define an XML string:
xml_string = '''
<catalog>
    <book id="001">
        <title>Python for Beginners</title>
        <author>John Doe</author>
        <price>29.99</price>
    </book>
    <book id="002">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
        <price>39.99</price>
    </book>
</catalog>
'''
  1. Use the fromstring() function to parse the XML string:
root = ET.fromstring(xml_string)

The fromstring() function returns the root Element of the parsed XML tree.

Now you know how to read and parse an XML file and an XML string using the xml.etree.ElementTree module. In the next sections, we’ll discuss how to access, retrieve, and manipulate XML elements and attributes.

How To Create an XML Tree From Scratch

The xml.etree.ElementTree module allows you to create an XML tree from scratch using Python. You can create elements, add attributes, and nest elements to build the XML tree structure. Here’s a step-by-step guide on how to create an XML tree from scratch:

  1. Import the xml.etree.ElementTree module:
import xml.etree.ElementTree as ET
  1. Create the root element using the Element() function:
root = ET.Element('catalog')

This creates a new XML element with the tag ‘catalog’. The Element() function takes the tag name as its argument and returns an Element object.

  1. Create child elements using the SubElement() function:
book1 = ET.SubElement(root, 'book')
book2 = ET.SubElement(root, 'book')

The SubElement() function takes two arguments: the parent element and the tag name of the new element. It creates a new child element under the specified parent element and returns the new Element object.

  1. Add attributes to elements using the set() method:
book1.set('id', '001')
book2.set('id', '002')

The set() method takes two arguments: the attribute name and the attribute value. It adds the attribute to the specified element.

  1. Add more child elements and set their text content:
title1 = ET.SubElement(book1, 'title')
title1.text = 'Python for Beginners'

author1 = ET.SubElement(book1, 'author')
author1.text = 'John Doe'

price1 = ET.SubElement(book1, 'price')
price1.text = '29.99'

title2 = ET.SubElement(book2, 'title')
title2.text = 'Advanced Python'

author2 = ET.SubElement(book2, 'author')
author2.text = 'Jane Smith'

price2 = ET.SubElement(book2, 'price')
price2.text = '39.99'

You can set the text content of an element by assigning a string to its text attribute.

  1. Create an ElementTree object from the root element:
tree = ET.ElementTree(root)

The ElementTree() function takes the root element as its argument and returns an ElementTree object, which represents the entire XML tree.

How To Access and Retrieve XML Elements and Attributes

After parsing an XML document using the xml.etree.ElementTree module, you can access and retrieve elements and their attributes using various techniques. In this section, we’ll cover how to access elements using indexing, loops, and XPath-like expressions, as well as how to retrieve element attributes.

  1. Access elements using indexing and loops:

Each XML element behaves like a list containing its child elements. You can use indexing and loops to access child elements.

import xml.etree.ElementTree as ET

xml_string = '''
<catalog>
    <book id="001">
        <title>Python for Beginners</title>
        <author>John Doe</author>
        <price>29.99</price>
    </book>
    <book id="002">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
        <price>39.99</price>
    </book>
</catalog>
'''

root = ET.fromstring(xml_string)

# Access the first book element using indexing
first_book = root[0]

# Iterate through all book elements using a loop
for book in root:
    print(book.tag, book.attrib)
  1. Access elements using the find(), findall(), and findtext() methods:

The find() method returns the first matching child element, while the findall() method returns a list of all matching child elements. The findtext() method returns the text content of the first matching child element.

# Find the first 'title' element
first_title = root.find('book/title')

# Find all 'author' elements
all_authors = root.findall('book/author')

# Get the text content of the first 'price' element
first_price_text = root.findtext('book/price')
  1. Retrieve element attributes using the get() method or dictionary-like access:
first_book_id = first_book.get('id')  # Using the get() method
first_book_id = first_book.attrib['id']  # Using dictionary-like access

How To Modify XML Elements and Attributes

Once you have accessed XML elements and attributes using the xml.etree.ElementTree module, you can modify their content and properties. In this section, we’ll cover how to change the text content of elements and how to add, modify, and remove attributes.

  1. Change the text content of an element:

To modify the text content of an element, simply assign a new string value to its text attribute.

import xml.etree.ElementTree as ET

xml_string = '''
<book>
    <title>Python for Beginners</title>
</book>
'''

root = ET.fromstring(xml_string)
title_element = root.find('title')

# Change the text content of the 'title' element
title_element.text = 'Python for Intermediate Learners'
  1. Add or modify attributes of an element:

To add a new attribute to an element or modify an existing one, you can use the set() method or dictionary-like access.

book_element = root

# Add a new attribute 'id' with the value '001'
book_element.set('id', '001')
# or
book_element.attrib['id'] = '001'

# Modify the value of the 'id' attribute
book_element.set('id', '002')
# or
book_element.attrib['id'] = '002'
  1. Remove an attribute from an element:

To remove an attribute from an element, use the del statement with dictionary-like access.

# Remove the 'id' attribute from the 'book' element
del book_element.attrib['id']

Now you know how to modify XML elements and attributes using the xml.etree.ElementTree module. In the following sections, you will learn how to add and remove elements, iterate through elements using loops, search for specific elements, and convert XML trees to strings or save them to files.

How To Add and Remove XML Elements

Using the xml.etree.ElementTree module, you can easily add and remove elements in an XML tree. In this section, we’ll cover how to add child elements and remove elements from an XML tree.

  1. Add child elements:

To add a child element to an existing element, use the SubElement() function, as demonstrated below:

import xml.etree.ElementTree as ET

xml_string = '''
<book>
    <title>Python for Beginners</title>
</book>
'''

root = ET.fromstring(xml_string)
book_element = root

# Add an 'author' child element with text content
author_element = ET.SubElement(book_element, 'author')
author_element.text = 'John Doe'

# Add a 'price' child element with text content
price_element = ET.SubElement(book_element, 'price')
price_element.text = '29.99'
  1. Remove elements:

To remove an element from its parent, use the remove() method of the parent element. Keep in mind that this method only removes the first occurrence of the specified element.

# Remove the 'price' element from the 'book' element
book_element.remove(price_element)

If you need to remove all elements with a specific tag, you can use a loop:

# Remove all 'author' elements from the 'book' element
for elem in book_element.findall('author'):
    book_element.remove(elem)

How To Iterate Through XML Elements Using Loops

When working with XML data using the xml.etree.ElementTree module, you might need to iterate through XML elements to access, modify, or analyze their content. There are several ways to iterate through XML elements using loops, such as iterating over child elements, using the iter() method, or using the iterfind() method. In this section, we’ll cover these methods.

  1. Iterate over child elements:

You can directly use a loop to iterate over the child elements of an XML element, as shown below:

import xml.etree.ElementTree as ET

xml_string = '''
<catalog>
    <book>
        <title>Python for Beginners</title>
        <author>John Doe</author>
    </book>
    <book>
        <title>Advanced Python</title>
        <author>Jane Smith</author>
    </book>
</catalog>
'''

root = ET.fromstring(xml_string)

# Iterate over child elements (book elements) of the root element (catalog)
for book in root:
    print(book.tag)
  1. Iterate using the iter() method:

The iter() method allows you to iterate through all elements in an XML tree that match a given tag. This method is useful when you need to access elements nested at different levels within the XML tree.

# Iterate over all 'title' elements in the XML tree
for title in root.iter('title'):
    print(title.text)
  1. Iterate using the iterfind() method:

The iterfind() method is similar to the findall() method but returns an iterator instead of a list, which can be more memory-efficient for large XML trees. The iterfind() method accepts an XPath-like expression to search for matching elements.

# Iterate over all 'author' elements in the XML tree
for author in root.iterfind('book/author'):
    print(author.text)

How To Search for Specific Elements in XML

When working with XML data using the xml.etree.ElementTree module, you might need to search for specific elements based on their tag names or attribute values. You can use XPath-like expressions with methods like find(), findall(), findtext(), and iterfind() to search for elements in an XML tree. In this section, we’ll cover these methods and demonstrate how to search for elements using XPath-like expressions.

  1. Search for elements using the find() method:

The find() method searches for the first element that matches the provided XPath-like expression and returns the element.

import xml.etree.ElementTree as ET

xml_string = '''
<catalog>
    <book id="001">
        <title>Python for Beginners</title>
        <author>John Doe</author>
    </book>
    <book id="002">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
    </book>
</catalog>
'''

root = ET.fromstring(xml_string)

# Find the first 'book' element with the 'id' attribute set to '001'
first_book = root.find(".//book[@id='001']")
  1. Search for elements using the findall() method:

The findall() method searches for all elements that match the provided XPath-like expression and returns a list of elements.

# Find all 'title' elements under 'book' elements
titles = root.findall('book/title')
  1. Search for elements using the findtext() method:

The findtext() method searches for the first element that matches the provided XPath-like expression and returns the element’s text content.

# Get the text content of the first 'author' element under a 'book' element
first_author_text = root.findtext('book/author')
  1. Search for elements using the iterfind() method:

The iterfind() method is similar to the findall() method but returns an iterator instead of a list, which can be more memory-efficient for large XML trees.

# Iterate over all 'author' elements under 'book' elements
for author in root.iterfind('book/author'):
    print(author.text)

How To Convert an XML Tree to a String

When working with XML data using the xml.etree.ElementTree module, you may need to convert an XML tree or an element back to an XML string representation. To do this, you can use the tostring() function provided by the module. In this section, we’ll demonstrate how to convert an entire XML tree and individual elements to strings.

  1. Convert an entire XML tree to a string:

To convert an entire XML tree to a string, you can use the tostring() function with the root element as its argument.

import xml.etree.ElementTree as ET

xml_string = '''
<catalog>
    <book id="001">
        <title>Python for Beginners</title>
        <author>John Doe</author>
    </book>
    <book id="002">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
    </book>
</catalog>
'''

root = ET.fromstring(xml_string)

# Convert the entire XML tree to a string
xml_tree_string = ET.tostring(root, encoding='utf-8', method='xml').decode('utf-8')
print(xml_tree_string)
  1. Convert an individual element to a string:

You can also convert an individual XML element to a string using the tostring() function.

# Find the first 'book' element
first_book = root.find('book')

# Convert the 'book' element to a string
book_element_string = ET.tostring(first_book, encoding='utf-8', method='xml').decode('utf-8')
print(book_element_string)

The tostring() function accepts optional arguments like encoding and method. The encoding argument specifies the output string’s character encoding (default is ‘us-ascii’). The method argument specifies the output format, which can be ‘xml’, ‘html’, or ‘text’ (default is ‘xml’).

How To Write and Save XML Data to a File

After parsing, modifying, or creating an XML tree using the xml.etree.ElementTree module, you might need to save the resulting XML data to a file. In this section, we’ll demonstrate how to write and save XML data to a file using the ElementTree.write() method and the tostring() function.

  1. Save an XML tree to a file using the ElementTree.write() method:

To save an XML tree directly to a file, create an ElementTree object with the root element as its argument and call the write() method.

import xml.etree.ElementTree as ET

xml_string = '''
<catalog>
    <book id="001">
        <title>Python for Beginners</title>
        <author>John Doe</author>
    </book>
    <book id="002">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
    </book>
</catalog>
'''

root = ET.fromstring(xml_string)
tree = ET.ElementTree(root)

# Save the XML tree to a file
with open('output.xml', 'wb') as file:
    tree.write(file, encoding='utf-8', xml_declaration=True)
  1. Save an XML tree to a file using the tostring() function:

Alternatively, you can use the tostring() function to convert the XML tree or an individual element to a string and then save the string to a file.

# Convert the entire XML tree to a string
xml_tree_string = ET.tostring(root, encoding='utf-8', method='xml')

# Save the XML string to a file
with open('output.xml', 'wb') as file:
    file.write(xml_tree_string)

The write() method and tostring() function accept optional arguments, such as encoding, xml_declaration, and method. The encoding argument specifies the output string’s character encoding (default is ‘us-ascii’). The xml_declaration argument, when set to True, includes the XML declaration in the output (default is False). The method argument specifies the output format, which can be ‘xml’, ‘html’, or ‘text’ (default is ‘xml’).

Click to share! ⬇️