Of course! The findall() method is a core and powerful feature of Python's xml.etree.ElementTree module. It allows you to search an XML tree and retrieve all elements that match a specific criteria.

Here's a comprehensive guide covering everything from the basics to more advanced usage.
The Basics: What is findall()?
findall() searches the entire subtree of an element (starting from that element and going down all its children, grandchildren, etc.) and returns a list of all matching elements.
The key to using findall() is understanding the path language it uses, which is a simplified subset of XPath.
Prerequisites: Importing and Parsing XML
First, you need to have an XML document and parse it into an ElementTree object. We'll use this sample XML for all our examples.

Sample XML (library.xml):
<library>
<book category="FICTION">
<title lang="en">The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>10.99</price>
</book>
<book category="SCIENCE">
<title lang="en">A Brief History of Time</title>
<author>Stephen Hawking</author>
<year>1988</year>
<price>15.50</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J.K. Rowling</author>
<year>1997</year>
<price>12.99</price>
</book>
<magazine>
<title>National Geographic</title>
<month>October</month>
<year>2025</year>
</magazine>
</library>
Python Code to Parse:
import xml.etree.ElementTree as ET
# Parse the XML file
tree = ET.parse('library.xml')
# Get the root element of the tree
root = tree.getroot()
# You can also parse from a string
# xml_string = """..."""
# root = ET.fromstring(xml_string)
findall() with Simple Paths
The simplest path is just a tag name. This will find all elements with that tag name anywhere under the current element.
Example: Find all <title> elements

# Find all <title> elements anywhere in the documents = root.findall('title')
in all_titles:
print(title.text)
Output:
The Great Gatsby
A Brief History of Time
Harry Potter
National Geographic
Navigating the Tree with Paths
You can specify a path to narrow down your search.
Syntax: 'parent/child'
Example: Find the <title> of the first <book>
# Find the <title> element that is a direct child of a <book> element
first_book_title = root.find('book/title')
print(f"Title of the first book: {first_book_title.text}")
Output:
Example: Find all <author> elements inside <book> elements
# Find all <author> elements that are children of a <book> element
book_authors = root.findall('book/author')
for author in book_authors:
print(f"Author: {author.text}")
Output:
Author: F. Scott Fitzgerald
Author: Stephen Hawking
Author: J.K. Rowling
Handling Namespaces (A Very Common Gotcha!)
If your XML uses namespaces (e.g., <ns1:book>), a simple findall('book') will not work. You must include the namespace in your path.
Namespaced XML Example:
<library xmlns:bk="http://example.com/books">
<bk:book category="FICTION">
<bk:title>The Great Gatsby</bk:title>
<bk:author>F. Scott Fitzgerald</bk:author>
</bk:book>
</library>
How to Handle It: You need to extract the namespace from the root element and use it in your queries.
import xml.etree.ElementTree as ET
namespaced_xml = """<library xmlns:bk="http://example.com/books">
<bk:book category="FICTION">
<bk:title>The Great Gatsby</bk:title>
<bk:author>F. Scott Fitzgerald</bk:author>
</bk:book>
</library>"""
root = ET.fromstring(namespaced_xml)
# 1. Get the namespace dictionary from the root element's tag
# The tag looks like: '{http://example.com/books}library'
ns = {'bk': root.tag.split('}')[0][1:]}
# 2. Use the dictionary prefix in your findall call
# The path becomes: 'bk:book/bk:title's = root.findall('bk:book/bk:title', ns)
in all_titles:
print(title.text)
Output:
The Great Gatsby
Advanced Searching with XPath Predicates
You can add conditions to your paths using predicates in square brackets []. This is extremely powerful for filtering.
Syntax: 'path[condition]'
Common Predicates:
[@attribute='value']: Filter by an attribute.[text()='value']: Filter by the text content of an element.
Example: Find the <book> with category="SCIENCE"
# Find the book element that has an attribute 'category' equal to 'SCIENCE'
science_book = root.find('book[@category="SCIENCE"]')
# Now find its title= science_book.find('title')
print(f"Title of the science book: {title.text}")
Output:
Example: Find the <book> where the author is "Stephen Hawking"
# Find the book whose author text is "Stephen Hawking"
hawking_book = root.find('book[author="Stephen Hawking"]')
year = hawking_book.find('year')
print(f"Publication year: {year.text}")
Output:
Publication year: 1988
Important Distinction: findall() vs. find()
| Method | What it Returns | Use Case |
|---|---|---|
findall(path) |
A list of all matching elements. | Use when you expect multiple matches and need to iterate over them. |
find(path) |
The first matching element, or None if nothing is found. |
Use when you expect only one match (e.g., a unique root child) or just want the first one. |
Example:
# findall returns a list
all_books = root.findall('book')
print(f"Found {len(all_books)} books using findall().") # Output: Found 3 books
# find returns a single element
first_book = root.find('book')
print(f"Found book using find(): {first_book.get('category')}") # Output: Found book using find(): FICTION
# If no match is found, find() returns None
magazine = root.find('newspaper')
print(f"Found newspaper: {magazine}") # Output: Found newspaper: None
findtext() - A Convenient Shortcut
If you only care about the text content of an element and not the element itself, findtext() is a convenient shortcut. It's like find() but returns the .text property directly.
# Find the text of the first <year> element
first_year_text = root.findtext('book/year')
print(f"First year found: {first_year_text}") # Output: First year found: 1925
# It also supports predicates and namespaces
hawking_year_text = root.findtext('book[author="Stephen Hawking"]/year')
print(f"Hawking's book year: {hawking_year_text}") # Output: Hawking's book year: 1988
Summary and Best Practices
- Start Simple: Use
findall('tag')to get a feel for it. - Use Paths for Hierarchy: Use
parent/childto navigate. - Handle Namespaces: If your XML has them, create a namespace dictionary and use prefixes like
ns:tag. - Filter with Predicates: Use
[@attr='val']and[text()='val']to get exactly what you need. - Choose
find()orfindall(): Usefindall()for multiple results andfind()for a single result. - Use
findtext()for Convenience: When you only need the text content of a single element.
