Of course! The findall() method in Python's xml.etree.ElementTree module is a fundamental and powerful tool for searching an XML document. It allows you to find all elements in the tree that match a specific path expression.

Let's break it down with a clear, step-by-step guide.
The Basics: What findall() Does
findall() searches the children of the current element for all items that match a given path. It always returns a list of matching Element objects.
The path language used by findall() is a simplified subset of XPath, which is a standard for querying XML documents.
Prerequisites: Setting up the XML
First, let's have some sample XML data to work with. We'll use a simple library catalog.

<!-- library.xml -->
<library>
<book category="FICTION">lang="en">The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>12.99</price>
</book>
<book category="SCIENCE">lang="en">A Brief History of Time</title>
<author>Stephen Hawking</author>
<year>1988</year>
<price>15.50</price>
</book>
<book category="CHILDREN">lang="en">Harry Potter and the Philosopher's Stone</title>
<author>J.K. Rowling</author>
<year>1997</year>
<price>8.99</price>
</book>
<magazine>National Geographic</title>
<issue>December 2025</issue>
</magazine>
</library>
Step-by-Step Examples
Step 1: Parsing the XML File
You must first parse the XML file to get the root element of the tree. All subsequent searches will start from this root.
import xml.etree.ElementTree as ET
try:
tree = ET.parse('library.xml')
root = tree.getroot()
print(f"Root element: {root.tag}")
except FileNotFoundError:
print("Error: library.xml not found. Please create it.")
# Create a dummy root for the examples to run without the file
root = ET.fromstring("""
<library>
<book category="FICTION">
<title lang="en">The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>12.99</price>
</book>
<book category="SCIENCE">
<title lang="en">A Brief History of Time</title>
<author>Stephen Hawking</author>
<year>1988</year>
<price>15.50</price>
</book>
<magazine>
<title>National Geographic</title>
</magazine>
</library>
""")
Step 2: Finding All Elements of a Specific Tag
The simplest path is just a tag name. This finds all direct children of the current element with that tag.
Goal: Find all <book> elements.
# Find all 'book' elements directly under the root
all_books = root.findall('book')
print(f"\nFound {len(all_books)} 'book' elements.")
for book in all_books:
print(f"- Found a book with category: {book.get('category')}")
Output:
Found 3 'book' elements.
- Found a book with category: FICTION
- Found a book with category: SCIENCE
- Found a book with category: CHILDREN
Step 3: Finding Elements with a Path (Parent-Child)
You can use a slash to specify a parent-child relationship. This is a very common use case.
Goal: Find the <title> of every book.
# Find all 'title' elements that are children of a 'book' elements = root.findall('book/title')
print("\nTitles of all books:")element in book_titles:
# .text gets the text content of the element
print(f"- {title_element.text}")
# You can also get attributes from the found element
print("\nTitles with their language attribute:")element in book_titles:
lang = title_element.get('lang') # Use .get() for attributes
print(f"- {title_element.text} (lang: {lang})")
Output:
- The Great Gatsby - A Brief History of Time - Harry Potter and the Philosopher's Stone s with their language attribute: - The Great Gatsby (lang: en) - A Brief History of Time (lang: en) - Harry Potter and the Philosopher's Stone (lang: en)
Step 4: Finding Elements at Any Level (Descendants)
What if you want to find all <title> elements, no matter how deep they are in the tree? findall() only searches direct children. To search all descendants, you need to use a loop.
Goal: Find the title of the magazine as well.
# findall() only searches direct children, so this will NOT find the magazine title
# magazine_titles = root.findall('title') # This would fail
# The correct way: iterate through all children and use findall on eachs = []
for child in root:
# Find all 'title' elements within each childs_in_child = child.findall('title')
all_titles.extend(titles_in_child)
print("\nAll titles in the library (found recursively):")element in all_titles:
print(f"- {title_element.text}")
Output:
- The Great Gatsby - A Brief History of Time - Harry Potter and the Philosopher's Stone - National Geographic
Step 5: Using Predicates to Filter by Attribute
You can filter elements based on their attributes using square brackets []. This is one of the most powerful features.
Goal: Find only the books in the "FICTION" category.
# Find 'book' elements that have an attribute 'category' with the value 'FICTION
fiction_books = root.findall("book[@category='FICTION']")
print("\nFiction books found with predicate:")
for book in fiction_books:= book.find('title').text # .find() returns the first match
author = book.find('author').text
print(f"- {title} by {author}")
Output:
Fiction books found with predicate:
- The Great Gatsby by F. Scott Fitzgerald
Key Differences: findall() vs. find()
It's crucial to understand the difference between findall() and find().
| Method | What it Does | Return Value |
|---|---|---|
findall(path) |
Finds all matching elements. | A list of Element objects. Returns an empty list [] if nothing is found. |
find(path) |
Finds the first matching element. | A single Element object. Returns None if nothing is found. |
Example of find():
# Find the first 'book' element
first_book = root.find('book')
if first_book is not None:
print(f"\nFirst book found: {first_book.find('title').text}")
else:
print("\nNo book found.")
# This will return None because there is no <magazine> with category 'XYZ'
non_existent_magazine = root.find("magazine[@category='XYZ']")
print(f"Result of finding a non-existent element: {non_existent_magazine}")
Output:
First book found: The Great Gatsby
Result of finding a non-existent element: None
Best Practices and Common Pitfalls
-
Namespaces are Tricky: If your XML uses namespaces (e.g.,
<ns:library>), you must include them in your path. The easiest way is to define a dictionary and use it in your search.<ns:library xmlns:ns="http://example.com/library"> <ns:book>...</ns:book> </ns:library>
# Define the namespace ns = {'ns': 'http://example.com/library'} # Parse the XML root = ET.fromstring(xml_string_with_namespace) # Search using the namespace prefix books = root.findall('ns:book', ns) -
Use
.get()for Attributes: Always use.get('attribute_name')to safely get attribute values. It returnsNoneif the attribute doesn't exist, preventing anAttributeError. Avoid using.attrib['attribute_name']unless you are certain the attribute exists. -
Check for
None: When usingfind(), always check if the result isNonebefore trying to access its properties (like.textor
