Python etree findall如何高效解析XML节点？-杰瑞科技汇

Of course! The findall() method is a core and powerful feature of Python's xml.etree.ElementTree module. It allows you to search an XML tree and retrieve all elements that match a specific criteria.

（图片来源网络，侵删）

Here's a comprehensive guide covering everything from the basics to more advanced usage.

The Basics: What is `findall()`?

findall() searches the entire subtree of an element (starting from that element and going down all its children, grandchildren, etc.) and returns a list of all matching elements.

The key to using findall() is understanding the path language it uses, which is a simplified subset of XPath.

Prerequisites: Importing and Parsing XML

First, you need to have an XML document and parse it into an ElementTree object. We'll use this sample XML for all our examples.

（图片来源网络，侵删）

Sample XML (library.xml):

<library>
    <book category="FICTION">
        <title lang="en">The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
        <price>10.99</price>
    </book>
    <book category="SCIENCE">
        <title lang="en">A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <year>1988</year>
        <price>15.50</price>
    </book>
    <book category="CHILDREN">
        <title lang="en">Harry Potter</title>
        <author>J.K. Rowling</author>
        <year>1997</year>
        <price>12.99</price>
    </book>
    <magazine>
        <title>National Geographic</title>
        <month>October</month>
        <year>2025</year>
    </magazine>
</library>

Python Code to Parse:

import xml.etree.ElementTree as ET
# Parse the XML file
tree = ET.parse('library.xml')
# Get the root element of the tree
root = tree.getroot()
# You can also parse from a string
# xml_string = """..."""
# root = ET.fromstring(xml_string)

`findall()` with Simple Paths

The simplest path is just a tag name. This will find all elements with that tag name anywhere under the current element.

Example: Find all <title> elements

（图片来源网络，侵删）

# Find all <title> elements anywhere in the documents = root.findall('title')
in all_titles:
    print(title.text)

Output:

The Great Gatsby
A Brief History of Time
Harry Potter
National Geographic

Navigating the Tree with Paths

You can specify a path to narrow down your search.

Syntax: 'parent/child'

Example: Find the <title> of the first <book>

# Find the <title> element that is a direct child of a <book> element
first_book_title = root.find('book/title')
print(f"Title of the first book: {first_book_title.text}")

Output:

Example: Find all <author> elements inside <book> elements

# Find all <author> elements that are children of a <book> element
book_authors = root.findall('book/author')
for author in book_authors:
    print(f"Author: {author.text}")

Output:

Author: F. Scott Fitzgerald
Author: Stephen Hawking
Author: J.K. Rowling

Handling Namespaces (A Very Common Gotcha!)

If your XML uses namespaces (e.g., <ns1:book>), a simple findall('book') will not work. You must include the namespace in your path.

Namespaced XML Example:

<library xmlns:bk="http://example.com/books">
    <bk:book category="FICTION">
        <bk:title>The Great Gatsby</bk:title>
        <bk:author>F. Scott Fitzgerald</bk:author>
    </bk:book>
</library>

How to Handle It: You need to extract the namespace from the root element and use it in your queries.

import xml.etree.ElementTree as ET
namespaced_xml = """<library xmlns:bk="http://example.com/books">
    <bk:book category="FICTION">
        <bk:title>The Great Gatsby</bk:title>
        <bk:author>F. Scott Fitzgerald</bk:author>
    </bk:book>
</library>"""
root = ET.fromstring(namespaced_xml)
# 1. Get the namespace dictionary from the root element's tag
# The tag looks like: '{http://example.com/books}library'
ns = {'bk': root.tag.split('}')[0][1:]}
# 2. Use the dictionary prefix in your findall call
# The path becomes: 'bk:book/bk:title's = root.findall('bk:book/bk:title', ns)
in all_titles:
    print(title.text)

Output:

The Great Gatsby

Advanced Searching with XPath Predicates

You can add conditions to your paths using predicates in square brackets []. This is extremely powerful for filtering.

Syntax: 'path[condition]'

Common Predicates:

[@attribute='value']: Filter by an attribute.
[text()='value']: Filter by the text content of an element.

Example: Find the <book> with category="SCIENCE"

# Find the book element that has an attribute 'category' equal to 'SCIENCE'
science_book = root.find('book[@category="SCIENCE"]')
# Now find its title= science_book.find('title')
print(f"Title of the science book: {title.text}")

Output:

Example: Find the <book> where the author is "Stephen Hawking"

# Find the book whose author text is "Stephen Hawking"
hawking_book = root.find('book[author="Stephen Hawking"]')
year = hawking_book.find('year')
print(f"Publication year: {year.text}")

Output:

Publication year: 1988

Important Distinction: `findall()` vs. `find()`

Method	What it Returns	Use Case
`findall(path)`	A list of all matching elements.	Use when you expect multiple matches and need to iterate over them.
`find(path)`	The first matching element, or `None` if nothing is found.	Use when you expect only one match (e.g., a unique root child) or just want the first one.

Example:

# findall returns a list
all_books = root.findall('book')
print(f"Found {len(all_books)} books using findall().") # Output: Found 3 books
# find returns a single element
first_book = root.find('book')
print(f"Found book using find(): {first_book.get('category')}") # Output: Found book using find(): FICTION
# If no match is found, find() returns None
magazine = root.find('newspaper')
print(f"Found newspaper: {magazine}") # Output: Found newspaper: None

`findtext()` - A Convenient Shortcut

If you only care about the text content of an element and not the element itself, findtext() is a convenient shortcut. It's like find() but returns the .text property directly.

# Find the text of the first <year> element
first_year_text = root.findtext('book/year')
print(f"First year found: {first_year_text}") # Output: First year found: 1925
# It also supports predicates and namespaces
hawking_year_text = root.findtext('book[author="Stephen Hawking"]/year')
print(f"Hawking's book year: {hawking_year_text}") # Output: Hawking's book year: 1988

Summary and Best Practices

Start Simple: Use findall('tag') to get a feel for it.
Use Paths for Hierarchy: Use parent/child to navigate.
Handle Namespaces: If your XML has them, create a namespace dictionary and use prefixes like ns:tag.
Filter with Predicates: Use [@attr='val'] and [text()='val'] to get exactly what you need.
Choose find() or findall(): Use findall() for multiple results and find() for a single result.
Use findtext() for Convenience: When you only need the text content of a single element.

Python etree findall如何高效解析XML节点？

The Basics: What is `findall()`?

Prerequisites: Importing and Parsing XML

`findall()` with Simple Paths

Navigating the Tree with Paths

Handling Namespaces (A Very Common Gotcha!)

Advanced Searching with XPath Predicates

Important Distinction: `findall()` vs. `find()`

`findtext()` - A Convenient Shortcut

Summary and Best Practices

99ANYc3cd6

Python Tricks PDF有哪些实用技巧？

Python命令中default参数如何使用？

汤国安地理信息系统教程有何独到之处？

Python ndarray矩阵如何高效操作？

Java API 1.6在线文档哪里找？

Java Socket编程实例具体怎么实现？

如何精通SEO搜索引擎优化？

如何将单个Python文件转为可安装的Python包？

Java如何搭建WebService？

如何高效Learn More Python？

Java HTTP如何返回JSON数据？

Python Notebook如何高效使用？

Python在Houdini中如何高效运用？

HBase Java Scan如何高效查询数据？

python selenium用法

CorelDraw新手如何快速入门？

Python etree findall如何高效解析XML节点？

The Basics: What is findall()?

Prerequisites: Importing and Parsing XML

findall() with Simple Paths

Navigating the Tree with Paths

Handling Namespaces (A Very Common Gotcha!)

Advanced Searching with XPath Predicates

Important Distinction: findall() vs. find()

findtext() - A Convenient Shortcut

Summary and Best Practices

相关推荐

Java Socket编程实例具体怎么实现？

The Basics: What is `findall()`?

`findall()` with Simple Paths

Important Distinction: `findall()` vs. `find()`

`findtext()` - A Convenient Shortcut