Python如何正确导入并使用xml模块？-杰瑞科技汇

Of course! Here's a comprehensive guide on how to import and work with XML data in Python, covering the built-in xml.etree.ElementTree module and the popular third-party library lxml.

The Standard Library: `xml.etree.ElementTree`

Python comes with a built-in module called xml.etree.ElementTree (often imported as ET). It's a powerful, easy-to-use, and memory-efficient way to parse and create XML data. It's perfect for most common tasks.

Key Concepts:

Element: The basic building block of an XML tree. An element has a tag, attributes, and text content.
Tree: The entire XML document is represented as a tree of elements.
Parsing: The process of reading an XML string or file and converting it into an ElementTree object.
Finding: Navigating the tree to find specific elements using tags, paths, or expressions.

Example 1: Parsing an XML File

Let's say you have an XML file named library.xml:

library.xml

<?xml version="1.0"?>
<library>
  <book category="FICTION">lang="en">The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <year>1925</year>
  </book>
  <book category="SCIENCE">lang="en">A Brief History of Time</title>
    <author>Stephen Hawking</author>
    <year>1988</year>
  </book>
  <book category="CHILDREN">lang="en">The Hobbit</title>
    <author>J.R.R. Tolkien</author>
    <year>1937</year>
  </book>
</library>

Here's how to parse it and extract information:

import xml.etree.ElementTree as ET
try:
    # Parse the XML file
    tree = ET.parse('library.xml')
    # Get the root element of the tree
    root = tree.getroot()
    print(f"Root tag: {root.tag}\n")
    # --- 1. Iterating over all 'book' elements ---
    print("--- Iterating over all books ---")
    for book in root.findall('book'):
        # find() searches for a child element with the given tag
        title = book.find('title').text
        author = book.find('author').text
        year = book.find('year').text
        category = book.get('category')  # .get() retrieves an attribute
        print(f"Title: {title}, Author: {author}, Year: {year}, Category: {category}")
    print("\n")
    # --- 2. Finding a specific element using XPath-like paths ---
    print("--- Finding a specific book by author ---")
    # findall with a path searches the entire tree
    hobbit_book = root.find(".//book[author='J.R.R. Tolkien']")
    if hobbit_book is not None:
        print(f"Found book: {hobbit_book.find('title').text}")
    print("\n")
    # --- 3. Accessing attributes and text ---
    print("--- Accessing attributes and text ---")
    first_book_title = root.find('book/title')
    # Get the 'lang' attribute
    lang_attr = first_book_title.get('lang')
    # Get the text contenttext = first_book_title.text
    print(f"First book title language: {lang_attr}")
    print(f"First book title text: {title_text}")
except FileNotFoundError:
    print("Error: library.xml not found.")
except ET.ParseError:
    print("Error: Could not parse the XML file.")

Output:

Root tag: library
--- Iterating over all books --- The Great Gatsby, Author: F. Scott Fitzgerald, Year: 1925, Category: FICTION A Brief History of Time, Author: Stephen Hawking, Year: 1988, Category: SCIENCE The Hobbit, Author: J.R.R. Tolkien, Year: 1937, Category: CHILDREN
--- Finding a specific book by author ---
Found book: The Hobbit
--- Accessing attributes and text ---
First book title language: en
First book title text: The Great Gatsby

Example 2: Creating XML from Scratch

You can also build an XML tree programmatically.

import xml.etree.ElementTree as ET
# Create the root element
root = ET.Element("inventory")
# Create a new product element
product1 = ET.SubElement(root, "product", id="P1001")
ET.SubElement(product1, "name").text = "Laptop"
ET.SubElement(product1, "quantity").text = "10"
# Create another product
product2 = ET.SubElement(root, "product", id="P1002")
ET.SubElement(product2, "name").text = "Mouse"
ET.SubElement(product2, "quantity").text = "50"
# Create a new XML tree from the root element
tree = ET.ElementTree(root)
# Write the tree to a file
# 'encoding="utf-8"' and 'xml_declaration=True' are best practices
tree.write("inventory.xml", encoding="utf-8", xml_declaration=True)
print("inventory.xml has been created.")

Generated `inventory.xml`:

<?xml version='1.0' encoding='utf-8'?>
<inventory>
  <product id="P1001">
    <name>Laptop</name>
    <quantity>10</quantity>
  </product>
  <product id="P1002">
    <name>Mouse</name>
    <quantity>50</quantity>
  </product>
</inventory>

The Powerful Third-Party Library: `lxml`

For more complex tasks, performance-critical applications, or when you need full XPath 1.0 support, the lxml library is the best choice. It's faster than the standard library and has more features.

First, you need to install it:

pip install lxml

lxml has a very similar API to xml.etree.ElementTree, so the code is often a drop-in replacement.

Example 1: Parsing with `lxml`

Let's use the same library.xml.

from lxml import etree
# Parse the XML file
tree = etree.parse('library.xml')
root = tree.getroot()
print(f"Root tag: {root.tag}\n")
# --- 1. Iterating over all 'book' elements ---
print("--- Iterating over all books ---")
for book in root.xpath('//book'): # lxml uses the full power of XPath= book.xpath('title/text()')[0]
    author = book.xpath('author/text()')[0]
    year = book.xpath('year/text()')[0]
    category = book.get('category')
    print(f"Title: {title}, Author: {author}, Year: {year}, Category: {category}")
print("\n")
# --- 2. Using more complex XPath expressions ---
print("--- Using complex XPath ---")
# Find all titles of books in the 'FICTION' category
fiction_titles = root.xpath('//book[@category="FICTION"]/title/text()')
print(f"Fiction titles: {fiction_titles}")
# Find the author of the book with the title 'The Hobbit'
hobbit_author = root.xpath('//book[title="The Hobbit"]/author/text()')
print(f"Author of 'The Hobbit': {hobbit_author[0]}")
print("\n")
# --- 3. Pretty Printing ---
# A nice feature of lxml is easy pretty printing
print("--- Pretty Printing ---")
print(etree.tostring(root, pretty_print=True, encoding='unicode'))

Key `lxml` Advantages:

Full XPath 1.0 Support: root.xpath() is incredibly powerful and can select nodes based on complex conditions, attributes, and text content.
Performance: lxml is significantly faster for both parsing and querying.
HTML Parsing: lxml.html is excellent for parsing broken or messy HTML, which xml.etree.ElementTree cannot handle.
Validation: It can validate XML against a DTD or Schema.

Summary: `ElementTree` vs. `lxml`

Feature	`xml.etree.ElementTree` (Standard Library)	`lxml` (Third-Party)
Installation	Built-in, no installation needed.	`pip install lxml`
Performance	Good, but slower for large/complex files.	Excellent, much faster.
XPath Support	Limited (basic `find()` and `findall()`).	Full XPath 1.0 support (`xpath()` method).
HTML Parsing	Not supported.	Excellent support (`lxml.html`).
Validation	No built-in validation.	Can validate against DTD and XML Schema.
Ease of Use	Very easy for simple tasks.	API is similar, slightly more complex for advanced features.
Best For	Simple scripts, quick parsing, environments where you can't install packages.	Production applications, large XML files, complex queries, web scraping.

Conclusion

For most day-to-day XML handling, start with the built-in xml.etree.ElementTree. It's simple, reliable, and gets the job done.

If you find yourself needing more power, better performance, or the ability to write complex XPath queries, switch to lxml. It's the industry standard for advanced XML processing in Python.

Python如何正确导入并使用xml模块？

The Standard Library: `xml.etree.ElementTree`

Key Concepts:

Example 1: Parsing an XML File

Output:

Example 2: Creating XML from Scratch

Generated `inventory.xml`:

The Powerful Third-Party Library: `lxml`

Example 1: Parsing with `lxml`

Key `lxml` Advantages:

Summary: `ElementTree` vs. `lxml`

Conclusion

99ANYc3cd6

foreach 数组 java

java调用sqlserver

王佩丰Excel教程如何高效学习？

hashmap java 实现

python ftplib cwd

Java Socket编程实例具体怎么实现？

Python Pandas 如何高效处理 log 数据？

如何快速掌握HTML5响应式布局？

Java如何DES解密C语言加密的数据？

Python schedule函数如何定时执行任务？

python redis hget

佳能eos1200d教程

哪里能免费下载Java视频教程？

Java中int转float会丢失精度吗？

Python str rindex 方法如何使用？

realflow安装教程

Python如何正确导入并使用xml模块？

The Standard Library: xml.etree.ElementTree

Key Concepts:

Example 1: Parsing an XML File

Output:

Example 2: Creating XML from Scratch

Generated inventory.xml:

The Powerful Third-Party Library: lxml

Example 1: Parsing with lxml

Key lxml Advantages:

Summary: ElementTree vs. lxml

Conclusion

相关推荐

Java Socket编程实例具体怎么实现？

The Standard Library: `xml.etree.ElementTree`

Generated `inventory.xml`:

The Powerful Third-Party Library: `lxml`

Example 1: Parsing with `lxml`

Key `lxml` Advantages:

Summary: `ElementTree` vs. `lxml`