杰瑞科技汇

Python XML 如何转 Excel?

Of course! Converting XML to Excel in Python is a common task, and there are several excellent libraries to choose from. The best method depends on the complexity of your XML file.

Python XML 如何转 Excel?-图1
(图片来源网络,侵删)

I'll guide you through the most popular and effective methods, starting with the simplest for well-structured XML and moving to more robust solutions for complex files.

Scenario 1: Simple, Flat XML (Best for pandas)

This is the easiest and most common scenario. Your XML has a repeating structure where each <record> or <item> can be directly mapped to a row in an Excel sheet.

Example XML File (data_simple.xml):

<?xml version="1.0"?>
<root>
  <record>
    <id>1</id>
    <product>Laptop</product>
    <price>1200</price>
  </record>
  <record>
    <id>2</id>
    <product>Mouse</product>
    <price>25</price>
  </record>
  <record>
    <id>3</id>
    <product>Keyboard</product>
    <price>75</price>
  </record>
</root>

Method A: Using pandas (Recommended for Simplicity)

The pandas library is fantastic for this. It can read XML directly if it's structured like our example.

Python XML 如何转 Excel?-图2
(图片来源网络,侵删)

Install pandas and lxml: lxml is a high-performance XML library that pandas uses under the hood for parsing.

pip install pandas lxml openpyxl

Python Script:

import pandas as pd
# The file path to your XML
xml_file = 'data_simple.xml'
# Read the XML file into a pandas DataFrame
# The 'record' tag is the repeating element that becomes each row.
df = pd.read_xml(xml_file, parser='lxml')
# Display the DataFrame (optional)
print(df)
# Write the DataFrame to an Excel file
# 'index=False' prevents writing the DataFrame index as a column
excel_file = 'output_simple.xlsx'
df.to_excel(excel_file, index=False, engine='openpyxl')
print(f"Successfully converted '{xml_file}' to '{excel_file}'")

Output (output_simple.xlsx):

id product price
1 Laptop 1200
2 Mouse 25
3 Keyboard 75

Scenario 2: Nested or Complex XML (Best for xml.etree.ElementTree)

When your XML has nested elements or attributes, pandas.read_xml() might not work out of the box. In this case, the standard library xml.etree.ElementTree gives you full control.

Python XML 如何转 Excel?-图3
(图片来源网络,侵删)

Example XML File (data_complex.xml):

<?xml version="1.0"?>
<orders>
  <order id="101">
    <customer>
      <name>Alice</name>
      <email>alice@example.com</email>
    </customer>
    <items>
      <item sku="A-123">Laptop</item>
      <item sku="B-456">Mouse</item>
    </items>
  </order>
  <order id="102">
    <customer>
      <name>Bob</name>
      <email>bob@example.com</email>
    </customer>
    <items>
      <item sku="C-789">Keyboard</item>
    </items>
  </order>
</orders>

Method B: Using xml.etree.ElementTree and pandas

We will parse the XML manually, flatten the nested structure, and then use pandas to create the Excel file.

Install libraries:

pip install pandas openpyxl

(Note: lxml is not strictly needed here as we use the built-in ElementTree).

Python Script:

import xml.etree.ElementTree as ET
import pandas as pd
def parse_xml_to_dataframe(xml_file):
    """
    Parses a complex XML file and returns a pandas DataFrame.
    Handles nested elements and attributes.
    """
    tree = ET.parse(xml_file)
    root = tree.getroot()
    all_records = []
    # Iterate over each 'order' tag
    for order in root.findall('order'):
        record = {}
        # Get attributes from the 'order' tag itself
        record['order_id'] = order.get('id')
        # Find child elements
        customer = order.find('customer')
        items = order.find('items')
        if customer is not None:
            record['customer_name'] = customer.find('name').text
            record['customer_email'] = customer.find('email').text
        if items is not None:
            # For multiple items, we'll join them with a separator
            item_texts = [item.text for item in items.findall('item')]
            record['items'] = ", ".join(item_texts)
            # We can also capture attributes if needed
            item_skus = [item.get('sku') for item in items.findall('item')]
            record['item_skus'] = ", ".join(item_skus)
        all_records.append(record)
    return pd.DataFrame(all_records)
# --- Main Execution ---
xml_file = 'data_complex.xml'
df = parse_xml_to_dataframe(xml_file)
print(df)
# Write the DataFrame to an Excel file
excel_file = 'output_complex.xlsx'
df.to_excel(excel_file, index=False, engine='openpyxl')
print(f"Successfully converted '{xml_file}' to '{output_complex.xlsx}'")

Output (output_complex.xlsx):

order_id customer_name customer_email items item_skus
101 Alice alice@example.com Laptop, Mouse A-123, B-456
102 Bob bob@example.com Keyboard C-789

Summary: Which Method Should I Use?

Method Best For Pros Cons
pandas.read_xml() Simple, flat XML with a clear repeating element (e.g., <record>). Extremely simple, one line of code to read. Fails on complex or deeply nested XML. Requires lxml.
xml.etree.ElementTree Complex, nested XML with attributes and varying structures. Full control over parsing logic, part of Python's standard library. More code to write, requires manual iteration and data structuring.
lxml Very large XML files or when you need XPath 2.0 support. High performance, powerful features. Not in the standard library, can have a steeper learning curve.

For most day-to-day tasks, start with pandas.read_xml(). If it fails or gives you unwanted results, switch to the xml.etree.ElementTree method for a more robust solution.

分享:
扫描分享到社交APP
上一篇
下一篇