杰瑞科技汇

openpyxl与xlrd如何选?Python读写Excel用哪个?

Of course! This is a very common point of confusion for Python users working with Excel files. Here’s a detailed breakdown of openpyxl and xlrd, their differences, and when to use each.

openpyxl与xlrd如何选?Python读写Excel用哪个?-图1
(图片来源网络,侵删)

The Short Answer (TL;DR)

Feature openpyxl xlrd
Primary Purpose Read, Write, and Manipulate .xlsx and .xlsm files. Read .xls and .xlsx files.
File Format Modern Excel format (.xlsx, since Excel 2007). Legacy Excel format (.xls) and modern (.xlsx).
Writing Yes, it's a full-featured library for creating and modifying files. No, it is read-only.
Development Actively maintained. The recommended choice for new projects. Largely unmaintained. Its .xlsx support is a wrapper and can be buggy.
Recommendation Use openpyxl for almost everything. Only use xlrd if you have a legacy .xls file you must read.

Detailed Comparison

openpyxl

openpyxl is a comprehensive library for working with Excel 2010+ files (.xlsx and .xlsm). It's the de-facto standard in the Python community for modern Excel files.

Key Strengths:

  • Full Read/Write Support: You can not only read data from existing spreadsheets but also create new ones, modify cell values, add charts, insert images, change formatting, and much more.
  • Actively Maintained: It's a robust and well-supported library.
  • Modern Format: It's designed specifically for the .xlsx format, which is the current standard.

Common Use Cases for openpyxl:

  • Generating a report from a database and saving it as an Excel file.
  • Reading data from an Excel file, processing it in Python, and saving the results back to the same or a new file.
  • Automating the creation of formatted invoices or templates.

Installation:

openpyxl与xlrd如何选?Python读写Excel用哪个?-图2
(图片来源网络,侵删)
pip install openpyxl

Simple openpyxl Example (Reading and Writing):

import openpyxl
# --- READING from an .xlsx file ---
# Load the workbook
try:
    # The read_only=True is much faster for reading large files
    workbook = openpyxl.load_workbook('sample.xlsx', read_only=True)
    sheet = workbook.active
    print("--- Reading Data ---")
    # Read data from cell A1
    value_a1 = sheet['A1'].value
    print(f"Value in A1: {value_a1}")
    # Iterate through rows
    for row in sheet.iter_rows(min_row=2, values_only=True): # Start from row 2
        print(f"Row data: {row}")
    workbook.close() # Important to close when using read_only=True
except FileNotFoundError:
    print("sample.xlsx not found. Creating a new one...")
# --- WRITING to an .xlsx file ---
# Create a new workbook
workbook_to_save = openpyxl.Workbook()
sheet_to_save = workbook_to_save.active
sheet_to_save.title = "New Data"
# Write data to cells
sheet_to_save['A1'] = 'Product'
sheet_to_save['B1'] = 'Quantity'
sheet_to_save['A2'] = 'Apple'
sheet_to_save['B2'] = 150
sheet_to_save['A3'] = 'Banana'
sheet_to_save['B3'] = 200
# Save the workbook
workbook_to_save.save('new_report.xlsx')
print("\n--- Writing Data ---")
print("Successfully created new_report.xlsx")

xlrd

xlrd is an older library that was the go-to for reading .xls files (the Excel format used before 2007). It also has some support for reading .xlsx files, but this support is limited and relies on an external library, xlrd.xlsx.

Key Strengths:

  • Legacy .xls Support: It is one of the best and most reliable libraries for reading old .xls files.
  • Speed: For reading very large .xls files, it can be very fast.

Key Weaknesses:

openpyxl与xlrd如何选?Python读写Excel用哪个?-图3
(图片来源网络,侵删)
  • Read-Only: You cannot write or modify files with xlrd.
  • Largely Unmaintained: The project has seen little development in recent years.
  • Poor .xlsx Support: Its .xlsx capabilities are a basic wrapper and do not support many modern features. It often fails on complex .xlsx files. The official documentation even warns against using it for .xlsx files.

Common Use Cases for xlrd:

  • You have a large archive of old .xls files that you need to read data from.
  • You are maintaining an old codebase that already uses xlrd.

Installation:

pip install xlrd

Simple xlrd Example (Reading only):

import xlrd
# --- READING from an .xls file ---
try:
    # Open the workbook
    workbook = xlrd.open_workbook('legacy_data.xls')
    # Get the first sheet by index
    sheet = workbook.sheet_by_index(0)
    print("--- Reading from legacy_data.xls ---")
    # Get value from cell A1 (note: 0-indexed)
    value_a1 = sheet.cell_value(0, 0)
    print(f"Value in A1: {value_a1}")
    # Iterate through rows
    print("\n--- Row Data ---")
    for row_idx in range(1, sheet.nrows): # Start from row 1 (skip header)
        # Get a list of cell values for the current row
        row_data = sheet.row_values(row_idx)
        print(f"Row {row_idx + 1}: {row_data}")
except FileNotFoundError:
    print("legacy_data.xls not found.")

When to Use Which? A Decision Guide

Your Goal Recommended Library Why?
Read or write a .xlsx file. openpyxl It's the modern, actively maintained, and full-featured standard.
Read a legacy .xls file. xlrd It's the most reliable library for the old .xls format.
Read an unknown file type (.xls or .xlsx). Try openpyxl first. If it fails with an error about the format, then switch to xlrd.
Create a new Excel file from scratch. openpyxl xlrd cannot write files.
Modify an existing Excel file. openpyxl xlrd cannot modify files.

The Modern Alternative: pandas

For most data analysis tasks, you won't use openpyxl or xlrd directly. Instead, you'll use the pandas library, which uses them "under the hood" as engines.

pandas provides a much simpler, high-level interface for reading data from Excel files directly into a DataFrame.

Installation:

pip install pandas openpyxl xlrd

(You need openpyxl and xlrd installed as dependencies for pandas to read .xlsx and .xls files respectively.)

Simple pandas Example:

import pandas as pd
# Read an .xlsx file into a DataFrame
# pandas will automatically use the openpyxl engine
try:
    df_xlsx = pd.read_excel('sample.xlsx', engine='openpyxl')
    print("--- DataFrame from sample.xlsx ---")
    print(df_xlsx)
except FileNotFoundError:
    print("\nsample.xlsx not found.")
# Read an .xls file into a DataFrame
# pandas will automatically use the xlrd engine
try:
    df_xls = pd.read_excel('legacy_data.xls', engine='xlrd')
    print("\n--- DataFrame from legacy_data.xls ---")
    print(df_xls)
except FileNotFoundError:
    print("legacy_data.xls not found.")
# Write a DataFrame to an .xlsx file
new_df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [95, 88, 76]
})
new_df.to_excel('pandas_output.xlsx', index=False)
print("\n--- Writing DataFrame with pandas ---")
print("Successfully created pandas_output.xlsx")

Final Recommendation

  1. For any new project, start with pandas if your goal is data analysis and manipulation. It's the easiest and most common way to work with Excel data in Python.
  2. If you need fine-grained control over the Excel file (like setting specific cell formats, adding charts, or manipulating worksheets in a way pandas doesn't support), use openpyxl directly.
  3. Only pull in xlrd if you have a specific
分享:
扫描分享到社交APP
上一篇
下一篇