杰瑞科技汇

Python loadworkbook如何高效读取Excel数据?

Of course! The load_workbook function is a cornerstone of the openpyxl library in Python, which is used for reading and writing Excel (.xlsx) files.

Python loadworkbook如何高效读取Excel数据?-图1
(图片来源网络,侵删)

Here's a comprehensive guide covering everything from the basics to more advanced usage.


The Basics: What is load_workbook?

load_workbook() is a function from the openpyxl library that opens an existing Excel file from your disk and loads it into memory, allowing you to read its data, modify it, and then save it back.

Key things to know:

  • File Format: It works with modern .xlsx files, not the older .xls format.
  • Read-Only Mode: By default, it loads the file in a read-write mode. For better performance when you only need to read data, you can use read_only=True.
  • Data Only Mode: If you only need the cell values and don't care about formatting (like colors, fonts, borders), you can use data_only=True. This can significantly speed up loading.

Prerequisites: Installation

If you don't have openpyxl installed, you can install it using pip:

Python loadworkbook如何高效读取Excel数据?-图2
(图片来源网络,侵删)
pip install openpyxl

Loading a Workbook: The Core Function

Let's start with the simplest example.

Assume you have an Excel file named sample.xlsx with the following content:

A B C
1 Product Quantity Price
2 Apple 50 20
3 Orange 75 50
4 Banana 120 80

Example 1: Basic Loading

from openpyxl import load_workbook
# Specify the path to your Excel file
file_path = 'sample.xlsx'
# Load the workbook
# The 'data_only=True' argument is useful for reading calculated cell values
wb = load_workbook(filename=file_path, data_only=True)
print(f"Successfully loaded workbook: {wb}")
print(f"Workbook sheet names: {wb.sheetnames}")
# Don't forget to close the workbook when you're done
# This is good practice to free up resources.
wb.close()

Output:

Successfully loaded workbook: <openpyxl.workbook.workbook.Workbook object at 0x...>
Workbook sheet names: ['Sheet1']

Working with the Loaded Workbook

Once you've loaded the workbook, you'll typically want to access its sheets and then the cells within those sheets.

Python loadworkbook如何高效读取Excel数据?-图3
(图片来源网络,侵删)

Example 2: Accessing Sheets and Reading Data

This example reads all the data from the first sheet.

from openpyxl import load_workbook
file_path = 'sample.xlsx'
wb = load_workbook(filename=file_path, data_only=True)
# --- Accessing Sheets ---
# Method 1: Get the sheet by its name (most common)
sheet = wb['Sheet1']
# Method 2: Get the first sheet (if you know it's the one you want)
# sheet = wb.active
# --- Reading Data ---
# Get the value of a single cell
product_a2 = sheet['A2']
print(f"Value of cell A2: {product_a2.value}")
# You can also use .cell() method, which is often more robust
# Note: .cell() uses 1-based indexing for rows and columns
quantity_a3 = sheet.cell(row=3, column=2)
print(f"Value of cell B3: {quantity_a3.value}")
# Reading all data from the sheet
print("\n--- Reading all data from the sheet ---")
# We'll iterate through rows. 'sheet.iter_rows()' is efficient.
for row in sheet.iter_rows(values_only=True):
    # values_only=True gives you only the cell values, not the cell objects
    print(row)
wb.close()

Output:

Value of cell A2: Apple
Value of cell B3: 75
--- Reading all data from the sheet ---
('Product', 'Quantity', 'Price')
('Apple', 50, 1.2)
('Orange', 75, 1.5)
('Banana', 120, 0.8)

Advanced Loading Options

Using read_only=True and data_only=True can make your script much faster and more memory-efficient, especially with large files.

Example 3: Using read_only=True

This mode is for reading only. You cannot modify or save the workbook in this mode. It streams the data, which is great for huge files that wouldn't fit in memory otherwise.

from openpyxl import load_workbook
file_path = 'sample.xlsx'
# Load the workbook in read-only mode
wb_read_only = load_workbook(filename=file_path, read_only=True)
sheet = wb_read_only.active
print("Reading from a read-only workbook:")
for row in sheet.iter_rows(values_only=True):
    print(row)
# IMPORTANT: You MUST close a read-only workbook.
# It cannot be closed automatically with a 'with' statement.
wb_read_only.close()

Example 4: Using data_only=True

This mode loads the last calculated value of a cell. If a cell contains a formula (e.g., =SUM(A1:A10)), data_only=True will give you the result of that sum. If you omit it, you'll get the formula string itself (=SUM(A1:A10)).

from openpyxl import load_workbook
# Assume 'formula.xlsx' has a cell C5 with the formula =A2*B2
# where A2 is 10 and B2 is 5. The value in C5 should be 50.
file_path = 'formula.xlsx'
# Without data_only, you get the formula string
wb_formula = load_workbook(filename=file_path)
sheet = wb_formula.active
print(f"Cell C5 contains the formula: {sheet['C5'].value}")
wb_formula.close()
# With data_only, you get the calculated value
wb_value = load_workbook(filename=file_path, data_only=True)
sheet = wb_value.active
print(f"Cell C5 contains the calculated value: {sheet['C5'].value}")
wb_value.close()

Output:

Cell C5 contains the formula: =A2*B2
Cell C5 contains the calculated value: 50

Best Practice: The with Statement

Using a with statement is the recommended way to handle files in Python. It automatically ensures that your workbook is closed, even if errors occur during processing.

This works for standard loading but not for read_only=True mode.

from openpyxl import load_workbook
file_path = 'sample.xlsx'
try:
    with load_workbook(filename=file_path, data_only=True) as wb:
        sheet = wb.active
        print(f"Sheet title: {sheet.title}")
        # Do your work here...
        # For example, print the value of cell C4
        print(f"Value of C4: {sheet['C4'].value}")
        # The workbook will be automatically closed when the 'with' block exits
        print("Workbook processing complete.")
except FileNotFoundError:
    print(f"Error: The file '{file_path}' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

Summary of Key Parameters

Parameter Description Example
filename (Required) The path to the .xlsx file to load. load_workbook('data.xlsx')
read_only If True, opens the file in read-only mode. Faster and uses less memory. Cannot be saved. load_workbook('data.xlsx', read_only=True)
data_only If True, loads the calculated value of cells. If False (default), loads the formula string. load_workbook('data.xlsx', data_only=True)
keep_vba If True, any VBA (macros) present in the workbook will be preserved. load_workbook('macros.xlsx', keep_vba=True)
keep_links If True, external links will be preserved. load_workbook('linked.xlsx', keep_links=True)
分享:
扫描分享到社交APP
上一篇
下一篇