杰瑞科技汇

Python如何高效查找Excel数据?

Of course! Finding data in an Excel file is a very common task in Python. The best library for this is pandas, which is powerful and easy to use. For more complex tasks, you can use openpyxl.

Here’s a complete guide covering the most common scenarios, from simple lookups to more complex "find" operations.


The Setup: Installing Libraries

First, you need to install the necessary libraries. pandas is the core, and openpyxl is the engine it uses to read .xlsx files.

pip install pandas openpyxl

Let's assume you have an Excel file named sales_data.xlsx with the following data:

Date Product Sales Rep Amount
2025-10-01 Laptop Alice 1200
2025-10-02 Mouse Bob 25
2025-10-03 Keyboard Charlie 150
2025-10-04 Laptop Alice 1300
2025-10-05 Monitor Bob 300
2025-10-06 Keyboard David 160

Using pandas (Recommended)

pandas loads your Excel sheet into a DataFrame, which is like a powerful, in-memory spreadsheet.

import pandas as pd
# Load the Excel file into a DataFrame
# The 'sheet_name' argument is optional if you only have one sheet
df = pd.read_excel('sales_data.xlsx', sheet_name='Sheet1')
print("Original DataFrame:")
print(df)

Scenario A: Find a Value in a Specific Column

This is the most common task: "Find all rows where the 'Sales Rep' is 'Alice'".

# Find all rows where the 'Sales Rep' column is 'Alice'
alice_sales = df[df['Sales Rep'] == 'Alice']
print("\nSales by Alice:")
print(alice_sales)

Output:

Sales by Alice:
         Date Product Sales Rep  Amount
0 2025-10-01  Laptop     Alice    1200
3 2025-10-04  Laptop     Alice    1300

Scenario B: Find a Value Based on Multiple Conditions

Let's find all sales of 'Laptops' made by 'Alice'.

# Find rows where 'Product' is 'Laptop' AND 'Sales Rep' is 'Alice'
laptop_by_alice = df[(df['Product'] == 'Laptop') & (df['Sales Rep'] == 'Alice')]
print("\nLaptops sold by Alice:")
print(laptop_by_alice)

Note: The & operator is used for "AND". You must wrap each condition in parentheses .

Scenario C: Find a Value Using str.contains() (Partial Match)

What if you want to find all products that contain the word "key"? This is useful for partial matches.

# Find rows where the 'Product' column contains the string 'key' (case-insensitive)
keyboard_sales = df[df['Product'].str.contains('key', case=False, na=False)]
print("\nSales of products containing 'key':")
print(keyboard_sales)
  • case=False: Makes the search case-insensitive (so 'Key' or 'KEY' would also match).
  • na=False: Handles any potential missing/NaN values by treating them as not matching.

Scenario D: Find a Value and Get a Specific Cell's Information

Let's find the amount for the first sale of a 'Mouse'.

# 1. Find the row where 'Product' is 'Mouse'
mouse_row = df[df['Product'] == 'Mouse']
# 2. Check if any rows were found
if not mouse_row.empty:
    # 3. Get the 'Amount' from the first matching row (iloc[0] gets the first row)
    mouse_amount = mouse_row.iloc[0]['Amount']
    print(f"\nThe amount for the first mouse sale was: {mouse_amount}")
else:
    print("\nNo mouse sales found.")

Using openpyxl (More Granular Control)

openpyxl gives you access to the Excel file at a cell-by-cell level. It's useful for things like formatting or iterating through cells in a specific way.

from openpyxl import load_workbook
# Load the workbook and specify the data_only=True to get values, not formulas
wb = load_workbook(filename='sales_data.xlsx', data_only=True)
ws = wb.active  # Gets the first active sheet
print("\n--- Using openpyxl ---")
# Find the cell with the value 'Bob'
found_cell = None
for row in ws.iter_rows(values_only=True):
    if 'Bob' in row:
        # 'row' is a tuple of cell values for that row
        # The index of 'Bob' in the tuple tells us the column
        bob_index = row.index('Bob')
        # Get the cell object using its coordinates
        found_cell = ws.cell(row=ws.max_row, column=bob_index + 1) # A bit complex to get the exact cell
        break
if found_cell:
    print(f"Found 'Bob' at cell address: {found_cell.coordinate}")
    print(f"Row number: {found_cell.row}, Column number: {found_cell.column}")
else:
    print("Could not find 'Bob'.")

Why pandas is usually better for "finding":

  • Readability: df[df['Column'] == 'Value'] is much clearer than iterating through rows.
  • Power: pandas is built for data manipulation and can handle millions of rows efficiently.
  • Simplicity: It abstracts away the complexity of cell coordinates.

Use openpyxl when you need to:

  • Read/write cell formatting (fonts, colors, borders).
  • Manipulate Excel objects like charts, images, or formulas.
  • Iterate through every single cell with fine-grained control.

Advanced: Find a Value and Return the Row Number

Sometimes you need the row number to update the file or for other logic. pandas makes this easy with .index.

import pandas as pd
df = pd.read_excel('sales_data.xlsx')
# Find all rows where 'Sales Rep' is 'Charlie'
charlie_sales = df[df['Sales Rep'] == 'Charlie']
# Get the index (row number) of these matches
# Note: Pandas index is 0-based, but Excel rows are 1-based
charlie_row_indices = charlie_sales.index.tolist()
print(f"\nFound Charlie's sales at pandas indices (0-based): {charlie_row_indices}")
# If you need 1-based Excel-style row numbers:
excel_row_numbers = [idx + 1 for idx in charlie_row_indices]
print(f"Found Charlie's sales at Excel row numbers (1-based): {excel_row_numbers}")
# Example: Get the full row for the first match using the index
if charlie_row_indices:
    first_match_full_row = df.loc[charlie_row_indices[0]]
    print("\nFull details of the first match:")
    print(first_match_full_row)

Summary: Which Method to Use?

Your Goal Recommended Library Why?
"Find all rows where X is Y" pandas Simple, readable, and highly efficient for this task.
"Find rows where X is Y and Z is W" pandas Easy to combine conditions with &, , .
"Find a partial text match" pandas The .str.contains() method is perfect for this.
"Find a value and get its cell address/row" pandas Use .index to get row numbers.
"Find a value and change its formatting" openpyxl pandas doesn't handle cell formatting. openpyxl gives you direct access.
"Iterate through every single cell in a sheet" openpyxl ws.iter_rows() is designed for this low-level task.
分享:
扫描分享到社交APP
上一篇
下一篇