Python如何高效查找Excel数据？-杰瑞科技汇

Of course! Finding data in an Excel file is a very common task in Python. The best library for this is pandas, which is powerful and easy to use. For more complex tasks, you can use openpyxl.

Here’s a complete guide covering the most common scenarios, from simple lookups to more complex "find" operations.

The Setup: Installing Libraries

First, you need to install the necessary libraries. pandas is the core, and openpyxl is the engine it uses to read .xlsx files.

pip install pandas openpyxl

Let's assume you have an Excel file named sales_data.xlsx with the following data:

Date	Product	Sales Rep	Amount
2025-10-01	Laptop	Alice	1200
2025-10-02	Mouse	Bob	25
2025-10-03	Keyboard	Charlie	150
2025-10-04	Laptop	Alice	1300
2025-10-05	Monitor	Bob	300
2025-10-06	Keyboard	David	160

Using `pandas` (Recommended)

pandas loads your Excel sheet into a DataFrame, which is like a powerful, in-memory spreadsheet.

import pandas as pd
# Load the Excel file into a DataFrame
# The 'sheet_name' argument is optional if you only have one sheet
df = pd.read_excel('sales_data.xlsx', sheet_name='Sheet1')
print("Original DataFrame:")
print(df)

Scenario A: Find a Value in a Specific Column

This is the most common task: "Find all rows where the 'Sales Rep' is 'Alice'".

# Find all rows where the 'Sales Rep' column is 'Alice'
alice_sales = df[df['Sales Rep'] == 'Alice']
print("\nSales by Alice:")
print(alice_sales)

Output:

Sales by Alice:
         Date Product Sales Rep  Amount
0 2025-10-01  Laptop     Alice    1200
3 2025-10-04  Laptop     Alice    1300

Scenario B: Find a Value Based on Multiple Conditions

Let's find all sales of 'Laptops' made by 'Alice'.

# Find rows where 'Product' is 'Laptop' AND 'Sales Rep' is 'Alice'
laptop_by_alice = df[(df['Product'] == 'Laptop') & (df['Sales Rep'] == 'Alice')]
print("\nLaptops sold by Alice:")
print(laptop_by_alice)

Note: The & operator is used for "AND". You must wrap each condition in parentheses .

Scenario C: Find a Value Using `str.contains()` (Partial Match)

What if you want to find all products that contain the word "key"? This is useful for partial matches.

# Find rows where the 'Product' column contains the string 'key' (case-insensitive)
keyboard_sales = df[df['Product'].str.contains('key', case=False, na=False)]
print("\nSales of products containing 'key':")
print(keyboard_sales)

case=False: Makes the search case-insensitive (so 'Key' or 'KEY' would also match).
na=False: Handles any potential missing/NaN values by treating them as not matching.

Scenario D: Find a Value and Get a Specific Cell's Information

Let's find the amount for the first sale of a 'Mouse'.

# 1. Find the row where 'Product' is 'Mouse'
mouse_row = df[df['Product'] == 'Mouse']
# 2. Check if any rows were found
if not mouse_row.empty:
    # 3. Get the 'Amount' from the first matching row (iloc[0] gets the first row)
    mouse_amount = mouse_row.iloc[0]['Amount']
    print(f"\nThe amount for the first mouse sale was: {mouse_amount}")
else:
    print("\nNo mouse sales found.")

Using `openpyxl` (More Granular Control)

openpyxl gives you access to the Excel file at a cell-by-cell level. It's useful for things like formatting or iterating through cells in a specific way.

from openpyxl import load_workbook
# Load the workbook and specify the data_only=True to get values, not formulas
wb = load_workbook(filename='sales_data.xlsx', data_only=True)
ws = wb.active  # Gets the first active sheet
print("\n--- Using openpyxl ---")
# Find the cell with the value 'Bob'
found_cell = None
for row in ws.iter_rows(values_only=True):
    if 'Bob' in row:
        # 'row' is a tuple of cell values for that row
        # The index of 'Bob' in the tuple tells us the column
        bob_index = row.index('Bob')
        # Get the cell object using its coordinates
        found_cell = ws.cell(row=ws.max_row, column=bob_index + 1) # A bit complex to get the exact cell
        break
if found_cell:
    print(f"Found 'Bob' at cell address: {found_cell.coordinate}")
    print(f"Row number: {found_cell.row}, Column number: {found_cell.column}")
else:
    print("Could not find 'Bob'.")

Why pandas is usually better for "finding":

Readability: df[df['Column'] == 'Value'] is much clearer than iterating through rows.
Power: pandas is built for data manipulation and can handle millions of rows efficiently.
Simplicity: It abstracts away the complexity of cell coordinates.

Use openpyxl when you need to:

Read/write cell formatting (fonts, colors, borders).
Manipulate Excel objects like charts, images, or formulas.
Iterate through every single cell with fine-grained control.

Advanced: Find a Value and Return the Row Number

Sometimes you need the row number to update the file or for other logic. pandas makes this easy with .index.

import pandas as pd
df = pd.read_excel('sales_data.xlsx')
# Find all rows where 'Sales Rep' is 'Charlie'
charlie_sales = df[df['Sales Rep'] == 'Charlie']
# Get the index (row number) of these matches
# Note: Pandas index is 0-based, but Excel rows are 1-based
charlie_row_indices = charlie_sales.index.tolist()
print(f"\nFound Charlie's sales at pandas indices (0-based): {charlie_row_indices}")
# If you need 1-based Excel-style row numbers:
excel_row_numbers = [idx + 1 for idx in charlie_row_indices]
print(f"Found Charlie's sales at Excel row numbers (1-based): {excel_row_numbers}")
# Example: Get the full row for the first match using the index
if charlie_row_indices:
    first_match_full_row = df.loc[charlie_row_indices[0]]
    print("\nFull details of the first match:")
    print(first_match_full_row)

Summary: Which Method to Use?

Your Goal	Recommended Library	Why?
"Find all rows where X is Y"	`pandas`	Simple, readable, and highly efficient for this task.
"Find rows where X is Y and Z is W"	`pandas`	Easy to combine conditions with `&`, , .
"Find a partial text match"	`pandas`	The `.str.contains()` method is perfect for this.
"Find a value and get its cell address/row"	`pandas`	Use `.index` to get row numbers.
"Find a value and change its formatting"	`openpyxl`	`pandas` doesn't handle cell formatting. `openpyxl` gives you direct access.
"Iterate through every single cell in a sheet"	`openpyxl`	`ws.iter_rows()` is designed for this low-level task.

Python如何高效查找Excel数据？

The Setup: Installing Libraries

Using `pandas` (Recommended)

Scenario A: Find a Value in a Specific Column

Scenario B: Find a Value Based on Multiple Conditions

Scenario C: Find a Value Using `str.contains()` (Partial Match)

Scenario D: Find a Value and Get a Specific Cell's Information

Using `openpyxl` (More Granular Control)

Advanced: Find a Value and Return the Row Number

Summary: Which Method to Use?

99ANYc3cd6

Networkx怎么用？基础操作指南？

Java中equals重写要注意哪些关键点？

mapinfo视频教程

Java如何读取XML文件内容？

ANSYS 13.0教程如何快速上手？

Java Socket编程实例具体怎么实现？

Python imresize函数怎么用？

Java Date如何获取当前日期？

仓库ERP系统怎么入门？

Redis Java 序列化怎么选？

mapinfo视频教程

米6线刷开发版教程视频怎么做？

access数据库连接java

Java字符串如何转为char数组？

Python如何计算特征值？

Java 如何获取 request 对象？

Python如何高效查找Excel数据？

The Setup: Installing Libraries

Using pandas (Recommended)

Scenario A: Find a Value in a Specific Column

Scenario B: Find a Value Based on Multiple Conditions

Scenario C: Find a Value Using str.contains() (Partial Match)

Scenario D: Find a Value and Get a Specific Cell's Information

Using openpyxl (More Granular Control)

Advanced: Find a Value and Return the Row Number

Summary: Which Method to Use?

相关推荐

Java Socket编程实例具体怎么实现？

Using `pandas` (Recommended)

Scenario C: Find a Value Using `str.contains()` (Partial Match)

Using `openpyxl` (More Granular Control)