pandas python xls-杰瑞科技汇

Of course! Working with Excel files (.xls and .xlsx) is a very common task in Python, and the pandas library makes it incredibly easy.

Here's a comprehensive guide covering everything from basic reading and writing to more advanced scenarios.

The Essential Tools: Pandas and OpenPyXL

First, you need to make sure you have the necessary libraries installed.

pandas: The core data manipulation library.
openpyxl: A required engine for reading and writing modern .xlsx files.
xlrd: A required engine for reading legacy .xls files.

You can install them all with a single command:

pip install pandas openpyxl xlrd

.xls: The older Excel format (pre-2007). You must have xlrd installed to read these.
.xlsx: The modern Excel format (2007 and later). You must have openpyxl installed to read/write these.

Reading Excel Files into a DataFrame

The primary function for reading an Excel file is pandas.read_excel().

Basic Reading

This is the simplest way to read the first worksheet of an Excel file.

import pandas as pd
# Specify the path to your Excel file
file_path = 'data.xlsx' 
# Read the first sheet into a DataFrame
df = pd.read_excel(file_path)
print(df.head())

Reading a Specific Sheet

An Excel file can contain multiple sheets. You can read a specific sheet by its name or index (0-based).

# By sheet name (recommended)
df_sheet_name = pd.read_excel(file_path, sheet_name='Sheet2')
# By sheet index (0 for the first sheet, 1 for the second, etc.)
df_sheet_index = pd.read_excel(file_path, sheet_name=1)

Reading Multiple Sheets

You can read multiple sheets at once. The result will be a dictionary of DataFrames.

# Read multiple sheets by name
sheets_dict = pd.read_excel(file_path, sheet_name=['Sheet1', 'Sheet3'])
# Access a specific DataFrame from the dictionary
df_sheet1 = sheets_dict['Sheet1']
# Read all sheets
all_sheets = pd.read_excel(file_path, sheet_name=None)

Specifying a Specific Range of Cells

Use the usecols and skiprows/nrows parameters to read only a portion of the sheet.

# Read columns 'A', 'C', and 'E' only
df_specific_cols = pd.read_excel(file_path, usecols=['A', 'C', 'E'])
# Read only the first 100 rows, skipping the first 5
df_specific_rows = pd.read_excel(file_path, skiprows=5, nrows=100)

Writing DataFrames to Excel Files

The primary function for writing a DataFrame to an Excel file is pandas.DataFrame.to_excel().

Basic Writing

This will create a new Excel file with the DataFrame. Warning: This will overwrite any existing file with the same name.

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Paris']}
df_to_write = pd.DataFrame(data)
# Write the DataFrame to an Excel file
# The 'index=False' argument prevents writing the DataFrame index as a column
df_to_write.to_excel('output.xlsx', index=False)

Writing to a Specific Sheet

If the output file already exists, you can write to a new sheet without overwriting the entire file.

# Create a second DataFrame
data2 = {'Product': ['A', 'B', 'C'],
         'Sales': [100, 150, 200]}
df_sales = pd.DataFrame(data2)
# Write the first DataFrame to 'Sheet1'
df_to_write.to_excel('sales_report.xlsx', sheet_name='Summary', index=False)
# Write the second DataFrame to 'Sheet2' without overwriting the file
# Use 'mode='a'' to append and 'engine='openpyxl'' to handle existing files
with pd.ExcelWriter('sales_report.xlsx', engine='openpyxl', mode='a') as writer:
    df_sales.to_excel(writer, sheet_name='SalesData', index=False)

Note: The ExcelWriter context manager is the recommended way to handle multiple sheets.

Handling Different Excel File Types (`.xls` vs. `.xlsx`)

As mentioned, the key is the engine.

For .xlsx files (Modern): openpyxl is the default and recommended engine.

# No engine argument needed, openpyxl is used by default
df = pd.read_excel('data.xlsx')

For .xls files (Legacy): You must explicitly specify the xlrd engine.

# You MUST specify the engine for .xls files
df_old = pd.read_excel('data.xls', engine='xlrd')

Advanced Options and Common Issues

Handling the Header Row

Sometimes your data doesn't start on the first row, or you have multiple header rows.

# The first row is data, so use it as the header
df = pd.read_excel('data.xlsx', header=0) # Default
# The first row is a title, the second is the header. Skip the first row.
df = pd.read_excel('data.xlsx', header=1) 
# Use the first two rows as a multi-level index (hierarchical columns)
df = pd.read_excel('data.xlsx', header=[0, 1])

Setting the Index Column

You can tell pandas to use a specific column from the Excel file as the DataFrame's index.

# Use the 'ID' column as the index
df = pd.read_excel('data.xlsx', index_col='ID')

Handling Data Types

You can specify data types for columns when reading, which can prevent issues with mixed-type columns.

df = pd.read_excel('data.xlsx', dtype={'Age': 'int32', 'Name': 'str'})

Dealing with Merged Cells

Merged cells are a common source of problems. pandas will fill the merged cells with the value from the top-left cell, which might not be what you want. There isn't a one-size-fits-all solution, but a common approach is to use the openpyxl engine directly to unmerge cells before reading with pandas.

Handling Missing Values

You can specify what values in the Excel file should be treated as NaN (Not a Number).

# Treat 'NA', 'N/A', 'null', and empty cells as NaN
df = pd.read_excel('data.xlsx', na_values=['NA', 'N/A', 'null'])

Complete Example: A Typical Workflow

Here is a script that reads data from an Excel file, performs a simple calculation, and writes the result to a new sheet in the same file.

Scenario:

financial_data.xlsx has a sheet named Transactions.
We want to calculate the total profit for each product.
We want to append a new sheet named Profit Summary to the same file.

import pandas as pd
# --- 1. READING DATA ---
try:
    # Read the 'Transactions' sheet from the Excel file
    # We assume the file has columns: 'Product', 'Revenue', 'Cost'
    df_transactions = pd.read_excel('financial_data.xlsx', sheet_name='Transactions')
    print("--- Original Data ---")
    print(df_transactions)
    # --- 2. DATA MANIPULATION ---
    # Calculate profit
    df_transactions['Profit'] = df_transactions['Revenue'] - df_transactions['Cost']
    # Group by product and sum the profits
    profit_summary = df_transactions.groupby('Product')['Profit'].sum().reset_index()
    print("\n--- Profit Summary ---")
    print(profit_summary)
    # --- 3. WRITING DATA BACK TO EXCEL ---
    # Use ExcelWriter to append the new sheet without overwriting the file
    # The 'mode='a'' means append, and 'engine='openpyxl'' is crucial for .xlsx files
    with pd.ExcelWriter('financial_data.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
        # We use 'replace' because we might be re-running the script and don't want duplicate sheets
        profit_summary.to_excel(writer, sheet_name='Profit Summary', index=False)
    print("\nSuccessfully appended 'Profit Summary' sheet to 'financial_data.xlsx'.")
except FileNotFoundError:
    print("Error: The file 'financial_data.xlsx' was not found.")
except ValueError as e:
    print(f"Error: {e}. Please check if the sheet name 'Transactions' exists.")

This example covers the most common use cases and provides a solid foundation for working with Excel files using pandas.

pandas python xls

The Essential Tools: Pandas and OpenPyXL

Reading Excel Files into a DataFrame

Basic Reading

Reading a Specific Sheet

Reading Multiple Sheets

Specifying a Specific Range of Cells

Writing DataFrames to Excel Files

Basic Writing

Writing to a Specific Sheet

Handling Different Excel File Types (`.xls` vs. `.xlsx`)

Advanced Options and Common Issues

Handling the Header Row

Setting the Index Column

Handling Data Types

Dealing with Merged Cells

Handling Missing Values

Complete Example: A Typical Workflow

99ANYc3cd6

aecc2025安装教程

Python中如何使用axis equal实现坐标轴等比例？

iPad越狱教程靠谱吗？

Java Map如何存储Object对象？

Wacom数位板怎么安装？新手必看教程来了！

Java Socket编程实例具体怎么实现？

MyEclipse如何配置Java环境？

Protel99se怎么汉化？汉化步骤是什么？

Java如何读取property文件？

Python insertbatch如何高效批量插入数据？

SolidWorks高级应用教程，如何高效掌握核心技巧？

Java Web与Java EE有何本质区别？

Java如何正确打开二进制文件？

Oracle基础教程PDF哪里找？新手如何学？

秋凉Lightroom教程，如何快速调出氛围感？

C程序设计实践教程答案哪里找？

pandas python xls

The Essential Tools: Pandas and OpenPyXL

Reading Excel Files into a DataFrame

Basic Reading

Reading a Specific Sheet

Reading Multiple Sheets

Specifying a Specific Range of Cells

Writing DataFrames to Excel Files

Basic Writing

Writing to a Specific Sheet

Handling Different Excel File Types (.xls vs. .xlsx)

Advanced Options and Common Issues

Handling the Header Row

Setting the Index Column

Handling Data Types

Dealing with Merged Cells

Handling Missing Values

Complete Example: A Typical Workflow

相关推荐

Java Socket编程实例具体怎么实现？

Handling Different Excel File Types (`.xls` vs. `.xlsx`)