杰瑞科技汇

python list excel

Of course! Working with Python and Excel is a very common task, especially for data analysis, reporting, and automation. Here’s a comprehensive guide covering the most popular libraries, from simple reading/writing to more advanced operations.

We'll focus on two main libraries:

  1. openpyxl: Excellent for reading, writing, and modifying .xlsx files. It gives you fine-grained control over worksheets, cells, styles, and charts.
  2. pandas: The go-to library for data analysis. It's incredibly fast and powerful for loading an entire Excel sheet (or multiple sheets) directly into a DataFrame, performing operations, and then saving it back.

The Scenario: A Sample Excel File

Let's assume you have an Excel file named sales_data.xlsx with two sheets: January and February.

sales_data.xlsx Structure:

January Sheet
Product Sales Region Date
Laptop 1200 North 2025-01-15
Mouse 25 South 2025-01-20
Keyboard 75 East 2025-01-25
Monitor 300 West 2025-01-30
February Sheet
Product Sales Region Date
Laptop 1500 North 2025-02-10
Mouse 30 South 2025-02-12
Keyboard 80 East 2025-02-18
Webcam 50 West 2025-02-28

Method 1: openpyxl - For Detailed Workbook Manipulation

openpyxl is like a remote control for your Excel file. You can select specific cells, change formatting, insert rows/columns, and more.

Installation

pip install openpyxl

Example: Reading and Writing with openpyxl

from openpyxl import load_workbook
from openpyxl.styles import Font
from datetime import datetime
# --- READING DATA ---
# Load the existing workbook
wb = load_workbook('sales_data.xlsx')
# Select a specific sheet by name
sheet_jan = wb['January']
# Get the value of a specific cell (e.g., B2, which is '1200')
sales_value = sheet_jan['B2'].value
print(f"Sales value from B2: {sales_value}")
# Read all data from a sheet (ignoring the header)
data = []
for row in sheet_jan.iter_rows(min_row=2, values_only=True):
    data.append(row)
print("\nData from January sheet (without header):")
print(data)
# --- WRITING DATA ---
# Select the February sheet
sheet_feb = wb['February']
# Modify a specific cell value
sheet_feb['B2'] = 1550  # Update Laptop sales for February
# Add a new row at the end
new_row_data = ['Headphones', 60, 'Central', datetime(2025, 2, 25).date()]
sheet_feb.append(new_row_data)
# Add a header to a new sheet
if 'Summary' not in wb.sheetnames:
    wb.create_sheet('Summary')
sheet_summary = wb['Summary']
sheet_summary.append(['Total Sales', 'Total Products'])
sheet_summary['A2'] = '=SUM(January!B:B)' # You can even write Excel formulas!
# Apply styling (e.g., make the header bold)
for cell in sheet_jan[1]:  # Iterate through the first row
    cell.font = Font(bold=True)
# Save the workbook (this will overwrite the original file)
wb.save('sales_data_modified.xlsx')
print("\n'openpyxl' script finished. 'sales_data_modified.xlsx' has been created.")

Method 2: pandas - For Data Analysis

pandas is designed for data. It reads an entire sheet into a DataFrame, a powerful 2D table-like data structure, making it easy to perform calculations, filter, and group data.

Installation

pip install pandas openpyxl # pandas needs a library like openpyxl to read .xlsx files

Example: Reading and Analyzing with pandas

import pandas as pd
# --- READING DATA ---
# Read the 'January' sheet into a DataFrame
df_jan = pd.read_excel('sales_data.xlsx', sheet_name='January')
print("--- January Data ---")
print(df_jan)
print("\nData types:")
print(df_jan.dtypes)
# Read all sheets into a dictionary of DataFrames
all_sheets = pd.read_excel('sales_data.xlsx', sheet_name=None)
print("\n--- Available Sheets ---")
print(all_sheets.keys())
df_feb = all_sheets['February']
print("\n--- February Data ---")
print(df_feb)
# --- DATA ANALYSIS ---
# Calculate total sales for January
total_sales_jan = df_jan['Sales'].sum()
print(f"\nTotal Sales in January: {total_sales_jan}")
# Find the product with the highest sales in February
max_sales_feb = df_feb.loc[df_feb['Sales'].idxmax()]
print("\nProduct with the highest sales in February:")
print(max_sales_feb)
# Filter data: Get all sales from the 'North' region
north_sales = df_jan[df_jan['Region'] == 'North']
print("\nSales from the North region:")
print(north_sales)
# --- WRITING DATA ---
# Create a new summary DataFrame
summary_data = {
    'Month': ['January', 'February'],
    'Total Sales': [df_jan['Sales'].sum(), df_feb['Sales'].sum()]
}
df_summary = pd.DataFrame(summary_data)
# Write the new DataFrame to a new Excel file
# index=False prevents pandas from writing the DataFrame index as a column
df_summary.to_excel('sales_summary.xlsx', index=False)
print("\n'pandas' script finished. 'sales_summary.xlsx' has been created.")

Comparison: openpyxl vs. pandas

Feature openpyxl pandas
Primary Use Reading, writing, and modifying existing Excel workbooks. Data analysis, manipulation, and statistical operations.
Data Handling Works with individual cells, rows, and columns. Loads data row-by-row or cell-by-cell. Loads entire sheets into DataFrames, optimized for numerical data.
Performance Slower for large datasets as it's not a data analysis library. Extremely fast and efficient for large datasets.
Key Strength Fine-grained control: Styles, charts, images, formulas, complex layouts. Ease of analysis: Built-in functions for sum(), mean(), groupby(), merge(), etc.
Best For - Automating reports with specific formatting.
- Modifying templates.
- Adding charts or images.
- Cleaning and transforming data.
- Performing calculations and aggregations.
- Creating simple data exports.

Which One Should I Use?

  • Use pandas if:

    • Your main goal is to analyze the data in Excel.
    • You need to perform calculations, filter, or group the data.
    • You are exporting data from a program to Excel for others to view.
    • You are working with large datasets.
  • Use openpyxl if:

    • You need to preserve or modify the exact layout of an existing Excel file.
    • You need to apply specific formatting (colors, fonts, borders).
    • You need to insert charts, images, or formulas into the spreadsheet.
    • You are building a complex, template-based report.

Bonus: Combining Both (The Best of Both Worlds)

You can use pandas for the heavy data lifting and openpyxl for final formatting.

import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import Font, Alignment
# 1. Use pandas to create the data and write a draft file
df_summary = pd.DataFrame({
    'Report': ['Q1 Sales Summary'],
    'Total Sales': [15000],
    'Generated By': ['Python Automation']
})
df_summary.to_excel('final_report_draft.xlsx', index=False)
# 2. Use openpyxl to load the draft and apply professional formatting
wb = load_workbook('final_report_draft.xlsx')
ws = wb.active
# Style the header row
for cell in ws[1]:
    cell.font = Font(bold=True, color="FFFFFF")
    cell.fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
    cell.alignment = Alignment(horizontal="center")
# Style the main title
ws['A1'].font = Font(size=16, bold=True, name="Calibri")
ws['A1'].alignment = Alignment(horizontal="center")
# Save the final, beautifully formatted report
wb.save('final_report_styled.xlsx')
print("\nCombined script finished. 'final_report_styled.xlsx' has been created.")
分享:
扫描分享到社交APP
上一篇
下一篇