杰瑞科技汇

Python groupby level如何按层级分组聚合?

Of course! The term "groupby level" in Python almost exclusively refers to grouping by the levels of a MultiIndex (also known as a hierarchical index) using the powerful Pandas library.

Python groupby level如何按层级分组聚合?-图1
(图片来源网络,侵删)

Let's break it down from the basics to advanced usage.

What is a MultiIndex?

A MultiIndex is a way of having multiple index levels on a single axis (either rows or columns). It's like having a spreadsheet with grouped rows or columns.

Imagine you have sales data for different products across several cities and years. A MultiIndex can organize this beautifully:

Sales Data
├── Product: 'A'
│   ├── City: 'New York'
│   │   └── Year: 2025 -> 100
│   │   └── Year: 2025 -> 120
│   └── City: 'London'
│       └── Year: 2025 -> 80
│       └── Year: 2025 -> 90
└── Product: 'B'
    ├── City: 'New York'
    │   └── Year: 2025 -> 150
    │   └── Year: 2025 -> 170
    └── City: 'London'
        └── Year: 2025 -> 110
        └── Year: 2025 -> 125

In Pandas, this would look like:

Python groupby level如何按层级分组聚合?-图2
(图片来源网络,侵删)
import pandas as pd
import numpy as np
# Create a MultiIndex
index = pd.MultiIndex.from_product([
    ['Product A', 'Product B'],
    ['New York', 'London'],
    [2025, 2025]
], names=['Product', 'City', 'Year'])
# Create a DataFrame with the MultiIndex
data = np.random.randint(50, 200, size=8)
df = pd.DataFrame(data, index=index, columns=['Sales'])
print(df)

Output:

                     Sales
Product   City    Year     
Product A New York 2025   152
                 2025   178
          London   2025   101
                 2025   190
Product B New York 2025   119
                 2025   164
          London   2025   188
                 2025   134

The level Parameter in groupby()

When you call .groupby() on a DataFrame or Series with a MultiIndex, you need to tell Pandas which level(s) of the index to group by. This is what the level parameter is for.


Key 1: Grouping by a Single Level

You can group by a specific level using its name or its integer position (0 for the first level, 1 for the second, etc.).

Example 1: Group by Level Name ('Product')

Let's group by the 'Product' level to get the total sales for each product across all cities and years.

Python groupby level如何按层级分组聚合?-图3
(图片来源网络,侵删)
# Group by the 'Product' level and sum the sales
sales_by_product = df.groupby(level='Product').sum()
print(sales_by_product)

Output:

          Sales
Product         
Product A   621
Product B   605

Example 2: Group by Level Position (level=1)

Now, let's group by the second level, which is 'City' (index position 1).

# Group by the second level (index=1) and sum the sales
sales_by_city = df.groupby(level=1).sum()
print(sales_by_city)

Output:

             Sales
City              
London         513
New York       713

Key 2: Grouping by Multiple Levels

You can also group by more than one level at a time. To do this, pass a list of level names or positions to the level parameter.

Example: Group by 'Product' and 'City'

Let's get the total sales for each product in each city.

# Group by 'Product' and 'City' and sum the sales
sales_by_product_city = df.groupby(level=['Product', 'City']).sum()
print(sales_by_product_city)

Output:

                      Sales
Product   City             
Product A London         291
          New York       330
Product B London         322
          New York       283

Notice that the result also has a MultiIndex, reflecting the levels we grouped by.


Key 3: Grouping by All Levels

If you want to group by every level in the index, you can pass the special argument level=df.index.names. This effectively groups each unique combination of index values together, which is equivalent to not grouping at all in terms of the groups formed, but it's useful if you want to apply an aggregation function to each "cell" defined by the MultiIndex.

# Group by all levels in the index
# This will create a group for each unique row
grouped_by_all = df.groupby(level=df.index.names).sum()
print(grouped_by_all)

Output:

                      Sales
Product   City    Year     
Product A London   2025   101
                 2025   190
          New York 2025   152
                 2025   178
Product B London   2025   188
                 2025   134
          New York 2025   119
                 2025   164

This is functionally the same as the original df because sum() on a single number returns the number itself. The real power comes when you use this in combination with .agg().


Key 4: Grouping by Level with .agg() for Multiple Aggregations

This is where groupby(level) becomes incredibly powerful. You can perform different aggregation functions on different columns.

Let's add a 'Profit' column to our DataFrame to demonstrate.

df['Profit'] = np.random.randint(10, 50, size=8)
print(df)

Output:

                     Sales  Profit
Product   City    Year             
Product A New York 2025   152      23
                 2025   178      18
          London   2025   101      42
                 2025   190      14
Product B New York 2025   119      49
                 2025   164      11
          London   2025   188      46
                 2025   134      32

Now, let's group by 'City' and calculate the total 'Sales' and the average 'Profit' for each city.

# Group by 'City' and aggregate Sales with sum and Profit with mean
city_summary = df.groupby(level='City').agg({
    'Sales': 'sum',
    'Profit': 'mean'
})
print(city_summary)

Output:

             Sales  Profit
City                      
London         513  33.500
New York       713  25.250

Key 5: Grouping by Level on Columns

Everything we've done so far applies to column levels as well. You just need to use the axis=1 parameter.

Let's create a DataFrame with a MultiIndex for columns.

# Create a DataFrame with a MultiIndex for columns
columns = pd.MultiIndex.from_product([
    ['Q1', 'Q2'],
    ['Revenue', 'Expenses']
], names=['Quarter', 'Metric'])
data_col = {
    'Product A': [100, 50, 120, 60],
    'Product B': [150, 70, 170, 80]
}
df_col = pd.DataFrame(data_col, index=columns)
print(df_col)

Output:

Metric      Expenses  Revenue
Quarter                        
Q1            Product A     50      100
                      Product B     70      150
Q2            Product A     60      120
                      Product B     80      170

Now, let's group by the 'Metric' level of the columns and calculate the total for each metric across all products and quarters.

# Group by the 'Metric' level of the columns (axis=1)
# and sum the values
total_by_metric = df_col.groupby(axis=1, level='Metric').sum()
print(total_by_metric)

Output:

Metric   Expenses  Revenue
Product A      110      220
Product B      150      320

Summary

Task Code Example
Group by single level (name) df.groupby(level='Product').sum()
Group by single level (position) df.groupby(level=1).sum()
Group by multiple levels df.groupby(level=['Product', 'City']).sum()
Group by all levels df.groupby(level=df.index.names).sum()
Aggregate multiple columns by level `df.groupby(level='City').agg({'Sales':'
分享:
扫描分享到社交APP
上一篇
下一篇