杰瑞科技汇

Python中apply的axis参数到底该怎么用?

Of course! The axis parameter is one of the most fundamental and often confusing concepts in data manipulation with Python, especially when using libraries like NumPy and Pandas.

Python中apply的axis参数到底该怎么用?-图1
(图片来源网络,侵删)

Let's break it down with clear explanations, analogies, and code examples.

The Core Idea: Rows vs. Columns

The simplest way to understand axis is to think of it as specifying the direction of operation.

  • axis=0: Operate along the rows (vertically). This means the function will be applied down each column.
  • axis=1: Operate along the columns (horizontally). This means the function will be applied across each row.

NumPy's axis Parameter

In NumPy, axis is used with functions like sum(), mean(), std(), min(), max(), and in methods like np.apply_along_axis().

The Analogy: Spreadsheet

Imagine a 2D NumPy array as a spreadsheet.

Python中apply的axis参数到底该怎么用?-图2
(图片来源网络,侵删)
import numpy as np
# A 2x3 array (2 rows, 3 columns)
data = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print("Original Array:")
print(data)
# [[1 2 3]
#  [4 5 6]]

Example 1: axis=0 (Operation along rows)

When you use axis=0, you are telling NumPy to move down the columns.

Let's use np.sum(axis=0):

# Sum along axis=0 (down each column)
sum_axis_0 = np.sum(data, axis=0)
print("\nSum along axis=0:")
print(sum_axis_0)
# Output: [5 7 9]

What happened?

  • It took the first element of each row and summed them: 1 + 4 = 5
  • It took the second element of each row and summed them: 2 + 5 = 7
  • It took the third element of each row and summed them: 3 + 6 = 9
  • The result is a 1D array with the sum of each column.

Example 2: axis=1 (Operation along columns)

When you use axis=1, you are telling NumPy to move across the rows.

Python中apply的axis参数到底该怎么用?-图3
(图片来源网络,侵删)

Let's use np.sum(axis=1):

# Sum along axis=1 (across each row)
sum_axis_1 = np.sum(data, axis=1)
print("\nSum along axis=1:")
print(sum_axis_1)
# Output: [6 15]

What happened?

  • It took the first row and summed its elements: 1 + 2 + 3 = 6
  • It took the second row and summed its elements: 4 + 5 + 6 = 15
  • The result is a 1D array with the sum of each row.

Visual Summary for NumPy

Function axis=0 (Down Columns) axis=1 (Across Rows)
np.sum() Sums values in each column. Sums values in each row.
np.mean() Calculates the mean of each column. Calculates the mean of each row.
np.max() Finds the maximum value in each column. Finds the maximum value in each row.
np.min() Finds the minimum value in each column. Finds the minimum value in each row.

Pandas' axis Parameter

Pandas uses the same axis logic, but its context is a DataFrame (a labeled 2D table). The confusion often arises because the output shape can be different.

The Analogy: DataFrame

Imagine a Pandas DataFrame. The labels are important.

import pandas as pd
# A DataFrame with 2 rows and 3 columns
df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4],
    'C': [5, 6]
}, index=['row1', 'row2'])
print("Original DataFrame:")
print(df)
#         A  B  C
# row1    1  3  5
# row2    2  4  6

Example 1: axis=0 (Operation along rows)

In Pandas, axis=0 almost always refers to operating on the labels of the rows (i.e., it works down the columns). This is the default for many aggregation functions.

Let's use df.sum(axis=0):

# Sum along axis=0 (default)
sum_axis_0 = df.sum(axis=0)
print("\nSum along axis=0:")
print(sum_axis_0)
# Output:
# A    3
# B    7
# C   11
# dtype: int64

What happened?

  • It summed the values in each column (A, B, C).
  • The result is a Pandas Series, where the index of the Series is the column name of the original DataFrame.

Example 2: axis=1 (Operation along columns)

axis=1 means operating on the labels of the columns (i.e., it works across the rows).

Let's use df.sum(axis=1):

# Sum along axis=1
sum_axis_1 = df.sum(axis=1)
print("\nSum along axis=1:")
print(sum_axis_1)
# Output:
# row1     9
# row2    12
# Name: sum, dtype: int64

What happened?

  • It summed the values in each row (row1, row2).
  • The result is a Pandas Series, where the index of the Series is the index (row label) of the original DataFrame.

Common Pandas axis Use Cases

Method axis=0 (Default) axis=1
df.sum() Sums each column. Returns a Series of column sums. Sums each row. Returns a Series of row sums.
df.drop() Drops a specified column label. Drops a specified row index.
df.mean() Calculates the mean of each column. Calculates the mean of each row.
df.sort() Sorts the DataFrame by the values in a column. Sorts the DataFrame by its row index.
df.dropna() Drops rows that contain any missing values. Drops columns that contain any missing values.

The "Axis of Evil" Trick

If you're ever stuck, remember this simple trick for Pandas:

axis=1 is for droping a column. Both have the number 1 in them.

This is a helpful mnemonic because df.drop('column_name', axis=1) is a very common operation.


Summary Table

Library axis=0 axis=1
NumPy Operates down the columns. Reduces rows. Operates across the rows. Reduces columns.
Pandas Operates on column labels. Works down the rows. Default for most aggregations. Operates on row labels. Works across the columns.

The key takeaway is that axis defines the direction of your operation. Whether you're summing, dropping, or applying a function, axis=0 means you're moving vertically (column-wise), and axis=1 means you're moving horizontally (row-wise).

分享:
扫描分享到社交APP
上一篇
下一篇