Of course! In Python, especially with libraries like NumPy and Pandas, "sum axis 1" is a very common operation. It refers to summing elements along the rows of a 2D array or DataFrame.
Let's break it down with clear examples for both libraries.
The Core Concept: Axis 0 vs. Axis 1
Imagine a 2D table (like a spreadsheet or a matrix):
| Column 0 | Column 1 | Column 2 | |
|---|---|---|---|
| Row 0 | 1 | 2 | 3 |
| Row 1 | 4 | 5 | 6 |
| Row 2 | 7 | 8 | 9 |
-
axis=0(Sum along the columns): You move down the columns.- Sum of Column 0:
1 + 4 + 7 = 12 - Sum of Column 1:
2 + 5 + 8 = 15 - Sum of Column 2:
3 + 6 + 9 = 18 - The result is a 1D array:
[12, 15, 18]
- Sum of Column 0:
-
axis=1(Sum along the rows): You move across the rows.- Sum of Row 0:
1 + 2 + 3 = 6 - Sum of Row 1:
4 + 5 + 6 = 15 - Sum of Row 2:
7 + 8 + 9 = 24 - The result is a 1D array:
[6, 15, 24]
- Sum of Row 0:
Mnemonic: axis=1 sums across the 1st dimension (the rows), leaving you with a sum for each row.
Using NumPy
NumPy is the fundamental library for numerical operations in Python. Its axis parameter is consistent and powerful.
Example:
import numpy as np
# Create a 2D NumPy array
data = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print("Original Array:")
print(data)
# Original Array:
# [[1 2 3]
# [4 5 6]
# [7 8 9]]
# Sum along axis 1 (sum each row)
row_sums = np.sum(data, axis=1)
print("\nSum along axis 1 (row sums):")
print(row_sums)
# Sum along axis 1 (row sums):
# [ 6 15 24]
Common NumPy Functions with axis=1
Most reduction functions in NumPy work the same way:
# Sum of squares for each row
row_sums_of_squares = np.sum(data**2, axis=1)
print("Sum of squares for each row:", row_sums_of_squares)
# Sum of squares for each row: [ 14 77 194]
# Mean of each row
row_means = np.mean(data, axis=1)
print("Mean of each row:", row_means)
# Mean of each row: [2. 5. 8.]
# Maximum value in each row
row_maxes = np.max(data, axis=1)
print("Max value in each row:", row_maxes)
# Max value in each row: [3 6 9]
Using Pandas
Pandas is built on top of NumPy and is designed for data manipulation, typically using DataFrames. The concept is identical, but the syntax is slightly different.
Example:
import pandas as pd
# Create a Pandas DataFrame
df = pd.DataFrame({
'A': [1, 4, 7],
'B': [2, 5, 8],
'C': [3, 6, 9]
})
print("Original DataFrame:")
print(df)
# Original DataFrame:
# A B C
# 0 1 2 3
# 1 4 5 6
# 2 7 8 9
# Sum along axis 1 (sum each row)
# In Pandas, axis=1 is the default for many row-wise operations,
# but it's good practice to be explicit.
row_sums = df.sum(axis=1)
print("\nSum along axis 1 (row sums):")
print(row_sums)
# Sum along axis 1 (row sums):
# 0 6
# 1 15
# 2 24
# dtype: int64
Notice that the result is a Pandas Series, with the original DataFrame's index preserved.
Common Pandas Methods with axis=1
# Add the row sums as a new column to the DataFrame
df['Row_Sum'] = df.sum(axis=1)
print("\nDataFrame with new 'Row_Sum' column:")
print(df)
# DataFrame with new 'Row_Sum' column:
# A B C Row_Sum
# 0 1 2 3 6
# 1 4 5 6 15
# 2 7 8 9 24
# Calculate the mean for each row
row_means = df.mean(axis=1)
print("\nMean of each row:")
print(row_means)
# Mean of each row:
# 0 2.0
# 1 5.0
# 2 8.0
# dtype: float64
Important: The axis=1 Trap for Beginners
A very common point of confusion is that axis=1 in Pandas often refers to columns, not rows.
df.drop(columns_to_drop, axis=1): This drops columns. You are acting on the columns.df.rename(new_column_names, axis=1): This renames columns.
This seems contradictory, but it makes sense if you think of axis as the axis along which to apply the operation. When you drop a column, you are applying the drop operation along the column axis (axis=1).
However, for aggregation functions like sum(), mean(), std(), etc., the NumPy convention holds:
axis=0: Aggregate down the rows (result is one value per column).axis=1: Aggregate across the columns (result is one value per row).
Summary Table
| Operation | NumPy (np.array) |
Pandas (pd.DataFrame) |
Result |
|---|---|---|---|
| Sum along rows | np.sum(data, axis=1) |
df.sum(axis=1) |
A 1D array / Series with sum for each row |
| Sum along columns | np.sum(data, axis=0) |
df.sum(axis=0) |
A 1D array / Series with sum for each column |
Key Takeaway: For aggregation (sum, mean, etc.), axis=1 means sum across the columns for each row. This is the standard in both NumPy and Pandas.
