杰瑞科技汇

Python中pandas的shift函数如何实现数据偏移?

Of course! The pandas.DataFrame.shift() and pandas.Series.shift() methods are incredibly useful for time-series analysis and general data manipulation. They allow you to shift data by a specified number of periods, creating a "lagged" or "leading" version of your data.

Python中pandas的shift函数如何实现数据偏移?-图1
(图片来源网络,侵删)

Let's break it down with clear explanations and examples.

What is shift()?

At its core, shift() moves data up or down along the index (which is often a time index). The key thing to understand is that the index does not move. The data is just re-aligned with the existing index.


Basic Syntax

The method is available on both DataFrame and Series objects.

# For a Series
Series.shift(periods=1, freq=None, axis=0, fill_value=None)
# For a DataFrame
DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None)

Key Parameters:

  • periods (default: 1): The number of periods to shift. Can be positive (shift forward/down) or negative (shift backward/up).
  • freq (default: None): A DateOffset string or object (e.g., 'D' for day, 'M' for month). Used only with time-series data to shift by calendar time rather than row number.
  • axis (default: 0): The axis to shift along.
    • axis=0 or 'index': Shifts rows (default behavior).
    • axis=1 or 'columns': Shifts columns.
  • fill_value (default: None): The value to use to fill the new NaN values created by the shift.

Core Examples with a Series

Let's start with a simple Series.

Python中pandas的shift函数如何实现数据偏移?-图2
(图片来源网络,侵删)
import pandas as pd
import numpy as np
# Create a simple Series
s = pd.Series([10, 20, 30, 40, 50])
print("Original Series:")
print(s)

Output:

Original Series:
0    10
1    20
2    30
3    40
4    50
dtype: int64

Shifting Forward (Default)

Shifting forward by 1 period (periods=1) moves the data down. The first element is lost, and a NaN is introduced at the end.

# Shift forward by 1 period (default)
shifted_forward = s.shift(1)
print("\nShifted forward by 1:")
print(shifted_forward)

Output:

Shifted forward by 1:
0     NaN
1    10.0
2    20.0
3    30.0
4    40.0
dtype: float64

Shifting Backward

Shifting backward by 1 period (periods=-1) moves the data up. The last element is lost, and a NaN is introduced at the beginning.

Python中pandas的shift函数如何实现数据偏移?-图3
(图片来源网络,侵删)
# Shift backward by 1 period
shifted_backward = s.shift(-1)
print("\nShifted backward by 1:")
print(shifted_backward)

Output:

Shifted backward by 1:
0    20.0
1    30.0
2    40.0
3    50.0
4     NaN
dtype: float64

Using fill_value

The new NaN values can be problematic for calculations. The fill_value parameter solves this.

# Shift forward and fill new NaNs with 0
shifted_forward_filled = s.shift(1, fill_value=0)
print("\nShifted forward with fill_value=0:")
print(shifted_forward_filled)

Output:

Shifted forward with fill_value=0:
0    0
1   10
2   20
3   30
4   40
dtype: int64

Practical Use Case: Calculating Daily Returns

This is the most common use case for shift() in finance. To calculate the percentage change from one day to the next, you need the previous day's price.

# Create a time-series DataFrame
data = {'price': [100.0, 101.5, 101.2, 103.8, 105.0]}
dates = pd.to_datetime(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04', '2025-01-05'])
df = pd.DataFrame(data, index=dates)
print("Original DataFrame:")
print(df)
# Calculate daily return: (today_price / yesterday_price) - 1
# We need yesterday's price, which is the current price shifted forward by 1.
df['daily_return'] = (df['price'] / df['price'].shift(1)) - 1
print("\nDataFrame with Daily Returns:")
print(df)

Output:

Original DataFrame:
            price
2025-01-01  100.0
2025-01-02  101.5
2025-01-03  101.2
2025-01-04  103.8
2025-01-05  105.0
DataFrame with Daily Returns:
            price  daily_return
2025-01-01  100.0           NaN  # No previous day to compare
2025-01-02  101.5      0.015000
2025-01-03  101.2     -0.002955
2025-01-04  103.8      0.025692
2025-01-05  105.0      0.011174

Notice the NaN for the first row. This is correct because there's no prior day's data.


shift() with a DataFrame

shift() works on DataFrames too. By default, it shifts along the axis=0 (rows), shifting all columns in the same way.

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [10, 20, 30, 40],
    'C': [100, 200, 300, 400]
})
print("Original DataFrame:")
print(df)
# Shift all columns down by 1
print("\nDataFrame shifted down by 1 (axis=0):")
print(df.shift(1))

Output:

Original DataFrame:
   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300
3  4  40  400
DataFrame shifted down by 1 (axis=0):
     A     B     C
0  NaN   NaN   NaN
1  1.0  10.0  100.0
2  2.0  20.0  200.0
3  3.0  30.0  300.0

Shifting Along axis=1 (Columns)

You can also shift across columns. This is less common but can be useful for certain data manipulations.

# Shift columns to the right by 1
print("\nDataFrame shifted right by 1 (axis=1):")
print(df.shift(1, axis=1))

Output:

DataFrame shifted right by 1 (axis=1):
     A     B     C
0  NaN   1.0  10.0
1  NaN   2.0  20.0
2  NaN   3.0  30.0
3  NaN   4.0  40.0

shift() with Time Series (freq parameter)

When your index is a DatetimeIndex, you can use the freq parameter to shift by actual time units (days, months, etc.) instead of just by row number.

# Create a time-series with a daily frequency
ts = pd.Series(
    [1, 2, 3, 4],
    index=pd.to_datetime(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04'])
)
print("Original Time Series:")
print(ts)
# Shift by 1 calendar day
# Note: The index itself changes!
shifted_by_day = ts.shift(freq='1D')
print("\nShifted by 1 calendar day:")
print(shifted_by_day)
# Shift by 2 calendar days
shifted_by_2days = ts.shift(freq='2D')
print("\nShifted by 2 calendar days:")
print(shifted_by_2days)

Output:

Original Time Series:
2025-01-01    1
2025-01-02    2
2025-01-03    3
2025-01-04    4
dtype: int64
Shifted by 1 calendar day:
2025-01-02    1
2025-01-03    2
2025-01-04    3
2025-01-05    4
dtype: int64
Shifted by 2 calendar days:
2025-01-03    1
2025-01-04    2
2025-01-05    3
2025-01-06    4
dtype: int64

Key Difference with freq: When freq is used, the index values are shifted, not just the data. This is essential for aligning data with real-world time intervals, like shifting a monthly report to the end of the following month.


Summary Table

Goal Method Key Point
Create a lag (previous value) df['col'].shift(1) Most common use case. Introduces NaN at the start.
Create a lead (next value) df['col'].shift(-1) Introduces NaN at the end.
Fill new NaNs df['col'].shift(1, fill_value=0) Prevents NaN errors in calculations.
Calculate difference (delta) df['col'].diff() An alternative to shift() for simple differences. diff() = col - col.shift(1)
Shift rows in a DataFrame df.shift(1) Shifts all columns down by one row.
Shift columns in a DataFrame df.shift(1, axis=1) Shifts all columns to the right by one.
Shift by calendar time ts.shift(freq='1D') Changes the index itself. Requires a DatetimeIndex.

Final Tip: shift() vs. diff()

For calculating the difference between consecutive rows, df.diff() is a more direct and often more readable alternative to df - df.shift(1).

df['price_change'] = df['price'].diff(1) # Equivalent to df['price'] - df['price'].shift(1)
分享:
扫描分享到社交APP
上一篇
下一篇