杰瑞科技汇

Python rolling ma如何实现与优化?

Of course! Calculating a rolling moving average (MA) is a common task in data analysis, especially in time series data like stock prices, sensor readings, or sales figures.

Python rolling ma如何实现与优化?-图1
(图片来源网络,侵删)

In Python, the most efficient and standard way to do this is by using the Pandas library. I'll show you the best methods, from the simplest to more advanced scenarios.

The Best & Most Common Way: Using Pandas

Pandas is the go-to library for data manipulation in Python. Its rolling() method is specifically designed for this kind of calculation.

Setup: First, install Pandas if you haven't already

pip install pandas

Example: Basic Rolling Mean

Let's create a simple Pandas Series and calculate a 3-day rolling average.

import pandas as pd
import numpy as np
# 1. Create a sample time series DataFrame
# Let's imagine daily sales data for a week
data = {'sales': [10, 12, 15, 13, 18, 20, 22]}
dates = pd.date_range(start='2025-01-01', periods=7)
df = pd.DataFrame(data, index=dates)
print("Original Data:")
print(df)
print("-" * 30)
# 2. Calculate the rolling mean
# We use .rolling(window=3) to specify the window size
# .mean() calculates the average for that window
df['rolling_3day_mean'] = df['sales'].rolling(window=3).mean()
print("Data with 3-day Rolling Mean:")
print(df)

Output:

Python rolling ma如何实现与优化?-图2
(图片来源网络,侵删)
Original Data:
            sales
2025-01-01     10
2025-01-02     12
2025-01-03     15
2025-01-04     13
2025-01-05     18
2025-01-06     20
2025-01-07     22
------------------------------
Data with 3-day Rolling Mean:
            sales  rolling_3day_mean
2025-01-01     10                NaN  # Not enough data points (less than 3)
2025-01-02     12                NaN  # Not enough data points
2025-01-03     15           12.333333  # (10+12+15)/3
2025-01-04     13           13.333333  # (12+15+13)/3
2025-01-05     18           15.333333  # (15+13+18)/3
2025-01-06     20           17.000000  # (13+18+20)/3
2025-01-07     22           20.000000  # (18+20+22)/3

Key Parameters of .rolling()

  • window: The size of the moving window (e.g., window=7 for a weekly average on daily data).
  • min_periods: The minimum number of observations required to have a non-NaN result. This is very useful.
    • The default is min_periods = window. This's why the first two values were NaN.
    • If you set min_periods=1, the calculation will start as soon as there's one value in the window.
  • center: If True, the labels are set at the center of the window. This is often useful for plotting.
    • Default is False, so the window is right-aligned (uses past values).

Example with min_periods=1 and center=True:

# Calculate a 3-day rolling mean, but start calculating as soon as we have 1 data point
# and center the window.
df['rolling_centered'] = df['sales'].rolling(window=3, min_periods=1, center=True).mean()
print("\nData with Centered Rolling Mean (min_periods=1):")
print(df)

Output:

Data with Centered Rolling Mean (min_periods=1):
            sales  rolling_3day_mean  rolling_centered
2025-01-01     10                NaN          11.000000  # (10+12)/2 (centered)
2025-01-02     12                NaN          12.333333  # (10+12+15)/3 (centered)
2025-01-03     15           12.333333          13.333333  # (12+15+13)/3 (centered)
2025-01-04     13           13.333333          15.333333  # (15+13+18)/3 (centered)
2025-01-05     18           15.333333          17.000000  # (13+18+20)/3 (centered)
2025-01-06     20           17.000000          20.000000  # (18+20+22)/3 (centered)
2025-01-07     22           20.000000          21.000000  # (20+22)/2 (centered)

Different Types of Moving Averages

Pandas' rolling() object can calculate more than just the simple mean.

  • Simple Moving Average (SMA): The average we just did. df['col'].rolling(w).mean()
  • Weighted Moving Average (WMA): Gives more weight to recent observations.
  • Exponentially Weighted Moving Average (EWMA): Gives exponentially decreasing weights to past observations. This is very common in finance (e.g., for calculating volatility).

Example: Exponentially Weighted Moving Average (EWMA)

# The `span` parameter controls the decay.
# A span of n is equivalent to an window of size n for a simple moving average in terms of smoothing.
df['ewma_3span'] = df['sales'].ewm(span=3, adjust=False).mean()
print("\nData with EWMA (span=3):")
print(df)

Output:

Python rolling ma如何实现与优化?-图3
(图片来源网络,侵删)
Data with EWMA (span=3):
            sales  rolling_3day_mean  rolling_centered    ewma_3span
2025-01-01     10                NaN          11.000000    10.000000
2025-01-02     12                NaN          12.333333    11.000000
2025-01-03     15           12.333333          13.333333    12.333333
2025-01-04     13           13.333333          15.333333    12.777778
2025-01-05     18           15.333333          17.000000    14.444444
2025-01-06     20           17.000000          20.000000    16.370370
2025-01-07     22           20.000000          21.000000    18.370370

Alternative (Less Common) Method: Using NumPy

For very large datasets or when you can't use Pandas, you can use NumPy. It's generally less convenient because it requires you to handle the NaN values at the beginning of the array manually.

import numpy as np
# Our data as a NumPy array
data = np.array([10, 12, 15, 13, 18, 20, 22])
window_size = 3
# Calculate the rolling mean using NumPy
# We use a list comprehension to create the sliding windows
rolling_means = np.convolve(data, np.ones(window_size)/window_size, mode='valid')
# The 'valid' mode returns only the values where the window fully overlaps the data.
# This results in a shorter array.
# To match the original data length, we prepend NaNs.
nan_padding = np.full(window_size - 1, np.nan)
rolling_means_full = np.concatenate([nan_padding, rolling_means])
print("NumPy Rolling Mean:")
print(rolling_means_full)

Output:

NumPy Rolling Mean:
[        nan         nan 12.33333333 13.33333333 15.33333333 17.
        nan 20.        ]

Note: The np.convolve method is a clever trick but can be less intuitive than the Pandas approach, especially for more complex calculations.


Summary: Which Method Should You Use?

Scenario Recommended Method Why?
Almost all cases pandas.Series.rolling() It's the most readable, flexible, and efficient. Handles time series data, dates, and edge cases (like min_periods) seamlessly.
Performance-critical, no Pandas numpy.convolve Can be faster in pure NumPy environments, but you lose the convenience of Pandas' data structures and time series handling.
Learning/Algorithmic challenge Manual Python Loop Good for understanding the mechanics, but never use in production code due to poor performance on large datasets.

Final Recommendation: Stick with Pandas. It's the industry standard for a reason and will save you a lot of time and effort.

分享:
扫描分享到社交APP
上一篇
下一篇