Python时间序列分析有哪些常用方法？-杰瑞科技汇

Of course! "Python timeseries" is a broad topic, but it's a fundamental skill in data science, finance, and many scientific fields. Here’s a comprehensive guide covering the essential concepts, libraries, and a practical workflow.

（图片来源网络，侵删）

What is a Time Series?

A time series is a sequence of data points recorded or indexed in time order. The most common example is a stock's price over time, but it can be anything that's measured over intervals:

Daily, hourly, or minute-by-minute weather temperature
Monthly sales figures
Yearly population counts
Sensor readings from an IoT device
Audio signals

The key characteristic is that the order of the data points matters.

Core Concepts in Time Series Analysis

Before diving into code, it's crucial to understand these concepts:

Time-Based Indexing: The data must have a time-based index (e.g., a DateTime index in Pandas). This allows for powerful time-specific operations like resampling and slicing.
Trend: The long-term progression of the series (e.g., increasing sales over several years).
Seasonality: Periodic and cyclical patterns that repeat at fixed intervals (e.g., ice cream sales peaking every summer).
Stationarity: A time series is stationary if its statistical properties (mean, variance, autocorrelation) are constant over time. Most classic time series models (like ARIMA) assume the data is stationary. Many real-world series are not, so they must be transformed (e.g., by differencing or logging).
Autocorrelation: The correlation of a time series with its own past and future values. For example, today's stock price is likely correlated with yesterday's price.

Key Python Libraries for Time Series

Here are the main tools you'll use, with a brief explanation of each.

（图片来源网络，侵删）

Library	Purpose	Key Features
Pandas	Core Data Manipulation	The foundation. Provides `DatetimeIndex`, powerful time-series slicing, resampling, rolling windows, and basic plotting.
NumPy	Numerical Computing	The engine under Pandas. Handles the n-dimensional arrays used to store time series data efficiently.
Matplotlib & Seaborn	Data Visualization	Matplotlib is the foundational plotting library. Seaborn provides high-level, aesthetically pleasing statistical plots.
Statsmodels	Statistical Modeling	The go-to library for classical time series analysis. Includes ARIMA, SARIMAX, seasonal decomposition, and statistical tests for stationarity.
Scikit-learn	Machine Learning	Used for applying machine learning models (like Random Forests or Gradient Boosting) to time series data, often after feature engineering.
Prophet	Forecasting (by Meta/FB)	A high-level library designed for business time series with strong seasonality and holiday effects. It's very easy to use and robust.
Darts	Advanced Forecasting	A modern library that provides a unified API for multiple forecasting models (including deep learning like N-BEATS, TFT, and Transformer models).

A Practical Time Series Workflow: From Data to Forecast

Let's walk through a complete example using Pandas and Matplotlib. We'll analyze and forecast a classic dataset: the number of international airline passengers per month.

Step 1: Setup and Data Loading

First, make sure you have the necessary libraries installed:

pip install pandas numpy matplotlib statsmodels

Now, let's load the data. The dataset is conveniently available in statsmodels.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
# Load the dataset
# The dataset is a classic example from Box & Jenkins (1976)
# It contains monthly totals of international airline passengers from 1949 to 1960.
df = sm.datasets.get_rdataset('AirPassengers').data
# Convert to a proper time series object
# The 'time' column is in a 'year-period' format (e.g., "1949-01")
df['date'] = pd.to_datetime(df['time'], format='%Y-%m')
df.set_index('date', inplace=True)
df.drop('time', axis=1, inplace=True)
# Rename the value column for clarity
df.rename(columns={'value': 'passengers'}, inplace=True)
print(df.head())
print("\nData Info:")
df.info()

Output:

（图片来源网络，侵删）

            passengers
date
1949-01-01         112
1949-02-01         118
1949-03-01         132
1949-04-01         129
1949-05-01         121
Data Info:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 144 entries, 1949-01-01 to 1960-12-01
Data columns (total 1 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   passengers  144 non-null    int64
dtypes: int64(1)
memory usage: 2.2 KB

Notice the DatetimeIndex. This is the key to all time series operations in Pandas.

Step 2: Visualization and Exploration

Always plot your data first to get a visual understanding.

# Plot the time series
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['passengers'], label='Monthly Passengers')'International Airline Passengers (1949-1960)')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.grid(True)
plt.show()

You will immediately see an upward trend and a clear yearly seasonality (the peaks repeat every 12 months).

Step 3: Decomposing the Time Series

We can use statsmodels to break the series into its constituent parts: trend, seasonality, and residuals (noise).

# Decompose the time series
# The period is 12 for yearly seasonality
decomposition = sm.tsa.seasonal_decompose(df['passengers'], model='multiplicative', period=12)
# Plot the decomposition
fig = decomposition.plot()
fig.set_size_inches(12, 8)
plt.show()

The plot will show three subplots:

Observed: The original data.
Trend: The long-term progression, showing a clear upward trend.
Seasonal: The repeating yearly pattern.
Residual: The "noise" left over after removing the trend and seasonality.

Step 4: Checking for Stationarity

Many models require stationary data. We can use the Augmented Dickey-Fuller (ADF) test from statsmodels.

Null Hypothesis (H0): The time series is non-stationary.
Alternative Hypothesis (H1): The time series is stationary.

If the p-value is less than a significance level (e.g., 0.05), we reject the null hypothesis and conclude the series is stationary.

from statsmodels.tsa.stattools import adfuller
def check_stationarity(timeseries):
    result = adfuller(timeseries, autolag='AIC')
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))
print("Stationarity Check on Original Data:")
check_stationarity(df['passengers'])

Output:

ADF Statistic: 0.815369
p-value: 0.991880
Critical Values:
    1%: -3.481
    5%: -2.886
    10%: -2.579

The p-value is very high (0.99), so we fail to reject the null hypothesis. The data is non-stationary.

Step 5: Making the Data Stationary (Differencing)

A common technique to make a series stationary is differencing—subtracting the previous observation from the current one.

# First-order differencing
df['passengers_diff'] = df['passengers'].diff().dropna()
# Plot the differenced data
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['passengers_diff'], label='Differenced Passengers')'Differenced Monthly Passengers')
plt.xlabel('Date')
plt.ylabel('Difference')
plt.legend()
plt.grid(True)
plt.show()
# Check stationarity again
print("\nStationarity Check on Differenced Data:")
check_stationarity(df['passengers_diff'].dropna())

The plot of the differenced data looks much more like random noise around zero. The ADF test should now confirm stationarity.

Step 6: Forecasting with a Model (ARIMA Example)

ARIMA (AutoRegressive Integrated Moving Average) is a classic model. The "I" stands for "Integrated," which refers to the differencing step we just performed.

We'll use auto_arima from the pmdarima library, which automatically finds the best parameters (p, d, q) for the model.

pip install pmdarima

from pmdarima import auto_arima
# Use auto_arima to find the best ARIMA model
# The model is already differenced once, so d=1. We can let auto_arima find p and q.
# We also account for seasonality with m=12
model = auto_arima(df['passengers'], 
                   seasonal=True, 
                   m=12, 
                   d=1,           # We already differenced once
                   D=1,           # Seasonal differencing
                   trace=True,
                   error_action='ignore',
                   suppress_warnings=True)
print(model.summary())

auto_arima will test various combinations and output the best model. For this data, it will likely find a SARIMAX(0,1,1)(0,1,1,12) model, which is a seasonal ARIMA model.

Step 7: Evaluating the Forecast

Now, let's use the trained model to make predictions and compare them to the actual data.

# Split data into training and testing sets
# Use the last 12 months for testing
train = df['passengers'].iloc[:-12]
test = df['passengers'].iloc[-12:]
# Re-fit the model on the training data
final_model = auto_arima(train, seasonal=True, m=12, trace=False, suppress_warnings=True)
# Make predictions
forecast = final_model.predict(n_periods=12)
# Create a DataFrame for the forecast
forecast_df = pd.DataFrame(forecast, index=test.index, columns=['Forecast'])
# Plot the results
plt.figure(figsize=(12, 6))
plt.plot(train.index, train, label='Training Data')
plt.plot(test.index, test, label='Actual Test Data')
plt.plot(forecast_df.index, forecast_df['Forecast'], label='Forecast', color='red')'Airline Passengers Forecast vs Actual')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.grid(True)
plt.show()

You will see a plot where the red forecast line closely follows the blue actual test data line, indicating a good model fit.

Advanced Topics

Deep Learning for Time Series: For complex patterns, models like LSTMs (Long Short-Term Memory networks) and Transformers are very powerful. Libraries like Darts, TensorFlow, and PyTorch are used for this.
Feature Engineering: Create new features from the time index, such as:
- df['month'] = df.index.month
- df['year'] = df.index.year
- df['day_of_week'] = df.index.dayofweek These can be fed into machine learning models like scikit-learn's RandomForestRegressor.

Prophet for Business Forecasts: If you have business data with holidays and known events, Prophet is incredibly simple and effective.

# from prophet import Prophet
# df_prophet = df.reset_index().rename(columns={'date': 'ds', 'passengers': 'y'})
# model = Prophet(seasonality_mode='multiplicative')
# model.fit(df_prophet)
# future = model.make_future_dataframe(periods=12, freq='M')
# forecast = model.predict(future)
# model.plot(forecast)

Python时间序列分析有哪些常用方法？

What is a Time Series?

Core Concepts in Time Series Analysis

Key Python Libraries for Time Series

A Practical Time Series Workflow: From Data to Forecast

Step 1: Setup and Data Loading

Step 2: Visualization and Exploration

Step 3: Decomposing the Time Series

Step 4: Checking for Stationarity

Step 5: Making the Data Stationary (Differencing)

Step 6: Forecasting with a Model (ARIMA Example)

Step 7: Evaluating the Forecast

Advanced Topics

99ANYc3cd6

LinkedList在Java中如何高效使用？

Python getcontent如何获取内容？

Java LinkedList如何高效操作？

Java Web开发PDF如何快速入门？

Maya变形金刚教程从零开始学吗？

Java Socket编程实例具体怎么实现？

如何在Python中使用ECharts？

Python insertbatch如何高效批量插入数据？

After Effects CS6教程从哪里学起？

Autocad建筑设计教程怎么学更高效？

Java内存溢出如何排查与解决？

Python中convertmillis函数如何使用？

Grasshopper与Python，如何高效协同工作？

Python myexception如何自定义异常类？

macOS Sierra教程，如何快速上手与实用技巧？

Android游戏开发教程从哪开始学？

Python时间序列分析有哪些常用方法？

What is a Time Series?

Core Concepts in Time Series Analysis

Key Python Libraries for Time Series

A Practical Time Series Workflow: From Data to Forecast

Step 1: Setup and Data Loading

Step 2: Visualization and Exploration

Step 3: Decomposing the Time Series

Step 4: Checking for Stationarity

Step 5: Making the Data Stationary (Differencing)

Step 6: Forecasting with a Model (ARIMA Example)

Step 7: Evaluating the Forecast

Advanced Topics

相关推荐

Java Socket编程实例具体怎么实现？