Python如何对OHLC数据进行重采样？-杰瑞科技汇

Of course! Resampling OHLC (Open, High, Low, Close) data is a fundamental task in financial analysis with Python. The primary library for this is pandas, which has a powerful built-in .resample() method.

（图片来源网络，侵删）

Here’s a complete guide covering:

Why Resample OHLC Data? (Common use cases)
The Core Challenge: Why you can't just use a standard aggregation function like mean().
The Solution: Using .resample() with custom aggregation functions.
Complete Code Examples: From creating sample data to performing various resampling tasks.
Alternative (More Flexible) Method: Using pd.Grouper.

Why Resample OHLC Data?

Traders and analysts often work with data at different frequencies. Resampling allows you to convert data from one time frame to another.

From Lower to Higher Frequency (Upsampling):
Convert 1-minute data to 10-second data. This usually involves forward-filling or interpolating values, as there isn't always a trade in every 10-second bucket.
From Higher to Lower Frequency (Downsampling):
- Convert 1-minute data to 5-minute data. This is the most common use case. You need to calculate the Open, High, Low, and Close for each new 5-minute interval based on the original 1-minute data within it.
- Convert hourly data to daily data.
- Convert daily data to weekly or monthly data.

The Core Challenge: Aggregation is Not Simple

If you have a list of numbers and want to find the average, you just sum them and divide by the count. OHLC data is different.

（图片来源网络，侵删）

Open: The price of the first trade in the new period.
Close: The price of the last trade in the new period.
High: The maximum price reached during the new period.
Low: The minimum price reached during the new period.

A simple mean() or sum() doesn't make sense for these columns. You must apply specific aggregation functions to each column.

The Solution: `resample().agg()`

The pandas solution is a two-step process:

.resample(): This object groups your time series data into bins (e.g., 5-minute bins).
.agg(): This method applies one or more aggregation functions to each column of the grouped data.

You provide a dictionary to .agg() where the keys are the column names and the values are the aggregation functions to use.

Complete Code Examples

Let's walk through a full example.

（图片来源网络，侵删）

Step 1: Setup and Create Sample Data

First, let's install pandas if you haven't already and create some sample 1-minute OHLC data.

pip install pandas

import pandas as pd
import numpy as np
# Create a date range for our sample data
# Let's create 1-minute data for one business day
date_rng = pd.date_range(start='2025-10-26 09:30:00', end='2025-10-26 16:00:00', freq='1min')
# Create a DataFrame with random OHLC data
# In a real scenario, you would load this from a CSV or API
np.random.seed(42) # for reproducibility
n = len(date_rng)
ohlc_data = pd.DataFrame({
    'open': np.random.uniform(150, 155, n),
    'high': np.random.uniform(155, 160, n),
    'low': np.random.uniform(148, 153, n),
    'close': np.random.uniform(151, 157, n),
    'volume': np.random.randint(1000, 10000, n)
}, index=date_rng)
# Ensure high is always >= open, close, low and low is always <= open, close, high
ohlc_data['high'] = ohlc_data[['open', 'high', 'close']].max(axis=1)
ohlc_data['low'] = ohlc_data[['open', 'low', 'close']].min(axis=1)
print("--- Original 1-Minute Data ---")
print(ohlc_data.head())

Step 2: Resample to 5-Minute Bars (Downsampling)

This is the most common and important operation. We want to create 5-minute OHLC bars from our 1-minute data.

# Define the aggregation rules for each column
agg_rules = {
    'open': 'first',      # The first 'open' in the 5-min period
    'high': 'max',        # The highest 'high' in the 5-min period
    'low': 'min',         # The lowest 'low' in the 5-min period
    'close': 'last',      # The last 'close' in the 5-min period
    'volume': 'sum'       # The sum of all volumes in the 5-min period
}
# Resample the data to 5-minute intervals
five_min_bars = ohlc_data.resample('5T').agg(agg_rules)
print("\n--- Resampled 5-Minute Data ---")
print(five_min_bars.head())

Explanation of Aggregation Functions:

'first' for open: Gets the first value of the open column within each 5-minute bin.
'last' for close: Gets the last value of the close column.
'max' for high: Gets the maximum value.
'min' for low: Gets the minimum value.
'sum' for volume: Sums up all the trades in the period.

Common Time Aliases for Resampling:

T or min for minutes
H or h for hours
D for calendar days
B for business days (Mon-Fri)
W for weekly (Sunday)
M for month-end
Y for year-end

Step 3: Resample to Daily Bars

The process is identical, you just change the resampling frequency.

# Resample the data to daily (business day) intervals
daily_bars = ohlc_data.resample('B').agg(agg_rules)
print("\n--- Resampled Daily Data ---")
print(daily_bars.head())

Step 4: Resample to Hourly Bars

# Resample the data to hourly intervals
hourly_bars = ohlc_data.resample('H').agg(agg_rules)
print("\n--- Resampled Hourly Data ---")
print(hourly_bars.head())

Alternative Method: Using `pd.Grouper`

The .resample() method is concise and perfect for regular time series. However, for more complex grouping (e.g., grouping by business month end, or grouping by a specific timezone), pd.Grouper is a more flexible and powerful alternative.

The syntax is slightly different but achieves the same result.

# Define the aggregation rules
agg_rules = {
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}
# Use pd.Grouper to group by 5-minute intervals
# key='...' tells Grouper which column to use (the index in this case)
# freq='...' specifies the frequency
five_min_bars_grouper = ohlc_data.groupby(pd.Grouper(freq='5T')).agg(agg_rules)
print("\n--- 5-Minute Data using pd.Grouper ---")
print(five_min_bars_grouper.head())

When to use which?

Use .resample() for most time-based downsampling. It's idiomatic pandas and very readable.
Use pd.Grouper() when you need more flexibility, such as grouping by multiple criteria (e.g., date and a symbol column) or using more complex frequencies.

Summary and Best Practices

Always use .agg() with a dictionary when resampling OHLC data. This is the correct way to apply different functions to different columns.
Standard Aggregations: first, last, max, min, sum are your primary tools.
Handling Missing Data: If your data has gaps (e.g., no trades over a weekend), resample will create empty rows. You can fill them using .fillna() if needed. For example, to forward-fill the 'close' price: five_min_bars['close'] = five_min_bars['close'].fillna(method='ffill').
Data Integrity: After resampling, it's good practice to ensure that high >= max(open, close) and low <= min(open, close) for each bar, as floating-point arithmetic or data errors can sometimes violate this.

Python如何对OHLC数据进行重采样？

Why Resample OHLC Data?

The Core Challenge: Aggregation is Not Simple

The Solution: `resample().agg()`

Complete Code Examples

Step 1: Setup and Create Sample Data

Step 2: Resample to 5-Minute Bars (Downsampling)

Step 3: Resample to Daily Bars

Step 4: Resample to Hourly Bars

Alternative Method: Using `pd.Grouper`

Summary and Best Practices

99ANYc3cd6

Python多线程如何高效实现并发任务？

java 使用javascript

java byte integer

labeledpoint python

java nio selector

androidkiller安装教程

NumPy 2.7版本是否仍兼容与更新？

Java核心技术卷1基础知识PDF哪里能找到？

Java webservice如何高效连接数据库？

XP装双系统视频教程，步骤详细吗？

Math.random()如何生成随机数及范围控制？

wsdl2java如何生成客户端代码？

Python守护线程，如何正确使用与终止？

JavaScript高级教程，如何突破进阶瓶颈？

int to string python

Eclipse Maven Java项目如何配置与运行？

Python如何对OHLC数据进行重采样？

Why Resample OHLC Data?

The Core Challenge: Aggregation is Not Simple

The Solution: resample().agg()

Complete Code Examples

Step 1: Setup and Create Sample Data

Step 2: Resample to 5-Minute Bars (Downsampling)

Step 3: Resample to Daily Bars

Step 4: Resample to Hourly Bars

Alternative Method: Using pd.Grouper

Summary and Best Practices

相关推荐

androidkiller安装教程

The Solution: `resample().agg()`

Alternative Method: Using `pd.Grouper`