杰瑞科技汇

Python如何对OHLC数据进行重采样?

Of course! Resampling OHLC (Open, High, Low, Close) data is a fundamental task in financial analysis with Python. The primary library for this is pandas, which has a powerful built-in .resample() method.

Python如何对OHLC数据进行重采样?-图1
(图片来源网络,侵删)

Here’s a complete guide covering:

  1. Why Resample OHLC Data? (Common use cases)
  2. The Core Challenge: Why you can't just use a standard aggregation function like mean().
  3. The Solution: Using .resample() with custom aggregation functions.
  4. Complete Code Examples: From creating sample data to performing various resampling tasks.
  5. Alternative (More Flexible) Method: Using pd.Grouper.

Why Resample OHLC Data?

Traders and analysts often work with data at different frequencies. Resampling allows you to convert data from one time frame to another.

  • From Lower to Higher Frequency (Upsampling):

    Convert 1-minute data to 10-second data. This usually involves forward-filling or interpolating values, as there isn't always a trade in every 10-second bucket.

  • From Higher to Lower Frequency (Downsampling):
    • Convert 1-minute data to 5-minute data. This is the most common use case. You need to calculate the Open, High, Low, and Close for each new 5-minute interval based on the original 1-minute data within it.
    • Convert hourly data to daily data.
    • Convert daily data to weekly or monthly data.

The Core Challenge: Aggregation is Not Simple

If you have a list of numbers and want to find the average, you just sum them and divide by the count. OHLC data is different.

Python如何对OHLC数据进行重采样?-图2
(图片来源网络,侵删)
  • Open: The price of the first trade in the new period.
  • Close: The price of the last trade in the new period.
  • High: The maximum price reached during the new period.
  • Low: The minimum price reached during the new period.

A simple mean() or sum() doesn't make sense for these columns. You must apply specific aggregation functions to each column.


The Solution: resample().agg()

The pandas solution is a two-step process:

  1. .resample(): This object groups your time series data into bins (e.g., 5-minute bins).
  2. .agg(): This method applies one or more aggregation functions to each column of the grouped data.

You provide a dictionary to .agg() where the keys are the column names and the values are the aggregation functions to use.

Complete Code Examples

Let's walk through a full example.

Python如何对OHLC数据进行重采样?-图3
(图片来源网络,侵删)

Step 1: Setup and Create Sample Data

First, let's install pandas if you haven't already and create some sample 1-minute OHLC data.

pip install pandas
import pandas as pd
import numpy as np
# Create a date range for our sample data
# Let's create 1-minute data for one business day
date_rng = pd.date_range(start='2025-10-26 09:30:00', end='2025-10-26 16:00:00', freq='1min')
# Create a DataFrame with random OHLC data
# In a real scenario, you would load this from a CSV or API
np.random.seed(42) # for reproducibility
n = len(date_rng)
ohlc_data = pd.DataFrame({
    'open': np.random.uniform(150, 155, n),
    'high': np.random.uniform(155, 160, n),
    'low': np.random.uniform(148, 153, n),
    'close': np.random.uniform(151, 157, n),
    'volume': np.random.randint(1000, 10000, n)
}, index=date_rng)
# Ensure high is always >= open, close, low and low is always <= open, close, high
ohlc_data['high'] = ohlc_data[['open', 'high', 'close']].max(axis=1)
ohlc_data['low'] = ohlc_data[['open', 'low', 'close']].min(axis=1)
print("--- Original 1-Minute Data ---")
print(ohlc_data.head())

Step 2: Resample to 5-Minute Bars (Downsampling)

This is the most common and important operation. We want to create 5-minute OHLC bars from our 1-minute data.

# Define the aggregation rules for each column
agg_rules = {
    'open': 'first',      # The first 'open' in the 5-min period
    'high': 'max',        # The highest 'high' in the 5-min period
    'low': 'min',         # The lowest 'low' in the 5-min period
    'close': 'last',      # The last 'close' in the 5-min period
    'volume': 'sum'       # The sum of all volumes in the 5-min period
}
# Resample the data to 5-minute intervals
five_min_bars = ohlc_data.resample('5T').agg(agg_rules)
print("\n--- Resampled 5-Minute Data ---")
print(five_min_bars.head())

Explanation of Aggregation Functions:

  • 'first' for open: Gets the first value of the open column within each 5-minute bin.
  • 'last' for close: Gets the last value of the close column.
  • 'max' for high: Gets the maximum value.
  • 'min' for low: Gets the minimum value.
  • 'sum' for volume: Sums up all the trades in the period.

Common Time Aliases for Resampling:

  • T or min for minutes
  • H or h for hours
  • D for calendar days
  • B for business days (Mon-Fri)
  • W for weekly (Sunday)
  • M for month-end
  • Y for year-end

Step 3: Resample to Daily Bars

The process is identical, you just change the resampling frequency.

# Resample the data to daily (business day) intervals
daily_bars = ohlc_data.resample('B').agg(agg_rules)
print("\n--- Resampled Daily Data ---")
print(daily_bars.head())

Step 4: Resample to Hourly Bars

# Resample the data to hourly intervals
hourly_bars = ohlc_data.resample('H').agg(agg_rules)
print("\n--- Resampled Hourly Data ---")
print(hourly_bars.head())

Alternative Method: Using pd.Grouper

The .resample() method is concise and perfect for regular time series. However, for more complex grouping (e.g., grouping by business month end, or grouping by a specific timezone), pd.Grouper is a more flexible and powerful alternative.

The syntax is slightly different but achieves the same result.

# Define the aggregation rules
agg_rules = {
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}
# Use pd.Grouper to group by 5-minute intervals
# key='...' tells Grouper which column to use (the index in this case)
# freq='...' specifies the frequency
five_min_bars_grouper = ohlc_data.groupby(pd.Grouper(freq='5T')).agg(agg_rules)
print("\n--- 5-Minute Data using pd.Grouper ---")
print(five_min_bars_grouper.head())

When to use which?

  • Use .resample() for most time-based downsampling. It's idiomatic pandas and very readable.
  • Use pd.Grouper() when you need more flexibility, such as grouping by multiple criteria (e.g., date and a symbol column) or using more complex frequencies.

Summary and Best Practices

  1. Always use .agg() with a dictionary when resampling OHLC data. This is the correct way to apply different functions to different columns.
  2. Standard Aggregations: first, last, max, min, sum are your primary tools.
  3. Handling Missing Data: If your data has gaps (e.g., no trades over a weekend), resample will create empty rows. You can fill them using .fillna() if needed. For example, to forward-fill the 'close' price: five_min_bars['close'] = five_min_bars['close'].fillna(method='ffill').
  4. Data Integrity: After resampling, it's good practice to ensure that high >= max(open, close) and low <= min(open, close) for each bar, as floating-point arithmetic or data errors can sometimes violate this.
分享:
扫描分享到社交APP
上一篇
下一篇