Python combinefirst如何合并数据？-杰瑞科技汇

Of course! The combine_first() method in pandas is a powerful tool for filling missing values in a DataFrame or Series using data from another DataFrame or Series.

（图片来源网络，侵删）

Let's break it down with clear explanations and examples.

What is `combine_first()`?

At its core, combine_first() is used to "patch" missing values (NaN) in the calling object (let's call it A) with values from another object (B).

The key rule is: If a value in A is missing (NaN), it is replaced with the corresponding value from B. If the value in A is not missing, it is kept as is.

The operation is performed element-wise, aligning the data based on their index (for Series) or index and columns (for DataFrames).

（图片来源网络，侵删）

`combine_first()` with Series

This is the simplest case. We have two Series, and we want to fill missing values in the first one using the second one.

How it Works:

Alignment: The two Series are aligned by their index.
Filling: For each index, if the value in the first Series is NaN, it's replaced by the value from the second Series at that same index. If the value in the first Series is not NaN, it remains unchanged.

Example:

Let's create two Series. s1 has some missing values, and s2 has the values we want to use to fill them.

import pandas as pd
import numpy as np
# Series with missing values
s1 = pd.Series([10, 20, np.nan, 40, np.nan], index=['a', 'b', 'c', 'd', 'e'])
print("Series s1 (original):")
print(s1)
print("-" * 30)
# Series to fill the missing values from
s2 = pd.Series([100, 200, 300, 400, 500], index=['a', 'c', 'e', 'f', 'g'])
print("Series s2 (filler):")
print(s2)
print("-" * 30)
# Use combine_first to fill missing values in s1 with values from s2
s_filled = s1.combine_first(s2)
print("Series s1 after combine_first(s2):")
print(s_filled)

Output:

Series s1 (original):
a    10.0
b    20.0
c     NaN
d    40.0
e     NaN
dtype: float64
------------------------------
Series s2 (filler):
a    100
c    300
e    500
f    400
g    500
dtype: int64
------------------------------
Series s1 after combine_first(s2):
a     10.0  # Not NaN in s1, kept as 10
b     20.0  # Not NaN in s1, kept as 20
c    300.0  # Was NaN in s1, filled with 300 from s2
d     40.0  # Not NaN in s1, kept as 40
e    500.0  # Was NaN in s1, filled with 500 from s2
dtype: float64

Explanation:

（图片来源网络，侵删）

s1['a'] is 10, so it's kept.
s1['b'] is 20, so it's kept.
s1['c'] is NaN, so it's replaced with s2['c'] which is 300.
s1['d'] is 40, so it's kept.
s1['e'] is NaN, so it's replaced with s2['e'] which is 500.

Notice that values in s2 for indices f and g are ignored because they don't have a corresponding index in s1.

`combine_first()` with DataFrames

This is where combine_first() becomes incredibly useful, especially for time-series data or data with aligned columns.

How it Works:

Alignment: The two DataFrames are aligned by both their index and columns.
Filling: For each cell at (row, col):
- If the value in the first DataFrame (A) is NaN, it's replaced by the value from the second DataFrame (B) at the same (row, col).
- If the value in A is not NaN, it is kept.
- If a column or index exists only in B, it is ignored.
- If a column or index exists only in A, the NaNs in that column/row will remain NaN (unless B also has that column/index).

Example:

Imagine df1 is our primary DataFrame with some missing data, and df2 is a secondary source with data we can use to fill the gaps.

# DataFrame with missing values
df1 = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [9, 10, 11, 12]
}, index=['row1', 'row2', 'row3', 'row4'])
print("DataFrame df1 (original):")
print(df1)
print("-" * 40)
# DataFrame to fill missing values from
df2 = pd.DataFrame({
    'A': [100, 200, 300, 400],
    'B': [500, 600, 700, 800],
    'D': [13, 14, 15, 16] # Column 'D' is only in df2
}, index=['row1', 'row2', 'row3', 'row4'])
print("DataFrame df2 (filler):")
print(df2)
print("-" * 40)
# Use combine_first to fill missing values in df1 with values from df2
df_filled = df1.combine_first(df2)
print("DataFrame df1 after combine_first(df2):")
print(df_filled)

Output:

DataFrame df1 (original):
         A    B   C
row1   1.0  5.0   9
row2   2.0  NaN  10
row3   NaN  NaN  11
row4   4.0  8.0  12
----------------------------------------
DataFrame df2 (filler):
        A    B   D
row1  100  500  13
row2  200  600  14
row3  300  700  15
row4  400  800  16
----------------------------------------
DataFrame df1 after combine_first(df2):
         A      B   C
row1   1.0    5.0   9
row2   2.0  600.0  10
row3  300.0  700.0  11
row4   4.0    8.0  12

Explanation:

Column 'A': df1['A']['row3'] was NaN, so it was filled with df2['A']['row3'] which is 300. Other values in df1['A'] were kept.
Column 'B': df1['B']['row2'] and df1['B']['row3'] were NaN, so they were filled with 600 and 700 from df2.
Column 'C': No values were NaN in df1, so the entire column is unchanged.
Column 'D': This column existed only in df2, so it was completely ignored in the result. The result only has the columns that were in df1.

Common Use Case: Forward-Filling Time Series

A classic use case for combine_first is to create a complete time series by combining two partially overlapping ones.

Let's say we have stock prices recorded by two different systems, and we want to merge them into a single, continuous series.

import pandas as pd
# Create two time series with some overlapping and non-overlapping data
dates = pd.to_datetime(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04', '2025-01-05'])
# System A has data for the beginning
prices_a = pd.Series([100, 102, np.nan, np.nan, np.nan], index=dates)
print("System A Prices:")
print(prices_a)
print("-" * 30)
# System B has data for the end
prices_b = pd.Series([np.nan, np.nan, 105, 107, 110], index=dates)
print("System B Prices:")
print(prices_b)
print("-" * 30)
# Combine them to get a continuous price series
combined_prices = prices_a.combine_first(prices_b)
print("Combined Price Series:")
print(combined_prices)

Output:

System A Prices:
2025-01-01    100.0
2025-01-02    102.0
2025-01-03      NaN
2025-01-04      NaN
2025-01-05      NaN
dtype: float64
------------------------------
System B Prices:
2025-01-01      NaN
2025-01-02      NaN
2025-01-03    105.0
2025-01-04    107.0
2025-01-05    110.0
dtype: float64
------------------------------
Combined Price Series:
2025-01-01    100.0
2025-01-02    102.0
2025-01-03    105.0
2025-01-04    107.0
2025-01-05    110.0
dtype: float64

This perfectly stitches the two time series together.

`combine_first()` vs. `fillna()`

It's important not to confuse combine_first() with fillna().

Feature	`combine_first(other)`	`fillna(value)`
Purpose	Fill missing values using data from another pandas object (Series or DataFrame).	Fill missing values using a scalar value, a dictionary, or a Series.
Data Source	A Series or DataFrame.	A single number, a list, a dict, or a Series.
Alignment	Aligns by index (and columns for DataFrames) from the `other` object.	For a Series, aligns by the index of the fill value Series. For a DataFrame, fills all `NaN` with the same scalar or column-specific values from a dict/Series.
Flexibility	High. Can intelligently patch data from a complex secondary source.	Lower. Primarily for simple, predefined fill values.

In short: Use fillna() when you have a simple rule like "fill all NaNs with 0" or "fill column 'A' with 99 and column 'B' with -1". Use combine_first() when you have another entire dataset and you want to use its values to intelligently patch the missing spots in your primary dataset.

Python combinefirst如何合并数据？

What is `combine_first()`?

`combine_first()` with Series

How it Works:

Example:

`combine_first()` with DataFrames

How it Works:

Example:

Common Use Case: Forward-Filling Time Series

`combine_first()` vs. `fillna()`

99ANYc3cd6

Linux下如何执行Python脚本文件？

sourceinsight教程

Python jieba库具体怎么用？

matlab2025b安装教程

Mac下Python运行报错怎么办？

androidkiller安装教程

sourceinsight破解版安装教程安全吗？

Java如何用JSON实现WebService交互？

Python 2.7.9 Windows如何安装配置？

Ajax如何调用Java WebService？

Java的Date与MySQL的Date如何正确转换？

Python affected rows如何获取？

Dreamweaver网页教程视频新手如何快速入门？

Python HTTP 401未授权如何解决？

SQL Server 2008视频教程哪里学？

Zend Framework 2 教程，如何快速上手？

Python combinefirst如何合并数据？

What is combine_first()?

combine_first() with Series

How it Works:

Example:

combine_first() with DataFrames

How it Works:

Example:

Common Use Case: Forward-Filling Time Series

combine_first() vs. fillna()

相关推荐

androidkiller安装教程

What is `combine_first()`?

`combine_first()` with Series

`combine_first()` with DataFrames

`combine_first()` vs. `fillna()`