杰瑞科技汇

Python如何合并索引?

Of course! The phrase "combine index" in Python can mean several different things depending on the context. It usually refers to combining data based on some kind of index or key.

Python如何合并索引?-图1
(图片来源网络,侵删)

Let's break down the most common scenarios, from the most fundamental to more specialized cases.

Combining Lists (or other iterables) by Index

This is the most literal interpretation: you have multiple lists, and you want to pair up elements that share the same index.

Example: You have a list of names and a list of ages, and you want to create a list of (name, age) tuples.

names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
# The goal: [('Alice', 25), ('Bob', 30), ('Charlie', 35)]

Method 1: Using zip() (Most Pythonic and Recommended)

The zip() function is designed for exactly this. It takes multiple iterables and returns an iterator that aggregates elements from each iterable. It stops when the shortest iterable is exhausted.

Python如何合并索引?-图2
(图片来源网络,侵删)
combined = list(zip(names, ages))
print(combined)
# Output: [('Alice', 25), ('Bob', 30), ('Charlie', 35)]

You can also use a list comprehension with enumerate if you need the index itself.

# If you also need the index (e.g., 0, 1, 2)
combined_with_index = [(i, name, age) for i, (name, age) in enumerate(zip(names, ages))]
print(combined_with_index)
# Output: [(0, 'Alice', 25), (1, 'Bob', 30), (2, 'Charlie', 35)]

Method 2: Using a Manual for Loop

This is more verbose but helps understand the logic.

combined_manual = []
for i in range(len(names)): # Assumes lists are of the same length
    combined_manual.append((names[i], ages[i]))
print(combined_manual)
# Output: [('Alice', 25), ('Bob', 30), ('Charlie', 35)]

Combining Pandas DataFrames on an Index

This is a very common task in data analysis. You have two DataFrames and you want to join them based on their index values.

Example: You have one DataFrame with user info and another with their scores. The index is the user_id.

import pandas as pd
# DataFrame 1: User Info
df_info = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'city': ['New York', 'London', 'Paris']
}, index=[101, 102, 103])
# DataFrame 2: Scores
df_scores = pd.DataFrame({
    'score': [88, 92, 95],
    'attempts': [1, 3, 2]
}, index=[101, 103, 104]) # Note: user 102 is missing, user 104 is new

Method 1: pd.join() (Most Common for Indexes)

This method is specifically designed to join a DataFrame to another on their indexes.

# How='inner' keeps only matching indexes (101, 103)
# How='outer' keeps all indexes (101, 102, 103, 104) with NaN for mismatches
# How='left' keeps all indexes from the left DataFrame (101, 102, 103)
# How='right' keeps all indexes from the right DataFrame (101, 103, 104)
combined_df = df_info.join(df_scores, how='outer')
print(combined_df)

Output:

       name      city  score  attempts
101    Alice  New York   88.0       1.0
102      Bob    London    NaN       NaN
103  Charlie     Paris   95.0       2.0
104     NaN       NaN   NaN       NaN

Method 2: pd.merge() (More General Purpose)

merge is the most flexible function in pandas. You can join on columns or indexes.

# To merge on index, use left_index=True and right_index=True
combined_df_merge = pd.merge(df_info, df_scores, left_index=True, right_index=True, how='outer')
print(combined_df_merge)

This produces the same result as join. join is essentially a convenient, specialized version of merge.


Combining DataFrames on a Column (Index-like Key)

Often, you don't join on the DataFrame's index but on a specific column that acts as a key. This is conceptually very similar to combining lists by index.

Example: Now, the user_id is a column, not the index.

import pandas as pd
# DataFrame 1: User Info (user_id is a column)
df_info_col = pd.DataFrame({
    'user_id': [101, 102, 103],
    'name': ['Alice', 'Bob', 'Charlie'],
    'city': ['New York', 'London', 'Paris']
})
# DataFrame 2: Scores (user_id is a column)
df_scores_col = pd.DataFrame({
    'user_id': [101, 103, 104],
    'score': [88, 95, 76],
    'attempts': [1, 2, 4]
})

Method: pd.merge() (The Standard for Column Joins)

You specify the key column using the on parameter.

# Inner join (default)
inner_joined = pd.merge(df_info_col, df_scores_col, on='user_id')
print("Inner Join:")
print(inner_joined)

Output (Inner Join):

   user_id   name      city  score  attempts
0      101  Alice  New York     88         1
1      103 Charlie     Paris     95         2
# Outer join
outer_joined = pd.merge(df_info_col, df_scores_col, on='user_id', how='outer')
print("\nOuter Join:")
print(outer_joined)

Output (Outer Join):

   user_id     name      city  score  attempts
0    101.0    Alice  New York   88.0       1.0
1    102.0      Bob    London    NaN       NaN
2    103.0  Charlie     Paris   95.0       2.0
3    104.0      NaN       NaN   76.0       4.0

Combining Indexes in a MultiIndex DataFrame

This is a more advanced topic. A "MultiIndex" (or hierarchical index) allows you to have multiple levels of indexing.

Example: You have data for sales over several years, and you want a hierarchical index of Year and Quarter.

import pandas as pd
# Data
sales_data = [100, 120, 90, 110, 130, 125]
quarters = ['Q1', 'Q2', 'Q3', 'Q4', 'Q1', 'Q2']
years = [2025, 2025, 2025, 2025, 2025, 2025]
# Create a MultiIndex from the lists
index = pd.MultiIndex.from_arrays([years, quarters], names=['Year', 'Quarter'])
# Create a Series with the MultiIndex
sales_series = pd.Series(sales_data, index=index)
print(sales_series)

Output:

Year  Quarter
2025  Q1        100
      Q2        120
      Q3         90
      Q4        110
2025  Q1        130
      Q2        125
dtype: int64

Here, you've "combined" the years and quarters lists to create a more powerful, structured index. You can then use .loc to select data from this hierarchical index.

# Select all data for 2025
print(sales_series.loc[2025])
# Select Q1 for all years
print(sales_series.loc[:, 'Q1'])

Summary: Which one to use?

Your Goal Best Python Tool Key Idea
Pair elements from multiple lists by position zip(list1, list2) Creates an iterator of tuples.
Join two DataFrames using their index df1.join(df2) or pd.merge(..., left_index=True) Aligns rows based on matching index values.
Join two DataFrames using a common column pd.merge(df1, df2, on='key_column') Aligns rows based on matching values in a specified column.
Create a hierarchical index from lists pd.MultiIndex.from_arrays([list1, list2]) Combines lists to form a multi-level index for complex data structures.
分享:
扫描分享到社交APP
上一篇
下一篇