杰瑞科技汇

Python merge left 如何实现?

Of course! The term "merge left" in Python is most commonly associated with combining data from multiple sources, typically pandas DataFrames, by aligning them based on their columns (keys) and stacking them vertically.

Let's break down the concept, starting with the most common scenario and then covering other possibilities.


The Most Common Meaning: Vertical Stacking of DataFrames (Pandas)

When data professionals say "merge left," they are often referring to a specific type of database-style join. However, the phrase can also be used more informally to mean "stacking DataFrames on top of each other."

Let's clarify both.

Scenario A: The Informal Meaning - Stacking Vertically

This is the simplest interpretation. You have two or more DataFrames with the same (or similar) columns, and you want to combine them into one larger DataFrame. The pandas function for this is pd.concat().

Imagine you have two CSV files: sales_jan.csv and sales_feb.csv. You want to merge them into a single sales_data DataFrame.

Key Requirement: The DataFrames should have the same column names and data types for a clean merge.

Example:

import pandas as pd
# --- Sample Data ---
# DataFrame 1: January Sales
data_jan = {
    'OrderID': [101, 102, 103],
    'Product': ['Laptop', 'Mouse', 'Keyboard'],
    'Sales': [1200, 25, 75]
}
df_jan = pd.DataFrame(data_jan)
# DataFrame 2: February Sales
data_feb = {
    'OrderID': [104, 105, 106],
    'Product': ['Monitor', 'Laptop', 'Desk'],
    'Sales': [300, 1100, 200]
}
df_feb = pd.DataFrame(data_feb)
print("--- January Sales ---")
print(df_jan)
print("\n--- February Sales ---")
print(df_feb)
# --- The "Merge Left" (Stacking) Operation ---
# We use pd.concat with axis=0 to stack vertically (axis=1 would stack side-by-side)
merged_df = pd.concat([df_jan, df_feb], ignore_index=True)
print("\n--- Merged Sales Data (Stacked) ---")
print(merged_df)

Output:

--- January Sales ---
   OrderID   Product  Sales
0      101    Laptop   1200
1      102     Mouse     25
2      103  Keyboard     75
--- February Sales ---
   OrderID   Product  Sales
0      104   Monitor    300
1      105    Laptop   1100
2      106      Desk    200
--- Merged Sales Data (Stacked) ---
   OrderID   Product  Sales
0      101    Laptop   1200
1      102     Mouse     25
2      103  Keyboard     75
3      104   Monitor    300
4      105    Laptop   1100
5      106      Desk    200

Explanation:

  • pd.concat(): The pandas function for concatenating objects.
  • [df_jan, df_feb]: A list of the DataFrames to merge.
  • axis=0: This is the default, meaning to stack along the rows (vertically). You could omit it.
  • ignore_index=True: This is crucial. It resets the index of the new DataFrame, preventing duplicate index values (e.g., both original DataFrames start with index 0). Without it, the merged DataFrame would have a non-unique index.

Scenario B: The Formal Meaning - A Left Join

In database terminology, a "left join" is a specific type of merge. It combines two DataFrames based on a common key and keeps all rows from the left DataFrame, adding matching data from the right DataFrame where available.

The pandas function for this is pd.merge().

Key Requirement: You need to specify a common column (or key) to join on.

Example:

Let's say you have a products DataFrame and a sales DataFrame. You want to see all products and their sales data, but you also want to keep products that haven't been sold yet.

import pandas as pd
# --- Sample Data ---
# DataFrame 1: Products (the "left" DataFrame)
products_data = {
    'ProductID': ['P01', 'P02', 'P03', 'P04'],
    'ProductName': ['Laptop', 'Mouse', 'Keyboard', 'Webcam'],
    'Price': [1200, 25, 75, 150]
}
products_df = pd.DataFrame(products_data)
# DataFrame 2: Sales (the "right" DataFrame)
sales_data = {
    'SaleID': [1, 2, 3],
    'ProductID': ['P01', 'P02', 'P01'], # Note: P03 and P04 are not here
    'Quantity': [1, 2, 1]
}
sales_df = pd.DataFrame(sales_data)
print("--- Products (Left) ---")
print(products_df)
print("\n--- Sales (Right) ---")
print(sales_df)
# --- The "Merge Left" (Left Join) Operation ---
# We want all products, and their sales if they exist.
merged_left_df = pd.merge(
    left=products_df,
    right=sales_df,
    on='ProductID',  # The key to join on
    how='left'       # The type of join: 'left', 'right', 'inner', 'outer'
)
print("\n--- Merged Result (Left Join) ---")
print(merged_left_df)

Output:

--- Products (Left) ---
  ProductID ProductName  Price
0       P01      Laptop   1200
1       P02       Mouse     25
2       P03    Keyboard     75
3       P04      Webcam    150
--- Sales (Right) ---
   SaleID ProductID  Quantity
0       1       P01         1
1       2       P02         2
2       3       P01         1
--- Merged Result (Left Join) ---
  ProductID ProductName  Price  SaleID  Quantity
0       P01      Laptop   1200       1         1
1       P01      Laptop   1200       3         1
2       P02       Mouse     25       2         2
3       P03    Keyboard     75     NaN       NaN
4       P04      Webcam    150     NaN       NaN

Explanation:

  • pd.merge(): The function for database-style joins.
  • left=products_df: The "left" DataFrame. All its rows will be preserved.
  • right=sales_df: The "right" DataFrame. Only rows with matching keys will be added.
  • on='ProductID': The column name(s) to use as the key for the join. Pandas will look for matching values in this column in both DataFrames.
  • how='left': This is the critical parameter. It specifies that we want a left join.
    • Notice that Keyboard (P03) and Webcam (P04) from the left DataFrame are kept in the result.
    • Where there was no matching sale, the new columns (SaleID, Quantity) are filled with NaN (Not a Number), which is pandas' way of representing missing data.

Other Meanings of "Merge Left"

While pandas is the most common context, "merge left" could refer to other things:

Merging Dictionaries

You can merge dictionaries in Python. The term "left" isn't standard here, but you can think of it as updating a "left" dictionary with a "right" one.

  • dict.update(): Modifies the dictionary in-place.
  • {**d1, **d2} (Unpacking): Creates a new merged dictionary.
left_dict = {'name': 'Alice', 'city': 'New York'}
right_dict = {'city': 'Boston', 'country': 'USA'}
# Using the merge/update operator (Python 3.5+)
merged_dict = {**left_dict, **right_dict}
print(merged_dict)
# Output: {'name': 'Alice', 'city': 'Boston', 'country': 'USA'}
# The 'city' from the right_dict overwrote the one from the left_dict.

Merging Lists

For lists, "merge left" usually means concatenating them.

list1 = [1, 2, 3]
list2 = [4, 5, 6]
merged_list = list1 + list2
print(merged_list)
# Output: [1, 2, 3, 4, 5, 6]

Summary: Which one should you use?

Your Goal Python Tool Key Parameters
Stack two tables/DataFrames with the same columns on top of each other. pd.concat() [df1, df2], ignore_index=True
Combine two tables/DataFrames based on a common key, keeping all rows from the first (left) table. pd.merge() left=..., right=..., on='key', how='left'
Combine two dictionaries, with values from the second overwriting the first. {**d1, **d2} or d1.update(d2) N/A
Combine two lists into a single list. list1 + list2 N/A

In almost any data analysis context, if someone says "merge left," they are most likely referring to pd.merge(..., how='left'). If they just have two simple tables of the same shape, they might mean pd.concat().

分享:
扫描分享到社交APP
上一篇
下一篇