杰瑞科技汇

Python executemany如何高效批量插入数据?

Of course! Let's dive deep into executemany() in Python, a powerful method used with database cursors, especially when working with libraries like sqlite3, psycopg2 (PostgreSQL), and mysql-connector-python.

Python executemany如何高效批量插入数据?-图1
(图片来源网络,侵删)

What is executemany()?

executemany() is a method available on database cursor objects. Its primary purpose is to execute a single SQL statement multiple times, with different sets of parameters for each execution. This is the standard and most efficient way to perform bulk inserts or updates into a database.

Why Use executemany()? (The Benefits)

Imagine you need to insert 1,000 new users into your database.

The Inefficient Way (Using a Loop with execute()):

import sqlite3
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()
users = [
    ('Alice', 30),
    ('Bob', 25),
    ('Charlie', 35),
    # ... 997 more users
]
for user in users:
    cursor.execute("INSERT INTO users (name, age) VALUES (?, ?)", user)
conn.commit()
conn.close()

Problems with this approach:

Python executemany如何高效批量插入数据?-图2
(图片来源网络,侵删)
  1. High Overhead: Each cursor.execute() call involves a full round-trip to the database. For 1,000 users, that's 1,000 separate network/database calls. This is extremely slow.
  2. Transaction Overhead: Each statement is typically run in its own transaction (or a small transaction group), which is less efficient than a single large transaction.

The Efficient Way (Using executemany()):

import sqlite3
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()
users = [
    ('Alice', 30),
    ('Bob', 25),
    ('Charlie', 35),
    # ... 997 more users
]
# Execute the statement once for each tuple in the list
cursor.executemany("INSERT INTO users (name, age) VALUES (?, ?)", users)
conn.commit()
conn.close()

Benefits of this approach:

  1. Reduced Network Overhead: The database driver can often bundle these operations into a single, more efficient network call to the database server.
  2. Optimized Transactions: The entire batch of operations is typically wrapped in a single transaction, which is much faster for bulk operations.
  3. Cleaner Code: The code is more concise and clearly expresses the intent of performing a bulk operation.

How executemany() Works: The Parameter Sequence

The key to using executemany() is understanding its second argument, the seq_of_parameters. This argument must be a sequence (like a list or tuple) of sequences.

  • Outer Sequence: The list/tuple containing all the parameter sets.
  • Inner Sequences: Each individual item in the outer list must be a sequence (tuple or list) that contains the parameters for one execution of the SQL statement.

The SQL statement should contain placeholders ( for SQLite, %s for Psycopg2/MySQL, etc.). The number of placeholders must match the number of items in each inner sequence.

Python executemany如何高效批量插入数据?-图3
(图片来源网络,侵删)

Example with SQLite ( placeholders)

import sqlite3
# Setup
conn = sqlite3.connect(':memory:') # Use an in-memory DB for this example
cursor = conn.cursor()
cursor.execute("CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, price REAL)")
# Data to insert
products_to_add = [
    (1, 'Laptop', 1200.50),
    (2, 'Mouse', 25.99),
    (3, 'Keyboard', 75.00),
    (4, 'Monitor', 300.75)
]
# The SQL statement has 3 placeholders. Each inner tuple has 3 values.
cursor.executemany("INSERT INTO products (id, name, price) VALUES (?, ?, ?)", products_to_add)
conn.commit()
# Verify the data
cursor.execute("SELECT * FROM products")
print(cursor.fetchall())
# Output: [(1, 'Laptop', 1200.5), (2, 'Mouse', 25.99), (3, 'Keyboard', 75.0), (4, 'Monitor', 300.75)]
conn.close()

Example with Psycopg2 (PostgreSQL, %s placeholders)

# pip install psycopg2-binary
import psycopg2
# Setup (replace with your actual DB connection details)
conn = psycopg2.connect(
    dbname="mydb",
    user="myuser",
    password="mypassword",
    host="localhost"
)
cursor = conn.cursor()
# Data to insert
orders_to_add = [
    ('ORD-001', 'Alice', 150.75),
    ('ORD-002', 'Bob', 89.99),
    ('ORD-003', 'Charlie', 210.0)
]
# The SQL statement uses %s placeholders.
cursor.executemany("INSERT INTO orders (order_id, customer_name, total_amount) VALUES (%s, %s, %s)", orders_to_add)
conn.commit()
conn.close()

Advanced Usage: executemany() with a Custom Function

The executemany() method is not limited to simple lists of tuples. Its second argument can be any iterable. This means you can use a generator function, which is incredibly memory-efficient for processing very large datasets that don't fit in memory.

Let's create a generator that yields data from a large file.

import sqlite3
def generate_users_from_file(file_path):
    """A generator that yields user data from a file line by line."""
    with open(file_path, 'r') as f:
        for line in f:
            # Assuming each line is "name,age"
            name, age_str = line.strip().split(',')
            yield (name, int(age_str))
# --- Main Execution ---
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (name TEXT, age INTEGER)")
# Create a dummy file for demonstration
with open('users.txt', 'w') as f:
    f.write("David\n")
    f.write("Eve\n")
    f.write("Frank\n")
    # ... imagine millions of lines here
# Use the generator with executemany. No need to load the whole file into memory!
cursor.executemany("INSERT INTO users (name, age) VALUES (?, ?)", generate_users_from_file('users.txt'))
conn.commit()
conn.close()
print("Data inserted successfully.")

Error Handling in executemany()

What happens if one of the operations in the batch fails?

  • SQLite: By default, if one operation fails, the entire executemany() call stops. No further operations are performed, and the error is raised. The transaction is likely rolled back unless you've set conn.execute("PRAGMA journal_mode=WAL") or other specific settings.
  • Psycopg2 (PostgreSQL): It will try to execute all the operations. If one fails, it will continue with the rest. It will not raise an error by default. You must check the cursor.rowcount attribute or use cursor.execute("COMMIT") and catch errors to know if all succeeded.

Best Practice: For robust applications, wrap your database operations in a try...except...finally block.

try:
    conn = sqlite3.connect('my_database.db')
    cursor = conn.cursor()
    # ... setup table ...
    data = [ ... ] # Your large list of data
    cursor.executemany("INSERT INTO mytable (col1, col2) VALUES (?, ?)", data)
    conn.commit()
    print(f"Successfully inserted {cursor.rowcount} rows.")
except sqlite3.Error as e:
    print(f"Database error: {e}")
    # Rollback any changes in case of error
    conn.rollback()
finally:
    # Ensure the connection is always closed
    if conn:
        conn.close()

executemany() vs. executemany_batched()

Some modern database drivers (like psycopg3) offer a more advanced method called executemany_batched(). This is an optimization on top of executemany().

  • executemany(): Sends all parameters in a single network call. This can be too much data for one request if you have millions of rows.
  • executemany_batched(): Takes an additional batch_size argument. It internally splits the large list of parameters into smaller batches and sends them as multiple, smaller network calls. This avoids overwhelming the database server with a single, gigantic request.

Example with psycopg3:

# pip install psycopg[binary]
import psycopg
conn = psycopg.connect("dbname=test user=postgres")
cursor = conn.cursor()
data = [(i, f"item_{i}") for i in range(1, 100001)] # 100,000 items
# Send in batches of 1000
cursor.executemany_batched(
    "INSERT INTO items (id, name) VALUES (%s, %s)",
    data,
    batch_size=1000
)
conn.commit()
conn.close()

Summary

Feature executemany()
Purpose Efficiently execute a SQL statement multiple times with different parameters.
Best For Bulk inserts, updates, and deletes.
Key Argument A sequence (list/tuple) of sequences (tuples/lists of parameters).
Placeholders Use for SQLite, %s for Psycopg2/MySQL, etc.
Performance Far superior to looping with execute() due to fewer network calls and optimized transactions.
Flexibility Works with any iterable, including memory-efficient generators.
Error Handling Use try...except...finally blocks to manage transactions and connection closing.
Advanced For very large datasets, consider executemany_batched() in modern drivers like psycopg3.
分享:
扫描分享到社交APP
上一篇
下一篇