杰瑞科技汇

Python executemany如何高效批量执行SQL?

Of course! The term "execute many" in Python almost always refers to the concept of executing multiple SQL statements efficiently, typically against a database. The goal is to avoid the overhead of sending each query to the database one by one, which can be very slow.

Python executemany如何高效批量执行SQL?-图1
(图片来源网络,侵删)

The primary and most recommended way to do this in Python is by using the executemany() method provided by database cursor objects, which is part of the Python Database API Specification (DB-API 2.0). This is supported by most database drivers like psycopg2 (PostgreSQL), mysql-connector-python (MySQL), and sqlite3 (SQLite).

Let's break it down with clear examples.


The Problem: Why Not Just Loop?

Imagine you want to insert 1,000 new users into a database. The naive approach would be:

# BAD - DO NOT DO THIS
import psycopg2 # Example for PostgreSQL
conn = psycopg2.connect("your_connection_string")
cursor = conn.cursor()
users = [('user1', 'pass1'), ('user2', 'pass2'), ...] # 1000 tuples
for user in users:
    sql = "INSERT INTO users (username, password) VALUES (%s, %s);"
    cursor.execute(sql, user) # Execute one statement at a time
conn.commit()

This is inefficient because for each cursor.execute(), your Python application has to:

Python executemany如何高效批量执行SQL?-图2
(图片来源网络,侵删)
  1. Serialize the query and data.
  2. Send it over the network to the database.
  3. The database parses the query, creates a plan, and executes it.
  4. Send a response back.

For 1,000 users, this is 1,000 separate network round-trips. This is very slow.


The Solution: cursor.executemany()

The executemany() method is designed for exactly this scenario. It takes a single SQL statement with placeholders and a sequence (like a list) of parameter sequences (like a list of tuples). The database driver then sends the data to the database server in a single, optimized operation.

The syntax is: cursor.executemany(sql, seq_of_parameters)

Key Features:

  • Efficiency: Dramatically reduces network overhead.
  • Transactions: By default, executemany() is performed as a single transaction. If any single insert fails, the entire operation is rolled back, ensuring your data remains consistent.
  • Flexibility: It works for INSERT, UPDATE, and DELETE statements.

Concrete Examples

Let's look at how to use executemany() with different databases.

Python executemany如何高效批量执行SQL?-图3
(图片来源网络,侵删)

Example 1: SQLite (Built-in, no installation needed)

This is the simplest example to get started.

import sqlite3
# Connect to an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
# Create a table
cursor.execute('''
    CREATE TABLE products (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        quantity INTEGER
    )
''')
# --- executemany() in action ---
# 1. Define the SQL statement with placeholders (? for SQLite)
insert_sql = "INSERT INTO products (name, quantity) VALUES (?, ?)"
# 2. Define the data as a list of tuples
products_to_insert = [
    ('Laptop', 10),
    ('Mouse', 50),
    ('Keyboard', 30),
    ('Monitor', 25)
]
# 3. Execute the many statements
cursor.executemany(insert_sql, products_to_insert)
# 4. Commit the transaction
conn.commit()
# Verify the data
cursor.execute("SELECT * FROM products;")
print("Data after executemany:")
for row in cursor.fetchall():
    print(row)
# Clean up
conn.close()

Example 2: PostgreSQL (using psycopg2)

You'll need to install it first: pip install psycopg2-binary

import psycopg2
# --- IMPORTANT: Replace with your actual connection details ---
DB_NAME = "your_db"
DB_USER = "your_user"
DB_PASS = "your_password"
DB_HOST = "localhost"
DB_PORT = "5432"
try:
    conn = psycopg2.connect(
        dbname=DB_NAME,
        user=DB_USER,
        password=DB_PASS,
        host=DB_HOST,
        port=DB_PORT
    )
    cursor = conn.cursor()
    # Create a table (if it doesn't exist)
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS employees (
            id SERIAL PRIMARY KEY,
            name VARCHAR(100) NOT NULL,
            department VARCHAR(50),
            salary NUMERIC(10, 2)
        );
    ''')
    # Data to insert (list of tuples)
    employees_to_insert = [
        ('Alice', 'Engineering', 95000.00),
        ('Bob', 'Sales', 60000.00),
        ('Charlie', 'Engineering', 110000.00),
        ('Diana', 'HR', 65000.00)
    ]
    # SQL with placeholders (%s for psycopg2)
    insert_sql = "INSERT INTO employees (name, department, salary) VALUES (%s, %s, %s);"
    # Execute many
    cursor.executemany(insert_sql, employees_to_insert)
    # Commit the transaction
    conn.commit()
    print(f"{cursor.rowcount} records inserted successfully.")
    # Verify
    cursor.execute("SELECT name, department FROM employees;")
    print("\nEmployees added:")
    for row in cursor.fetchall():
        print(f"  - {row[0]} in {row[1]}")
except (Exception, psycopg2.Error) as error:
    print(f"Error while connecting to PostgreSQL or executing query: {error}")
    if conn:
        conn.rollback() # Roll back in case of error
finally:
    # Closing database connection.
    if conn:
        cursor.close()
        conn.close()
        print("\nPostgreSQL connection is closed.")

Advanced Use Case: executemany() with UPDATE

The same principle applies to updating multiple rows.

import sqlite3
# (Assuming the 'products' table from the first example exists)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
# ... (code to create table and insert initial data from Example 1) ...
# --- Using executemany() for UPDATE ---
# 1. Define the SQL statement
update_sql = "UPDATE products SET quantity = ? WHERE name = ?"
# 2. Define the data as a list of tuples
# Note the order: (new_value, condition_value)
updates_to_make = [
    (15, 'Laptop'),   # Set quantity of 'Laptop' to 15
    (55, 'Mouse'),    # Set quantity of 'Mouse' to 55
    (100, 'Keyboard') # Set quantity of 'Keyboard' to 100
]
# 3. Execute the many update statements
cursor.executemany(update_sql, updates_to_make)
# 4. Commit the transaction
conn.commit()
# Verify the data
cursor.execute("SELECT name, quantity FROM products;")
print("\nData after executemany update:")
for row in cursor.fetchall():
    print(f"  - {row[0]}: {row[1]}")
conn.close()

Performance Comparison

Let's see the dramatic difference in performance.

import sqlite3
import time
# Setup
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
cursor.execute("CREATE TABLE test_data (id INTEGER, value TEXT)")
# Generate 10,000 records to insert
data_to_insert = [(i, f'value_{i}') for i in range(10000)]
# --- Method 1: Loop with cursor.execute() ---
start_time = time.time()
for record in data_to_insert:
    cursor.execute("INSERT INTO test_data (id, value) VALUES (?, ?)", record)
conn.commit()
loop_time = time.time() - start_time
print(f"Loop with execute(): {loop_time:.4f} seconds")
# --- Method 2: cursor.executemany() ---
start_time = time.time()
cursor.executemany("INSERT INTO test_data (id, value) VALUES (?, ?)", data_to_insert)
conn.commit()
executemany_time = time.time() - start_time
print(f"executemany(): {executemany_time:.4f} seconds")
# --- Clean up ---
conn.close()
# Result will be something like:
# Loop with execute(): 1.2345 seconds
# executemany(): 0.0123 seconds
# executemany() is ~100x faster!

Summary: When to Use executemany()

Scenario Recommended Method Why?
Inserting/Updating a list of records cursor.executemany() Highly Recommended. Massive performance gain due to fewer network trips and efficient batching.
Executing the same query with different parameters one at a time cursor.executemany() Good choice if you have the data already in a list.
Executing a sequence of different SQL statements for loop with cursor.execute() executemany() is for one SQL statement with varying parameters. If each query is unique, you must loop.
Running a single SQL statement cursor.execute() The standard method for one-off queries.
分享:
扫描分享到社交APP
上一篇
下一篇