杰瑞科技汇

Python bulkcreate如何高效批量创建对象?

Of course! The term "bulk create" in Python refers to the process of creating a large number of objects, records, or entries in an efficient way, typically to minimize database queries, network requests, or computational overhead.

Python bulkcreate如何高效批量创建对象?-图1
(图片来源网络,侵删)

The best method for "bulk create" depends entirely on what you are creating:

  1. Bulk Database Creation: Inserting many rows into a database.
  2. Bulk Object Creation: Creating many instances of a Python class in memory.
  3. Bulk File Creation: Creating many files.

Let's break down the most common scenarios.


Bulk Database Creation (Most Common Use Case)

When you need to insert hundreds or thousands of records into a database, doing it one by one is extremely slow. This is because each INSERT statement is a separate transaction that has to be communicated over the network.

The solution is to use a bulk insert operation provided by your database library.

Python bulkcreate如何高效批量创建对象?-图2
(图片来源网络,侵删)

A. Django ORM (Databases like PostgreSQL, MySQL, SQLite)

Django's ORM has a bulk_create() method that is highly optimized for this. It takes a list of model instances and translates them into a single INSERT statement (or a small number of them), drastically reducing the number of database queries.

How it works:

  1. Create a list of unsaved model instances.
  2. Pass the list to MyModel.objects.bulk_create().

Example:

import os
import django
from django.db import models
# --- Setup for a minimal Django environment (for demonstration) ---
# In a real Django project, you wouldn't need this.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')
django.setup()
# --- Your Django Model ---
class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    stock = models.IntegerField()
    def __str__(self):
        return self.name
# --- Bulk Creation ---
if __name__ == '__main__':
    print("Creating products...")
    # 1. Create a list of unsaved model instances
    products_to_create = [
        Product(name="Laptop", price=1200.00, stock=50),
        Product(name="Mouse", price=25.50, stock=200),
        Product(name="Keyboard", price=75.00, stock=150),
        # ... imagine thousands more
    ]
    # 2. Perform the bulk create operation
    # This results in a single INSERT query for all 3 products.
    created_objects = Product.objects.bulk_create(products_to_create)
    print(f"Successfully created {len(created_objects)} products.")
    # Note: bulk_create does not update the `id` field of the objects in the list
    # until the query is executed. The returned `created_objects` list contains
    # the instances with their new IDs.
    # For very large lists, you can use `batch_size` to break it into chunks
    # (e.g., 1000 objects per query).
    # Product.objects.bulk_create(large_list, batch_size=1000)

Why bulk_create is better:

Python bulkcreate如何高效批量创建对象?-图3
(图片来源网络,侵删)
  • Performance: It's orders of magnitude faster than a loop with obj.save().
  • Atomicity (optional): You can set atomic=False to have each batch be a separate transaction, which can be faster but less safe.

B. SQLAlchemy (SQLAlchemy Core and ORM)

SQLAlchemy provides several methods for bulk operations. insert() is for raw SQL, while Session.bulk_save_objects() is for ORM objects.

Using Session.bulk_save_objects() (ORM):

from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    price = Column(Float)
# Setup
engine = create_engine('sqlite:///products.db') # Using SQLite for example
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
# --- Bulk Creation ---
products_to_create = [
    Product(name="Laptop", price=1200.00),
    Product(name="Mouse", price=25.50),
    Product(name="Keyboard", price=75.00),
]
# Use bulk_save_objects for a highly efficient "INSERT ... VALUES (...), (...), (...)" operation
session.bulk_save_objects(products_to_create, return_defaults=True) # return_defaults fetches new IDs
session.commit()
print("Bulk products saved using SQLAlchemy ORM.")
session.close()

Using insert() (Core, more direct):

This is even faster as it bypasses the ORM's identity map and other overhead.

from sqlalchemy import insert
# ... (setup from the previous example) ...
# Create a list of dictionaries
data_to_insert = [
    {'name': 'Monitor', 'price': 300.00},
    {'name': 'Webcam', 'price': 80.00},
]
# Execute a single, multi-row insert statement
stmt = insert(Product).values(data_to_insert)
session.execute(stmt)
session.commit()
print("Bulk products saved using SQLAlchemy Core.")
session.close()

Bulk Object Creation (In Memory)

If you just need to create many Python objects and don't need to save them to a database, you want the most memory-efficient way.

A. List Comprehension (Fast & Pythonic)

This is the standard, readable, and fast way to create a list of objects.

class User:
    def __init__(self, username, email):
        self.username = username
        self.email = email
# Create 1000 User objects
users = [User(f"user_{i}", f"user{i}@example.com") for i in range(1000)]
print(f"Created {len(users)} user objects in memory.")

B. Generator Expression (Memory Efficient for Iteration)

If you don't need to store all objects in a list at once, but just need to iterate over them, a generator is perfect. It doesn't consume memory for the entire list.

def generate_users(count):
    for i in range(count):
        yield User(f"user_{i}", f"user{i}@example.com")
# Iterate over the users one by one without storing them all
for user in generate_users(1000000): # Can handle a very large number
    # do something with user
    if user.username == "user_42":
        print("Found user 42!")
        break # Stop iteration when done

Bulk File Creation

If you need to create many files, the key is to be efficient and avoid unnecessary I/O operations.

A. Using a with statement and a Loop

This is a clean and safe way to create multiple files. The with statement ensures each file is properly closed.

import os
# Create a directory to hold the files if it doesn't exist
output_dir = "logs"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
# Create 10 log files
for i in range(10):
    filename = os.path.join(output_dir, f"log_{i}.txt")
    # The 'with' statement handles opening and closing the file automatically
    with open(filename, 'w') as f:
        f.write(f"This is log file number {i}.\n")
        f.write("It contains some sample data.\n")
print(f"Created 10 files in the '{output_dir}' directory.")

Summary: Which one should you use?

Your Goal Recommended Method Why?
Insert many records into a database Django ORM: Model.objects.bulk_create()
SQLAlchemy: session.bulk_save_objects() or session.execute(insert())
Performance. It drastically reduces the number of database queries, making it 10x to 100x faster than looping and saving one by one.
Create many Python objects in memory List Comprehension: [MyClass(...) for i in range(N)] Readability and Speed. It's the standard, most Pythonic, and generally the fastest way to create a list of objects.
Create many files on disk Loop with with open() Clarity and Safety. The with statement is the standard for handling files in Python, ensuring they are closed properly even if errors occur.
分享:
扫描分享到社交APP
上一篇
下一篇