Python并发编程，如何选择合适模型？-杰瑞科技汇

Of course! "Python concurrent" refers to the set of tools and techniques in Python for writing programs that can perform multiple operations simultaneously. This is crucial for improving performance, especially for I/O-bound tasks (like network requests or file operations) and can also help with CPU-bound tasks (like complex calculations).

（图片来源网络，侵删）

Let's break it down into the key concepts and modules.

The Core Concept: The Global Interpreter Lock (GIL)

Before diving into the tools, you must understand the GIL.

What it is: The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time within a single process.
The Implication: For CPU-bound tasks (tasks that are heavy on computation), Python's threading module might not provide a performance boost because only one thread can execute Python code at a time. The GIL acts as a bottleneck.
The Exception: The GIL is released during I/O operations (like waiting for a network response or reading a file). This makes threading very effective for I/O-bound tasks.

Because of the GIL, Python uses different tools for different types of concurrency problems:

For I/O-Bound Tasks: Use Threading.
For CPU-Bound Tasks: Use Multiprocessing.
For Simpler, High-Level Concurrency: Use asyncio (with async/await syntax).

Threading (for I/O-Bound Tasks)

Threading is used when your program spends most of its time waiting. For example, a web scraper that needs to make many network requests. While one thread is waiting for a response, another thread can make a new request.

（图片来源网络，侵删）

Key Idea: Run multiple threads within a single process. They share memory, which is great for data sharing but requires careful synchronization (using Lock, Queue, etc.).

Example: Web Scraping with `concurrent.futures`

The concurrent.futures module provides a high-level interface for asynchronously executing callables. ThreadPoolExecutor is the perfect tool for I/O-bound tasks.

import requests
import concurrent.futures
import time
def fetch_url(url):
    """Fetches the content of a URL and returns the URL and status code."""
    try:
        response = requests.get(url, timeout=5)
        return url, response.status_code
    except requests.RequestException as e:
        return url, str(e)
urls = [
    "https://www.python.org",
    "https://www.google.com",
    "https://www.github.com",
    "https://www.nonexistent-website-12345.com",
    "https://www.stackoverflow.com"
]
# Using a ThreadPoolExecutor to fetch URLs concurrently
start_time = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # map() returns results in the same order as the inputs
    results = list(executor.map(fetch_url, urls))
end_time = time.time()
print("--- Results ---")
for url, status in results:
    print(f"{url}: {status}")
print(f"\nTotal time taken: {end_time - start_time:.2f} seconds")

Why is this faster? If you ran these requests sequentially, you'd have to wait for each one to complete before starting the next. With threading, while one request is "in flight" (waiting for the server), the other threads are working on other requests.

Multiprocessing (for CPU-Bound Tasks)

Multiprocessing gets around the GIL by creating separate processes, each with its own Python interpreter and memory space. This allows for true parallel execution on multi-core CPUs.

（图片来源网络，侵删）

Key Idea: Run multiple processes. Each process has its own memory, so data sharing is more complex (requires Queue, Pipe, or Manager). This is the go-to for heavy calculations.

Example: Parallel Image Processing

Let's say we have a list of image files and we want to apply a filter to each one. This is a CPU-bound task.

import os
import time
from concurrent.futures import ProcessPoolExecutor
from PIL import Image # Requires Pillow: pip install Pillow
# A dummy CPU-intensive function
def apply_grayscale(image_path):
    """Applies a grayscale filter to an image."""
    try:
        with Image.open(image_path) as img:
            img_gray = img.convert("L")
            output_path = f"gray_{os.path.basename(image_path)}"
            img_gray.save(output_path)
            return f"Processed {image_path} -> {output_path}"
    except Exception as e:
        return f"Error processing {image_path}: {e}"
# Create some dummy image files for the example
if not os.path.exists("images"):
    os.makedirs("images")
for i in range(5):
    Image.new('RGB', (100, 100), color = 'red').save(f"images/image_{i}.png")
image_files = [f"images/image_{i}.png" for i in range(5)]
start_time = time.time()
# Using a ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(apply_grayscale, image_files))
end_time = time.time()
print("--- Results ---")
for result in results:
    print(result)
print(f"\nTotal time taken: {end_time - start_time:.2f} seconds")
# Clean up dummy files
for f in image_files:
    if os.path.exists(f"gray_{os.path.basename(f)}"):
        os.remove(f"gray_{os.path.basename(f)}")

Why is this faster? The apply_grayscale function is CPU-intensive. By using separate processes, the work can be distributed across multiple CPU cores, and each core can work on a different image simultaneously. Threading would be ineffective here due to the GIL.

Asyncio (for I/O-Bound Tasks with High Concurrency)

Asyncio is a different paradigm. Instead of using threads or processes, it uses a single thread and an event loop to manage many "tasks." When a task performs an I/O operation (like await an_http_request()), it yields control back to the event loop, allowing other tasks to run.

Key Idea: Cooperative multitasking. Tasks must explicitly yield control using await. This is extremely efficient for handling thousands of concurrent I/O connections (e.g., a web server, chat app).

Example: Fetching URLs with `asyncio` and `aiohttp`

This is the modern, high-performance way to do I/O concurrency in Python.

import asyncio
import aiohttp
import time
async def fetch_url_async(session, url):
    """Asynchronously fetches a URL."""
    try:
        async with session.get(url, timeout=5) as response:
            return url, response.status
    except Exception as e:
        return url, str(e)
async def main():
    urls = [
        "https://www.python.org",
        "https://www.google.com",
        "https://www.github.com",
        "https://www.nonexistent-website-12345.com",
        "https://www.stackoverflow.com"
    ]
    start_time = time.time()
    # aiohttp requires an ClientSession
    async with aiohttp.ClientSession() as session:
        # Create a list of tasks to run concurrently
        tasks = [fetch_url_async(session, url) for url in urls]
        # asyncio.gather runs all tasks concurrently and waits for them all to finish
        results = await asyncio.gather(*tasks)
    end_time = time.time()
    print("--- Results ---")
    for url, status in results:
        print(f"{url}: {status}")
    print(f"\nTotal time taken: {end_time - start_time:.2f} seconds")
# Run the main async function
asyncio.run(main())

Summary: When to Use What

Method	Best For	How it Works	Memory Overhead	Data Sharing
Threading	I/O-Bound (network, disk)	Multiple threads in one process. GIL is released on I/O.	Low (shares memory)	Easy (but needs locks)
Multiprocessing	CPU-Bound (math, video processing)	Multiple processes, each with its own interpreter and memory.	High (each process has its own memory)	Harder (requires queues/manager)
Asyncio	High-Concurrency I/O (web servers, 1000s of connections)	Single thread, event loop, cooperative tasks (`await`).	Very Low	Shared within the same thread

A Note on `asyncio` vs. `Threading`

For simple I/O tasks like the example above, both ThreadPoolExecutor and asyncio can be very fast. The choice often comes down to:

Simplicity: ThreadPoolExecutor can be simpler if you're not already using async/await.
Scale: asyncio can handle a much larger number of concurrent connections (e.g., 10,000) because it doesn't have the overhead of creating and managing thousands of OS threads.
Ecosystem: Modern web frameworks like FastAPI and Django Channels are built on asyncio. If you're using them, you should use async functions for your I/O.

Python并发编程，如何选择合适模型？

The Core Concept: The Global Interpreter Lock (GIL)

Threading (for I/O-Bound Tasks)

Example: Web Scraping with `concurrent.futures`

Multiprocessing (for CPU-Bound Tasks)

Example: Parallel Image Processing

Asyncio (for I/O-Bound Tasks with High Concurrency)

Example: Fetching URLs with `asyncio` and `aiohttp`

Summary: When to Use What

A Note on `asyncio` vs. `Threading`

99ANYc3cd6

Python listfields是什么？如何使用？

Metasploit如何用Python开发模块？

xp安装版安装步骤是怎样的？

Python LazyModule如何实现懒加载机制？

Win8 U盘装机步骤是怎样的？

Java Socket编程实例具体怎么实现？

java eclipse 内存

OpenSwitch如何用Python实现自动化管理？

Python交互模式如何高效使用？

Python jsonhepler如何高效处理JSON数据？

Java、Excel、HTML如何实现数据交互？

零基础如何免费学CDR入门？

MySQL5.5安装步骤详解，新手必看？

Linux环境变量如何配置Java？

佳能650D新手如何快速上手？

python module rpm

Python并发编程，如何选择合适模型？

The Core Concept: The Global Interpreter Lock (GIL)

Threading (for I/O-Bound Tasks)

Example: Web Scraping with concurrent.futures

Multiprocessing (for CPU-Bound Tasks)

Example: Parallel Image Processing

Asyncio (for I/O-Bound Tasks with High Concurrency)

Example: Fetching URLs with asyncio and aiohttp

Summary: When to Use What

A Note on asyncio vs. Threading

相关推荐

Java Socket编程实例具体怎么实现？

Example: Web Scraping with `concurrent.futures`

Example: Fetching URLs with `asyncio` and `aiohttp`

A Note on `asyncio` vs. `Threading`