杰瑞科技汇

Python async loop 如何高效运行与优化?

Of course! Let's break down async loops in Python. This is a fundamental concept for writing efficient, non-blocking code, especially for I/O-bound tasks like web requests, database calls, or reading files.

Python async loop 如何高效运行与优化?-图1
(图片来源网络,侵删)

The core idea is to run many operations concurrently, not in parallel. This is crucial for I/O-bound tasks, where most of the time is spent waiting for an external resource (like a network or disk).

The Problem with a Standard for Loop

Imagine you want to fetch the content of 10 different websites. A standard synchronous loop would look like this:

import requests
import time
def fetch_url(url):
    """A synchronous function that fetches a URL."""
    print(f"Fetching {url}...")
    response = requests.get(url, timeout=5)
    print(f"Finished fetching {url}. Status: {response.status_code}")
    return response.status_code
urls = [
    "https://httpbin.org/delay/1",  # This will take 1 second
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]
start_time = time.time()
for url in urls:
    fetch_url(url)
end_time = time.time()
print(f"Total time taken: {end_time - start_time:.2f} seconds")

What's happening here?

  1. The loop starts.
  2. It calls fetch_url("url_1").
  3. requests.get() sends a request and waits for the response. This is a blocking operation. The entire program is frozen until the response arrives (1 second).
  4. Once the first response is received, it prints the message and moves to the next URL.
  5. This repeats for all 10 URLs.

Result: The total time will be roughly 10 * 1 second = 10 seconds. We are doing everything sequentially, one after the other.

Python async loop 如何高效运行与优化?-图2
(图片来源网络,侵删)

The Solution: async and await

To fix this, we need to make our code non-blocking. We'll use Python's asyncio library.

  1. async def: We define a function as async def. This marks it as a "coroutine." It's a special function that can be paused and resumed.
  2. await: When we call another coroutine from within an async def function, we use await. This is the magic word. It tells Python: "This coroutine is waiting for an I/O operation. Don't block the whole program; pause this coroutine and run something else that's ready."

Let's rewrite our example using asyncio and aiohttp (an asynchronous HTTP client library, similar to requests).

Step 1: Install aiohttp

pip install aiohttp

Step 2: Write the Async Code

import asyncio
import aiohttp
import time
# This is an async function (a coroutine)
async def fetch_url_async(session, url):
    """An asynchronous function that fetches a URL."""
    print(f"Fetching {url}...")
    # aiohttp's session.get() is a coroutine, so we must 'await' it.
    # This is the non-blocking part.
    async with session.get(url, timeout=5) as response:
        # We can also 'await' reading the content
        # data = await response.text()
        print(f"Finished fetching {url}. Status: {response.status}")
        return response.status
async def main():
    """The main async function that will run our loop."""
    urls = [
        "https://httpbin.org/delay/1",
        # ... (same 10 URLs as before)
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/1",
    ]
    # Create a single aiohttp session to manage connections efficiently
    async with aiohttp.ClientSession() as session:
        # This is the async equivalent of a for loop.
        # It creates a task for each URL and schedules them to run concurrently.
        tasks = []
        for url in urls:
            # Create a task for each fetch operation
            task = asyncio.create_task(fetch_url_async(session, url))
            tasks.append(task)
        # Wait for all tasks to complete
        await asyncio.gather(*tasks)
# This is the standard way to run the top-level async function
if __name__ == "__main__":
    start_time = time.time()
    asyncio.run(main())
    end_time = time.time()
    print(f"Total time taken: {end_time - start_time:.2f} seconds")

What's happening here?

  1. asyncio.run(main()) starts the event loop and runs our main coroutine.
  2. Inside main, we create an aiohttp.ClientSession.
  3. The for loop begins. For each URL, it calls asyncio.create_task(). This doesn't run the function immediately. It just schedules the fetch_url_async coroutine to run as a "task" on the event loop.
  4. The loop finishes very quickly, having scheduled all 10 tasks.
  5. await asyncio.gather(*tasks) is the key. It tells the event loop: "Please wait for all of these tasks to complete." The event loop now runs the tasks.
  6. When fetch_url_async hits await session.get(), it yields control back to the event loop. The loop sees that the first task is waiting and starts the second task. When the second task hits an await, it starts the third, and so on.
  7. The requests are all "in the air" at the same time. The event loop manages them. When the first response comes back (after 1 second), it resumes the first task. When the second response comes back, it resumes the second task, etc.

Result: The total time will be roughly 1 second, not 10 seconds! All the 1-second delays happen concurrently.

Python async loop 如何高效运行与优化?-图3
(图片来源网络,侵删)

Cleaner Syntax: async for

For some objects, like async generators, you can use a for loop directly. However, for our common use case of running multiple tasks concurrently, the asyncio.create_task pattern is the standard and most explicit.

A more modern and often cleaner way to write the concurrent loop is by using asyncio.gather with a list comprehension.

Cleaner Version with List Comprehension

import asyncio
import aiohttp
import time
async def fetch_url_async(session, url):
    print(f"Fetching {url}...")
    async with session.get(url, timeout=5) as response:
        print(f"Finished fetching {url}. Status: {response.status}")
        return response.status
async def main_cleaner():
    urls = ["https://httpbin.org/delay/1"] * 10 # A shorter way to write the list
    async with aiohttp.ClientSession() as session:
        # Create all tasks at once and pass them to gather
        results = await asyncio.gather(
            *(fetch_url_async(session, url) for url in urls)
        )
        print(f"All tasks completed. Results: {results}")
if __name__ == "__main__":
    start_time = time.time()
    asyncio.run(main_cleaner())
    end_time = time.time()
    print(f"Total time taken: {end_time - start_time:.2f} seconds")

This version is more concise and achieves the same result. The syntax unpacks the generator into separate arguments for asyncio.gather.


Key Concepts Summary

Concept Description Example
async def Defines a coroutine. A special function that can be paused. async def my_function():
await Pauses the current coroutine and lets the event loop run other tasks. It can only be used inside an async def function. data = await some_coroutine()
asyncio.run() The entry point to run the top-level async function. It creates and manages the event loop. asyncio.run(main())
asyncio.create_task() Schedules a coroutine to run concurrently as a "Task". task = asyncio.create_task(my_coroutine())
asyncio.gather() Runs multiple tasks concurrently and waits for all of them to finish. It collects their return values. results = await asyncio.gather(task1, task2)
async with The async version of with, used for managing asynchronous resources like sessions. async with aiohttp.ClientSession() as session:

When to Use Async Loops

  • I/O-Bound Tasks: This is the sweet spot. Network calls, database queries, reading/writing files, making API calls.
  • Many Small Operations: When you have hundreds or thousands of small tasks that involve waiting.

When NOT to use Async Loops:

  • CPU-Bound Tasks: If your task is heavy on computation (e.g., complex math, image processing, video encoding), async won't help. The await keyword only pauses for I/O. For CPU-bound tasks, use the multiprocessing module to run tasks in parallel across different CPU cores.
分享:
扫描分享到社交APP
上一篇
下一篇