杰瑞科技汇

Thread Run Python,如何高效实现?

Of course! Let's break down how to run a thread in Python. The primary and most modern way to do this is with the concurrent.futures module, specifically the ThreadPoolExecutor. It's much easier and more "Pythonic" than the older threading module for most use cases.

Thread Run Python,如何高效实现?-图1
(图片来源网络,侵删)

I'll cover:

  1. The Easy Way: concurrent.futures.ThreadPoolExecutor (Recommended)
  2. The Classic Way: threading.Thread (For more control)
  3. When to Use Threads (and when NOT to)
  4. A Complete Example with a Real-world Task

The Easy Way: concurrent.futures.ThreadPoolExecutor

This is the modern, high-level interface for managing a pool of threads. You just tell it what function to run and what arguments to pass, and it handles all the complexity for you.

Key Concepts:

  • Executor: An object that manages a pool of threads.
  • with statement: This ensures that the threads are properly cleaned up after they finish.
  • submit(): This method schedules a function to be executed and returns a Future object. A Future is a placeholder for the result of the function that hasn't finished running yet.
  • as_completed(): An iterator that yields Future objects as they complete. This is great for processing results as soon as they are ready.

Example: Downloading Multiple Files

Let's imagine you want to download several images from the internet. Using threads allows you to start all the downloads concurrently, which is much faster than downloading them one by one.

import concurrent.futures
import requests
import time
# The function we want to run in a thread
def download_image(url):
    """Downloads an image from a URL and saves it."""
    try:
        print(f"Starting download for: {url}")
        response = requests.get(url, timeout=10)
        response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)
        # Get the filename from the URL
        filename = url.split("/")[-1]
        with open(filename, "wb") as f:
            f.write(response.content)
        print(f"Finished downloading: {filename}")
        return filename
    except requests.exceptions.RequestException as e:
        print(f"Error downloading {url}: {e}")
        return None
if __name__ == "__main__":
    image_urls = [
        "https://www.python.org/static/community_logos/python-logo-master-v3-TM.png",
        "https://www.djangoproject.com/img/logos/django-logo-negative.png",
        "https://www.postgresql.org/media/img/about/press/elephant.png",
        "https://invalid-url-that-will-fail.com/image.jpg" # This one will fail
    ]
    # Using a ThreadPoolExecutor
    # max_workers is the number of threads to use. You can also omit it.
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        # Submit all tasks to the executor
        future_to_url = {executor.submit(download_image, url): url for url in image_urls}
        # Process results as they are completed
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
            try:
                # result() will raise any exception that occurred in the thread
                filename = future.result()
                if filename:
                    print(f"Successfully processed result for {url}: {filename}")
            except Exception as e:
                print(f"An exception occurred for {url}: {e}")
    print("\nAll downloads have been submitted.")

To run this code:

Thread Run Python,如何高效实现?-图2
(图片来源网络,侵删)
  1. Save it as a file (e.g., thread_downloader.py).
  2. Run it from your terminal: python thread_downloader.py

You will see the output showing that the downloads start in quick succession, not one after the other.


The Classic Way: threading.Thread

This gives you more direct control over individual threads. You create a Thread object, pass it a target function, and then call .start() to run it.

Key Concepts:

  • threading.Thread(target=your_function, args=(arg1, arg2)): Creates a new thread object.
  • .start(): Begins the execution of the thread.
  • .join(): Waits for the thread to complete its execution. This is crucial if the main thread needs to wait for the worker threads to finish before continuing.

Example: Running a Task in the Background

Let's create a simple example where one thread runs a long task while the main program continues doing other things.

import threading
import time
def long_running_task(seconds):
    """A function that simulates a long task."""
    print(f"Task started. Will run for {seconds} seconds.")
    time.sleep(seconds)
    print("Task finished.")
if __name__ == "__main__":
    # Create a thread object
    # The 'target' is the function to run
    # The 'args' are the arguments to pass to the function (as a tuple)
    thread = threading.Thread(target=long_running_task, args=(5,))
    print("Starting the thread...")
    thread.start()  # Start the thread's execution
    # The main program continues to run while the thread is in the background
    print("Main program is doing other work...")
    for i in range(3):
        print(f"Main program working... {i}")
        time.sleep(1)
    # Wait for the thread to complete before the main program exits
    # This is important to ensure the program doesn't terminate while threads are still running
    print("Main program is waiting for the thread to finish...")
    thread.join() 
    print("Main program has finished.")

To run this code:

  1. Save it as classic_thread.py.
  2. Run it: python classic_thread.py

You'll see that "Main program is doing other work..." prints out while the 5-second task is happening in the background.


When to Use Threads (and when NOT to)

Use Threads for:

  • I/O-Bound Tasks: This is the most important use case. If your program is waiting for external resources, threads are perfect.
    • Examples: Network requests (API calls, downloading files), reading/writing files to disk, database queries.
    • Why? While one thread is waiting for the network, the Python interpreter can switch to another thread to do more work. This dramatically improves performance.

AVOID Threads for:

  • CPU-Bound Tasks: If your task is doing heavy calculations (math, data processing, image manipulation), threads in Python will likely slow you down.
    • Why? Due to the Global Interpreter Lock (GIL), only one thread can execute Python bytecode at a time. For CPU-bound tasks, threads don't get true parallelism; they just take turns, which adds overhead.
    • Solution for CPU-Bound Tasks: Use the multiprocessing module. It creates separate processes, each with its own Python interpreter and memory, allowing you to bypass the GIL and achieve true parallelism on multi-core CPUs.

A Complete Example: Web Scraper with Threads

Here's a practical example that scrapes titles from multiple Wikipedia pages concurrently.

import concurrent.futures
import requests
from bs4 import BeautifulSoup
def get_wikipedia_title(url):
    """Fetches a Wikipedia page and returns its title."""
    try:
        response = requests.get(url, timeout=5)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')
        # The title is in the <title> tag
        title = soup.find('title').text
        return title
    except Exception as e:
        return f"Error fetching {url}: {e}"
if __name__ == "__main__":
    urls = [
        "https://en.wikipedia.org/wiki/Python_(programming_language)",
        "https://en.wikipedia.org/wiki/Concurrent_computing",
        "https://en.wikipedia.org/wiki/Thread_(computing)",
        "https://en.wikipedia.org/wiki/Global_interpreter_lock"
    ]
    print("Starting to fetch titles concurrently...")
    start_time = time.time()
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        # map() is a simpler alternative to submit() if you want to get results in order
        # It returns an iterator of results in the same order as the input urls
        results = list(executor.map(get_wikipedia_title, urls))
    end_time = time.time()
    print("\n--- Results ---")
    for url, title in zip(urls, results):
        print(f"URL: {url}")
        print(f"Title: {title}\n")
    print(f"Total time taken: {end_time - start_time:.2f} seconds")

This example is clean, efficient, and demonstrates the power of threads for I/O-bound scraping.

分享:
扫描分享到社交APP
上一篇
下一篇