杰瑞科技汇

Python如何高效下载文件?

Of course! Downloading files is a common task in Python. The best method depends on the source of the file (e.g., a simple URL, a password-protected server) and the size of the file.

Python如何高效下载文件?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the most popular and effective methods, from simple to more advanced.


Method 1: The Simple & Standard Way (requests Library)

For most use cases, the requests library is the go-to choice. It's powerful, easy to use, and handles many things (like session cookies, headers) automatically.

First, install the library:

pip install requests

Example 1: Downloading a Small File to Memory

This method is great for small files like images, JSON data, or small CSV files. The entire file is loaded into your computer's memory.

Python如何高效下载文件?-图2
(图片来源网络,侵删)
import requests
url = "https://www.python.org/static/community_logos/python-logo-master-v3-TM.png"
save_path = "python_logo.png"
try:
    # Send a GET request to the URL
    response = requests.get(url, stream=True) # stream=True is good practice
    # Raise an exception for bad status codes (4xx or 5xx)
    response.raise_for_status()
    # Get the total file size from headers (optional, for progress bar)
    total_size = int(response.headers.get('content-length', 0))
    # Write the content to a file in binary mode
    with open(save_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192): # 8KB chunks
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    print(f"File downloaded successfully to {save_path}")
except requests.exceptions.RequestException as e:
    print(f"Error downloading the file: {e}")

Example 2: Downloading a Large File with a Progress Bar

For large files, downloading to memory can cause issues. It's better to stream the file directly to disk and show a progress bar.

We'll use the popular tqdm library for the progress bar.

First, install the libraries:

pip install requests tqdm
import requests
from tqdm import tqdm
url = "https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz"
save_path = "Python-3.11.4.tgz"
try:
    # Get the file size
    response = requests.get(url, stream=True)
    total_size = int(response.headers.get('content-length', 0))
    # Initialize the progress bar
    progress_bar = tqdm(total=total_size, unit='iB', unit_scale=True, desc=save_path)
    # Download and write the file
    with open(save_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=1024):
            f.write(chunk)
            progress_bar.update(len(chunk))
    progress_bar.close()
    if total_size != 0 and progress_bar.n != total_size:
        print("ERROR, something went wrong")
    print(f"\nFile downloaded successfully to {save_path}")
except requests.exceptions.RequestException as e:
    print(f"Error downloading the file: {e}")

Method 2: The Built-in Way (urllib)

Python's standard library has urllib, which doesn't require any installation. It's less user-friendly than requests but gets the job done for simple downloads.

Python如何高效下载文件?-图3
(图片来源网络,侵删)
import urllib.request
url = "https://www.python.org/static/community_logos/python-logo-master-v3-TM.png"
save_path = "python_logo_urllib.png"
try:
    # Download the file and save it
    urllib.request.urlretrieve(url, save_path)
    print(f"File downloaded successfully to {save_path}")
except urllib.error.URLError as e:
    print(f"Error downloading the file: {e}")

Pros:

  • No external libraries needed.
  • Very simple for one-off downloads.

Cons:

  • Less flexible (e.g., adding headers or handling authentication is more complex).
  • Lacks features like streaming and progress bars out of the box.
  • Can be slower for large files as it often loads the whole file into memory first.

Method 3: For Very Large Files & Resumable Downloads

When downloading huge files (like multi-GB datasets), you want to be able to resume a download if it's interrupted. The requests library makes this easy by checking the Range header.

import requests
import os
url = "https://example.com/very_large_file.zip"
save_path = "very_large_file.zip"
def download_file_with_resume(url, save_path):
    # Check if the file already exists and get its size
    first_byte = os.path.exists(save_path) and os.path.getsize(save_path)
    headers = {}
    if first_byte:
        headers = {'Range': f'bytes={first_byte}-'} # Request the rest of the file
    response = requests.get(url, headers=headers, stream=True)
    response.raise_for_status()
    # Get total file size
    total_size = int(response.headers.get('content-length', 0)) + first_byte if first_byte else 0
    mode = 'ab' if first_byte else 'wb' # 'ab' for append binary, 'wb' for write binary
    with open(save_path, mode) as f:
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)
                # You could add a progress bar here too
    print(f"File download complete. Saved to {save_path}")
download_file_with_resume(url, save_path)

Method 4: For FTP Servers

If you need to download from an FTP server, use Python's built-in ftplib.

from ftplib import FTP
import os
ftp_server = "ftp.example.com"
ftp_user = "username"
ftp_pass = "password"
remote_file_path = "/path/on/server/file.txt"
local_file_path = "file.txt"
try:
    # Create an FTP client object
    ftp = FTP(ftp_server)
    # Login
    ftp.login(ftp_user, ftp_pass)
    # Download the file
    with open(local_file_path, 'wb') as local_file:
        ftp.retrbinary(f"RETR {remote_file_path}", local_file.write)
    print(f"File downloaded successfully to {local_file_path}")
    # Quit the FTP session
    ftp.quit()
except Exception as e:
    print(f"An FTP error occurred: {e}")

Summary & Recommendation

Method Best For Pros Cons
requests General purpose, most use cases. Easy to use, powerful, handles sessions/headers, supports streaming. Requires external library (pip install requests).
urllib Quick, simple scripts where you can't install libraries. Built-in, no dependencies. Clunky API, less flexible, poor for large files.
Resumable Download Very large files or unstable connections. Can resume interrupted downloads, saves bandwidth. More complex code.
ftplib Downloading from FTP servers. Built-in, standard for FTP. Only for FTP, not for HTTP/HTTPS.

Final Recommendation:

For 99% of file downloading tasks in Python, use the requests library. It's the modern standard, is incredibly versatile, and makes your code clean and readable. Use the streaming approach (stream=True) for anything larger than a few megabytes.

分享:
扫描分享到社交APP
上一篇
下一篇