杰瑞科技汇

Python opener与open有何区别?

Of course! The term "opener" in Python most famously refers to a function or object used with the urllib.request module to open URLs in a flexible way. However, it can also refer to the general concept of opening files.

Python opener与open有何区别?-图1
(图片来源网络,侵删)

Let's break it down into the two main contexts:

  1. The Classic: urllib.request.OpenerDirector (for URLs)
  2. The Modern: requests library (for HTTP requests)
  3. The Fundamental: open() (for files)

The Classic: urllib.request.OpenerDirector (for URLs)

This is the most direct answer to "python opener open". The urllib.request module has a powerful, if somewhat complex, system for opening URLs. The central piece is the OpenerDirector.

Why use an Opener?

By default, urllib.request.urlopen() is simple but limited. It can't handle things like:

  • HTTP cookies
  • HTTP Authentication (like Basic Auth)
  • Proxies
  • Custom headers

An OpenerDirector acts like a "manager" for a set of handlers. You build an opener by adding handlers that know how to perform these specific tasks. When you call opener.open(), it uses all its handlers to process the request.

Python opener与open有何区别?-图2
(图片来源网络,侵删)

Key Components:

  • urllib.request.OpenerDirector: The main opener object.
  • urllib.request.build_opener(): A convenience function to create an opener with a set of default handlers, plus any you add.
  • urllib.request.install_opener(): Installs your custom opener as the default for the module.
  • Handler Classes: These do the work.
    • ProxyHandler: Handles proxies.
    • HTTPHandler, HTTPSHandler: Handle HTTP/HTTPS connections.
    • HTTPCookieProcessor: Handles cookies (requires http.cookiejar).
    • HTTPBasicAuthHandler: Handles HTTP Basic Authentication.
    • HTTPRedirectHandler: Follows HTTP redirects (3xx status codes).

Example: Making a Request with Cookies and Authentication

Let's build an opener that can handle cookies and Basic Authentication.

import urllib.request
import urllib.parse
import http.cookiejar
from urllib.error import URLError, HTTPError
# --- Step 1: Create a CookieJar to store cookies ---
cookie_jar = http.cookiejar.CookieJar()
# --- Step 2: Create handlers ---
# We need a handler for cookies and one for authentication
cookie_handler = urllib.request.HTTPCookieProcessor(cookie_jar)
# For this example, we'll use a password manager for Basic Auth
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password for a specific URL
# The realm can be None if you don't know it
top_level_url = "http://example.com/"
password_mgr.add_password(None, top_level_url, "my_username", "my_secret_password")
auth_handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# --- Step 3: Build the opener ---
# We can add our custom handlers to the default ones
opener = urllib.request.build_opener(auth_handler, cookie_handler)
# --- Step 4: Install the opener (optional) ---
# This makes urlopen() use our opener by default
# urllib.request.install_opener(opener)
# --- Step 5: Use the opener to open a URL ---
try:
    # The opener will now handle auth and cookies automatically
    # Let's open a page that requires authentication
    response = opener.open("http://example.com/protected_page.html")
    # Read the response data
    data = response.read()
    print(f"Response status: {response.status}")
    print(f"Cookies received: {cookie_jar}")
    print(f"Response data (first 100 chars): {data[:100].decode('utf-8')}")
except HTTPError as e:
    print(f"Error code: {e.code}")
    print(f"Error reason: {e.reason}")
except URLError as e:
    print(f"Reason: {e.reason}")

The Modern: requests library (for HTTP requests)

While urllib.request is built-in, the requests library is the de-facto standard for making HTTP requests in Python today. It is much simpler and more intuitive. It handles all the "opener" complexity (cookies, auth, headers, etc.) behind the scenes with a clean, high-level API.

If you're doing anything more than a simple GET or POST, you should almost certainly use requests.

Installation

First, you need to install it:

Python opener与open有何区别?-图3
(图片来源网络,侵删)
pip install requests

Example: The same task as above, but with requests

Notice how much cleaner and simpler this is. The requests.Session object is the modern equivalent of a custom OpenerDirector.

import requests
# --- Step 1: Create a Session object ---
# A session object persists parameters across requests
# and handles cookies for you automatically.
session = requests.Session()
# --- Step 2: Set authentication ---
# requests handles Basic Auth easily via the `auth` tuple
session.auth = ('my_username', 'my_secret_password')
# You can also set headers, proxies, etc. on the session
# session.headers.update({'User-Agent': 'MyCoolApp/1.0'})
# --- Step 3: Make the request ---
# The session will automatically handle authentication and cookies
try:
    response = session.get("http://example.com/protected_page.html")
    # Raise an exception for bad status codes (4xx or 5xx)
    response.raise_for_status() 
    print(f"Response status: {response.status_code}")
    # The cookies are stored in the session object
    print(f"Cookies in session: {session.cookies.get_dict()}")
    print(f"Response text (first 100 chars): {response.text[:100]}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

As you can see, requests abstracts away the entire "opener" building process, making it the recommended approach for most HTTP tasks.


The Fundamental: open() (for files)

Finally, it's important not to confuse the URL opener with Python's built-in open() function, which is used for opening files on your local filesystem or in a cloud storage system (like S3, using fsspec).

This is the most common "open" operation in Python.

Example: Reading a local text file

# The 'with' statement ensures the file is automatically closed
try:
    with open('my_document.txt', 'r') as f:
        content = f.read()
        print(content)
except FileNotFoundError:
    print("Error: The file was not found.")

Key differences from URL openers:

Feature urllib.request.OpenerDirector / requests open() (built-in)
Purpose Fetch data from a network resource (HTTP, FTP). Read from/write to a local file or a file-like object.
Protocol HTTP, HTTPS, FTP. Local filesystem path.
Return Type An http.client.HTTPResponse or requests.Response object. A file object (with read(), readline(), write() methods).
Dependencies urllib (built-in), requests (external). None (built-in).

Summary: Which one should I use?

  • To open a file on your computer: Use the built-in open().
  • To make a simple HTTP/HTTPS request: Use requests.get(), requests.post(), etc. It's the easiest and most Pythonic way.
  • To make a complex HTTP request with advanced features (cookies, auth, custom headers) without installing an external library: Use urllib.request.build_opener() with custom handlers. This is the "old school" but powerful way.
  • If you are working with legacy code or have a specific reason to avoid requests: Use the urllib.request.OpenerDirector approach.
分享:
扫描分享到社交APP
上一篇
下一篇