杰瑞科技汇

Python urllib2如何发起GET请求?

Of course! Let's break down how to make HTTP GET requests in Python, starting with the older urllib2 and then showing the modern, recommended way using the requests library.

Python urllib2如何发起GET请求?-图1
(图片来源网络,侵删)

The Modern and Recommended Way: requests Library

First and foremost, for any new project, you should use the requests library. It's designed to be more user-friendly, powerful, and Pythonic than the built-in modules.

Installation

If you don't have it installed, open your terminal or command prompt and run:

pip install requests

Simple GET Request Example

Python urllib2如何发起GET请求?-图2
(图片来源网络,侵删)

This is the most common use case: fetching the content of a webpage.

import requests
# The URL you want to get data from
url = 'https://www.example.com'
try:
    # Send a GET request to the URL
    response = requests.get(url)
    # Raise an exception if the request was unsuccessful (e.g., 404 Not Found, 500 Server Error)
    response.raise_for_status()
    # The content of the response, decoded automatically from bytes to a string
    # .text is for text-based content (HTML, JSON, etc.)
    html_content = response.text
    # Print the first 500 characters of the HTML
    print("Successfully fetched the page!")
    print(f"Status Code: {response.status_code}")
    print(f"Content-Type: {response.headers['Content-Type']}")
    print("-" * 20)
    print(html_content[:500])
except requests.exceptions.RequestException as e:
    # Handle any errors that occur during the request (e.g., connection error, timeout)
    print(f"An error occurred: {e}")

Key requests Features:

  • Simplicity: requests.get(url) is all you need for a basic GET request.
  • Automatic Decoding: It automatically decodes the response content (e.g., from gzip) into a string using the correct encoding.
  • JSON Handling: response.json() is a convenient method to parse JSON responses directly into a Python dictionary.
  • Easy Access to Headers & Status: response.headers, response.status_code, response.url are easy to access.
  • Session Objects: requests.Session() allows you to persist parameters across requests, which is great for cookies and authentication.

The Older Way: urllib2 (and urllib in Python 3)

urllib2 was the standard library module in Python 2. In Python 3, it was split and merged into urllib.request and urllib.error. You should generally avoid this for new code, but it's useful to know for maintaining older Python 2 scripts or for environments where you cannot install external libraries.

Python 3: urllib.request

Here is the equivalent GET request using Python 3's built-in urllib.request.

Python urllib2如何发起GET请求?-图3
(图片来源网络,侵删)
import urllib.request
import urllib.error
# The URL you want to get data from
url = 'https://www.example.com'
try:
    # Create a request object
    # You can add headers here, e.g., req = urllib.request.Request(url, headers={'User-Agent': 'MyApp/1.0'})
    req = urllib.request.Request(url)
    # Open the URL and get a response object
    # This is a context manager (with statement) which is good practice
    with urllib.request.urlopen(req) as response:
        # The response data is in bytes, so you need to decode it to a string
        # response.read() returns the entire content as bytes
        html_bytes = response.read()
        # Get the encoding from the headers, default to 'utf-8' if not found
        encoding = response.headers.get_content_charset()
        if encoding is None:
            encoding = 'utf-8'
        html_content = html_bytes.decode(encoding)
        # Print the first 500 characters of the HTML
        print("Successfully fetched the page!")
        print(f"Status Code: {response.status}") # Note: it's .status, not .status_code
        print(f"Content-Type: {response.headers.get('Content-Type')}")
        print("-" * 20)
        print(html_content[:500]
except urllib.error.URLError as e:
    # Handle URL errors (e.g., 404, 500)
    print(f"An error occurred: {e.reason}")
except Exception as e:
    # Handle other potential errors
    print(f"An unexpected error occurred: {e}")

Python 2: urllib2

If you are working with a legacy Python 2 codebase, the syntax is slightly different.

# Python 2
import urllib2
import urllib
url = 'https://www.example.com'
try:
    # Create a request object
    req = urllib2.Request(url)
    # Open the URL and get a response object
    response = urllib2.urlopen(req)
    # The response data is in bytes
    html_bytes = response.read()
    # Decode to a string
    html_content = html_bytes.decode('utf-8') # Or the correct encoding
    # Print the first 500 characters
    print("Successfully fetched the page!")
    print(f"Status Code: {response.code}") # Note: it's .code
    print(f"Content-Type: {response.headers.getheader('Content-Type')}")
    print("-" * 20)
    print(html_content[:500]
except urllib2.URLError as e:
    print(f"An error occurred: {e.reason}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Comparison: requests vs. urllib.request

Feature requests (Recommended) urllib.request (Built-in)
Ease of Use Excellent. Very intuitive and Pythonic. Fair. More verbose and less intuitive.
Installation Requires pip install requests. Built-in, no installation needed.
JSON Handling response.json() automatically parses JSON. Manual parsing required using json.loads().
URL Encoding requests handles it automatically. Manual handling required (urllib.parse.quote).
Sessions/Cookies Easy with requests.Session(). Possible but more complex.
Response Object response.text (string), response.content (bytes). response.read() (bytes), manual decoding needed.
Error Handling response.raise_for_status() and specific exception classes. try...except urllib.error.URLError.

Conclusion

When to Use Why
requests Almost always. For any new project, personal script, or professional development. It saves time and reduces boilerplate code.
urllib.request When you cannot install third-party libraries (e.g., in some restricted environments or coding challenges that forbid it).

For learning and modern Python development, start with requests. It will make your life much easier. Use urllib.request only when you have a specific reason to avoid external dependencies.

分享:
扫描分享到社交APP
上一篇
下一篇