Python urllib2如何处理HTTP请求？-杰瑞科技汇

Of course! Let's dive deep into urllib2 for HTTP requests in Python.

（图片来源网络，侵删）

Important: `urllib2` vs. `urllib` (Python 2) vs. `urllib` (Python 3)

This is a common point of confusion. You mentioned urllib2, which is the standard library name in Python 2.

Python 2: The library was split into two modules: urllib2 and urllib.
- urllib2: For opening URLs (HTTP, HTTPS, FTP). This is what you're asking about.
- urllib: For URL encoding/decoding (urllib.quote, urllib.unquote).
Python 3: The modules were merged and reorganized into urllib.request and urllib.parse.
- urllib.request: The direct successor to urllib2. It handles opening URLs.
- urllib.parse: The successor to the old urllib. It handles URL parsing and encoding.

Recommendation: For any new project, you should use the requests library. It's far more intuitive, powerful, and easier to use than the built-in urllib. However, understanding urllib2 is essential for reading legacy Python 2 code.

（图片来源网络，侵删）

`urllib2` in Python 2: Core Concepts

The main entry point for urllib2 is the urlopen() function. It can handle both simple requests and more complex ones that require custom headers, authentication, or cookies.

Making a Simple GET Request

This is the most basic use case: fetching the content of a webpage.

import urllib2
# The URL we want to fetch
url = "http://httpbin.org/get"
try:
    # urlopen opens the URL and returns a file-like object
    response = urllib2.urlopen(url)
    # We can read the content of the response
    html = response.read()
    # The response object also has useful headers
    print "Response Code:", response.getcode()
    print "Headers:"
    print response.info()
    print "\n--- Content ---"
    print html
except urllib2.URLError as e:
    print "Failed to open URL:", e.reason

Explanation:

urllib2.urlopen(url): Opens the URL and returns a addinfourl object (a file-like object).
response.read(): Reads the entire content of the response as a string.
response.getcode(): Returns the HTTP status code (e.g., 200 for OK, 404 for Not Found).
response.info(): Returns an email.message.Message object containing the response headers.

Adding Custom Headers (e.g., User-Agent)

Many websites block default Python user-agents. To avoid this, you can add a User-Agent header. For this, you need to create a Request object.

（图片来源网络，侵删）

import urllib2
url = "http://httpbin.org/user-agent"
# Create a dictionary of headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
    'Accept': 'application/json'
}
# Create a Request object with the URL and headers
request = urllib2.Request(url, headers=headers)
try:
    # Pass the Request object to urlopen
    response = urllib2.urlopen(request)
    response_data = response.read()
    print response_data
except urllib2.URLError as e:
    print "Failed to open URL:", e.reason

Explanation:

urllib2.Request(url, headers=...): Instead of passing a string directly, we create a Request object. This allows us to specify headers, data (for POST), and other request-level parameters.
urllib2.urlopen(request): We pass the Request object to urlopen.

Making a POST Request

To send data to a server, you use a POST request. You pass the data as the second argument to the Request object.

import urllib2
import urllib # Note: We need the old urllib for urlencode
url = "http://httpbin.org/post"
# The data to send. This must be in a specific format.
# We use urllib.urlencode to convert a dictionary to a query string.
post_data = urllib.urlencode({
    'username': 'test_user',
    'password': 'secure_password123'
})
# The data must be passed as bytes in Python 2
post_data_bytes = post_data.encode('utf-8')
# Create a Request object, passing the URL and the data
request = urllib2.Request(url, data=post_data_bytes)
try:
    response = urllib2.urlopen(request)
    response_data = response.read()
    print "Response from POST request:"
    print response_data
except urllib2.URLError as e:
    print "Failed to open URL:", e.reason

Explanation:

urllib.urlencode(): This function (from the old urllib module) is crucial. It takes a dictionary and converts it into a format suitable for an HTTP request body (key1=value1&key2=value2).
Request(url, data=post_data_bytes): By providing the data argument, urllib2 automatically changes the request method from GET to POST.

Handling HTTP Errors (e.g., 404 Not Found)

If a URL returns an error status code (4xx or 5xx), urlopen raises an HTTPError.

import urllib2
url = "http://httpbin.org/status/404"
try:
    response = urllib2.urlopen(url)
    print "Success! Code:", response.getcode()
except urllib2.HTTPError as e:
    # This block catches HTTP errors like 404, 500, etc.
    print "HTTP Error occurred!"
    print "Error Code:", e.code
    print "Error Reason:", e.reason
    print "Error Headers:", e.headers
except urllib2.URLError as e:
    # This block catches other URL-related errors (e.g., no network)
    print "URL Error occurred:", e.reason

Handling Cookies

urllib2 has a built-in HTTPCookieProcessor to handle cookies automatically. You need to build an "opener" to use it.

import urllib2
import cookielib # The old cookie library
# Create a cookie jar to store cookies
cookie_jar = cookielib.CookieJar()
# Create an opener that will handle cookies
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
# A URL that sets a cookie
url_set_cookie = "http://httpbin.org/cookies/set?test_cookie=12345"
# A URL that expects the cookie to be sent back
url_check_cookie = "http://httpbin.org/cookies"
# Install the opener. Now all calls to urlopen will use it.
urllib2.install_opener(opener)
print "--- Visiting URL to set cookie ---"
response = urllib2.urlopen(url_set_cookie)
print response.read()
print "\n--- Visiting URL to check cookie ---"
response = urllib2.urlopen(url_check_cookie)
print response.read()
# You can inspect the cookies in the jar
print "\n--- Cookies in the jar ---"
for cookie in cookie_jar:
    print cookie

The Modern Alternative: The `requests` Library

As promised, here's how you'd do the same tasks with the requests library in Python 3 (or 2 with pip install requests). The syntax is much cleaner.

Installation

pip install requests

`requests` Examples

import requests
# 1. Simple GET Request
response = requests.get('http://httpbin.org/get')
print("Status Code:", response.status_code)
print("JSON Response:", response.json()) # .json() parses the JSON response
# 2. Adding Headers
headers = {'User-Agent': 'MyCoolAgent/1.0'}
response = requests.get('http://httpbin.org/user-agent', headers=headers)
print(response.json())
# 3. Making a POST Request
payload = {'username': 'test_user', 'password': 'secure_password123'}
response = requests.post('http://httpbin.org/post', data=payload)
print("POST Response JSON:", response.json())
# 4. Handling Errors (Requests does this for you)
try:
    response = requests.get('http://httpbin.org/status/404', timeout=5)
    response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx)
except requests.exceptions.HTTPError as err:
    print("HTTP Error:", err)
except requests.exceptions.RequestException as err:
    print("Other Error:", err)
# 5. Handling Cookies (Automatically!)
# Session objects persist cookies across requests
with requests.Session() as session:
    # Get a cookie
    session.get('http://httpbin.org/cookies/set/test_cookie/12345')
    # The cookie is now attached to the session
    response = session.get('http://httpbin.org/cookies')
    print("Cookies from session:", response.json())

Summary: `urllib2` vs. `requests`

Feature	`urllib2` (Python 2)	`requests` (Modern Python)
Ease of Use	Verbose, requires manual handling of many cases.	Simple, intuitive, "Pythonic" API.
JSON Handling	Manual: `json.loads(response.read())`	Automatic: `response.json()`
POST Data	Requires `urllib.urlencode()`.	Pass a dictionary directly.
Sessions/Cookies	Requires building an `opener` with `HTTPCookieProcessor`.	Built-in `Session` object handles cookies automatically.
Error Handling	Requires catching `URLError` and `HTTPError` separately.	`raise_for_status()` and a single `requests.exceptions.RequestException` base class.
Recommendation	Only for maintaining legacy Python 2 code.	The standard for all new Python projects.

Python urllib2如何处理HTTP请求？

Important: `urllib2` vs. `urllib` (Python 2) vs. `urllib` (Python 3)

`urllib2` in Python 2: Core Concepts

Making a Simple GET Request

Adding Custom Headers (e.g., User-Agent)

Making a POST Request

Handling HTTP Errors (e.g., 404 Not Found)

Handling Cookies

The Modern Alternative: The `requests` Library

Installation

`requests` Examples

Summary: `urllib2` vs. `requests`

99ANYc3cd6

NeHe OpenGL教程适合新手入门吗？

Java对象如何转List？

Python Linux Django如何协同开发部署？

JS和Java仅一字之差，实则有何本质区别？

Java中sleep和wait到底有何本质区别？

Python自然语言处理如何高效处理PDF文档？

Java如何调用Oracle存储过程？

Integer类如何实现自动装箱与拆箱？

Java如何实现XML到Excel的转换？

Java properties文件如何读取与配置？

Python for Everyone适合零基础学习吗？

Python锁acquire会阻塞吗？

Python scikit-learn如何快速上手机器学习？

Java微信公众台开发怎么入门？

Java class 与 Java 有何关联？

Java字符串如何转字符数组？

Python urllib2如何处理HTTP请求？

Important: urllib2 vs. urllib (Python 2) vs. urllib (Python 3)

urllib2 in Python 2: Core Concepts

Making a Simple GET Request

Adding Custom Headers (e.g., User-Agent)

Making a POST Request

Handling HTTP Errors (e.g., 404 Not Found)

Handling Cookies

The Modern Alternative: The requests Library

Installation

requests Examples

Summary: urllib2 vs. requests

相关推荐

Python自然语言处理如何高效处理PDF文档？

Important: `urllib2` vs. `urllib` (Python 2) vs. `urllib` (Python 3)

`urllib2` in Python 2: Core Concepts

The Modern Alternative: The `requests` Library

`requests` Examples

Summary: `urllib2` vs. `requests`