杰瑞科技汇

python urllib2 302

Of course! The question "python urllib2 302" is a classic one for developers working with older Python code. It touches on a fundamental concept of web interaction: HTTP redirection.

Let's break it down.

What is a 302 Redirect?

First, it's important to understand what a 302 status code means.

  • Status Code 302: "Found" or "Moved Temporarily".
  • Meaning: When your client (like a Python script using urllib2) asks a server for a resource at URL A, the server responds with a 302 status code and a Location header in the response. The Location header tells the client: "The resource you're looking for isn't here permanently. Please go get it from this new URL, B."

The key point is that the server is suggesting a temporary move. Historically, this was meant for things like load balancing or A/B testing, where the destination might change again. Today, 301 (Permanent Redirect) is often preferred for permanent moves, but 302 is still very common.

The urllib2 Behavior (Python 2)

In Python 2, the urllib2 library handles 302 redirects automatically by default. This is often the source of confusion because you might not realize a redirect has happened.

When you call urllib2.urlopen():

  1. It sends a request to the original URL.
  2. If it receives a 301 or 302 response, it automatically looks at the Location header.
  3. It then sends a new request to the URL specified in the Location header.
  4. It returns the response from this second request.

You, the programmer, only see the final content from the final URL. You don't see the initial 302 response or the second request that urllib2 made for you.


Example: Seeing the Automatic Redirect

Let's write a simple script that demonstrates this automatic behavior.

# This script is for Python 2
import urllib2
import sys
# A URL that is well-known for redirecting (302) to its final destination.
# The HTTPBin service is perfect for this.
redirect_url = 'http://httpbin.org/redirect/1' # This will redirect to http://httpbin.org/get
print "--- Using urllib2.urlopen() ---"
try:
    # This single line handles the redirect automatically.
    response = urllib2.urlopen(redirect_url)
    # The URL of the response object is the FINAL URL, not the one we requested.
    print "Requested URL:  " + redirect_url
    print "Final URL:      " + response.geturl()
    print "Status Code:    " + str(response.getcode())
    print "Response Body (first 100 chars):"
    print response.read(100)
except urllib2.URLError as e:
    print "An error occurred:", e.reason
    if hasattr(e, 'code'):
        print "Error code:", e.code
print "\n" + "="*40 + "\n"
# --- How to PREVENT the redirect ---
print "--- Preventing the redirect with HTTPRedirectHandler ---"
# To see the 302 response itself, you need to install a custom handler
# that doesn't follow redirects.
# Create an opener that ignores redirects
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler)
# The open() method of the opener will raise an exception on a redirect
try:
    response = opener.open(redirect_url)
except urllib2.HTTPError as e:
    # We expect an HTTPError because the server returned a 302,
    # and our handler doesn't follow it.
    print "An HTTPError occurred as expected."
    print "Error code:    " + str(e.code)
    print "Final URL:     " + e.geturl() # This is the original URL we tried
    print "Response Headers:"
    print e.headers
    print "\nThis shows the server's 'Location' header:"
    # The headers are an instance of httplib.HTTPMessage, so we access it like a dict
    if 'Location' in e.headers:
        print "  Location:", e.headers['Location']

How to Run This (Python 2)

Save the code as urllib2_example.py and run it with python urllib2_example.py.

Expected Output:

--- Using urllib2.urlopen() ---
Requested URL:  http://httpbin.org/redirect/1
Final URL:      http://httpbin.org/get
Status Code:    200
Response Body (first 100 chars):
{
  "args": {}, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/2.7", 
    "X-Forwarded-For": "...", 
    "X-Forwarded-Proto": "http"
  }, 
  "origin": "...", 
  "url": "http://httpbin.org/get"
}
========================================
--- Preventing the redirect with HTTPRedirectHandler ---
An HTTPError occurred as expected.
Error code:    302
Final URL:     http://httpbin.org/redirect/1
Response Headers:
Date: ..., ...
Content-Type: text/html; charset=utf-8
Content-Length: 236
Connection: keep-alive
Location: http://httpbin.org/get
Server: gunicorn/19.7.1
This shows the server's 'Location' header:
  Location: http://httpbin.org/get

Key Takeaways from the Example

  1. Automatic Following: urllib2.urlopen() hides the redirect process from you. It's convenient but can be surprising if you're debugging or need to know the final URL.
  2. response.geturl(): This is the most important method to use when you suspect a redirect might have occurred. It always returns the URL of the resource you ultimately received, not the one you originally requested.
  3. Preventing Redirects: If you need to handle the 302 response yourself (e.g., to get the Location header or to build a custom redirect logic), you can build a custom opener with urllib2.build_opener() and a HTTPRedirectHandler. By default, this handler will not follow redirects, and the server will return an HTTPError with code 302, which you can then inspect.

The Modern Alternative: requests (Python 2 & 3)

urllib2 (and its successor urllib in Python 3) can be clunky. The requests library is the de-facto standard for HTTP in Python today because it's much more intuitive and powerful.

Here's how you would handle the same scenario with requests.

Installation

pip install requests

Example with requests

# This script works in both Python 2 and 3
import requests
redirect_url = 'http://httpbin.org/redirect/1'
print "--- Using requests.get() (Default: Follows redirects) ---"
try:
    # requests follows redirects by default, just like urllib2
    response = requests.get(redirect_url)
    # The 'url' attribute of the response object is the FINAL URL
    print "Requested URL:  " + redirect_url
    print "Final URL:      " + response.url
    print "Status Code:    " + str(response.status_code)
    print "Response Body (first 100 chars):"
    print response.text[:100]
except requests.exceptions.RequestException as e:
    print "An error occurred:", e
print "\n" + "="*40 + "\n"
print "--- Preventing the redirect with allow_redirects=False ---"
# To prevent redirects, you simply pass a parameter.
# This is much cleaner than building a custom opener!
try:
    # allow_redirects=False tells requests to stop and return the 302 response
    response = requests.get(redirect_url, allow_redirects=False)
    print "Status Code:    " + str(response.status_code)
    print "Final URL:      " + response.url # This is the original URL
    print "Response Headers:"
    print response.headers
    print "\nThis shows the server's 'Location' header:"
    print "  Location:", response.headers['Location']
except requests.exceptions.RequestException as e:
    print "An error occurred:", e

requests Output

--- Using requests.get() (Default: Follows redirects) ---
Requested URL:  http://httpbin.org/redirect/1
Final URL:      http://httpbin.org/get
Status Code:    200
Response Body (first 100 chars):
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "...", 
    "X-Forwarded-For": "...", 
    "X-Forwarded-Proto": "http"
  }, 
  "origin": "...", 
  "url": "httpbin.org/get"
}
========================================
--- Preventing the redirect with allow_redirects=False ---
Status Code:    302
Final URL:      http://httpbin.org/redirect/1
Response Headers:
Date: ..., ...
Content-Type: text/html; charset=utf-8
Content-Length: 236
Connection: keep-alive
Location: http://httpbin.org/get
Server: gunicorn/19.9.0
This shows the server's 'Location' header:
  Location: http://httpbin.org/get

Summary

Feature urllib2 (Python 2) requests (Modern)
Default Behavior Follows 302 redirects automatically. Follows redirects automatically.
How to see final URL response.geturl() response.url
How to prevent redirect Build custom opener with HTTPRedirectHandler. Pass allow_redirects=False to the request method.
Handling 302 response Results in an HTTPError exception that you can catch and inspect. Results in a successful response with status_code == 302 when allow_redirects=False.
Ease of Use Verbose, requires manual opener building for advanced cases. Simple, intuitive, and powerful API.

Recommendation: If you are writing new code, or can update old code, use the requests library. It makes handling HTTP, including redirects, cookies, authentication, and timeouts, significantly easier. Only stick with urllib2 if you are working on a legacy Python 2 project where you cannot add new dependencies.

分享:
扫描分享到社交APP
上一篇
下一篇