杰瑞科技汇

python urllib status

Of course! When you ask about the "status" in Python's urllib library, you're almost certainly referring to the HTTP status code. This is a three-digit number returned by a web server that indicates the result of an HTTP request (e.g., 200 for OK, 404 for Not Found).

python urllib status-图1
(图片来源网络,侵删)

Here's a complete guide on how to get and use HTTP status codes with urllib, broken down by the library's modules.

The Core Concept: The status Attribute

When you make a request with urllib, the response object you get back has a status attribute. This attribute holds the integer HTTP status code.


Using urllib.request (The Modern, Recommended Approach)

This is the standard way to make HTTP requests in Python 3. You use urllib.request.urlopen() to get a response object, and then you check its .status attribute.

Basic Example: Getting the Status Code

This example demonstrates the most common use case: fetching a URL and checking if the request was successful.

python urllib status-图2
(图片来源网络,侵删)
import urllib.request
import urllib.error
# The URL we want to check
url = "https://www.example.com"
try:
    # urlopen returns a response object
    with urllib.request.urlopen(url) as response:
        # The status code is an attribute of the response object
        status_code = response.status
        print(f"URL: {url}")
        print(f"Status Code: {status_code}")
        print(f"Reason: {response.reason}") # e.g., 'OK'
        print(f"Headers: {response.headers}")
except urllib.error.HTTPError as e:
    # This block catches HTTP errors (like 404, 500, etc.)
    print(f"Error: The server could not fulfill the request.")
    print(f"Error Code: {e.code}")
    print(f"Error Reason: {e.reason}")
    # You can also get headers from the error response
    print(f"Error Headers: {e.headers}")
except urllib.error.URLError as e:
    # This block catches other URL-related errors (like no internet, DNS failure)
    print(f"Error: Failed to reach a server.")
    print(f"Reason: {e.reason}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Output for https://www.example.com:

URL: https://www.example.com
Status Code: 200
Reason: OK
Headers: Content-Type: text/html; charset=UTF-8
Server: ECS (dcb/7F83)
...

Output for a non-existent URL (e.g., https://www.example.com/nonexistent-page):

Error: The server could not fulfill the request.
Error Code: 404
Error Reason: Not Found
Error Headers: ...

Key Components Explained:

  • urllib.request.urlopen(url): Opens the URL and returns a file-like object (http.client.HTTPResponse).
  • response.status: The integer status code (e.g., 200).
  • response.reason: The human-readable reason phrase for the status code (e.g., 'OK', 'Not Found').
  • response.headers: A dictionary-like object containing the response headers.
  • try...except block: This is crucial for robust code. If the server returns an error status code (like 404 or 500), urlopen raises an HTTPError. If there's a network problem (like no connection), it raises a URLError.

Using urllib.error

You don't use urllib.error directly to make requests, but you use it to handle the exceptions that urllib.request can raise.

  • urllib.error.HTTPError: Raised when the server returns an HTTP error code (4xx or 5xx). As shown above, this exception object has its own .code, .reason, and .headers attributes, which are very useful.
  • urllib.error.URLError: Raised for more general, non-HTTP-related errors, like a network being down or an invalid URL.

Using urllib.parse (For Building URLs)

While not directly related to the status of a response, urllib.parse is often used alongside urllib.request to construct valid URLs, especially when adding query parameters.

python urllib status-图3
(图片来源网络,侵删)
from urllib.parse import urlencode
from urllib.request import urlopen
base_url = "https://httpbin.org/get"
params = {'key1': 'value1', 'key2': 'value2'}
# Encode the parameters into a query string
# 'key1=value1&key2=value2'
query_string = urlencode(params)
# Combine the base URL and the query string
full_url = f"{base_url}?{query_string}"
print(f"Requesting URL: {full_url}")
with urlopen(full_url) as response:
    print(f"Status Code: {response.status}")
    # The response body will show the URL and params the server received
    print(response.read().decode('utf-8'))

Common HTTP Status Codes

Here's a quick reference for the status codes you'll encounter most often:

Code Name Meaning
200 OK The request was successful. This is what you want to see.
301 Moved Permanently The resource has a new, permanent URL. The browser will usually follow this automatically.
302 Found The resource has a temporary new URL. Similar to 301.
401 Unauthorized Authentication is required. You need to log in.
403 Forbidden You do not have permission to access the resource, even if you are authenticated.
404 Not Found The resource you requested does not exist on the server.
500 Internal Server Error A generic server error. Something went wrong on the server side.

Summary and Best Practices

  1. Use urllib.request for making requests.
  2. Always use a try...except block to handle HTTPError and URLError.
  3. Check response.status inside the try block for successful requests (typically status 200).
  4. Check e.code inside the except urllib.error.HTTPError as e: block for error codes.
  5. For complex applications, consider using the more powerful and user-friendly requests library, which simplifies many of these tasks. However, urllib is great for simple scripts and is part of Python's standard library.
分享:
扫描分享到社交APP
上一篇
下一篇