Python urlopen如何处理HTTPS请求？-杰瑞科技汇

Of course! Here’s a comprehensive guide on how to use Python's urllib.request.urlopen to make HTTPS requests, covering the basics, best practices, and important security considerations.

（图片来源网络，侵删）

The Short Answer: Basic Usage

Here is the simplest way to open an HTTPS URL using urlopen. This works for both HTTP and HTTPS.

from urllib.request import urlopen
try:
    # The 'with' statement ensures the connection is properly closed
    with urlopen('https://www.python.org') as response:
        # Read the response data (returns bytes)
        html = response.read()
        # Decode the bytes to a string (e.g., using UTF-8)
        html_string = html.decode('utf-8')
        print(f"Successfully fetched {len(html_string)} characters.")
        # print(html_string[:200]) # Print the first 200 characters
except Exception as e:
    print(f"An error occurred: {e}")

Key Components Explained

from urllib.request import urlopen: This imports the specific function we need from the standard library.
with urlopen(...) as response:: This is the recommended way to use urlopen.
- It opens the connection to the URL.
- The as response part assigns the returned object to the response variable. This object is like a file object.
- The with statement guarantees that response.close() is called automatically when the block is exited, even if an error occurs. This is crucial for managing network resources.
response.read(): This method reads the entire content of the response from the server. For a webpage, this will be the HTML content. It returns the data as bytes.
.decode('utf-8'): Since response.read() returns bytes, you usually need to decode it into a string. UTF-8 is a common encoding for web pages.

Handling Different Scenarios (Advanced Usage)

In a real-world application, you'll need to handle more than just a simple GET request. You might need to add headers, send data (POST request), or handle errors gracefully.

Adding Headers (e.g., User-Agent)

Some websites block requests that don't look like they're coming from a real browser. You can add a User-Agent header to mimic a browser.

from urllib.request import Request, urlopen
url = 'https://httpbin.org/user-agent' # A site that echoes back your headers
# Create a Request object to add headers
request = Request(url, 
                  headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
try:
    with urlopen(request) as response:
        data = response.read().decode('utf-8')
        print(data)
        # Output will be something like:
        # {
        #   "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
        # }
except Exception as e:
    print(f"An error occurred: {e}")

Sending Data (POST Request)

To send data (like from a form), you need to encode it into bytes and pass it to the urlopen function.

（图片来源网络，侵删）

from urllib.request import Request, urlopen
from urllib.parse import urlencode
url = 'https://httpbin.org/post' # A site that echoes back the POST data
# Data to send (must be a dictionary)
post_data = {'username': 'testuser', 'message': 'Hello from Python!'}
# Encode the data into bytes
data_to_send = urlencode(post_data).encode('utf-8')
# Create a request object
request = Request(url, data=data_to_send, method='POST')
try:
    with urlopen(request) as response:
        response_data = response.read().decode('utf-8')
        print("Successfully sent POST request.")
        # print(response_data) # You will see the data you sent echoed back
except Exception as e:
    print(f"An error occurred: {e}")

Handling HTTP Errors (Status Codes)

If the server returns an error (like 404 Not Found or 500 Internal Server Error), urlopen raises an HTTPError. You should catch this specific error.

from urllib.request import urlopen
from urllib.error import HTTPError, URLError
url = 'https://www.python.org/non-existent-page'
try:
    with urlopen(url) as response:
        print(response.read().decode('utf-8'))
except HTTPError as e:
    # This block catches HTTP errors (e.g., 404, 500)
    print(f"HTTP Error Occurred: {e.code} {e.reason}")
    # You can still read the error page content if available
    # error_page = e.read().decode('utf-8')
    # print(error_page)
except URLError as e:
    # This catches other URL-related errors (e.g., DNS failure)
    print(f"URL Error Occurred: {e.reason}")
except Exception as e:
    # A catch-all for any other unexpected errors
    print(f"An unexpected error occurred: {e}")

Security and Best Practices: SSL/TLS Verification

This is the most important part when dealing with HTTPS.

By default, urlopen verifies the SSL certificate of the website. This means it checks:

Is the certificate valid? (not expired, revoked)
Does the hostname in the URL match the hostname in the certificate? (prevents man-in-the-middle attacks)

This is good and secure! However, there are common situations where you might need to handle this differently.

（图片来源网络，侵删）

The Problem: Self-Signed Certificates

If you are connecting to a server with a self-signed certificate (common in development, corporate intranets, or IoT devices), the default verification will fail with a ssl.SSLError.

How to handle it (with caution!)

You can tell urlopen to ignore SSL verification. This makes your connection insecure and should only be done in trusted, controlled environments like a development server.

import ssl
from urllib.request import urlopen
# WARNING: This is insecure. Only use for development/testing.
url = 'https://localhost:8443' # Example with a self-signed cert
# Create an unverified SSL context
# This disables all SSL verification
unverified_context = ssl._create_unverified_context()
try:
    # Pass the context to urlopen
    with urlopen(url, context=unverified_context) as response:
        print("Successfully connected with SSL verification disabled.")
        print(response.read().decode('utf-8'))
except Exception as e:
    print(f"An error occurred: {e}")

Modern Alternatives: `requests` Library

While urllib is built-in, the third-party requests library is vastly more popular and user-friendly. It simplifies all the tasks shown above into a clean, intuitive API.

Installation:

pip install requests

Example using requests:

import requests
# GET request with headers
url = 'https://httpbin.org/user-agent'
headers = {'User-Agent': 'MyCoolApp/1.0'}
try:
    response = requests.get(url, headers=headers)
    # Automatically raises an exception for bad status codes (4xx or 5xx)
    response.raise_for_status() 
    # .text automatically decodes the content
    print(response.json()) # Parses JSON response directly
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
# POST request with data
post_url = 'https://httpbin.org/post'
data_to_send = {'key': 'value'}
try:
    response = requests.post(post_url, json=data_to_send) # Use json= for JSON data
    response.raise_for_status()
    print("POST request successful.")
    print(response.json())
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Summary: `urllib` vs. `requests`

Feature	`urllib.request` (Standard Library)	`requests` (Third-Party)
Ease of Use	More verbose, less intuitive	Very simple, Pythonic API
Installation	Built-in, no installation needed	`pip install requests` required
JSON Handling	Requires manual parsing (`json.loads()`)	Automatic with `response.json()`
Sessions	Manual cookie handling	`requests.Session()` for persistence
SSL Verification	Requires manual context creation	Simple `verify=False` option (use with caution)
Documentation	Good, but can be dense	Excellent, widely praised

Recommendation:

For quick scripts, small projects, or environments where you can't install packages, urllib is perfectly fine.
For almost any other development, the requests library is the recommended choice due

Python urlopen如何处理HTTPS请求？

The Short Answer: Basic Usage

Key Components Explained

Handling Different Scenarios (Advanced Usage)

Adding Headers (e.g., User-Agent)

Sending Data (POST Request)

Handling HTTP Errors (Status Codes)

Security and Best Practices: SSL/TLS Verification

The Problem: Self-Signed Certificates

Modern Alternatives: `requests` Library

Summary: `urllib` vs. `requests`

99ANYc3cd6

java 使用javascript

java byte integer

labeledpoint python

java nio selector

java websocket 聊天

androidkiller安装教程

anoconda下sudo python会出问题吗？

turtle.shape()怎么改自定义形状？

jquery mobile教程

Java连接WebSphere，配置与常见问题解析？

collection 遍历 java

unity3d shader 教程

Visual Basic程序设计教程该如何入门学习？

Java Servlet文件上传如何实现？

TornadoRedis如何高效连接与使用？

郭天祥51单片机C语言教程适合零基础学吗？

Python urlopen如何处理HTTPS请求？

The Short Answer: Basic Usage

Key Components Explained

Handling Different Scenarios (Advanced Usage)

Adding Headers (e.g., User-Agent)

Sending Data (POST Request)

Handling HTTP Errors (Status Codes)

Security and Best Practices: SSL/TLS Verification

The Problem: Self-Signed Certificates

Modern Alternatives: requests Library

Summary: urllib vs. requests

相关推荐

androidkiller安装教程

Modern Alternatives: `requests` Library

Summary: `urllib` vs. `requests`