Python 3.2 urllib如何使用？-杰瑞科技汇

Of course. Let's dive into urllib in Python 3.2.

First, and most importantly, a critical piece of context:

Python 3.2 is a very old and unsupported version, released in 2011. It does not receive security updates, and its standard library, including urllib, is missing many features, bug fixes, and security patches present in modern Python.

The information below is for understanding legacy code or running on very old systems. For any new project, you must use a modern version of Python (3.6+) and the modern urllib library.

Overview of `urllib` in Python 3.2

In Python 3, urllib was reorganized into several submodules to provide a cleaner separation of concerns. In Python 3.2, you will primarily interact with these four:

urllib.request: For opening and reading URLs (like http:, ftp:, file:).
urllib.error: Contains exception classes raised by urllib.request.
urllib.parse: For parsing URLs into components (scheme, netloc, path, etc.).
urllib.robotparser: For parsing robots.txt files.

`urllib.request` - Making HTTP Requests

This is the core module for fetching data from the web.

Basic GET Request

The most common task is to download the content of a webpage.

import urllib.request
# The URL you want to fetch
url = 'http://example.com'
try:
    # urlopen() returns a file-like object
    with urllib.request.urlopen(url) as response:
        # Read the response content
        html = response.read()
        # The content is returned as bytes, so we decode it to a string
        html_string = html.decode('utf-8')
        print(f"Successfully fetched {len(html_string)} characters from {url}")
        # print(html_string) # Uncomment to see the HTML
except urllib.error.URLError as e:
    print(f"Failed to open URL: {e.reason}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Key points:

urllib.request.urlopen(url) opens the URL.
It's best practice to use a with statement, as it automatically handles closing the connection.
response.read() returns the entire content as a bytes object.
You must explicitly decode the bytes into a string (e.g., using .decode('utf-8')).

Adding Headers (e.g., User-Agent)

Many websites block requests that don't have a proper User-Agent header. In Python 3.2, you do this by creating a Request object.

import urllib.request
url = 'http://httpbin.org/user-agent' # A site that echoes back your headers
# Create a dictionary of headers
headers = {
    'User-Agent': 'MyCoolPythonScript/1.0 (http://mywebsite.com)',
    'Accept': 'text/html'
}
# Create a Request object with the URL and headers
req = urllib.request.Request(url, headers=headers)
try:
    with urllib.request.urlopen(req) as response:
        html = response.read().decode('utf-8')
        print(html)
        # Expected output: {"user-agent": "MyCoolPythonScript/1.0 (http://mywebsite.com)"}
except urllib.error.URLError as e:
    print(f"Failed to open URL: {e.reason}")

Making a POST Request

To send data via a POST request, you need to encode your data into bytes and pass it to the Request object.

import urllib.request
import urllib.parse
url = 'http://httpbin.org/post' # A site that echoes back POST data
# Data to be sent in the POST request
# It must be a dictionary of string keys and string values
post_data = {
    'username': 'test_user',
    'message': 'Hello from Python 3.2!'
}
# Encode the data into bytes
# The 'utf-8' encoding is standard
encoded_data = urllib.parse.urlencode(post_data).encode('utf-8')
# Create a Request object, passing the encoded data
req = urllib.request.Request(url, data=encoded_data, method='POST')
try:
    with urllib.request.urlopen(req) as response:
        response_body = response.read().decode('utf-8')
        print("POST request successful!")
        # print(response_body) # To see the server's response
except urllib.error.URLError as e:
    print(f"Failed to make POST request: {e.reason}")

`urllib.error` - Handling Errors

This module defines exceptions that urllib.request can raise.

urllib.error.URLError: A general error. It has a .reason attribute that tells you what went wrong (e.g., "connection timed out", "not found").
urllib.error.HTTPError: A more specific error for HTTP status codes like 404 (Not Found) or 500 (Server Error). It's a subclass of URLError and has additional attributes like .code (the status code) and .headers (the response headers).

import urllib.request
import urllib.error
url = 'http://example.com/nonexistent-page'
try:
    with urllib.request.urlopen(url) as response:
        print(response.read())
except urllib.error.HTTPError as e:
    print(f"HTTP Error occurred: {e.code} {e.reason}")
    # You can access headers like this:
    # print(e.headers)
except urllib.error.URLError as e:
    print(f"URL Error occurred: {e.reason}")

`urllib.parse` - Parsing URLs

This module is for breaking down URLs into their components or building them from parts.

import urllib.parse
url = 'http://www.example.com:80/path/to/page;params?query=name#fragment'
# Parse a URL into a 6-tuple (scheme, netloc, path, params, query, fragment)
parsed_url = urllib.parse.urlparse(url)
print(f"Scheme: {parsed_url.scheme}")
print(f"Netloc (domain + port): {parsed_url.netloc}")
print(f"Path: {parsed_url.path}")
print(f"Query (after ?): {parsed_url.query}")
print(f"Fragment (after #): {parsed_url.fragment}")
# --- Building a URL from components ---
# urlunparse() takes a 6-tuple
new_parts = ('https', 'newsite.com', '/search', '', 'q=python', '')
new_url = urllib.parse.urlunparse(new_parts)
print(f"\nNew URL: {new_url}")
# --- Encoding data for URLs ---
# Use urlencode for query parameters
query_params = {'q': 'python tutorial', 'page': '2'}
encoded_query = urllib.parse.urlencode(query_params)
print(f"\nEncoded Query String: {encoded_query}")

`urllib.robotparser` - Parsing `robots.txt`

This module helps you check if you are allowed to crawl a specific URL on a website.

import urllib.robotparser
# Create a RobotFileParser object
rp = urllib.robotparser.RobotFileParser()
# Set the URL for the website's robots.txt file
rp.set_url('http://example.com/robots.txt')
# Read and parse the robots.txt file
rp.read()
# Now you can check if you can access a URL
user_agent = 'MyCoolCrawler'
url_to_check = 'http://example.com/some-page/'
can_fetch = rp.can_fetch(user_agent, url_to_check)
if can_fetch:
    print(f"'{user_agent}' is allowed to fetch '{url_to_check}'")
else:
    print(f"'{user_agent}' is NOT allowed to fetch '{url_to_check}'")
# You can also get the crawl delay
# crawl_delay = rp.crawl_delay(user_agent)
# if crawl_delay:
#     print(f"Crawl delay for '{user_agent}': {crawl_delay} seconds")

Critical Recommendation: Use Modern Python and `requests`

For any serious development, you should use a modern Python version (3.6+) and the requests library. It is vastly superior to urllib in terms of simplicity, readability, and features.

Example of the same tasks using requests:

# First, install requests: pip install requests
import requests
# --- Basic GET Request ---
try:
    response = requests.get('http://example.com')
    response.raise_for_status()  # Raises an exception for bad status codes (4xx or 5xx)
    html = response.text  # Text is automatically decoded
    print(f"Successfully fetched {len(html)} characters from http://example.com")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
# --- POST Request with Headers ---
url = 'http://httpbin.org/post'
data = {'username': 'test_user', 'message': 'Hello from requests!'}
headers = {'User-Agent': 'MyCoolPythonScript/1.0'}
try:
    response = requests.post(url, data=data, headers=headers)
    response.raise_for_status()
    print("\nPOST request successful!")
    print(response.json()) # .json() automatically parses the JSON response
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

As you can see, requests handles encoding, headers, and JSON parsing automatically, making the code much cleaner and easier to write.

Python 3.2 urllib如何使用？

Overview of `urllib` in Python 3.2

`urllib.request` - Making HTTP Requests

Basic GET Request

Adding Headers (e.g., User-Agent)

Making a POST Request

`urllib.error` - Handling Errors

`urllib.parse` - Parsing URLs

`urllib.robotparser` - Parsing `robots.txt`

Critical Recommendation: Use Modern Python and `requests`

99ANYc3cd6

如何在Houdini中用Python创建按钮？

Java Oracle认证考试该怎么高效备考？

Java如何设置Button的大小？

Revit2025安装教程，详细步骤是什么？

GitHub Python 100项目是什么？

Java Socket编程实例具体怎么实现？

Python if name == main 有何作用？

Java如何高效判断对象是否为空？

Java enum 如何转为 int？

Python list(range)怎么用？

Python安装pip的步骤是什么？

Java如何连接MySQL数据库？

MongoDB下载安装步骤详解？

Python sort()中lambda如何自定义排序规则？

Python Unicode字符串如何处理？

!/usr/bin/env python 是Python脚本的标准开头吗？

Python 3.2 urllib如何使用？

Overview of urllib in Python 3.2

urllib.request - Making HTTP Requests

Basic GET Request

Adding Headers (e.g., User-Agent)

Making a POST Request

urllib.error - Handling Errors

urllib.parse - Parsing URLs

urllib.robotparser - Parsing robots.txt

Critical Recommendation: Use Modern Python and requests

相关推荐

Java Socket编程实例具体怎么实现？

Overview of `urllib` in Python 3.2

`urllib.request` - Making HTTP Requests

`urllib.error` - Handling Errors

`urllib.parse` - Parsing URLs

`urllib.robotparser` - Parsing `robots.txt`

Critical Recommendation: Use Modern Python and `requests`