python urllib2 api-杰瑞科技汇

Of course! Here is a comprehensive guide to using the urllib2 API in Python.

（图片来源网络，侵删）

Important Note: `urllib2` vs. `urllib` in Python 3

First, a crucial point for modern Python users:

In Python 2: The library was called urllib2.
In Python 3: The urllib2 module was merged and reorganized into urllib. The functionality is now split across three modules:
- urllib.request: For opening and reading URLs (the direct replacement for urllib2).
- urllib.error: Contains exception classes for errors raised by urllib.request.
- urllib.parse: For parsing URLs (this was part of urlib in Python 2).

This guide will focus on the Python 2 urllib2 API, as that's what your question specifically asked for. I will include a "Python 3 Equivalent" section for each major concept.

Core Concepts of `urllib2`

The urllib2 library is a standard library for fetching URLs. Its main components are:

urllib2.urlopen(url[, data[, timeout]]): The primary function for opening a URL. It returns a file-like object.
urllib2.Request: A class that represents a request to a URL. It allows you to add headers, set data, and use different HTTP methods.
urllib2.build_opener([handler, ...]): A function to create an "opener" object, which is more advanced than urlopen. It allows you to install handlers for things as cookies, HTTP authentication, and proxies.
urllib2.install_opener(opener): Installs the opener you built as the default opener for the module.
urllib2.HTTPError & urllib2.URLError: Exception classes for handling network and HTTP errors.

Basic URL Fetching (`urlopen`)

This is the simplest way to get the content of a webpage. urlopen returns a file-like object, so you can read from it just like a file.

（图片来源网络，侵删）

import urllib2
try:
    # The URL you want to open
    url = "http://www.example.com"
    # Open the URL
    response = urllib2.urlopen(url)
    # Get the HTTP response status code (e.g., 200 for OK)
    print "Response Status Code:", response.getcode()
    # Read the content of the response
    # .read() returns the entire content as a string
    html_content = response.read()
    # Print the first 200 characters of the HTML
    print "Content (first 200 chars):", html_content[:200]
    # It's good practice to close the response
    response.close()
except urllib2.URLError as e:
    print "Failed to reach a server."
    print "Reason:", e.reason
except urllib2.HTTPError as e:
    print "The server couldn't fulfill the request."
    print "Error code:", e.code
    print "Page:", e.read() # You can read the error page's content

Python 3 Equivalent:

# In Python 3, you use urllib.request
from urllib.request import urlopen
from urllib.error import URLError, HTTPError
try:
    url = "http://www.example.com"
    with urlopen(url) as response: # The 'with' statement handles closing
        html_content = response.read()
        print("Response Status Code:", response.status)
        print("Content (first 200 chars):", html_content[:200].decode('utf-8')) # Decode bytes to string
except URLError as e:
    print("Failed to reach a server.")
    print("Reason:", e.reason)
except HTTPError as e:
    print("The server couldn't fulfill the request.")
    print("Error code:", e.code)
    print("Page:", e.read().decode('utf-8'))

Adding Headers to a Request

Many websites require specific headers, like a User-Agent, to allow requests from scripts.

You must create a Request object to add headers.

import urllib2
url = "http://www.example.com"
# Create a Request object
req = urllib2.Request(url)
# Add a User-Agent header
# This makes the request look like it's coming from a standard browser
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36')
# Add another header, e.g., for Accept-Language
req.add_header('Accept-Language', 'en-US,en;q=0.9')
# Now open the Request object, not the URL directly
response = urllib2.urlopen(req)
html_content = response.read()
print "User-Agent header added successfully."
print "Content (first 200 chars):", html_content[:200]
response.close()

Python 3 Equivalent:

from urllib.request import Request, urlopen
url = "http://www.example.com"
req = Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36')
with urlopen(req) as response:
    html_content = response.read()
    print("User-Agent header added successfully.")
    print("Content (first 200 chars):", html_content[:200].decode('utf-8'))

Sending Data (POST Request)

To send data to a server (like a form submission), you use the data argument of urlopen. The data must be in a specific format: application/x-www-form-urlencoded. The urllib.urlencode function is perfect for this.

import urllib2
import urllib # Note: we need the 'urllib' module for urlencode
url = "http://www.example.com/submit-form"
# The data to be sent, as a dictionary
form_data = {
    'username': 'john_doe',
    'password': 'secret123'
}
# Encode the data into the correct format
# e.g., "username=john_doe&password=secret123"
encoded_data = urllib.urlencode(form_data)
# Note: When data is passed, urlopen automatically makes a POST request
# The request object is created internally by urlopen
req = urllib2.Request(url, data=encoded_data)
# You can still add headers
req.add_header('User-Agent', 'MyCoolScript/1.0')
try:
    response = urllib2.urlopen(req)
    response_content = response.read()
    print "POST request successful."
    print "Response:", response_content[:200]
    response.close()
except urllib2.HTTPError as e:
    print "Error during POST:", e.code, e.reason

Python 3 Equivalent:

from urllib.request import Request, urlopen
from urllib.parse import urlencode # urlencode is in urllib.parse in Python 3
url = "http://www.example.com/submit-form"
form_data = {
    'username': 'john_doe',
    'password': 'secret123'
}
encoded_data = urlencode(form_data).encode('utf-8') # Data must be bytes in Python 3
req = Request(url, data=encoded_data, method='POST') # Explicitly setting method is good practice
req.add_header('User-Agent', 'MyCoolScript/1.0')
try:
    with urlopen(req) as response:
        response_content = response.read()
        print("POST request successful.")
        print("Response:", response_content[:200].decode('utf-8'))
except HTTPError as e:
    print("Error during POST:", e.code, e.reason)

Handling Cookies

urllib2 has built-in support for cookies. The HTTPCookieProcessor handler manages them.

import urllib2
import cookielib # The module for cookie handling
# Create a cookie jar to store cookies
cookie_jar = cookielib.CookieJar()
# Create an opener that will handle cookies for us
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
# Install this opener as the default opener for the module
urllib2.install_opener(opener)
# Now, any request made with urlopen will automatically handle cookies
url = "http://www.example.com/login-page" # A hypothetical login page
response = urllib2.urlopen(url)
print "Visited login page. Cookies received:", cookie_jar
# Now visit another page that requires a session
url2 = "http://www.example.com/protected-page"
response2 = urllib2.urlopen(url2)
print "Visited protected page. Cookies are now in the jar."
print "Content:", response2.read()[:200]
response.close()
response2.close()

Python 3 Equivalent:

from urllib.request import build_opener, install_opener, Request, urlopen
from urllib.error import HTTPError
from http.cookiejar import CookieJar # CookieJar is in http.cookiejar
cookie_jar = CookieJar()
opener = build_opener(HTTPCookieProcessor(cookie_jar))
install_opener(opener)
url = "http://www.example.com/login-page"
try:
    with urlopen(url) as response:
        print("Visited login page. Cookies received:", cookie_jar)
    url2 = "http://www.example.com/protected-page"
    with urlopen(url2) as response2:
        print("Visited protected page. Cookies are now in the jar.")
        print("Content:", response2.read()[:200].decode('utf-8'))
except HTTPError as e:
    print("Error accessing protected page:", e.code, e.reason)

Handling HTTP Authentication (Basic Auth)

For websites that require username/password authentication, you can use HTTPBasicAuthHandler.

import urllib2
import base64 # For encoding the username:password string
# The URL that requires authentication
url = "http://www.example.com/protected"
# Create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password for the URL
# The realm can be None if you don't know it
top_level_url = "http://www.example.com/"
username = "user"
password = "pass"
password_mgr.add_password(None, top_level_url, username, password)
# Create the handler for basic authentication
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# Build an opener with this handler
opener = urllib2.build_opener(auth_handler)
# Install the opener
urllib2.install_opener(opener)
# Now, the request will include the authentication headers
try:
    response = urllib2.urlopen(url)
    print "Authentication successful."
    print "Content:", response.read()[:200]
    response.close()
except urllib2.HTTPError as e:
    print "Authentication failed. Error:", e.code, e.reason

Python 3 Equivalent:

from urllib.request import build_opener, install_opener, Request, urlopen
from urllib.error import HTTPError
from urllib.parse import urlparse
from http.cookiejar import CookieJar
from http.client import HTTPBasicAuthHandler
url = "http://www.example.com/protected"
top_level_url = "http://www.example.com/"
username = "user"
password = "pass"
password_mgr = HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, top_level_url, username, password)
auth_handler = HTTPBasicAuthHandler(password_mgr)
opener = build_opener(auth_handler)
install_opener(opener)
try:
    with urlopen(url) as response:
        print("Authentication successful.")
        print("Content:", response.read()[:200].decode('utf-8'))
except HTTPError as e:
    print("Authentication failed. Error:", e.code, e.reason)

Summary Table: `urllib2` vs. `urllib.request` (Python 3)

Task	Python 2 (`urllib2`)	Python 3 (`urllib.request`)
Open URL	`urllib2.urlopen(url)`	`urllib.request.urlopen(url)`
Create Request	`urllib2.Request(url)`	`urllib.request.Request(url)`
Add Header	`req.add_header('Key', 'Val')`	`req.add_header('Key', 'Val')`
Encode Data	`urllib.urlencode(dict)`	`urllib.parse.urlencode(dict).encode()`
Handle Cookies	`cookielib.CookieJar` `urllib2.HTTPCookieProcessor`	`http.cookiejar.CookieJar` `urllib.request.HTTPCookieProcessor`
Handle Auth	`urllib2.HTTPBasicAuthHandler`	`urllib.request.HTTPBasicAuthHandler`
Build Opener	`urllib2.build_opener(handler)`	`urllib.request.build_opener(handler)`
Install Opener	`urllib2.install_opener(opener)`	`urllib.request.install_opener(opener)`
HTTP Error	`urllib2.HTTPError`	`urllib.error.HTTPError`
URL Error	`urllib2.URLError`	`urllib.error.URLError`

python urllib2 api

Important Note: `urllib2` vs. `urllib` in Python 3

Core Concepts of `urllib2`

Basic URL Fetching (`urlopen`)

Adding Headers to a Request

Sending Data (POST Request)

Handling Cookies

Handling HTTP Authentication (Basic Auth)

Summary Table: `urllib2` vs. `urllib.request` (Python 3)

99ANYc3cd6

static关键字在Java中到底有何特殊作用？

Java如何判断字符串含特殊字符？

如何在Python中实现Holt-Winters预测模型？

Python教程视频教程哪个好？

Visual C++教程怎么学？从哪开始学？

Java Socket编程实例具体怎么实现？

Ubuntu U盘安装步骤有哪些？

Python RESTful Eve 如何快速上手？

iphone 4 刷机教程

上古5 MOD教程从哪开始学？

Python logging如何控制台输出配置？

Java isSelected()如何判断选项选中状态？

Java如何高效操作文件系统？

arcgis python 二次开发

Win7 ghost教程如何操作？

python远程连接linux服务器

python urllib2 api

Important Note: urllib2 vs. urllib in Python 3

Core Concepts of urllib2

Basic URL Fetching (urlopen)

Adding Headers to a Request

Sending Data (POST Request)

Handling Cookies

Handling HTTP Authentication (Basic Auth)

Summary Table: urllib2 vs. urllib.request (Python 3)

相关推荐

Java Socket编程实例具体怎么实现？

Important Note: `urllib2` vs. `urllib` in Python 3

Core Concepts of `urllib2`

Basic URL Fetching (`urlopen`)

Summary Table: `urllib2` vs. `urllib.request` (Python 3)