Python requests text如何高效获取网页内容？-杰瑞科技汇

Of course! The requests library in Python is the standard for making HTTP requests. Getting text from a URL is one of its most common uses.

（图片来源网络，侵删）

Here’s a complete guide, from the basics to more advanced and practical examples.

The Basic Request: `requests.get()`

The core of getting text is the requests.get() function, which sends an HTTP GET request to a specified URL. The server responds, and requests gives you an object that contains the server's response.

To get the text content, you use the .text attribute on the response object.

Step 1: Install the `requests` library

If you don't have it installed, open your terminal or command prompt and run:

（图片来源网络，侵删）

pip install requests

Step 2: Simple Example

This is the most basic way to fetch and print the text of a webpage.

import requests
# The URL you want to get text from
url = 'https://www.python.org'
try:
    # Send a GET request to the URL
    response = requests.get(url)
    # This will raise an exception for bad status codes (4xx or 5xx)
    response.raise_for_status()
    # Get the text content from the response
    # The .text attribute returns the content as a string
    page_text = response.text
    # Print the first 500 characters of the text
    print(page_text[:500])
except requests.exceptions.RequestException as e:
    # Handle any errors that occur during the request
    print(f"An error occurred: {e}")

What's happening here?

import requests: Imports the library.
requests.get(url): Sends the HTTP GET request. The server sends back a response, which is stored in the response object.
response.raise_for_status(): This is a good practice. It checks if the request was successful (status code 200-299). If not (e.g., 404 Not Found, 500 Server Error), it raises an HTTPError.
response.text: This is the key part. It decodes the response body (which is in bytes) into a string using the encoding specified in the response headers (e.g., Content-Type: text/html; charset=utf-8).

Important Attributes of the Response Object

When you get a response, it's not just text. The Response object contains a lot of useful information.

import requests
url = 'https://httpbin.org/get' # A great testing URL
response = requests.get(url)
# --- Status Code ---
# Indicates whether the request was successful (e.g., 200), not found (404), etc.
print(f"Status Code: {response.status_code}")
# --- Headers ---
# The headers sent by the server.
# Note: The 'requests' library adds its own headers (like 'User-Agent').
print("\nServer Headers:")
print(response.headers)
# --- Request Headers ---
# The headers that your request sent.
print("\nRequest Headers (sent by us):")
print(response.request.headers)
# --- Encoding ---
# The encoding used to decode the response content.
# requests tries to guess this from the headers.
print(f"\nEncoding: {response.encoding}")
# --- Raw Content (in bytes) ---
# The raw content of the response, as bytes.
# This is useful if you're dealing with non-text data or want to control the decoding.
print(f"\nRaw Content (first 50 bytes): {response.content[:50]}")

Handling Real-World Complications

In a real application, you'll need to handle more than just a simple request.

（图片来源网络，侵删）

a) Handling Errors

Networks are unreliable. The server might be down, the URL might be wrong, or you might lose your connection. Always wrap your requests in a try...except block.

import requests
from requests.exceptions import RequestException, Timeout, HTTPError
url = 'https://this-domain-does-not-exist.com'
try:
    # Set a timeout in seconds for the request and the read operation
    response = requests.get(url, timeout=5)
    # If the request was successful, raise_for_status() does nothing.
    # If not, it raises an HTTPError.
    response.raise_for_status()
    print("Success! The page loaded.")
    print(f"Text length: {len(response.text)}")
except HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}") # e.g., 404, 500
except Timeout as err:
    print(f"Request timed out: {err}")
except RequestException as err:
    # This is a catch-all for any requests-related errors
    print(f"An error occurred during the request: {err}")

b) Handling Different Encodings

Sometimes the server doesn't specify the encoding correctly, and response.text might look like gibberish. You can force requests to use a specific encoding.

import requests
# This URL is known to have an encoding issue if not handled correctly
url = 'https://www.nytimes.com/2025/10/27/us/politics/biden-polling.html'
try:
    response = requests.get(url)
    response.raise_for_status()
    # Let's see what encoding requests guessed
    print(f"Guessed Encoding: {response.encoding}") # Often 'ISO-8859-1' for problematic pages
    # The text might be corrupted
    # print(response.text) 
    # You can manually set the encoding. 'utf-8' is a common and safe choice.
    response.encoding = 'utf-8'
    # Now get the text with the correct encoding
    page_text = response.text
    print("\nSuccessfully decoded text with UTF-8:")
    print(page_text[:500])
except RequestException as e:
    print(f"An error occurred: {e}")

c) Adding Headers (e.g., User-Agent)

Some websites block default requests User-Agent strings because they know it's a script. To make your request look like it's coming from a real browser, you can add custom headers.

import requests
url = 'https://httpbin.org/user-agent' # This URL tells you what User-Agent it sees
# A common Chrome User-Agent string
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)
print(response.json()) # The server will echo back the User-Agent it received
# Output: {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}

Advanced: Streaming Large Responses

If you are downloading a very large text file (or a large file in general), you don't want to load it all into memory at once. You can "stream" the response, processing it chunk by chunk.

import requests
url = 'https://www.gutenberg.org/files/11/11-0.txt' # Alice in Wonderland (large text file)
try:
    # Use stream=True to download the content in chunks
    with requests.get(url, stream=True) as response:
        response.raise_for_status()
        line_count = 0
        # Iterate over the response content line by line
        # response.iter_lines() gives you chunks, decoded by default
        for line in response.iter_lines(decode_unicode=True):
            if line: # filter out keep-alive new lines
                # Process each line here
                line_count += 1
                # print(line) # Uncomment to see the lines
                # Stop after 100 lines for this example
                if line_count >= 100:
                    break
        print(f"\nSuccessfully read {line_count} lines.")
except RequestException as e:
    print(f"An error occurred: {e}")

Summary

Task	Code Snippet
Basic GET Request	`response = requests.get('https://example.com')`
Get Text Content	`text = response.text`
Check for Success	`response.raise_for_status()`
Handle Errors	Wrap in `try...except requests.exceptions.RequestException`
Set a Timeout	`requests.get(url, timeout=10)`
Set Custom Headers	`requests.get(url, headers={'User-Agent': '...'})`
Handle Encoding	`response.encoding = 'utf-8'`
Stream Large Files	`with requests.get(url, stream=True) as r:` `for chunk in r.iter_content(): ...`

Python requests text如何高效获取网页内容？

The Basic Request: `requests.get()`

Step 1: Install the `requests` library

Step 2: Simple Example

Important Attributes of the Response Object

Handling Real-World Complications

a) Handling Errors

b) Handling Different Encodings

c) Adding Headers (e.g., User-Agent)

Advanced: Streaming Large Responses

Summary

99ANYc3cd6

Java List如何高效转字符串？

Java环境变量PATH如何正确设置？

3D Max动画教程从哪开始学？

蛤蟆吃我世界联机怎么玩？

Python unittest视频该怎么学？

Java Socket编程实例具体怎么实现？

OpenGL开发教程该怎么学？

数据结构Java版第三版答案哪里找？

Python正则如何匹配回车符？

如何用Acrobat制作视频教程？

Java中char到底占几个字节？

dev c 的使用教程

Java对象属性对象为空，如何排查？

Python如何跨文件调用函数？

Java如何读取jar包内的文件？

GitHub客户端怎么用？新手入门指南。

Python requests text如何高效获取网页内容？

The Basic Request: requests.get()

Step 1: Install the requests library

Step 2: Simple Example

Important Attributes of the Response Object

Handling Real-World Complications

a) Handling Errors

b) Handling Different Encodings

c) Adding Headers (e.g., User-Agent)

Advanced: Streaming Large Responses

Summary

相关推荐

Java Socket编程实例具体怎么实现？

The Basic Request: `requests.get()`

Step 1: Install the `requests` library