杰瑞科技汇

python httplib read

Of course! Let's dive into reading data with Python's httplib module.

python httplib read-图1
(图片来源网络,侵删)

First, a very important point: httplib was renamed to http.client in Python 3. The functionality is identical, but the name changed to follow a more consistent naming convention. The examples below will use http.client, but you can substitute httplib if you are using Python 2.

The core idea of reading data from an HTTP response involves a two-step process:

  1. Send a Request: You create a connection to a server and send an HTTP request (e.g., a GET request for a webpage).
  2. Read the Response: The server sends back an HTTP response. This response has two main parts:
    • Headers: Metadata about the response (e.g., content type, content length, status code).
    • Body: The actual data you requested (e.g., the HTML of a page, JSON data, an image file).

You read the body, which is typically a stream of data.


Key Methods for Reading the Response Body

Once you have a response object from a call like conn.getresponse(), you can use several methods to read its body:

python httplib read-图2
(图片来源网络,侵删)
Method Description When to Use
response.read() Reads the entire response body into memory and returns it as a single bytes object. Simple, small responses like JSON or a short HTML page. Be careful with large files!
response.read(size) Reads and returns the next size bytes from the response body. For streaming large files (images, videos, large logs) to avoid loading everything into RAM at once.
response.readline() Reads one line from the response body (until it finds a newline character \n). Useful when the body is structured as lines of text, like in some streaming APIs or log files.
response.readlines() Reads all remaining lines from the response body and returns them as a list of bytes objects. Similar to read(), but for line-based data. Use with caution for large bodies.

Example 1: Simple GET Request (Reading HTML)

This is the most common scenario. We'll fetch the HTML from httpbin.org, a fantastic service for testing HTTP requests.

import http.client
import ssl
# The host and the path of the resource you want to access
host = 'httpbin.org'
path = '/html'
# --- Step 1: Create a connection and send a request ---
# We use a 'with' statement to ensure the connection is automatically closed
try:
    # For modern HTTPS connections (recommended)
    # context = ssl.create_default_context()
    # conn = http.client.HTTPSConnection(host, context=context)
    # For this specific example, httpbin.org supports plain HTTP too
    conn = http.client.HTTPConnection(host)
    # Send a GET request for the specified path
    conn.request("GET", path)
    # --- Step 2: Get the response from the server ---
    response = conn.getresponse()
    # Check if the request was successful (status code 200)
    if response.status == 200:
        print(f"Status: {response.status} {response.reason}")
        print("Headers:")
        for header, value in response.getheaders():
            print(f"  {header}: {value}")
        print("-" * 20)
        # --- Step 3: Read the response body ---
        # response.read() returns the entire body as bytes
        body_bytes = response.read()
        # Since this is HTML, it's often best to decode it to a string
        # The encoding is usually specified in the 'Content-Type' header
        # We'll use a common default if not found.
        content_type = response.getheader('Content-Type')
        if 'charset=' in content_type:
            encoding = content_type.split('charset=')[-1]
        else:
            encoding = 'utf-8' # A safe default
        body_str = body_bytes.decode(encoding)
        print("First 200 characters of the body:")
        print(body_str[:200])
    else:
        print(f"Error: {response.status} {response.reason}")
except http.client.HTTPException as e:
    print(f"HTTP Error: {e}")
except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # --- Step 4: Close the connection ---
    if 'conn' in locals() and conn:
        conn.close()

Output:

Status: 200 OK
Headers:
  Date: Wed, 27 Sep 2025 10:30:00 GMT
  Content-Type: text/html; charset=utf-8
  Content-Length: 1365
  Connection: keep-alive
  Server: gunicorn/19.9.0
  Access-Control-Allow-Origin: *
  Access-Control-Allow-Credentials: true
--------------------
First 200 characters of the body:
<!DOCTYPE html>
<html>
<head>httpbin.org</title>
<link href="http://twitter.github.io/bootstrap/assets/css/bootstrap.css" rel="stylesheet"
type="text/css">
...

Example 2: Reading a Large File (Streaming with read(size))

Downloading a large file (e.g., a 100MB image) into memory with response.read() would crash your program. Instead, you should read it in chunks.

import http.client
import ssl
host = 'httpbin.org'
path = '/image/jpeg' # This endpoint returns a sample JPEG image
CHUNK_SIZE = 4096  # Read 4KB at a time
try:
    # Using HTTPS for a secure connection
    context = ssl.create_default_context()
    conn = http.client.HTTPSConnection(host, context=context)
    conn.request("GET", path)
    response = conn.getresponse()
    if response.status == 200:
        print(f"Downloading image from {host}{path}...")
        # Get the filename from the Content-Disposition header if available
        content_disposition = response.getheader('Content-Disposition')
        filename = 'downloaded_image.jpg'
        if content_disposition and 'filename=' in content_disposition:
            filename = content_disposition.split('filename=')[-1].strip('"')
        print(f"Saving to file: {filename}")
        # Open a file in binary write mode
        with open(filename, 'wb') as f:
            # Read the response in chunks
            while True:
                chunk = response.read(CHUNK_SIZE)
                if not chunk: # An empty chunk means the end of the stream
                    break
                f.write(chunk) # Write the chunk to the file
        print("Download complete!")
    else:
        print(f"Error: {response.status} {response.reason}")
except Exception as e:
    print(f"An error occurred: {e}")
finally:
    if 'conn' in locals() and conn:
        conn.close()

This example is much more memory-efficient because it never holds the entire file in memory at once. It reads a small piece, writes it to disk, and repeats until the file is complete.


Important Considerations

  1. Character Encodings: As seen in Example 1, response.read() returns bytes. You must decode it into a string (str) using the correct character encoding (e.g., utf-8, iso-8859-1). Check the Content-Type header for the charset parameter.
  2. Connection Management: Always close your connection when you're done. The with statement is the best way to handle this, as it guarantees the __exit__ method (which closes the connection) is called, even if errors occur.
  3. HTTPS: For any real-world application, you should be using HTTPS. The ssl.create_default_context() is the modern, secure way to handle SSL/TLS verification.
  4. Modern Alternatives: While http.client is powerful and built-in, for most applications, the higher-level requests library is strongly recommended. It abstracts away many of these low-level details (connection management, encoding, chunked transfer) and provides a much more user-friendly API.

Example using the requests library (for comparison):

import requests
url = 'https://httpbin.org/html'
try:
    response = requests.get(url)
    response.raise_for_status()  # Raises an exception for bad status codes (4xx or 5xx)
    # `response.text` automatically handles decoding to a string
    print(f"Status: {response.status_code}")
    print("First 200 characters of the body:")
    print(response.text[:200])
    # For binary data, use `response.content` (which is bytes)
    # and save it directly to a file
    image_url = 'https://httpbin.org/image/jpeg'
    img_response = requests.get(image_url, stream=True) # stream=True is key for large files
    with open('requests_image.jpg', 'wb') as f:
        for chunk in img_response.iter_content(chunk_size=8192):
            f.write(chunk)
    print("Downloaded image using requests library.")
except requests.exceptions.RequestException as e:
    print(f"An error occurred with the requests library: {e}")
python httplib read-图3
(图片来源网络,侵删)
分享:
扫描分享到社交APP
上一篇
下一篇