Of course! The requests library in Python is the standard for making HTTP requests. Getting text from a URL is one of its most common uses.

Here’s a complete guide, from the basics to more advanced and practical examples.
The Basic Request: requests.get()
The core of getting text is the requests.get() function, which sends an HTTP GET request to a specified URL. The server responds, and requests gives you an object that contains the server's response.
To get the text content, you use the .text attribute on the response object.
Step 1: Install the requests library
If you don't have it installed, open your terminal or command prompt and run:

pip install requests
Step 2: Simple Example
This is the most basic way to fetch and print the text of a webpage.
import requests
# The URL you want to get text from
url = 'https://www.python.org'
try:
# Send a GET request to the URL
response = requests.get(url)
# This will raise an exception for bad status codes (4xx or 5xx)
response.raise_for_status()
# Get the text content from the response
# The .text attribute returns the content as a string
page_text = response.text
# Print the first 500 characters of the text
print(page_text[:500])
except requests.exceptions.RequestException as e:
# Handle any errors that occur during the request
print(f"An error occurred: {e}")
What's happening here?
import requests: Imports the library.requests.get(url): Sends the HTTP GET request. The server sends back a response, which is stored in theresponseobject.response.raise_for_status(): This is a good practice. It checks if the request was successful (status code 200-299). If not (e.g., 404 Not Found, 500 Server Error), it raises anHTTPError.response.text: This is the key part. It decodes the response body (which is in bytes) into a string using the encoding specified in the response headers (e.g.,Content-Type: text/html; charset=utf-8).
Important Attributes of the Response Object
When you get a response, it's not just text. The Response object contains a lot of useful information.
import requests
url = 'https://httpbin.org/get' # A great testing URL
response = requests.get(url)
# --- Status Code ---
# Indicates whether the request was successful (e.g., 200), not found (404), etc.
print(f"Status Code: {response.status_code}")
# --- Headers ---
# The headers sent by the server.
# Note: The 'requests' library adds its own headers (like 'User-Agent').
print("\nServer Headers:")
print(response.headers)
# --- Request Headers ---
# The headers that your request sent.
print("\nRequest Headers (sent by us):")
print(response.request.headers)
# --- Encoding ---
# The encoding used to decode the response content.
# requests tries to guess this from the headers.
print(f"\nEncoding: {response.encoding}")
# --- Raw Content (in bytes) ---
# The raw content of the response, as bytes.
# This is useful if you're dealing with non-text data or want to control the decoding.
print(f"\nRaw Content (first 50 bytes): {response.content[:50]}")
Handling Real-World Complications
In a real application, you'll need to handle more than just a simple request.

a) Handling Errors
Networks are unreliable. The server might be down, the URL might be wrong, or you might lose your connection. Always wrap your requests in a try...except block.
import requests
from requests.exceptions import RequestException, Timeout, HTTPError
url = 'https://this-domain-does-not-exist.com'
try:
# Set a timeout in seconds for the request and the read operation
response = requests.get(url, timeout=5)
# If the request was successful, raise_for_status() does nothing.
# If not, it raises an HTTPError.
response.raise_for_status()
print("Success! The page loaded.")
print(f"Text length: {len(response.text)}")
except HTTPError as http_err:
print(f"HTTP error occurred: {http_err}") # e.g., 404, 500
except Timeout as err:
print(f"Request timed out: {err}")
except RequestException as err:
# This is a catch-all for any requests-related errors
print(f"An error occurred during the request: {err}")
b) Handling Different Encodings
Sometimes the server doesn't specify the encoding correctly, and response.text might look like gibberish. You can force requests to use a specific encoding.
import requests
# This URL is known to have an encoding issue if not handled correctly
url = 'https://www.nytimes.com/2025/10/27/us/politics/biden-polling.html'
try:
response = requests.get(url)
response.raise_for_status()
# Let's see what encoding requests guessed
print(f"Guessed Encoding: {response.encoding}") # Often 'ISO-8859-1' for problematic pages
# The text might be corrupted
# print(response.text)
# You can manually set the encoding. 'utf-8' is a common and safe choice.
response.encoding = 'utf-8'
# Now get the text with the correct encoding
page_text = response.text
print("\nSuccessfully decoded text with UTF-8:")
print(page_text[:500])
except RequestException as e:
print(f"An error occurred: {e}")
c) Adding Headers (e.g., User-Agent)
Some websites block default requests User-Agent strings because they know it's a script. To make your request look like it's coming from a real browser, you can add custom headers.
import requests
url = 'https://httpbin.org/user-agent' # This URL tells you what User-Agent it sees
# A common Chrome User-Agent string
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)
print(response.json()) # The server will echo back the User-Agent it received
# Output: {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
Advanced: Streaming Large Responses
If you are downloading a very large text file (or a large file in general), you don't want to load it all into memory at once. You can "stream" the response, processing it chunk by chunk.
import requests
url = 'https://www.gutenberg.org/files/11/11-0.txt' # Alice in Wonderland (large text file)
try:
# Use stream=True to download the content in chunks
with requests.get(url, stream=True) as response:
response.raise_for_status()
line_count = 0
# Iterate over the response content line by line
# response.iter_lines() gives you chunks, decoded by default
for line in response.iter_lines(decode_unicode=True):
if line: # filter out keep-alive new lines
# Process each line here
line_count += 1
# print(line) # Uncomment to see the lines
# Stop after 100 lines for this example
if line_count >= 100:
break
print(f"\nSuccessfully read {line_count} lines.")
except RequestException as e:
print(f"An error occurred: {e}")
Summary
| Task | Code Snippet |
|---|---|
| Basic GET Request | response = requests.get('https://example.com') |
| Get Text Content | text = response.text |
| Check for Success | response.raise_for_status() |
| Handle Errors | Wrap in try...except requests.exceptions.RequestException |
| Set a Timeout | requests.get(url, timeout=10) |
| Set Custom Headers | requests.get(url, headers={'User-Agent': '...'}) |
| Handle Encoding | response.encoding = 'utf-8' |
| Stream Large Files | with requests.get(url, stream=True) as r: for chunk in r.iter_content(): ... |
