Of course! Working with S3 storage using Python is a very common task, especially for data science, machine learning, and web applications. The primary library for this is boto3, which is the official AWS SDK for Python.

This guide will walk you through everything you need to know, from setup to common operations.
Prerequisites: Setting Up Your Environment
Before you write any Python code, you need to ensure you have the necessary credentials and permissions.
a. AWS Credentials
The easiest way to provide credentials to boto3 is through environment variables. This is more secure than hardcoding them in your script.
Set these two environment variables in your terminal:

AWS_ACCESS_KEY_ID: Your AWS Access Key ID.AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key.
On Linux/macOS:
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY" export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
On Windows (Command Prompt):
set AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY" set AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
For Jupyter Notebooks / Google Colab: You can set them directly in a notebook cell (though for production, IAM Roles are preferred).
import os os.environ["AWS_ACCESS_KEY_ID"] = "YOUR_ACCESS_KEY" os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_SECRET_KEY"
Security Best Practice: For production applications (like EC2 instances or Lambda functions), use IAM Roles instead of access keys. The SDK will automatically use the credentials assigned to the role.
(图片来源网络,侵删)
b. Install the boto3 Library
If you don't have it installed, open your terminal or command prompt and run:
pip install boto3
Connecting to S3 with boto3
The first step in any script is to create a "session" and then a "resource" or "client" object.
boto3.resource: A high-level, object-oriented interface. It's often easier to use for common tasks like listing buckets or objects.boto3.client: A low-level, service-oriented interface. It's more verbose but gives you more control and access to the full API. You use it when the resource interface doesn't have a method for what you need.
Example: Creating an S3 Resource
import boto3
# The resource will automatically find credentials from environment variables,
# IAM roles, or your ~/.aws/credentials file.
s3 = boto3.resource('s3')
# You can also specify a region
# s3 = boto3.resource('s3', region_name='us-east-1')
# To use a client instead:
# s3_client = boto3.client('s3')
Common S3 Operations with Python
Let's assume you have an S3 bucket named my-awesome-data-bucket-123.
a. List All Your Buckets
This is a great way to test your connection.
# Using the resource
s3 = boto3.resource('s3')
print("My S3 Buckets:")
for bucket in s3.buckets.all():
print(f"- {bucket.name}")
# Using the client
s3_client = boto3.client('s3')
response = s3_client.list_buckets()
print("\nMy S3 Buckets (using client):")
for bucket in response['Buckets']:
print(f"- {bucket['Name']}")
b. List Objects (Files) in a Bucket
To list the files in a specific bucket, you get the bucket object and then iterate through its objects.
bucket_name = 'my-awesome-data-bucket-123'
bucket = s3.Bucket(bucket_name)
print(f"Objects in '{bucket_name}':")
for obj in bucket.objects.all():
# The key is the "path" to the file in the bucket
print(f"- {obj.key} (Size: {obj.size} bytes)")
c. Upload a File to S3
This is one of the most frequent operations. You use the Bucket.upload_file() method.
import os
# Local file path
local_file_path = 'my_local_data.csv'
# S3 object name (the name you want it to have in the bucket)
s3_object_name = 'data/raw/my_data.csv'
# Upload the file
s3.Bucket(bucket_name).upload_file(local_file_path, s3_object_name)
print(f"Successfully uploaded {local_file_path} to s3://{bucket_name}/{s3_object_name}")
d. Download a File from S3
The reverse of uploading. Use Bucket.download_file().
# S3 object key to download
s3_object_key = 'data/raw/my_data.csv'
# Local file path to save the downloaded file
local_download_path = 'downloaded_data.csv'
# Download the file
s3.Bucket(bucket_name).download_file(s3_object_key, local_download_path)
print(f"Successfully downloaded s3://{bucket_name}/{s3_object_key} to {local_download_path}")
e. Delete a File from S3
You can delete a file by getting the object and calling its delete() method.
s3_object_to_delete = 'data/raw/my_data.csv'
# Get the object and delete it
s3.Object(bucket_name, s3_object_to_delete).delete()
print(f"Successfully deleted s3://{bucket_name}/{s3_object_to_delete}")
f. Delete an Entire Bucket
Warning: This is a destructive operation. The bucket must be empty before it can be deleted.
bucket_to_delete = 'my-old-empty-bucket-456'
# First, delete all objects in the bucket
bucket = s3.Bucket(bucket_to_delete)
bucket.objects.all().delete()
# Then, delete the bucket itself
s3.Bucket(bucket_to_delete).delete()
print(f"Successfully deleted bucket '{bucket_to_delete}'")
Advanced: Working with Large Files (Streaming)
Uploading or downloading very large files (e.g., >100MB) can be inefficient and memory-intensive if you load the entire file into memory. boto3 supports streaming, which is much better.
a. Streaming an Upload
You can open a file in binary read mode ('rb') and pass the file object directly. boto3 will read and upload it in chunks.
local_large_file = 'large_video.mp4'
s3_large_object_name = 'videos/large_video.mp4'
print("Starting large file upload...")
with open(local_large_file, 'rb') as data:
s3.Bucket(bucket_name).upload_fileobj(data, s3_large_object_name)
print("Large file upload complete.")
b. Streaming a Download
Similarly, you can open a destination file in binary write mode ('wb') and pass the file object to download_fileobj.
s3_large_key = 'videos/large_video.mp4'
local_download_large_file = 'downloaded_large_video.mp4'
print("Starting large file download...")
with open(local_download_large_file, 'wb') as data:
s3.Bucket(bucket_name).download_fileobj(s3_large_key, data)
print("Large file download complete.")
Putting It All Together: A Complete Example Script
Here is a script that demonstrates several of these operations.
import boto3
import os
# --- Configuration ---
# It's better to use environment variables for credentials in production
# For this example, we'll assume they are set.
BUCKET_NAME = 'my-awesome-data-bucket-123'
LOCAL_FILE_TO_UPLOAD = 'sample.txt'
S3_KEY_NAME = 'documents/sample.txt'
def main():
"""Demonstrates basic S3 operations with boto3."""
print("Initializing S3 resource...")
s3 = boto3.resource('s3')
# 1. Upload a file
if os.path.exists(LOCAL_FILE_TO_UPLOAD):
print(f"\nUploading '{LOCAL_FILE_TO_UPLOAD}' to 's3://{BUCKET_NAME}/{S3_KEY_NAME}'...")
s3.Bucket(BUCKET_NAME).upload_file(LOCAL_FILE_TO_UPLOAD, S3_KEY_NAME)
print("Upload successful.")
else:
print(f"Error: Local file '{LOCAL_FILE_TO_UPLOAD}' not found. Skipping upload.")
# Create a dummy file for the demo to run
with open(LOCAL_FILE_TO_UPLOAD, 'w') as f:
f.write("Hello from S3 Python script!")
print(f"Created a dummy file '{LOCAL_FILE_TO_UPLOAD}'.")
# 2. List objects in the bucket
print(f"\nListing objects in bucket '{BUCKET_NAME}':")
try:
bucket = s3.Bucket(BUCKET_NAME)
for obj in bucket.objects.all():
print(f" - {obj.key} (Size: {obj.size} bytes)")
except Exception as e:
print(f" Error listing bucket: {e}")
# 3. Download the file
print(f"\nDownloading 's3://{BUCKET_NAME}/{S3_KEY_NAME}' to 'downloaded_{LOCAL_FILE_TO_UPLOAD}'...")
s3.Bucket(BUCKET_NAME).download_file(S3_KEY_NAME, f'downloaded_{LOCAL_FILE_TO_UPLOAD}')
print("Download successful.")
# 4. Delete the file from S3
print(f"\nDeleting 's3://{BUCKET_NAME}/{S3_KEY_NAME}' from S3...")
s3.Object(BUCKET_NAME, S3_KEY_NAME).delete()
print("Delete successful.")
# 5. Verify deletion by listing again
print(f"\nVerifying deletion. Listing objects in bucket '{BUCKET_NAME}' again:")
try:
bucket = s3.Bucket(BUCKET_NAME)
if not list(bucket.objects.all()):
print(" Bucket is empty.")
else:
for obj in bucket.objects.all():
print(f" - {obj.key}")
except Exception as e:
print(f" Error listing bucket: {e}")
if __name__ == "__main__":
main()
Useful Resources
boto3Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html (The ultimate source of truth)- AWS S3 Service Documentation: https://docs.aws.amazon.com/s3/ (Explains the S3 service concepts)
- IAM Policy Generator: Use this to create the least-privilege IAM policy needed for your script. For example, a policy that only allows
s3:GetObject,s3:PutObject, ands3:ListBucketon a specific bucket.

