杰瑞科技汇

SAE Storage Python 如何高效操作与管理?

Of course! Working with S3 storage using Python is a very common task, especially for data science, machine learning, and web applications. The primary library for this is boto3, which is the official AWS SDK for Python.

SAE Storage Python 如何高效操作与管理?-图1
(图片来源网络,侵删)

This guide will walk you through everything you need to know, from setup to common operations.


Prerequisites: Setting Up Your Environment

Before you write any Python code, you need to ensure you have the necessary credentials and permissions.

a. AWS Credentials

The easiest way to provide credentials to boto3 is through environment variables. This is more secure than hardcoding them in your script.

Set these two environment variables in your terminal:

SAE Storage Python 如何高效操作与管理?-图2
(图片来源网络,侵删)
  • AWS_ACCESS_KEY_ID: Your AWS Access Key ID.
  • AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key.

On Linux/macOS:

export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

On Windows (Command Prompt):

set AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
set AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

For Jupyter Notebooks / Google Colab: You can set them directly in a notebook cell (though for production, IAM Roles are preferred).

import os
os.environ["AWS_ACCESS_KEY_ID"] = "YOUR_ACCESS_KEY"
os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_SECRET_KEY"

Security Best Practice: For production applications (like EC2 instances or Lambda functions), use IAM Roles instead of access keys. The SDK will automatically use the credentials assigned to the role.

SAE Storage Python 如何高效操作与管理?-图3
(图片来源网络,侵删)

b. Install the boto3 Library

If you don't have it installed, open your terminal or command prompt and run:

pip install boto3

Connecting to S3 with boto3

The first step in any script is to create a "session" and then a "resource" or "client" object.

  • boto3.resource: A high-level, object-oriented interface. It's often easier to use for common tasks like listing buckets or objects.
  • boto3.client: A low-level, service-oriented interface. It's more verbose but gives you more control and access to the full API. You use it when the resource interface doesn't have a method for what you need.

Example: Creating an S3 Resource

import boto3
# The resource will automatically find credentials from environment variables,
# IAM roles, or your ~/.aws/credentials file.
s3 = boto3.resource('s3')
# You can also specify a region
# s3 = boto3.resource('s3', region_name='us-east-1')
# To use a client instead:
# s3_client = boto3.client('s3')

Common S3 Operations with Python

Let's assume you have an S3 bucket named my-awesome-data-bucket-123.

a. List All Your Buckets

This is a great way to test your connection.

# Using the resource
s3 = boto3.resource('s3')
print("My S3 Buckets:")
for bucket in s3.buckets.all():
    print(f"- {bucket.name}")
# Using the client
s3_client = boto3.client('s3')
response = s3_client.list_buckets()
print("\nMy S3 Buckets (using client):")
for bucket in response['Buckets']:
    print(f"- {bucket['Name']}")

b. List Objects (Files) in a Bucket

To list the files in a specific bucket, you get the bucket object and then iterate through its objects.

bucket_name = 'my-awesome-data-bucket-123'
bucket = s3.Bucket(bucket_name)
print(f"Objects in '{bucket_name}':")
for obj in bucket.objects.all():
    # The key is the "path" to the file in the bucket
    print(f"- {obj.key} (Size: {obj.size} bytes)")

c. Upload a File to S3

This is one of the most frequent operations. You use the Bucket.upload_file() method.

import os
# Local file path
local_file_path = 'my_local_data.csv'
# S3 object name (the name you want it to have in the bucket)
s3_object_name = 'data/raw/my_data.csv'
# Upload the file
s3.Bucket(bucket_name).upload_file(local_file_path, s3_object_name)
print(f"Successfully uploaded {local_file_path} to s3://{bucket_name}/{s3_object_name}")

d. Download a File from S3

The reverse of uploading. Use Bucket.download_file().

# S3 object key to download
s3_object_key = 'data/raw/my_data.csv'
# Local file path to save the downloaded file
local_download_path = 'downloaded_data.csv'
# Download the file
s3.Bucket(bucket_name).download_file(s3_object_key, local_download_path)
print(f"Successfully downloaded s3://{bucket_name}/{s3_object_key} to {local_download_path}")

e. Delete a File from S3

You can delete a file by getting the object and calling its delete() method.

s3_object_to_delete = 'data/raw/my_data.csv'
# Get the object and delete it
s3.Object(bucket_name, s3_object_to_delete).delete()
print(f"Successfully deleted s3://{bucket_name}/{s3_object_to_delete}")

f. Delete an Entire Bucket

Warning: This is a destructive operation. The bucket must be empty before it can be deleted.

bucket_to_delete = 'my-old-empty-bucket-456'
# First, delete all objects in the bucket
bucket = s3.Bucket(bucket_to_delete)
bucket.objects.all().delete()
# Then, delete the bucket itself
s3.Bucket(bucket_to_delete).delete()
print(f"Successfully deleted bucket '{bucket_to_delete}'")

Advanced: Working with Large Files (Streaming)

Uploading or downloading very large files (e.g., >100MB) can be inefficient and memory-intensive if you load the entire file into memory. boto3 supports streaming, which is much better.

a. Streaming an Upload

You can open a file in binary read mode ('rb') and pass the file object directly. boto3 will read and upload it in chunks.

local_large_file = 'large_video.mp4'
s3_large_object_name = 'videos/large_video.mp4'
print("Starting large file upload...")
with open(local_large_file, 'rb') as data:
    s3.Bucket(bucket_name).upload_fileobj(data, s3_large_object_name)
print("Large file upload complete.")

b. Streaming a Download

Similarly, you can open a destination file in binary write mode ('wb') and pass the file object to download_fileobj.

s3_large_key = 'videos/large_video.mp4'
local_download_large_file = 'downloaded_large_video.mp4'
print("Starting large file download...")
with open(local_download_large_file, 'wb') as data:
    s3.Bucket(bucket_name).download_fileobj(s3_large_key, data)
print("Large file download complete.")

Putting It All Together: A Complete Example Script

Here is a script that demonstrates several of these operations.

import boto3
import os
# --- Configuration ---
# It's better to use environment variables for credentials in production
# For this example, we'll assume they are set.
BUCKET_NAME = 'my-awesome-data-bucket-123'
LOCAL_FILE_TO_UPLOAD = 'sample.txt'
S3_KEY_NAME = 'documents/sample.txt'
def main():
    """Demonstrates basic S3 operations with boto3."""
    print("Initializing S3 resource...")
    s3 = boto3.resource('s3')
    # 1. Upload a file
    if os.path.exists(LOCAL_FILE_TO_UPLOAD):
        print(f"\nUploading '{LOCAL_FILE_TO_UPLOAD}' to 's3://{BUCKET_NAME}/{S3_KEY_NAME}'...")
        s3.Bucket(BUCKET_NAME).upload_file(LOCAL_FILE_TO_UPLOAD, S3_KEY_NAME)
        print("Upload successful.")
    else:
        print(f"Error: Local file '{LOCAL_FILE_TO_UPLOAD}' not found. Skipping upload.")
        # Create a dummy file for the demo to run
        with open(LOCAL_FILE_TO_UPLOAD, 'w') as f:
            f.write("Hello from S3 Python script!")
        print(f"Created a dummy file '{LOCAL_FILE_TO_UPLOAD}'.")
    # 2. List objects in the bucket
    print(f"\nListing objects in bucket '{BUCKET_NAME}':")
    try:
        bucket = s3.Bucket(BUCKET_NAME)
        for obj in bucket.objects.all():
            print(f"  - {obj.key} (Size: {obj.size} bytes)")
    except Exception as e:
        print(f"  Error listing bucket: {e}")
    # 3. Download the file
    print(f"\nDownloading 's3://{BUCKET_NAME}/{S3_KEY_NAME}' to 'downloaded_{LOCAL_FILE_TO_UPLOAD}'...")
    s3.Bucket(BUCKET_NAME).download_file(S3_KEY_NAME, f'downloaded_{LOCAL_FILE_TO_UPLOAD}')
    print("Download successful.")
    # 4. Delete the file from S3
    print(f"\nDeleting 's3://{BUCKET_NAME}/{S3_KEY_NAME}' from S3...")
    s3.Object(BUCKET_NAME, S3_KEY_NAME).delete()
    print("Delete successful.")
    # 5. Verify deletion by listing again
    print(f"\nVerifying deletion. Listing objects in bucket '{BUCKET_NAME}' again:")
    try:
        bucket = s3.Bucket(BUCKET_NAME)
        if not list(bucket.objects.all()):
            print("  Bucket is empty.")
        else:
            for obj in bucket.objects.all():
                print(f"  - {obj.key}")
    except Exception as e:
        print(f"  Error listing bucket: {e}")
if __name__ == "__main__":
    main()

Useful Resources

分享:
扫描分享到社交APP
上一篇
下一篇