SAE Storage Python 如何高效操作与管理？-杰瑞科技汇

Of course! Working with S3 storage using Python is a very common task, especially for data science, machine learning, and web applications. The primary library for this is boto3, which is the official AWS SDK for Python.

（图片来源网络，侵删）

This guide will walk you through everything you need to know, from setup to common operations.

Prerequisites: Setting Up Your Environment

Before you write any Python code, you need to ensure you have the necessary credentials and permissions.

a. AWS Credentials

The easiest way to provide credentials to boto3 is through environment variables. This is more secure than hardcoding them in your script.

Set these two environment variables in your terminal:

（图片来源网络，侵删）

AWS_ACCESS_KEY_ID: Your AWS Access Key ID.
AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key.

On Linux/macOS:

export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

On Windows (Command Prompt):

set AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
set AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

For Jupyter Notebooks / Google Colab: You can set them directly in a notebook cell (though for production, IAM Roles are preferred).

import os
os.environ["AWS_ACCESS_KEY_ID"] = "YOUR_ACCESS_KEY"
os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_SECRET_KEY"

Security Best Practice: For production applications (like EC2 instances or Lambda functions), use IAM Roles instead of access keys. The SDK will automatically use the credentials assigned to the role.
（图片来源网络，侵删）

b. Install the `boto3` Library

If you don't have it installed, open your terminal or command prompt and run:

pip install boto3

Connecting to S3 with `boto3`

The first step in any script is to create a "session" and then a "resource" or "client" object.

boto3.resource: A high-level, object-oriented interface. It's often easier to use for common tasks like listing buckets or objects.
boto3.client: A low-level, service-oriented interface. It's more verbose but gives you more control and access to the full API. You use it when the resource interface doesn't have a method for what you need.

Example: Creating an S3 Resource

import boto3
# The resource will automatically find credentials from environment variables,
# IAM roles, or your ~/.aws/credentials file.
s3 = boto3.resource('s3')
# You can also specify a region
# s3 = boto3.resource('s3', region_name='us-east-1')
# To use a client instead:
# s3_client = boto3.client('s3')

Common S3 Operations with Python

Let's assume you have an S3 bucket named my-awesome-data-bucket-123.

a. List All Your Buckets

This is a great way to test your connection.

# Using the resource
s3 = boto3.resource('s3')
print("My S3 Buckets:")
for bucket in s3.buckets.all():
    print(f"- {bucket.name}")
# Using the client
s3_client = boto3.client('s3')
response = s3_client.list_buckets()
print("\nMy S3 Buckets (using client):")
for bucket in response['Buckets']:
    print(f"- {bucket['Name']}")

b. List Objects (Files) in a Bucket

To list the files in a specific bucket, you get the bucket object and then iterate through its objects.

bucket_name = 'my-awesome-data-bucket-123'
bucket = s3.Bucket(bucket_name)
print(f"Objects in '{bucket_name}':")
for obj in bucket.objects.all():
    # The key is the "path" to the file in the bucket
    print(f"- {obj.key} (Size: {obj.size} bytes)")

c. Upload a File to S3

This is one of the most frequent operations. You use the Bucket.upload_file() method.

import os
# Local file path
local_file_path = 'my_local_data.csv'
# S3 object name (the name you want it to have in the bucket)
s3_object_name = 'data/raw/my_data.csv'
# Upload the file
s3.Bucket(bucket_name).upload_file(local_file_path, s3_object_name)
print(f"Successfully uploaded {local_file_path} to s3://{bucket_name}/{s3_object_name}")

d. Download a File from S3

The reverse of uploading. Use Bucket.download_file().

# S3 object key to download
s3_object_key = 'data/raw/my_data.csv'
# Local file path to save the downloaded file
local_download_path = 'downloaded_data.csv'
# Download the file
s3.Bucket(bucket_name).download_file(s3_object_key, local_download_path)
print(f"Successfully downloaded s3://{bucket_name}/{s3_object_key} to {local_download_path}")

e. Delete a File from S3

You can delete a file by getting the object and calling its delete() method.

s3_object_to_delete = 'data/raw/my_data.csv'
# Get the object and delete it
s3.Object(bucket_name, s3_object_to_delete).delete()
print(f"Successfully deleted s3://{bucket_name}/{s3_object_to_delete}")

f. Delete an Entire Bucket

Warning: This is a destructive operation. The bucket must be empty before it can be deleted.

bucket_to_delete = 'my-old-empty-bucket-456'
# First, delete all objects in the bucket
bucket = s3.Bucket(bucket_to_delete)
bucket.objects.all().delete()
# Then, delete the bucket itself
s3.Bucket(bucket_to_delete).delete()
print(f"Successfully deleted bucket '{bucket_to_delete}'")

Advanced: Working with Large Files (Streaming)

Uploading or downloading very large files (e.g., >100MB) can be inefficient and memory-intensive if you load the entire file into memory. boto3 supports streaming, which is much better.

a. Streaming an Upload

You can open a file in binary read mode ('rb') and pass the file object directly. boto3 will read and upload it in chunks.

local_large_file = 'large_video.mp4'
s3_large_object_name = 'videos/large_video.mp4'
print("Starting large file upload...")
with open(local_large_file, 'rb') as data:
    s3.Bucket(bucket_name).upload_fileobj(data, s3_large_object_name)
print("Large file upload complete.")

b. Streaming a Download

Similarly, you can open a destination file in binary write mode ('wb') and pass the file object to download_fileobj.

s3_large_key = 'videos/large_video.mp4'
local_download_large_file = 'downloaded_large_video.mp4'
print("Starting large file download...")
with open(local_download_large_file, 'wb') as data:
    s3.Bucket(bucket_name).download_fileobj(s3_large_key, data)
print("Large file download complete.")

Putting It All Together: A Complete Example Script

Here is a script that demonstrates several of these operations.

import boto3
import os
# --- Configuration ---
# It's better to use environment variables for credentials in production
# For this example, we'll assume they are set.
BUCKET_NAME = 'my-awesome-data-bucket-123'
LOCAL_FILE_TO_UPLOAD = 'sample.txt'
S3_KEY_NAME = 'documents/sample.txt'
def main():
    """Demonstrates basic S3 operations with boto3."""
    print("Initializing S3 resource...")
    s3 = boto3.resource('s3')
    # 1. Upload a file
    if os.path.exists(LOCAL_FILE_TO_UPLOAD):
        print(f"\nUploading '{LOCAL_FILE_TO_UPLOAD}' to 's3://{BUCKET_NAME}/{S3_KEY_NAME}'...")
        s3.Bucket(BUCKET_NAME).upload_file(LOCAL_FILE_TO_UPLOAD, S3_KEY_NAME)
        print("Upload successful.")
    else:
        print(f"Error: Local file '{LOCAL_FILE_TO_UPLOAD}' not found. Skipping upload.")
        # Create a dummy file for the demo to run
        with open(LOCAL_FILE_TO_UPLOAD, 'w') as f:
            f.write("Hello from S3 Python script!")
        print(f"Created a dummy file '{LOCAL_FILE_TO_UPLOAD}'.")
    # 2. List objects in the bucket
    print(f"\nListing objects in bucket '{BUCKET_NAME}':")
    try:
        bucket = s3.Bucket(BUCKET_NAME)
        for obj in bucket.objects.all():
            print(f"  - {obj.key} (Size: {obj.size} bytes)")
    except Exception as e:
        print(f"  Error listing bucket: {e}")
    # 3. Download the file
    print(f"\nDownloading 's3://{BUCKET_NAME}/{S3_KEY_NAME}' to 'downloaded_{LOCAL_FILE_TO_UPLOAD}'...")
    s3.Bucket(BUCKET_NAME).download_file(S3_KEY_NAME, f'downloaded_{LOCAL_FILE_TO_UPLOAD}')
    print("Download successful.")
    # 4. Delete the file from S3
    print(f"\nDeleting 's3://{BUCKET_NAME}/{S3_KEY_NAME}' from S3...")
    s3.Object(BUCKET_NAME, S3_KEY_NAME).delete()
    print("Delete successful.")
    # 5. Verify deletion by listing again
    print(f"\nVerifying deletion. Listing objects in bucket '{BUCKET_NAME}' again:")
    try:
        bucket = s3.Bucket(BUCKET_NAME)
        if not list(bucket.objects.all()):
            print("  Bucket is empty.")
        else:
            for obj in bucket.objects.all():
                print(f"  - {obj.key}")
    except Exception as e:
        print(f"  Error listing bucket: {e}")
if __name__ == "__main__":
    main()

Useful Resources

boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html (The ultimate source of truth)
AWS S3 Service Documentation: https://docs.aws.amazon.com/s3/ (Explains the S3 service concepts)
IAM Policy Generator: Use this to create the least-privilege IAM policy needed for your script. For example, a policy that only allows s3:GetObject, s3:PutObject, and s3:ListBucket on a specific bucket.

SAE Storage Python 如何高效操作与管理？

Prerequisites: Setting Up Your Environment

a. AWS Credentials

b. Install the `boto3` Library

Connecting to S3 with `boto3`

Common S3 Operations with Python

a. List All Your Buckets

b. List Objects (Files) in a Bucket

c. Upload a File to S3

d. Download a File from S3

e. Delete a File from S3

f. Delete an Entire Bucket

Advanced: Working with Large Files (Streaming)

a. Streaming an Upload

b. Streaming a Download

Putting It All Together: A Complete Example Script

Useful Resources

99ANYc3cd6

Python如何连接SQLServer数据库？

Vivoy13升级教程图解，具体步骤是什么？

Python datetime 类型如何高效处理时间？

Java Oracle驱动包如何正确配置与使用？

Selenium Python如何实现截图功能？

Java Socket编程实例具体怎么实现？

Eclipse for Java怎么用？入门到精通技巧有哪些？

GarageBand iPad教程，新手如何快速上手创作？

如何正确设置 java.library.path？

Java调用WebService客户端，如何快速实现？

Java Integer转Byte为何会失真？

Learn Python the Hard Way真的学得会吗？

SQL Server 2008教程视频哪里找？

Java类间多层函数调用如何实现？

Java String与DateTime如何相互转换？

Java String转int，如何实现？

SAE Storage Python 如何高效操作与管理？

Prerequisites: Setting Up Your Environment

a. AWS Credentials

b. Install the boto3 Library

Connecting to S3 with boto3

Common S3 Operations with Python

a. List All Your Buckets

b. List Objects (Files) in a Bucket

c. Upload a File to S3

d. Download a File from S3

e. Delete a File from S3

f. Delete an Entire Bucket

Advanced: Working with Large Files (Streaming)

a. Streaming an Upload

b. Streaming a Download

Putting It All Together: A Complete Example Script

Useful Resources

相关推荐

Java Socket编程实例具体怎么实现？

b. Install the `boto3` Library

Connecting to S3 with `boto3`