杰瑞科技汇

Python如何打包成tar.gz文件?

Of course! Creating a .tar.gz file in Python is a common task. The best and most standard way to do this is by using the built-in tarfile module.

Python如何打包成tar.gz文件?-图1
(图片来源网络,侵删)

Here’s a complete guide, from the simplest case to more advanced examples.

The Short Answer (The Core Code)

This is the most direct way to create a .tar.gz archive from a single file or a directory.

import tarfile
import os
# Define the source file/directory and the output archive name
source = 'my_directory'  # or 'my_file.txt'
output_filename = 'archive.tar.gz'
# Create the archive
with tarfile.open(output_filename, "w:gz") as tar:
    tar.add(source, arcname=os.path.basename(source))
print(f"Archive '{output_filename}' created successfully!")

Detailed Explanation and Examples

Let's break down the code and explore more options.

Understanding the tarfile.open() Function

The key to this process is tarfile.open().

Python如何打包成tar.gz文件?-图2
(图片来源网络,侵删)
  • output_filename: The name of the archive file you want to create.
  • "w:gz": This is the mode, and it's the most important part.
    • w: Write mode. You are creating a new archive. If a file with this name already exists, it will be overwritten.
    • gz: This tells tarfile to use gzip compression, which creates the .gz extension. This is what makes a .tar.gz file.

Other common modes include:

  • "w:": Create an uncompressed .tar file.
  • "r:gz": Open an existing .tar.gz file for reading.
  • "a:gz": Append to an existing .tar.gz file (adds new files without recreating the whole archive).

Example 1: Compressing a Single File

Let's say you have a file named report.txt and you want to compress it.

import tarfile
file_to_compress = 'report.txt'
archive_name = 'report.tar.gz'
# Check if the source file exists before creating the archive
if not os.path.exists(file_to_compress):
    print(f"Error: The file '{file_to_compress}' does not exist.")
else:
    with tarfile.open(archive_name, "w:gz") as tar:
        # Add the file to the archive.
        # The second argument, 'report.txt', is the name it will have *inside* the archive.
        tar.add(file_to_compress, arcname='report.txt')
    print(f"'{file_to_compress}' has been compressed into '{archive_name}'.")

Example 2: Compressing a Directory (Most Common Use Case)

This is the most frequent scenario. You want to bundle an entire directory and all its contents (subdirectories, files, etc.) into one archive.

import tarfile
import os
directory_to_compress = 'my_project_folder'
archive_name = 'project_backup.tar.gz'
if not os.path.isdir(directory_to_compress):
    print(f"Error: The directory '{directory_to_compress}' does not exist.")
else:
    with tarfile.open(archive_name, "w:gz") as tar:
        # The `arcname` argument controls the name of the root folder inside the archive.
        # If you omit it, the full path of the directory will be used as the root.
        # It's often cleaner to just use the base name of the directory.
        tar.add(directory_to_compress, arcname=os.path.basename(directory_to_compress))
    print(f"Directory '{directory_to_compress}' has been compressed into '{archive_name}'.")

What if you want to add multiple items? You can simply call tar.add() multiple times inside the with block.

Python如何打包成tar.gz文件?-图3
(图片来源网络,侵删)
with tarfile.open('multi_content.tar.gz', "w:gz") as tar:
    tar.add('my_project_folder', arcname='project')
    tar.add('report.txt', arcname='docs/report.txt')
    tar.add('another_file.py')

Advanced Options

Excluding Specific Files or Directories

Sometimes you don't want to include everything, like temporary files (.DS_Store, __pycache__) or log files. You can do this by providing a filter function to tar.add().

This function is called for each file being added, and you can return None to exclude it.

import tarfile
import os
def exclude_filter(tarinfo):
    """A filter function to exclude specific files and directories."""
    # Exclude all .pyc files and __pycache__ directories
    if tarinfo.name.endswith('.pyc') or '__pycache__' in tarinfo.name:
        return None  # Exclude this file
    return tarinfo # Include all other files
source = 'my_project_folder'
archive_name = 'project_filtered.tar.gz'
with tarfile.open(archive_name, "w:gz") as tar:
    tar.add(source, arcname=os.path.basename(source), filter=exclude_filter)
print(f"Filtered archive '{archive_name}' created.")

Setting File Permissions in the Archive

When you create an archive, the files inside it have permissions. You can control this using the filter function as well by modifying the tarinfo object.

import tarfile
import os
def set_permissions_filter(tarinfo):
    """Set specific permissions for files and directories."""
    # Make all directories executable (e.g., for 'cd' command)
    if tarinfo.isdir():
        tarinfo.mode = 0o755  # rwxr-xr-x
    # Make all files read-only
    elif tarinfo.isfile():
        tarinfo.mode = 0o644  # rw-r--r--
    return tarinfo
source = 'my_project_folder'
archive_name = 'project_permissions.tar.gz'
with tarfile.open(archive_name, "w:gz") as tar:
    tar.add(source, arcname=os.path.basename(source), filter=set_permissions_filter)
print(f"Archive with custom permissions '{archive_name}' created.")

Adding Files from a List

If you have a list of files you want to add, you can loop through them and add each one individually.

import tarfile
import os
files_to_add = ['file1.txt', 'file2.csv', 'image.png']
archive_name = 'selected_files.tar.gz'
# Create the archive
with tarfile.open(archive_name, "w:gz") as tar:
    for filename in files_to_add:
        if os.path.exists(filename):
            # Add the file, using its base name for the archive entry
            tar.add(filename, arcname=os.path.basename(filename))
        else:
            print(f"Warning: '{filename}' not found. Skipping.")
print(f"Archive '{archive_name}' created from a list of files.")

Summary: Key Takeaways

Task Code Snippet
Basic Archive Creation with tarfile.open("archive.tar.gz", "w:gz") as tar: tar.add('my_folder')
Add a Single File tar.add('report.txt', arcname='report.txt')
Control Archive Root Name tar.add('long/path/to/my_folder', arcname='my_folder')
Exclude Files (Filter) tar.add('src', filter=lambda t: None if t.name.endswith('.log') else t)
Set File Permissions (Filter) tar.add('src', filter=lambda t: setattr(t, 'mode', 0o644) or t)
Add Multiple Files Loop through a list and call tar.add() for each one.
分享:
扫描分享到社交APP
上一篇
下一篇