Of course! Creating a .tar.gz file in Python is a common task. The best and most standard way to do this is by using the built-in tarfile module.

Here’s a complete guide, from the simplest case to more advanced examples.
The Short Answer (The Core Code)
This is the most direct way to create a .tar.gz archive from a single file or a directory.
import tarfile
import os
# Define the source file/directory and the output archive name
source = 'my_directory' # or 'my_file.txt'
output_filename = 'archive.tar.gz'
# Create the archive
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source, arcname=os.path.basename(source))
print(f"Archive '{output_filename}' created successfully!")
Detailed Explanation and Examples
Let's break down the code and explore more options.
Understanding the tarfile.open() Function
The key to this process is tarfile.open().

output_filename: The name of the archive file you want to create."w:gz": This is the mode, and it's the most important part.w: Write mode. You are creating a new archive. If a file with this name already exists, it will be overwritten.gz: This tellstarfileto use gzip compression, which creates the.gzextension. This is what makes a.tar.gzfile.
Other common modes include:
"w:": Create an uncompressed.tarfile."r:gz": Open an existing.tar.gzfile for reading."a:gz": Append to an existing.tar.gzfile (adds new files without recreating the whole archive).
Example 1: Compressing a Single File
Let's say you have a file named report.txt and you want to compress it.
import tarfile
file_to_compress = 'report.txt'
archive_name = 'report.tar.gz'
# Check if the source file exists before creating the archive
if not os.path.exists(file_to_compress):
print(f"Error: The file '{file_to_compress}' does not exist.")
else:
with tarfile.open(archive_name, "w:gz") as tar:
# Add the file to the archive.
# The second argument, 'report.txt', is the name it will have *inside* the archive.
tar.add(file_to_compress, arcname='report.txt')
print(f"'{file_to_compress}' has been compressed into '{archive_name}'.")
Example 2: Compressing a Directory (Most Common Use Case)
This is the most frequent scenario. You want to bundle an entire directory and all its contents (subdirectories, files, etc.) into one archive.
import tarfile
import os
directory_to_compress = 'my_project_folder'
archive_name = 'project_backup.tar.gz'
if not os.path.isdir(directory_to_compress):
print(f"Error: The directory '{directory_to_compress}' does not exist.")
else:
with tarfile.open(archive_name, "w:gz") as tar:
# The `arcname` argument controls the name of the root folder inside the archive.
# If you omit it, the full path of the directory will be used as the root.
# It's often cleaner to just use the base name of the directory.
tar.add(directory_to_compress, arcname=os.path.basename(directory_to_compress))
print(f"Directory '{directory_to_compress}' has been compressed into '{archive_name}'.")
What if you want to add multiple items?
You can simply call tar.add() multiple times inside the with block.

with tarfile.open('multi_content.tar.gz', "w:gz") as tar:
tar.add('my_project_folder', arcname='project')
tar.add('report.txt', arcname='docs/report.txt')
tar.add('another_file.py')
Advanced Options
Excluding Specific Files or Directories
Sometimes you don't want to include everything, like temporary files (.DS_Store, __pycache__) or log files. You can do this by providing a filter function to tar.add().
This function is called for each file being added, and you can return None to exclude it.
import tarfile
import os
def exclude_filter(tarinfo):
"""A filter function to exclude specific files and directories."""
# Exclude all .pyc files and __pycache__ directories
if tarinfo.name.endswith('.pyc') or '__pycache__' in tarinfo.name:
return None # Exclude this file
return tarinfo # Include all other files
source = 'my_project_folder'
archive_name = 'project_filtered.tar.gz'
with tarfile.open(archive_name, "w:gz") as tar:
tar.add(source, arcname=os.path.basename(source), filter=exclude_filter)
print(f"Filtered archive '{archive_name}' created.")
Setting File Permissions in the Archive
When you create an archive, the files inside it have permissions. You can control this using the filter function as well by modifying the tarinfo object.
import tarfile
import os
def set_permissions_filter(tarinfo):
"""Set specific permissions for files and directories."""
# Make all directories executable (e.g., for 'cd' command)
if tarinfo.isdir():
tarinfo.mode = 0o755 # rwxr-xr-x
# Make all files read-only
elif tarinfo.isfile():
tarinfo.mode = 0o644 # rw-r--r--
return tarinfo
source = 'my_project_folder'
archive_name = 'project_permissions.tar.gz'
with tarfile.open(archive_name, "w:gz") as tar:
tar.add(source, arcname=os.path.basename(source), filter=set_permissions_filter)
print(f"Archive with custom permissions '{archive_name}' created.")
Adding Files from a List
If you have a list of files you want to add, you can loop through them and add each one individually.
import tarfile
import os
files_to_add = ['file1.txt', 'file2.csv', 'image.png']
archive_name = 'selected_files.tar.gz'
# Create the archive
with tarfile.open(archive_name, "w:gz") as tar:
for filename in files_to_add:
if os.path.exists(filename):
# Add the file, using its base name for the archive entry
tar.add(filename, arcname=os.path.basename(filename))
else:
print(f"Warning: '{filename}' not found. Skipping.")
print(f"Archive '{archive_name}' created from a list of files.")
Summary: Key Takeaways
| Task | Code Snippet |
|---|---|
| Basic Archive Creation | with tarfile.open("archive.tar.gz", "w:gz") as tar: tar.add('my_folder') |
| Add a Single File | tar.add('report.txt', arcname='report.txt') |
| Control Archive Root Name | tar.add('long/path/to/my_folder', arcname='my_folder') |
| Exclude Files (Filter) | tar.add('src', filter=lambda t: None if t.name.endswith('.log') else t) |
| Set File Permissions (Filter) | tar.add('src', filter=lambda t: setattr(t, 'mode', 0o644) or t) |
| Add Multiple Files | Loop through a list and call tar.add() for each one. |
