Of course! Here is a comprehensive guide to using Tokyo Cabinet with Python.

What is Tokyo Cabinet?
Tokyo Cabinet (TC) is a lightweight, high-performance, key-value store. Think of it as a super-fast, on-disk dictionary or hash map. It was created by the same author as Tokyo Tyrant and QDBM.
Key characteristics:
- Speed: Extremely fast read and write operations, often faster than traditional SQL databases for simple key-value tasks.
- Formats: It supports several data storage formats on disk, each with different strengths:
- Hash Table: Like a Python dictionary. Stores key-value pairs. Best for fast lookups by key.
- B+ Tree: Stores key-value pairs in sorted order. Best for range queries (e.g., "find all users with a score between 100 and 200").
- Table: A simple, fixed-length record database. Good for storing logs or sensor data.
- Pure C: It's written in C, making it very fast and memory-efficient.
- Simple API: The API is straightforward, focusing on core database operations.
Why Use It with Python?
You'd typically use Tokyo Cabinet in Python when you need:
- Speed: Blazing-fast data access for an application bottleneck.
- Simplicity: A simple "no-SQL" database without the overhead of a full RDBMS.
- Embedded Database: A database that runs in the same process as your Python application, without a separate server.
- Large Datasets: The ability to handle datasets larger than your available RAM, as it manages memory and disk caching efficiently.
Step 1: Installation
You need to install both the Tokyo Cabinet C library and the Python bindings.

Install the Tokyo Cabinet C Library
This is the most crucial step. If you skip this, the Python bindings won't work.
On Debian/Ubuntu:
sudo apt-get update sudo apt-get install tokyocabinet-dev
On macOS (using Homebrew):
brew install tokyocabinet
On Windows: This is more complex. You'll need to download the source from the official Tokyo Cabinet site and compile it using MinGW or Cygwin. Alternatively, you can use a pre-compiled binary if you can find one.

Install the Python Bindings
The most common and well-maintained Python library for Tokyo Cabinet is pytc.
pip install pytc
Step 2: Basic Usage (Hash Table Example)
Let's start with the most common format: the Hash Table. It's the most direct equivalent to a Python dictionary.
import pytc
# The filename for our database
db_file = 'my_database.tch'
# --- WRITING DATA (Creating a new database) ---
# The 'pytc.HDB' flag specifies we want a Hash Database.
# 'pytc.HOWOCREAT' means "Open or Create". If the file doesn't exist, it will be created.
# 'pytc.HOWOWRITER' means open with write permissions.
hdb = pytc.HDB()
try:
# Open the database file
hdb.open(db_file, pytc.HOWOCREAT | pytc.HOWOWRITER | pytc.HOWOTRUNC)
# Put some key-value pairs into the database
hdb.put('name', 'Alice')
hdb.put('age', '30') # Note: Values must be bytes
hdb.put('city', 'Tokyo')
hdb.put('hobbies', 'reading, hiking, coding')
print("Data written successfully.")
finally:
# Always close the database!
hdb.close()
# --- READING DATA ---
print("\n--- READING DATA ---")
hdb = pytc.HDB()
try:
hdb.open(db_file, pytc.HOWOREADER) # Open with read-only permissions
# Get a value by key. Returns None if the key doesn't exist.
name = hdb.get('name')
print(f"Name: {name.decode('utf-8')}") # Decode bytes to string
age = hdb.get('age')
print(f"Age: {age.decode('utf-8')}")
# Get a value that doesn't exist
country = hdb.get('country')
print(f"Country: {country}") # This will print None
finally:
hdb.close()
# --- UPDATING AND DELETING DATA ---
print("\n--- UPDATING AND DELETING ---")
hdb = pytc.HDB()
try:
hdb.open(db_file, pytc.HOWOWRITER)
# Update a value (just put a new value with the same key)
hdb.put('age', '31')
print("Updated age.")
# Delete a key-value pair
hdb.out('city')
print("Deleted city.")
finally:
hdb.close()
# --- VERIFYING UPDATE AND DELETE ---
print("\n--- VERIFYING CHANGES ---")
hdb = pytc.HDB()
try:
hdb.open(db_file, pytc.HOWOREADER)
age = hdb.get('age')
print(f"New Age: {age.decode('utf-8')}")
city = hdb.get('city')
print(f"City: {city}") # Should print None
finally:
hdb.close()
Step 3: Advanced Features
Tokyo Cabinet offers more than just simple get/put operations.
Iterating Over the Database
You can loop through all keys or all key-value pairs.
import pytc
db_file = 'my_database.tch'
hdb = pytc.HDB()
hdb.open(db_file, pytc.HOWOREADER)
print("\n--- ITERATING OVER KEYS ---")
# iterkeys() returns an iterator of all keys
for key in hdb.iterkeys():
print(f"Key: {key.decode('utf-8')}, Value: {hdb.get(key).decode('utf-8')}")
hdb.close()
Transactions
Tokyo Cabinet supports simple transactions to ensure a group of operations are atomic (all succeed or all fail).
import pytc
db_file = 'my_database.tch'
hdb = pytc.HDB()
hdb.open(db_file, pytc.HOWOWRITER)
print("\n--- TRANSACTION EXAMPLE ---")
try:
# Begin a transaction
hdb.begin_transaction()
# A series of operations
hdb.put('user:1:balance', '1000')
hdb.put('user:2:balance', '500')
hdb.put('total_balance', '1500') # This might be a mistake, let's rollback
# Oops, let's undo these changes
hdb.abort_transaction()
print("Transaction aborted. No changes were made.")
except pytc.TCException as e:
print(f"An error occurred: {e}")
hdb.abort_transaction() # Ensure transaction is aborted on error
finally:
hdb.close()
# Verify that no changes were made
hdb = pytc.HDB()
hdb.open(db_file, pytc.HOWOREADER)
print(f"Total balance after aborted transaction: {hdb.get('total_balance')}") # Should be None
hdb.close()
Step 4: Using the B+ Tree Format
The B+ Tree format is ideal when you need your data to be sorted, allowing for efficient range queries.
The API is very similar, but you use pytc.BDB instead of pytc.HDB and specify the pytc.BDBOWRITER flag.
import pytc
db_file = 'user_scores.bdb'
bdb = pytc.BDB()
# Open a B+ Tree database
bdb.open(db_file, pytc.BDBOWRITER | pytc.BDBOCREAT | pytc.BBOTRUNC)
# Insert data. The B+ Tree will keep it sorted by key.
bdb.put('user_3', '850')
bdb.put('user_1', '1200')
bdb.put('user_4', '950')
bdb.put('user_2', '1100')
print("\n--- B+ TREE DATA (sorted by key) ---")
# Keys are automatically sorted
for key in bdb.iterkeys():
print(f"User: {key.decode('utf-8')}, Score: {bdb.get(key).decode('utf-8')}")
print("\n--- RANGE QUERY (scores between 1000 and 1150) ---")
# fwmkeys finds keys that start with a prefix.
# For a range, we can use a trick: find keys starting with 'user_1' and iterate.
# A more robust way involves the B+ Tree's cursor API, but this is a simple example.
# Note: This simple prefix search isn't a true range query on values.
# A true range query requires using the cursor.
# Let's demonstrate a true range query using the cursor
print("Using cursor for true range query:")
cur = bdb.cursor()
# The cursor starts before the first record.
# We need to find the starting point.
cur.jump('user_1') # Jump to the first key that is >= 'user_1'
found = False 