杰瑞科技汇

Tokyo Cabinet如何与Python高效结合使用?

Of course! Here is a comprehensive guide to using Tokyo Cabinet with Python.

Tokyo Cabinet如何与Python高效结合使用?-图1
(图片来源网络,侵删)

What is Tokyo Cabinet?

Tokyo Cabinet (TC) is a lightweight, high-performance, key-value store. Think of it as a super-fast, on-disk dictionary or hash map. It was created by the same author as Tokyo Tyrant and QDBM.

Key characteristics:

  • Speed: Extremely fast read and write operations, often faster than traditional SQL databases for simple key-value tasks.
  • Formats: It supports several data storage formats on disk, each with different strengths:
    • Hash Table: Like a Python dictionary. Stores key-value pairs. Best for fast lookups by key.
    • B+ Tree: Stores key-value pairs in sorted order. Best for range queries (e.g., "find all users with a score between 100 and 200").
    • Table: A simple, fixed-length record database. Good for storing logs or sensor data.
  • Pure C: It's written in C, making it very fast and memory-efficient.
  • Simple API: The API is straightforward, focusing on core database operations.

Why Use It with Python?

You'd typically use Tokyo Cabinet in Python when you need:

  • Speed: Blazing-fast data access for an application bottleneck.
  • Simplicity: A simple "no-SQL" database without the overhead of a full RDBMS.
  • Embedded Database: A database that runs in the same process as your Python application, without a separate server.
  • Large Datasets: The ability to handle datasets larger than your available RAM, as it manages memory and disk caching efficiently.

Step 1: Installation

You need to install both the Tokyo Cabinet C library and the Python bindings.

Tokyo Cabinet如何与Python高效结合使用?-图2
(图片来源网络,侵删)

Install the Tokyo Cabinet C Library

This is the most crucial step. If you skip this, the Python bindings won't work.

On Debian/Ubuntu:

sudo apt-get update
sudo apt-get install tokyocabinet-dev

On macOS (using Homebrew):

brew install tokyocabinet

On Windows: This is more complex. You'll need to download the source from the official Tokyo Cabinet site and compile it using MinGW or Cygwin. Alternatively, you can use a pre-compiled binary if you can find one.

Tokyo Cabinet如何与Python高效结合使用?-图3
(图片来源网络,侵删)

Install the Python Bindings

The most common and well-maintained Python library for Tokyo Cabinet is pytc.

pip install pytc

Step 2: Basic Usage (Hash Table Example)

Let's start with the most common format: the Hash Table. It's the most direct equivalent to a Python dictionary.

import pytc
# The filename for our database
db_file = 'my_database.tch'
# --- WRITING DATA (Creating a new database) ---
# The 'pytc.HDB' flag specifies we want a Hash Database.
# 'pytc.HOWOCREAT' means "Open or Create". If the file doesn't exist, it will be created.
# 'pytc.HOWOWRITER' means open with write permissions.
hdb = pytc.HDB()
try:
    # Open the database file
    hdb.open(db_file, pytc.HOWOCREAT | pytc.HOWOWRITER | pytc.HOWOTRUNC)
    # Put some key-value pairs into the database
    hdb.put('name', 'Alice')
    hdb.put('age', '30') # Note: Values must be bytes
    hdb.put('city', 'Tokyo')
    hdb.put('hobbies', 'reading, hiking, coding')
    print("Data written successfully.")
finally:
    # Always close the database!
    hdb.close()
# --- READING DATA ---
print("\n--- READING DATA ---")
hdb = pytc.HDB()
try:
    hdb.open(db_file, pytc.HOWOREADER) # Open with read-only permissions
    # Get a value by key. Returns None if the key doesn't exist.
    name = hdb.get('name')
    print(f"Name: {name.decode('utf-8')}") # Decode bytes to string
    age = hdb.get('age')
    print(f"Age: {age.decode('utf-8')}")
    # Get a value that doesn't exist
    country = hdb.get('country')
    print(f"Country: {country}") # This will print None
finally:
    hdb.close()
# --- UPDATING AND DELETING DATA ---
print("\n--- UPDATING AND DELETING ---")
hdb = pytc.HDB()
try:
    hdb.open(db_file, pytc.HOWOWRITER)
    # Update a value (just put a new value with the same key)
    hdb.put('age', '31')
    print("Updated age.")
    # Delete a key-value pair
    hdb.out('city')
    print("Deleted city.")
finally:
    hdb.close()
# --- VERIFYING UPDATE AND DELETE ---
print("\n--- VERIFYING CHANGES ---")
hdb = pytc.HDB()
try:
    hdb.open(db_file, pytc.HOWOREADER)
    age = hdb.get('age')
    print(f"New Age: {age.decode('utf-8')}")
    city = hdb.get('city')
    print(f"City: {city}") # Should print None
finally:
    hdb.close()

Step 3: Advanced Features

Tokyo Cabinet offers more than just simple get/put operations.

Iterating Over the Database

You can loop through all keys or all key-value pairs.

import pytc
db_file = 'my_database.tch'
hdb = pytc.HDB()
hdb.open(db_file, pytc.HOWOREADER)
print("\n--- ITERATING OVER KEYS ---")
# iterkeys() returns an iterator of all keys
for key in hdb.iterkeys():
    print(f"Key: {key.decode('utf-8')}, Value: {hdb.get(key).decode('utf-8')}")
hdb.close()

Transactions

Tokyo Cabinet supports simple transactions to ensure a group of operations are atomic (all succeed or all fail).

import pytc
db_file = 'my_database.tch'
hdb = pytc.HDB()
hdb.open(db_file, pytc.HOWOWRITER)
print("\n--- TRANSACTION EXAMPLE ---")
try:
    # Begin a transaction
    hdb.begin_transaction()
    # A series of operations
    hdb.put('user:1:balance', '1000')
    hdb.put('user:2:balance', '500')
    hdb.put('total_balance', '1500') # This might be a mistake, let's rollback
    # Oops, let's undo these changes
    hdb.abort_transaction()
    print("Transaction aborted. No changes were made.")
except pytc.TCException as e:
    print(f"An error occurred: {e}")
    hdb.abort_transaction() # Ensure transaction is aborted on error
finally:
    hdb.close()
# Verify that no changes were made
hdb = pytc.HDB()
hdb.open(db_file, pytc.HOWOREADER)
print(f"Total balance after aborted transaction: {hdb.get('total_balance')}") # Should be None
hdb.close()

Step 4: Using the B+ Tree Format

The B+ Tree format is ideal when you need your data to be sorted, allowing for efficient range queries.

The API is very similar, but you use pytc.BDB instead of pytc.HDB and specify the pytc.BDBOWRITER flag.

import pytc
db_file = 'user_scores.bdb'
bdb = pytc.BDB()
# Open a B+ Tree database
bdb.open(db_file, pytc.BDBOWRITER | pytc.BDBOCREAT | pytc.BBOTRUNC)
# Insert data. The B+ Tree will keep it sorted by key.
bdb.put('user_3', '850')
bdb.put('user_1', '1200')
bdb.put('user_4', '950')
bdb.put('user_2', '1100')
print("\n--- B+ TREE DATA (sorted by key) ---")
# Keys are automatically sorted
for key in bdb.iterkeys():
    print(f"User: {key.decode('utf-8')}, Score: {bdb.get(key).decode('utf-8')}")
print("\n--- RANGE QUERY (scores between 1000 and 1150) ---")
# fwmkeys finds keys that start with a prefix.
# For a range, we can use a trick: find keys starting with 'user_1' and iterate.
# A more robust way involves the B+ Tree's cursor API, but this is a simple example.
# Note: This simple prefix search isn't a true range query on values.
# A true range query requires using the cursor.
# Let's demonstrate a true range query using the cursor
print("Using cursor for true range query:")
cur = bdb.cursor()
# The cursor starts before the first record.
# We need to find the starting point.
cur.jump('user_1') # Jump to the first key that is >= 'user_1'
found = False
分享:
扫描分享到社交APP
上一篇
下一篇