Of course! Using HBase counters with Python is a common requirement for applications that need to maintain atomic, incrementing values, like tracking page views, user points, or inventory levels.

The core concept is HBase's increment column family. This column is special because HBase guarantees atomic, thread-safe increments on it.
Here’s a complete guide covering the concept, the HBase shell commands, and how to implement it in Python using the two most popular libraries: happybase and pyhbase.
The Concept: HBase Counters
Unlike a regular column where you read, modify, and write back (a "read-modify-write" operation), HBase provides a special increment operation.
- Atomicity: The increment operation is a single, atomic action. You don't have to worry about race conditions where two processes try to increment the same value at the same time.
- Efficiency: It's a single RPC call to the RegionServer, making it much faster than a
get,add,putsequence. - Data Type: The counter column stores a 64-bit signed integer (
long).
Key Rule: You must define the column family as COUNTER when you create your table. If you try to increment a column in a regular DATA column family, it will not work as expected.

HBase Shell Example (For Reference)
Before jumping into Python, let's see how it works in the HBase shell. This helps solidify the concept.
Create a table with a COUNTER column family:
# The 'counter_cf' column family is defined as COUNTER create 'my_counters', 'counter_cf'
Increment a counter:
The increment command takes the table name, row key, and the column to increment. You can also specify an increment value.

# Increment the 'page_views' counter by 1 for row 'page_123' increment 'my_counters', 'page_123', 'counter_cf:page_views' # Increment by a specific value, e.g., 10 increment 'my_counters', 'page_123', 'counter_cf:page_views', 10
Get the counter's value:
To see the current value, you use a standard get command.
# Get the value of the 'page_views' counter get 'my_counters', 'page_123'
Expected Output:
COLUMN CELL
counter_cf:page_views timestamp=167..., value=11
1 row(s)
Python Implementation
We'll look at two libraries. happybase is more common for general-purpose HBase interaction, while pyhbase is a more modern, pure-Python library.
Prerequisites
First, you need to install the chosen library. happybase requires a pre-installed Thrift server on your HBase cluster.
# For happybase (requires HBase Thrift server) pip install happybase # For pyhbase (pure Python, no external server needed) pip install pyhbase
You'll also need the Thrift server running for happybase:
# On your HBase master node ./bin/start-thrift.sh
Method 1: Using happybase
This is a very popular and straightforward library.
import happybase
# --- Configuration ---
# Ensure your HBase Thrift server is running on this host and port
connection = happybase.Connection(host='your-hbase-thrift-server', port=9090)
table_name = 'my_counters'
try:
# 1. Connect to the table
table = connection.table(table_name)
# 2. Define the row key and the counter column
row_key = 'user_a'
counter_column = 'counter_cf:login_count'
# 3. Increment the counter by 1
# The increment() method returns the new value of the counter
new_value = table.counter_inc(row_key, counter_column)
print(f"Counter incremented for '{row_key}'. New value: {new_value}")
# 4. Increment by a specific amount (e.g., 5)
new_value = table.counter_inc(row_key, counter_column, increment=5)
print(f"Counter incremented by 5 for '{row_key}'. New value: {new_value}")
# 5. Get the current value of the counter
# You can use a standard row fetch
row_data = table.row(row_key)
current_value = row_data.get(counter_column.encode('utf-8'), 0) # Default to 0 if not found
print(f"Current value of '{counter_column}' for '{row_key}': {int(current_value)}")
finally:
# 6. Close the connection
connection.close()
Method 2: Using pyhbase
pyhbase is a great alternative because it communicates directly with HBase's REST/JSON interface, eliminating the need for a separate Thrift server. It's also pure Python.
import pyhbase
# --- Configuration ---
# HBase REST server is usually on port 8080
# Note: You might need to enable REST server on your HBase cluster.
# See: https://hbase.apache.org/book.html#_rest
client = pyhbase.Client(host='your-hbase-rest-server', port=8080)
table_name = b'my_counters' # pyhbase uses bytes for table names
try:
# 1. Increment the counter
# The method is client.increment()
# It takes the table name (bytes), row key (bytes), and a dictionary of {column: increment_value}
row_key = b'user_b'
column_to_increment = b'counter_cf:login_count'
# Increment by 1
response = client.increment(
table=table_name,
row=row_key,
columns={column_to_increment: 1}
)
# The response contains the new value
new_value = response.get(column_to_increment)
print(f"Counter incremented for '{row_key.decode()}'. New value: {new_value}")
# 2. Increment by a specific amount (e.g., 3)
response = client.increment(
table=table_name,
row=row_key,
columns={column_to_increment: 3}
)
new_value = response.get(column_to_increment)
print(f"Counter incremented by 3 for '{row_key.decode()}'. New value: {new_value}")
# 3. Get the current value of the counter
# Use a standard get operation
row_data = client.get(table=table_name, row=row_key)
current_value = row_data.get(column_to_increment)
if current_value is not None:
# The value comes back as a string representation of the integer
print(f"Current value of '{column_to_increment.decode()}' for '{row_key.decode()}': {int(current_value)}")
else:
print(f"Counter for '{row_key.decode()}' not found.")
finally:
# 4. pyhbase client doesn't have a persistent connection to close in the same way
pass
Important Considerations & Best Practices
-
Table Creation: You must create the table with the
COUNTERcolumn family beforehand. You can do this via the HBase shell or programmatically if your library supports DDL (likehappybasedoes withconnection.create_table()). -
Performance: Counters are extremely fast and are the recommended way to handle incrementing numbers in HBase. Avoid using them for non-integer data or for counters that need to be decremented (decrement is not a native atomic operation).
-
Resetting a Counter: There is no direct "reset" command. To set a counter to a specific value, you must delete the column and then increment it back to the desired value, or use a standard
putoperation if you are sure no other process is trying to increment it at that exact moment (which breaks atomicity).# Using happybase to "reset" a counter (not atomic with increments) # 1. Delete the column table.delete(row_key, columns=[counter_column]) # 2. Put the new value (this is NOT atomic with concurrent increments!) table.put(row_key, {counter_column: 100}) -
Data Type: The value in the counter column is always a byte string representing a long integer. When you retrieve it, you'll need to convert it to an integer (
int()).
