Of course! In Python, a "crosstable" is most commonly known as a contingency table or a cross-tabulation. It's a table that displays the frequency distribution of variables. It's used to see how different categorical variables interact with each other.

The most popular and powerful library for creating crosstabs in Python is Pandas. The pandas.crosstab() function is specifically designed for this.
Here's a comprehensive guide, from the basics to more advanced examples.
Setup: Sample Data
First, let's create some sample data to work with. We'll use a Pandas DataFrame.
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female', 'Female', 'Male'],
'AgeGroup': ['18-25', '26-35', '18-25', '36-45', '26-35', '18-25', '36-45', '18-25', '26-35', '36-45'],
'City': ['New York', 'London', 'New York', 'Paris', 'London', 'New York', 'Paris', 'London', 'New York', 'Paris'],
'Purchased': ['Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:

Gender AgeGroup City Purchased
0 Male 18-25 New York Yes
1 Female 26-35 London No
2 Female 18-25 New York Yes
3 Male 36-45 Paris Yes
4 Female 26-35 London No
5 Male 18-25 New York No
6 Male 36-45 Paris Yes
7 Female 18-25 London Yes
8 Female 26-35 London No
9 Male 36-45 Paris Yes
Basic Crosstab: Two Variables
The simplest crosstab shows the frequency of combinations between two categorical columns.
# Create a crosstab of Gender vs. Purchased
crosstab_basic = pd.crosstab(df['Gender'], df['Purchased'])
print("\n--- Basic Crosstab ---")
print(crosstab_basic)
Output:
--- Basic Crosstab ---
Purchased No Yes
Gender
Female 3 3
Male 2 4
This table shows that there are 3 females who did not purchase and 3 who did, and 2 males who did not purchase and 4 who did.
Adding Margins (Row and Column Totals)
To get a better overview, you can add row and column totals using the margins=True argument.
# Create a crosstab with margins
crosstab_margins = pd.crosstab(df['Gender'], df['Purchased'], margins=True, margins_name="Total")
print("\n--- Crosstab with Margins ---")
print(crosstab_margins)
Output:
--- Crosstab with Margins ---
Purchased No Yes Total
Gender
Female 3 3 6
Male 2 4 6
Total 5 7 11
The "Total" row and column give you the grand totals for each category.
Normalizing: Percentages Instead of Counts
Sometimes, you're more interested in proportions than raw counts. You can normalize the results.
a) Normalize by Index (Rows)
This shows the percentage distribution within each row.
# Normalize by rows (index)
crosstab_normalize_index = pd.crosstab(df['Gender'], df['Purchased'], normalize='index')
print("\n--- Normalized by Index (Rows) ---")
print(crosstab_normalize_index)
Output:
--- Normalized by Index (Rows) ---
Purchased No Yes
Gender
Female 0.500000 0.500000
Male 0.333333 0.666667
Interpretation: 50% of females did not purchase, while 66.7% of males did purchase.
b) Normalize by Columns
This shows the percentage distribution within each column.
# Normalize by columns
crosstab_normalize_cols = pd.crosstab(df['Gender'], df['Purchased'], normalize='columns')
print("\n--- Normalized by Columns ---")
print(crosstab_normalize_cols)
Output:
--- Normalized by Columns ---
Purchased No Yes
Gender
Female 0.600000 0.428571
Male 0.400000 0.571429
Interpretation: 60% of "No" purchases were made by females, while 57.1% of "Yes" purchases were made by males.
c) Normalize All Values (Grand Total)
This shows each cell's value as a percentage of the total number of observations.
# Normalize by all values (grand total)
crosstab_normalize_all = pd.crosstab(df['Gender'], df['Purchased'], normalize='all')
print("\n--- Normalized by All Values (Grand Total) ---")
print(crosstab_normalize_all)
Output:
--- Normalized by All Values (Grand Total) ---
Purchased No Yes
Gender
Female 0.272727 0.272727
Male 0.181818 0.363636
Interpretation: 27.3% of all people surveyed were females who did not purchase.
Multi-dimensional Crosstabs
You can easily extend crosstabs to more than two variables by passing lists to the index and columns parameters.
# Crosstab with Gender as index and AgeGroup/Purchased as columns
crosstab_multi = pd.crosstab(
index=df['Gender'],
columns=[df['AgeGroup'], df['Purchased']]
)
print("\n--- Multi-dimensional Crosstab ---")
print(crosstab_multi)
Output:
--- Multi-dimensional Crosstab ---
AgeGroup 18-25 26-35 36-45
Purchased No Yes No Yes No Yes
Gender
Female 1 2 2 0 0 1
Male 1 1 0 0 1 3
This gives a much more detailed breakdown. For example, we can see that there is 1 female in the 18-25 age group who did not purchase.
Adding Values (Aggregation)
You can also use crosstab to perform aggregations. For example, let's say we have a Spend column and we want to see the total amount spent by each gender/purchase combination.
# Add a 'Spend' column to our DataFrame
df['Spend'] = np.random.randint(10, 100, size=len(df))
print("\n--- DataFrame with Spend Column ---")
print(df)
# Create a crosstab that sums the 'Spend' for each combination
crosstab_agg = pd.crosstab(
index=df['Gender'],
columns=df['Purchased'],
values=df['Spend'], # The values to aggregate
aggfunc='sum' # The aggregation function (sum, mean, count, etc.)
)
print("\n--- Aggregated Crosstab (Sum of Spend) ---")
print(crosstab_agg)
Output (will vary due to random Spend data):
--- DataFrame with Spend Column ---
Gender AgeGroup City Purchased Spend
0 Male 18-25 New York Yes 45
1 Female 26-35 London No 12
2 Female 18-25 New York Yes 88
3 Male 36-45 Paris Yes 67
4 Female 26-35 London No 34
5 Male 18-25 New York No 91
6 Male 36-45 Paris Yes 23
7 Female 18-25 London Yes 56
8 Female 26-35 London No 78
9 Male 36-45 Paris Yes 50
--- Aggregated Crosstab (Sum of Spend) ---
Purchased No Yes
Gender
Female 124 144
Male 182 140
This table shows the total spending for males and females, broken down by whether they purchased or not.
Visualization: Heatmaps
A crosstab is perfect for visualizing with a heatmap. The seaborn library makes this incredibly easy.
import seaborn as sns
import matplotlib.pyplot as plt
# Use the basic crosstab from before
crosstab_vis = pd.crosstab(df['Gender'], df['Purchased'])
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(crosstab_vis, annot=True, cmap="YlGnBu", fmt="d") # annot=True adds numbers, fmt="d" formats as integer'Crosstab Heatmap: Gender vs. Purchased')
plt.xlabel('Purchased')
plt.ylabel('Gender')
plt.show()
This will generate a color-coded heatmap where darker colors represent higher counts, making it easy to spot patterns at a glance.
Summary: pd.crosstab() Arguments
| Argument | Description |
|---|---|
index |
Array-like or list of arrays. The values to group by in the rows. |
columns |
Array-like or list of arrays. The values to group by in the columns. |
values |
Array-like. Optional. An array of values to aggregate. If not provided, counts are used. |
rownames |
List. Optional. Names for the index levels. |
colnames |
List. Optional. Names for the column levels. |
aggfunc |
Function or list of functions. Optional. The function to use for aggregating values (e.g., 'sum', 'mean', 'count'). Default is count. |
margins |
Boolean. Default False. Add row/column margins (subtotals). |
margins_name |
String. Default 'All'. The name of the row/column that will contain the totals when margins is True. |
normalize |
{'all', 'index', 'columns'} or 0, 1. Optional. Normalize the crosstab over the entire table (all), the rows (index), or the columns (columns). |
