Of course! Here's a comprehensive guide to using the xlrd library in Python on a Linux system.

Overview
xlrd is a Python library for reading data and formatting information from Excel files (.xls and .xlsx). It's a fast and efficient library, but it's important to note its main limitation:
xlrd can only READ .xls files (Excel 97-2003) and .xlsx files (Excel 2007+) in "read-only" mode. It cannot create or modify Excel files.
For writing to Excel files, you would typically use a library like openpyxl or xlsxwriter.
Installation
First, you need to have Python and pip (Python's package installer) installed on your Linux system. If you don't, you can usually install them with your distribution's package manager (e.g., sudo apt-get install python3 python3-pip for Debian/Ubuntu).

Then, install xlrd using pip:
# It's best practice to use pip3 for Python 3 pip3 install xlrd
If you need to install it for a specific virtual environment, activate the environment first:
source /path/to/your/venv/bin/activate pip install xlrd
Basic Usage: Reading an Excel File
Let's assume you have an Excel file named data.xlsx with the following content in the first sheet:
| A | B | C | |
|---|---|---|---|
| 1 | Name | Age | City |
| 2 | Alice | 30 | New York |
| 3 | Bob | 25 | London |
| 4 | Charlie | 35 | Paris |
Here is the Python code to read this file.
import xlrd
# Define the path to your Excel file
file_path = 'data.xlsx'
try:
# Open the workbook
# The 'on_demand=True' parameter can be used to load only specific sheets,
# which is memory-efficient for large files.
workbook = xlrd.open_workbook(file_path, on_demand=False)
# Get the first sheet (by index or name)
# sheet = workbook.sheet_by_index(0)
sheet = workbook.sheet_by_name('Sheet1') # Make sure the sheet name is correct
print(f"Sheet Name: {sheet.name}")
print(f"Number of Rows: {sheet.nrows}")
print(f"Number of Columns: {sheet.ncols}")
print("-" * 20)
# --- Method 1: Iterate through all rows and cells ---
print("Iterating through all rows:")
for row_idx in range(sheet.nrows):
# Each row is a list of cell values
row = sheet.row_values(row_idx)
print(f"Row {row_idx}: {row}")
print("-" * 20)
# --- Method 2: Access specific cells by coordinates (row, col) ---
# Note: Row and column indices are 0-based.
print("Accessing specific cells:")
# Get the value from cell A2 (row 1, col 0)
name = sheet.cell_value(1, 0)
print(f"Name from A2: {name}")
# Get the value from cell B3 (row 2, col 1)
age = sheet.cell_value(2, 1)
print(f"Age from B3: {age}")
# --- Method 3: Get a full row or column as a list ---
print("\nGetting full rows and columns:")
header_row = sheet.row_values(0)
print(f"Header Row: {header_row}")
# Get all values from the 'City' column (column index 2)
city_column = sheet.col_values(2)
print(f"City Column: {city_column}")
except xlrd.XLRDError as e:
print(f"Error opening or reading the Excel file: {e}")
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
Explanation of Key Methods:
xlrd.open_workbook(file_path): Opens the Excel file.workbook.sheet_by_index(0): Gets a sheet by its position (0 is the first sheet).workbook.sheet_by_name('Sheet1'): Gets a sheet by its name.sheet.nrows: The total number of rows in the sheet.sheet.ncols: The total number of columns in the sheet.sheet.row_values(row_idx): Returns all cell values in a specific row as a list.sheet.col_values(col_idx): Returns all cell values in a specific column as a list.sheet.cell_value(row_idx, col_idx): Returns the value of a single cell.
Handling Cell Types
Excel stores different types of data (text, numbers, dates, etc.). xlrd provides a way to check the type of a cell.
xlrd.XL_CELL_TEXT(0)xlrd.XL_CELL_NUMBER(1)xlrd.XL_CELL_DATE(3)xlrd.XL_CELL_BOOLEAN(4)xlrd.XL_CELL_BLANK(5)
Important: Dates are tricky. xlrd stores them as a floating-point number (the number of days since 1900-01-01). You need to convert it to a Python datetime object.
Here's an example of handling different cell types:
import xlrd
from datetime import datetime, date
file_path = 'data_with_types.xlsx' # Assume this file has mixed data types
workbook = xlrd.open_workbook(file_path)
sheet = workbook.sheet_by_index(0)
# Iterate through rows, skipping the header
for row_idx in range(1, sheet.nrows):
for col_idx in range(sheet.ncols):
cell_value = sheet.cell_value(row_idx, col_idx)
cell_type = sheet.cell_type(row_idx, col_idx)
if cell_type == xlrd.XL_CELL_TEXT:
print(f"Cell ({row_idx},{col_idx}): Text -> '{cell_value}'")
elif cell_type == xlrd.XL_CELL_NUMBER:
print(f"Cell ({row_idx},{col_idx}): Number -> {cell_value}")
elif cell_type == xlrd.XL_CELL_DATE:
# Convert Excel date number to a Python datetime object
# The `xlrd.xldate_as_datetime` function is very helpful
date_obj = xlrd.xldate_as_datetime(cell_value, workbook.datemode)
print(f"Cell ({row_idx},{col_idx}): Date -> {date_obj} (Formatted: {date_obj.strftime('%Y-%m-%d')})")
elif cell_type == xlrd.XL_CELL_BOOLEAN:
print(f"Cell ({row_idx},{col_idx}): Boolean -> {bool(cell_value)}")
elif cell_type == xlrd.XL_CELL_BLANK:
print(f"Cell ({row_idx},{col_idx}): Blank")
print("-")
Common Pitfalls and Best Practices
xlrdCannot Write to.xlsxFiles: This is the most common mistake. If you try to usexlrdto create or modify an.xlsxfile, it will fail silently or raise an error. Useopenpyxlfor writing.- Date Handling: As shown above, dates are not handled out-of-the-box. You must use
xlrd.xldate_as_datetime()to convert them correctly. - File Not Found: Always wrap your file operations in a
try...except FileNotFoundErrorblock to handle cases where the Excel file doesn't exist. - Sheet Name is Case-Sensitive:
sheet_by_name('Sheet1')is different fromsheet_by_name('sheet1'). Be precise with the name. - Large Files: For very large Excel files, loading the entire workbook into memory can be slow. Use the
on_demand=Trueparameter when opening the workbook. This allows you to load sheets one by one as needed, saving memory.
# Memory-efficient way for large files
workbook = xlrd.open_workbook('huge_file.xlsx', on_demand=True)
# You can't get all sheet names immediately with on_demand=True
# You have to access them by index first
sheet = workbook.sheet_by_index(0)
print(f"Loaded sheet: {sheet.name}")
# ... do work on the sheet ...
# When you're done with a sheet, you can release its memory
workbook.unload_sheet(0)
Alternatives to xlrd
While xlrd is great for reading, the modern Python ecosystem offers other powerful libraries.
| Library | Can Read .xls? |
Can Read .xlsx? |
Can Write? | Notes |
|---|---|---|---|---|
xlrd |
✅ | ✅ | ❌ | Fast and lightweight for reading. The classic choice. |
openpyxl |
❌ | ✅ | ✅ | The modern standard. Slower than xlrd for reading but excellent for both reading and writing .xlsx files. |
pandas |
✅ | ✅ | ✅ | The best choice for data analysis. Uses xlrd or openpyxl under the hood. df = pd.read_excel('file.xlsx') is all you need. |
Example using pandas:
import pandas as pd
# This is the easiest way to read an Excel file into a data structure
# pandas will automatically use xlrd or openpyxl as needed.
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
print("DataFrame from Excel:")
print(df)
# Accessing data is very intuitive
print("\nNames column:")
print(df['Name'])
Summary
- Use
xlrdwhen you only need to read data from Excel files and performance is a critical concern. - Use
openpyxlwhen you need to both read and write.xlsxfiles. - Use
pandaswhen your goal is data analysis, manipulation, or scientific computing. It's the most convenient and powerful tool for this purpose.
