Of course! Let's dive into pyexcel-io. This is a foundational library within the pyexcel ecosystem, and understanding it is key to understanding how the whole system works.

What is pyexcel-io?
At its core, pyexcel-io is a low-level library for reading and writing data from/to various sources in a uniform way. It acts as the engine that handles the actual I/O operations, abstracting away the complexities of different file formats.
Think of it this way:
pyexcel(the main package): The high-level, user-friendly interface. You use this for 95% of your tasks. It's like the steering wheel and pedals of a car.pyexcel-io: The engine and transmission. It does the hard work of converting data into a standardized format and writing it to the disk, or reading from the disk and converting it back. You rarely interact with it directly unless you're creating a new source or format.
The main idea is separation of concerns:
- Data Representation:
pyexcelrepresents all tabular data as a simple sequence of sequences (a list of lists, or more specifically, apyexcel._sheet.Sheetobject). - I/O Handling:
pyexcel-iois responsible for taking this data and serializing it to a specific format (like CSV, XLSX) or deserializing it from a format into the data representation.
Key Concepts: Sources and Targets
The most important concepts in pyexcel-io are sources and targets.

- A Source: Anything you can read data from. This could be a file on your disk, a URL, a string in memory, or even a database connection.
- A Target: Anything you can write data to. This is typically a file or a stream in memory.
pyexcel-io provides a registry system where different source and target types are registered with specific "reader" and "writer" classes.
The Relationship: pyexcel -> pyexcel-io
When you use the main pyexcel library, it uses pyexcel-io behind the scenes.
Example: Reading a file with pyexcel
import pyexcel as p # 1. User calls a high-level pyexcel function # pyexcel looks at the file extension ".csv" sheet = p.get_sheet(file_name="my_data.csv") # 2. pyexcel-io is invoked # - It looks up the ".csv" extension in its registry. # - It finds the registered CSV source reader. # - It uses that reader to parse the file content. # - It returns the data to pyexcel in its standard format (a Sheet object). print(sheet) # Output: # pyexcel sheet: # name:my_data.csv +---------+---------+ | Name | Age | +---------+---------+ | Alice | 30 | | Bob | 25 | +---------+---------+
When Would You Use pyexcel-io Directly?
You would typically use pyexcel-io directly if you want to:

- Create a custom data source: For example, read data from an API response, a specific database table, or a log file that isn't a standard spreadsheet format.
- Create a custom data target: Write data to a specific database, a compressed stream, or a custom binary format.
- Understand the internals of how
pyexcelworks.
Practical Example: Creating a Custom Source
Let's create a simple custom source that reads data from a Python dictionary. We'll register it with pyexcel-io so that the main pyexcel library can use it.
Step 1: Define Your Custom Source Reader
You need to create a class that inherits from pyexcel_io.manager.NamedStream and implements the necessary methods.
import pyexcel_io
from pyexcel_io import manager
from pyexcel_io.plugin_api import ISheetReader, IReader
# Our data source
data_from_dict = {
"Sheet 1": [
["Name", "Age"],
["Alice", 30],
["Bob", 25],
]
}
# This class tells pyexcel-io how to read from our custom source
class DictReader(IReader):
def __init__(self, file_content, **keywords):
# file_content will be our dictionary
self._file_content = file_content
self._sheet_names = list(file_content.keys())
def read_sheet(self, sheet_index):
# This method is called for each sheet
sheet_name = self._sheet_names[sheet_index]
# The reader must return a generator of lists (rows)
return (row for row in self._file_content[sheet_name])
def get_sheet_stream(self):
# This is the main entry point for the reader
# It should return a generator of (sheet_name, sheet_reader)
for name in self._sheet_names:
yield name, self.read_sheet(self._sheet_names.index(name))
# This class registers our reader with a specific "type"
class DictSource(manager.Source):
def __init__(self, file_type, file_content, **keywords):
super().__init__(file_type, file_content, **keywords)
def get_reader(self):
# This method is called by pyexcel-io to get an instance of our reader
return DictReader(self._file_content, **self._keywords)
# The "type" can be anything, e.g., "dict", "my_api"
CUSTOM_SOURCE_TYPE = "dict"
Step 2: Register Your Custom Source
Now, we tell pyexcel-io that when it sees a source of type "dict", it should use our DictSource class.
# Register the source pyexcel_io.register_reader(CUSTOM_SOURCE_TYPE, DictSource)
Step 3: Use It with the Main pyexcel Library
Now, you can use pyexcel.get_sheet() as if it were a native file type!
# Use the custom source with the main pyexcel library
sheet = p.get_sheet(
file_type=CUSTOM_SOURCE_TYPE,
file_content=data_from_dict
)
print(sheet)
# Output:
# pyexcel sheet:
# name:Sheet 1
+-------+-----+
| Name | Age |
+-------+-----+
| Alice | 30 |
| Bob | 25 |
+-------+-----+
# You can now use all pyexcel features on this data
print("Alice's age:", sheet[1, 1])
# Output:
# Alice's age: 30
# You can even write it to a real file
p.save_as(
dest_file_name="from_dict_output.xlsx",
dest_sheet_name="New Data",
array=sheet.to_array()
)
Summary: pyexcel vs. pyexcel-io
| Feature | pyexcel (Main Package) |
pyexcel-io (Engine) |
|---|---|---|
| Purpose | High-level, easy-to-use API for data manipulation. | Low-level I/O handling engine. |
| Abstraction | Hides file formats and sources. You just pass a filename or URL. | Manages sources and targets. It knows about CSV, XLSX, etc. |
| Primary Users | End-users, data analysts, scientists. | Library authors, developers extending pyexcel. |
| Key Functions | get_sheet(), save_as(), get_book() |
register_reader(), register_writer(), source/target classes. |
| Data Flow | Accepts/returns Sheet and Book objects. |
Accepts/returns raw data (generators of lists) and streams. |
Installation
To install pyexcel-io, simply use pip:
pip install pyexcel-io
Note: You don't usually need to install pyexcel-io by itself if you install pyexcel with all its extras, as it will be included as a dependency.
# To get all supported formats pip install pyexcel[all]
