Of course! Here is a comprehensive guide to using Python with Solr, focusing on the most popular and robust client library, solr-py-client.

What is a Solr Client?
A Solr client is a Python library that allows your Python application to communicate with an Apache Solr server. It handles the low-level details of:
- Constructing HTTP requests.
- Formatting queries in Solr's specific query syntax (JSON).
- Sending the request to the Solr server.
- Parsing the JSON response from Solr into Python data structures (like dictionaries and lists).
This lets you focus on your application logic instead of the intricacies of HTTP and Solr's API.
The Recommended Library: solr-py-client
The de-facto standard and most widely used library is solr-py-client. It's mature, well-documented, and provides a simple, intuitive interface.
Installation
First, you need to install the library using pip:

pip install pysolr
Note: The package on PyPI is named pysolr, but the library itself is often referred to as solr-py-client in documentation.
Basic Setup
Before you can use the client, you need a running Solr instance with a "core" (or "collection" in SolrCloud mode) that you want to query.
Let's assume you have a Solr core named gettingstarted running at http://localhost:8983/solr/.
Here's how you initialize the client:

import pysolr # Define the Solr URL for your core/collection solr_url = 'http://localhost:8983/solr/gettingstarted' # Initialize the Solr client solr = pysolr.Solr(solr_url, timeout=10)
solr_url: The full URL to your Solr core.timeout: A timeout in seconds for requests to the Solr server. This is good practice to prevent your application from hanging.
Core Operations with pysolr
Here are the most common operations you'll perform.
Indexing (Adding/Updating Documents)
To add data to Solr, you provide a list of Python dictionaries. Each dictionary represents a single document. The keys of the dictionary become the field names in Solr.
Important: The fields you use must be defined in your schema.xml file in Solr.
# A list of documents to add to Solr
docs_to_add = [
{
'id': 'doc_1',
'name': 'Apple iPhone 13',
'category': 'Electronics',
'price': 799.99,
'in_stock': True,
'features': ['A15 Bionic chip', 'Dual-camera system', 'All-day battery life']
},
{
'id': 'doc_2',
'name': 'Samsung Galaxy S22',
'category': 'Electronics',
'price': 849.99,
'in_stock': True,
'features': ['Snapdragon 8 Gen 1', '108MP camera', 'Dynamic AMOLED 2X display']
},
{
'id': 'doc_3',
'name': 'The Great Gatsby',
'category': 'Books',
'author': 'F. Scott Fitzgerald',
'price': 12.50,
'in_stock': True,
'features': ['Classic American novel', 'Jazz Age', 'Tragedy']
}
]
# Add the documents to Solr
solr.add(docs_to_add)
print("Successfully added documents to Solr.")
solr.add()sends the documents to Solr.- If a document with the same
idalready exists,solr.add()will update it. This is an "upsert" operation.
Committing Changes
When you add or update documents, the changes are not immediately searchable. You need to "commit" them to make them visible in the index.
- Automatic Commit: By default, Solr has an
autoCommitsetting that will commit changes after a certain amount of time or after a certain number of documents have been added. This is good for performance. - Manual Commit: For immediate results, especially in scripts or tests, you should manually commit.
# Manually commit the changes to make them searchable
solr.commit()
print("Changes committed and are now searchable.")
Searching (Querying)
This is the core of what Solr does. The search() method is used to find documents.
# Perform a simple query
# The first argument is the query string. '*' matches everything.
results = solr.search('*')
print(f"Found {results.hits} documents in {results.qtime} seconds.")
# Iterate over the results
for doc in results:
print(f"ID: {doc['id']}, Name: {doc['name']}, Price: {doc['price']}")
# --- More Complex Queries ---
# Query for a specific term in the 'name' field
results = solr.search('name:iPhone')
# Query with a filter (fq)
# This finds all documents in the 'Electronics' category.
results = solr.search('*', fq='category:Electronics')
# Query with multiple filters and sorting
# Finds Electronics in stock, sorted by price ascending.
results = solr.search(
'features:camera', # The main query
fq='category:Electronics AND in_stock:true', # Filters
sort='price asc' # Sorting
)
print(f"\nFound {results.hits} matching 'camera' in Electronics (in stock).")
for doc in results:
print(f"ID: {doc['id']}, Name: {doc['name']}, Price: {doc['price']}")
The solr.search() method returns a Results object, which has useful attributes:
results.hits: The total number of documents matching the query.results.docs: A list of the returned documents (as dictionaries).results.qtime: The time Solr took to execute the query (in milliseconds).results.debug: A dictionary of debug information, which is incredibly useful for performance tuning.
Deleting Documents
You can delete documents by their unique ID or by a query.
# --- Delete by ID ---
# Delete a single document
solr.delete('doc_1') # Deletes the document with id='doc_1'
solr.commit() # Commit the deletion
# Delete multiple documents by a list of IDs
solr.delete(['doc_2', 'doc_3'])
solr.commit()
# --- Delete by Query ---
# Delete all documents in the 'Books' category
solr.delete(q='category:Books')
solr.commit()
print("Documents deleted.")
Advanced Usage
Handling Data Types
pysolr is smart about data types. It will convert Python booleans (True, False) to Solr's true/false, and numbers to their string representations. For dates, you should provide them as strings in the correct Solr format (e.g., '2025-10-27T10:00:00Z').
Error Handling
Network issues or invalid Solr queries can raise exceptions. It's good practice to wrap your Solr calls in a try...except block.
import pysolr
import solr
solr = pysolr.Solr('http://localhost:8983/solr/gettingstarted', timeout=10)
try:
# This will likely fail if the core doesn't exist
results = solr.search('invalid_field:something')
print(results.hits)
except solr.SolrConnectionError as e:
print(f"Error connecting to Solr: {e}")
except solr.SolrError as e:
print(f"A Solr error occurred: {e}")
Using SolrCloud
If you are using Solr in SolrCloud mode, the initialization is slightly different. You provide a list of Zookeeper hosts instead of a single core URL.
import pysolr
# For SolrCloud, provide a list of Zookeeper hosts
solr_cloud = pysolr.SolrCloud(
zookeeper_hosts='localhost:9983,localhost:9984,localhost:9985',
collection_name='my_collection',
timeout=10
)
# Now you can use solr_cloud just like the regular Solr client
# solr_cloud.add(...)
# results = solr_cloud.search('*')
Alternative: The requests Library
For simple use cases or if you prefer to not add another dependency, you can use Python's built-in requests library. This gives you full control but requires you to manually construct the JSON payloads and parse the responses.
Example of adding a document with requests:
import requests
import json
solr_url = 'http://localhost:8983/solr/gettingstarted/update'
headers = {'Content-Type': 'application/json'}
# The document to add, formatted as a JSON list
# The 'commitWithin' parameter tells Solr to commit automatically within 1500 ms.
data = {
"add": {
"doc": {
"id": "doc_requests_1",
"name": "Document added with requests",
"category": "Example"
},
"commitWithin": 1500
}
}
response = requests.post(solr_url, data=json.dumps(data), headers=headers)
if response.status_code == 200:
print("Document added successfully.")
print(response.text)
else:
print(f"Error: {response.status_code}")
print(response.text)
This approach is more verbose but can be useful for understanding what's happening under the hood or for integrating with systems where you can't add new libraries.
Summary
| Task | pysolr Method |
Notes |
|---|---|---|
| Initialize | pysolr.Solr(url) |
Standard way for a single core. |
| Add/Update | solr.add(docs) |
Docs is a list of dicts. |
| Commit | solr.commit() |
Makes changes searchable. |
| Search | solr.search(query, **kwargs) |
Returns a Results object. |
| Delete by ID | solr.delete(id) |
id can be a string or list of strings. |
| Delete by Query | solr.delete(q='...') |
Deletes all documents matching the query. |
| SolrCloud | pysolr.SolrCloud(zookeepers, collection) |
Use for distributed Solr setups. |
For any serious project, pysolr is the recommended choice due to its simplicity, robustness, and feature set.
