杰瑞科技汇

Python SolrClient如何连接与使用?

Of course! Here is a comprehensive guide to using Python with Solr, focusing on the most popular and robust client library, solr-py-client.

Python SolrClient如何连接与使用?-图1
(图片来源网络,侵删)

What is a Solr Client?

A Solr client is a Python library that allows your Python application to communicate with an Apache Solr server. It handles the low-level details of:

  • Constructing HTTP requests.
  • Formatting queries in Solr's specific query syntax (JSON).
  • Sending the request to the Solr server.
  • Parsing the JSON response from Solr into Python data structures (like dictionaries and lists).

This lets you focus on your application logic instead of the intricacies of HTTP and Solr's API.


The Recommended Library: solr-py-client

The de-facto standard and most widely used library is solr-py-client. It's mature, well-documented, and provides a simple, intuitive interface.

Installation

First, you need to install the library using pip:

Python SolrClient如何连接与使用?-图2
(图片来源网络,侵删)
pip install pysolr

Note: The package on PyPI is named pysolr, but the library itself is often referred to as solr-py-client in documentation.

Basic Setup

Before you can use the client, you need a running Solr instance with a "core" (or "collection" in SolrCloud mode) that you want to query.

Let's assume you have a Solr core named gettingstarted running at http://localhost:8983/solr/.

Here's how you initialize the client:

Python SolrClient如何连接与使用?-图3
(图片来源网络,侵删)
import pysolr
# Define the Solr URL for your core/collection
solr_url = 'http://localhost:8983/solr/gettingstarted'
# Initialize the Solr client
solr = pysolr.Solr(solr_url, timeout=10)
  • solr_url: The full URL to your Solr core.
  • timeout: A timeout in seconds for requests to the Solr server. This is good practice to prevent your application from hanging.

Core Operations with pysolr

Here are the most common operations you'll perform.

Indexing (Adding/Updating Documents)

To add data to Solr, you provide a list of Python dictionaries. Each dictionary represents a single document. The keys of the dictionary become the field names in Solr.

Important: The fields you use must be defined in your schema.xml file in Solr.

# A list of documents to add to Solr
docs_to_add = [
    {
        'id': 'doc_1',
        'name': 'Apple iPhone 13',
        'category': 'Electronics',
        'price': 799.99,
        'in_stock': True,
        'features': ['A15 Bionic chip', 'Dual-camera system', 'All-day battery life']
    },
    {
        'id': 'doc_2',
        'name': 'Samsung Galaxy S22',
        'category': 'Electronics',
        'price': 849.99,
        'in_stock': True,
        'features': ['Snapdragon 8 Gen 1', '108MP camera', 'Dynamic AMOLED 2X display']
    },
    {
        'id': 'doc_3',
        'name': 'The Great Gatsby',
        'category': 'Books',
        'author': 'F. Scott Fitzgerald',
        'price': 12.50,
        'in_stock': True,
        'features': ['Classic American novel', 'Jazz Age', 'Tragedy']
    }
]
# Add the documents to Solr
solr.add(docs_to_add)
print("Successfully added documents to Solr.")
  • solr.add() sends the documents to Solr.
  • If a document with the same id already exists, solr.add() will update it. This is an "upsert" operation.

Committing Changes

When you add or update documents, the changes are not immediately searchable. You need to "commit" them to make them visible in the index.

  • Automatic Commit: By default, Solr has an autoCommit setting that will commit changes after a certain amount of time or after a certain number of documents have been added. This is good for performance.
  • Manual Commit: For immediate results, especially in scripts or tests, you should manually commit.
# Manually commit the changes to make them searchable
solr.commit()
print("Changes committed and are now searchable.")

Searching (Querying)

This is the core of what Solr does. The search() method is used to find documents.

# Perform a simple query
# The first argument is the query string. '*' matches everything.
results = solr.search('*')
print(f"Found {results.hits} documents in {results.qtime} seconds.")
# Iterate over the results
for doc in results:
    print(f"ID: {doc['id']}, Name: {doc['name']}, Price: {doc['price']}")
# --- More Complex Queries ---
# Query for a specific term in the 'name' field
results = solr.search('name:iPhone')
# Query with a filter (fq)
# This finds all documents in the 'Electronics' category.
results = solr.search('*', fq='category:Electronics')
# Query with multiple filters and sorting
# Finds Electronics in stock, sorted by price ascending.
results = solr.search(
    'features:camera',  # The main query
    fq='category:Electronics AND in_stock:true', # Filters
    sort='price asc'     # Sorting
)
print(f"\nFound {results.hits} matching 'camera' in Electronics (in stock).")
for doc in results:
    print(f"ID: {doc['id']}, Name: {doc['name']}, Price: {doc['price']}")

The solr.search() method returns a Results object, which has useful attributes:

  • results.hits: The total number of documents matching the query.
  • results.docs: A list of the returned documents (as dictionaries).
  • results.qtime: The time Solr took to execute the query (in milliseconds).
  • results.debug: A dictionary of debug information, which is incredibly useful for performance tuning.

Deleting Documents

You can delete documents by their unique ID or by a query.

# --- Delete by ID ---
# Delete a single document
solr.delete('doc_1') # Deletes the document with id='doc_1'
solr.commit() # Commit the deletion
# Delete multiple documents by a list of IDs
solr.delete(['doc_2', 'doc_3'])
solr.commit()
# --- Delete by Query ---
# Delete all documents in the 'Books' category
solr.delete(q='category:Books')
solr.commit()
print("Documents deleted.")

Advanced Usage

Handling Data Types

pysolr is smart about data types. It will convert Python booleans (True, False) to Solr's true/false, and numbers to their string representations. For dates, you should provide them as strings in the correct Solr format (e.g., '2025-10-27T10:00:00Z').

Error Handling

Network issues or invalid Solr queries can raise exceptions. It's good practice to wrap your Solr calls in a try...except block.

import pysolr
import solr
solr = pysolr.Solr('http://localhost:8983/solr/gettingstarted', timeout=10)
try:
    # This will likely fail if the core doesn't exist
    results = solr.search('invalid_field:something')
    print(results.hits)
except solr.SolrConnectionError as e:
    print(f"Error connecting to Solr: {e}")
except solr.SolrError as e:
    print(f"A Solr error occurred: {e}")

Using SolrCloud

If you are using Solr in SolrCloud mode, the initialization is slightly different. You provide a list of Zookeeper hosts instead of a single core URL.

import pysolr
# For SolrCloud, provide a list of Zookeeper hosts
solr_cloud = pysolr.SolrCloud(
    zookeeper_hosts='localhost:9983,localhost:9984,localhost:9985',
    collection_name='my_collection',
    timeout=10
)
# Now you can use solr_cloud just like the regular Solr client
# solr_cloud.add(...)
# results = solr_cloud.search('*')

Alternative: The requests Library

For simple use cases or if you prefer to not add another dependency, you can use Python's built-in requests library. This gives you full control but requires you to manually construct the JSON payloads and parse the responses.

Example of adding a document with requests:

import requests
import json
solr_url = 'http://localhost:8983/solr/gettingstarted/update'
headers = {'Content-Type': 'application/json'}
# The document to add, formatted as a JSON list
# The 'commitWithin' parameter tells Solr to commit automatically within 1500 ms.
data = {
    "add": {
        "doc": {
            "id": "doc_requests_1",
            "name": "Document added with requests",
            "category": "Example"
        },
        "commitWithin": 1500
    }
}
response = requests.post(solr_url, data=json.dumps(data), headers=headers)
if response.status_code == 200:
    print("Document added successfully.")
    print(response.text)
else:
    print(f"Error: {response.status_code}")
    print(response.text)

This approach is more verbose but can be useful for understanding what's happening under the hood or for integrating with systems where you can't add new libraries.

Summary

Task pysolr Method Notes
Initialize pysolr.Solr(url) Standard way for a single core.
Add/Update solr.add(docs) Docs is a list of dicts.
Commit solr.commit() Makes changes searchable.
Search solr.search(query, **kwargs) Returns a Results object.
Delete by ID solr.delete(id) id can be a string or list of strings.
Delete by Query solr.delete(q='...') Deletes all documents matching the query.
SolrCloud pysolr.SolrCloud(zookeepers, collection) Use for distributed Solr setups.

For any serious project, pysolr is the recommended choice due to its simplicity, robustness, and feature set.

分享:
扫描分享到社交APP
上一篇
下一篇