Elasticsearch Python如何高效连接与查询数据？-杰瑞科技汇

Of course! Here is a comprehensive guide to using Elasticsearch with Python, covering everything from setup and connection to indexing, searching, and best practices.

（图片来源网络，侵删）

We will focus on the official Elasticsearch Python client, which is the recommended and most powerful way to interact with Elasticsearch from Python.

Prerequisites

Before you start, make sure you have:

Python 3.6+ installed on your system.

An Elasticsearch instance running.

The easiest way is via Docker:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.11.0

Or, you can download and run it directly from the Elasticsearch website.

Verify your cluster is running by visiting http://localhost:9200 in your browser. You should see a JSON response with cluster information.

Installation

First, you need to install the official Elasticsearch Python client. It's recommended to use a virtual environment.

# Create and activate a virtual environment (optional but good practice)
python -m venv es-env
source es-env/bin/activate  # On Windows: es-env\Scripts\activate
# Install the elasticsearch client library
pip install elasticsearch

Connecting to Elasticsearch

The first step in any Python script is to establish a connection to your Elasticsearch cluster.

（图片来源网络，侵删）

from elasticsearch import Elasticsearch
# By default, it tries to connect to localhost:9200
es = Elasticsearch()
# You can also specify the host and port explicitly
# es = Elasticsearch(["http://localhost:9200"])
# To verify the connection, you can ping the cluster
if es.ping():
    print("Successfully connected to Elasticsearch!")
else:
    print("Could not connect to Elasticsearch!")
# To see the cluster's information
# print(es.info())

For production, you should use environment variables for configuration (e.g., ELASTICSEARCH_URL).

Indexing Data (Creating Documents)

In Elasticsearch, you store data in indices (similar to databases in SQL). Within an index, you store documents (similar to rows/records).

There are two main ways to index data:

a) Indexing a Single Document

You use the index() method. If the document ID is not provided, Elasticsearch will generate one automatically.

（图片来源网络，侵删）

# Define the document data
doc = {
    "author": "John Doe",
    "text": "Elasticsearch is a powerful search and analytics engine.",
    "timestamp": "2025-10-27T10:00:00",
    "tags": ["search", "database", "nosql"]
}
# Index the document into the 'articles' index with ID 1
# The 'refresh' parameter makes the document searchable immediately (good for testing)
response = es.index(index="articles", id=1, document=doc, refresh="wait_for")
print(f"Document indexed with ID: {response['_id']}")
print(f"Version: {response['_version']}")

b) Indexing Multiple Documents (Bulk Indexing)

For better performance, it's highly recommended to use the bulk() helper function when indexing many documents.

from elasticsearch.helpers import bulk
# Define a list of documents to index
docs = [
    {
        "_index": "articles",
        "_id": 2,
        "_source": {
            "author": "Jane Smith",
            "text": "Python is a versatile programming language.",
            "timestamp": "2025-10-27T11:00:00",
            "tags": ["python", "programming"]
        }
    },
    {
        "_index": "articles",
        "_id": 3,
        "_source": {
            "author": "John Doe",
            "text": "Data analysis is made easy with Python libraries like Pandas.",
            "timestamp": "2025-10-27T12:00:00",
            "tags": ["python", "data", "analysis"]
        }
    }
]
# Use the bulk helper to index all documents at once
success, failed = bulk(es, docs)
print(f"Successfully indexed {success} documents.")
print(f"Failed to index {len(failed)} documents.")

Searching Data

This is where Elasticsearch shines. You can search using a simple query string or a powerful JSON-based query language (Query DSL).

a) Simple Query String Search

Good for quick, simple searches.

# Search for the term 'python' in all fields
query = {
    "query": {
        "query_string": {
            "query": "python"
        }
    }
}
# Execute the search
response = es.search(index="articles", body=query)
# Print the results
print(f"Found {response['hits']['total']['value']} documents.")
for hit in response['hits']['hits']:
    print(f"ID: {hit['_id']}, Author: {hit['_source']['author']}, Text: {hit['_source']['text']}")

b) Using the Query DSL (More Powerful & Recommended)

This gives you full control over your search. Let's search for documents where the author is "John Doe" AND the text contains "search".

# Define a more complex query
query = {
    "query": {
        "bool": {
            "must": [  # All clauses must match
                { "match": { "author": "John Doe" } },
                { "match": { "text": "search" } }
            ]
        }
    }
}
response = es.search(index="articles", body=query)
print(f"Found {response['hits']['total']['value']} documents matching the query.")
for hit in response['hits']['hits']:
    print(f"Score: {hit['_score']} -> ID: {hit['_id']}, Text: {hit['_source']['text']}")

Common Operations

a) Getting a Document by ID

# Get the document with ID '1'
response = es.get(index="articles", id=1)
if 'found' in response and response['found']:
    doc = response['_source']
    print(f"Found document: {doc}")
else:
    print("Document not found.")

b) Updating a Document

You can update a document entirely or use scripts for partial updates.

# Update the entire document with ID '1'
updated_doc = {
    "author": "John Doe (Updated)",
    "text": "Elasticsearch is a powerful search and analytics engine. It scales well!",
    "timestamp": "2025-10-27T10:00:00",
    "tags": ["search", "database", "nosql", "updated"]
}
es.index(index="articles", id=1, document=updated_doc, refresh="wait_for")
# Partial update using a script (e.g., increment a counter)
# script = {
#     "source": "ctx._source.views += 1",
#     "lang": "painless"
# }
# es.update(index="my_index", id=1, body={"script": script})

c) Deleting a Document

# Delete the document with ID '2'
response = es.delete(index="articles", id=2)
if response['result'] == 'deleted':
    print("Document deleted successfully.")

d) Deleting an Index

Warning: This is a destructive operation and will delete all data in the index.

# Delete the entire 'articles' index
if es.indices.exists(index="articles"):
    es.indices.delete(index="articles")
    print("Index 'articles' deleted.")
else:
    print("Index 'articles' does not exist.")

Working with Mappings (Data Types)

Mappings define the schema of your index, including the data type of each field. It's good practice to define mappings beforehand to ensure correct data handling and enable powerful features like full-text search.

# Define the mapping for the 'articles' index
mapping = {
    "mappings": {
        "properties": {
            "author": {
                "type": "text"  # Full-text search field
            },
            "text": {
                "type": "text",
                "analyzer": "english" # Use the English analyzer for better stemming
            },
            "timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
            },
            "tags": {
                "type": "keyword"  # Exact value field, good for filtering and aggregations
            }
        }
    }
}
# Create the index with the mapping
if not es.indices.exists(index="articles"):
    es.indices.create(index="articles", body=mapping)
    print("Index 'articles' created with mapping.")
else:
    print("Index 'articles' already exists.")

Best Practices

Use Bulk Operations: Always use elasticsearch.helpers.bulk for indexing, updating, or deleting large numbers of documents. It's significantly faster than making individual requests.
Manage Connections: For long-running applications (like web servers), create a single Elasticsearch client instance and reuse it. Don't create a new client for every request.
Handle Timeouts: Network issues can cause requests to hang. Use the timeout parameter in your calls (e.g., es.search(..., timeout=30)).
Error Handling: Elasticsearch operations can raise exceptions (e.g., connection errors, NotFoundError for a missing document). Use try...except blocks to handle them gracefully.
Use Async for High-Performance Apps: If you're building a high-performance application (e.g., an API), consider using the elasticsearch-async library for non-blocking, asynchronous operations.
Security: In production, always enable security features (TLS/SSL, authentication). The client library supports this by passing the appropriate URL (https://...) and API key/username credentials.

Elasticsearch Python如何高效连接与查询数据？

Prerequisites

Installation

Connecting to Elasticsearch

Indexing Data (Creating Documents)

a) Indexing a Single Document

b) Indexing Multiple Documents (Bulk Indexing)

Searching Data

a) Simple Query String Search

b) Using the Query DSL (More Powerful & Recommended)

Common Operations

a) Getting a Document by ID

b) Updating a Document

c) Deleting a Document

d) Deleting an Index

Working with Mappings (Data Types)

Best Practices

99ANYc3cd6

Python自然语言处理如何高效处理PDF文档？

Linux下如何执行Python脚本文件？

sourceinsight教程

Python jieba库具体怎么用？

matlab2025b安装教程

androidkiller安装教程

PS做网页？新手如何快速入门？

Java如何高效使用memcached？

autocad视频教程全集哪里能下载？

iPhone7plus使用教程从哪看？新手必学步骤有哪些？

Python loadworkbook如何高效读取Excel数据？

Python ImageEnhance如何实现图像增强？

python sqlpluspopen

java websocket 聊天

Java如何调用WebServer？

CentOS Java环境变量如何正确配置？

Elasticsearch Python如何高效连接与查询数据？

Prerequisites

Installation

Connecting to Elasticsearch

Indexing Data (Creating Documents)

a) Indexing a Single Document

b) Indexing Multiple Documents (Bulk Indexing)

Searching Data

a) Simple Query String Search

b) Using the Query DSL (More Powerful & Recommended)

Common Operations

a) Getting a Document by ID

b) Updating a Document

c) Deleting a Document

d) Deleting an Index

Working with Mappings (Data Types)

Best Practices

相关推荐

androidkiller安装教程