Of course! Python and ElastAlert are a powerful combination for monitoring and alerting on data in Elasticsearch. Here's a comprehensive guide covering what ElastAlert is, how to install it, and how to write rules.
What is ElastAlert?
ElastAlert is a simple, but powerful, framework for alerting on anomalies, spikes, or trends in your data as stored in Elasticsearch. It works by:
- Querying Elasticsearch: It runs periodic queries (e.g., "find all errors in the last 5 minutes") against your Elasticsearch indices.
- Applying Logic: It applies rules you define to the results of the query. This logic can be as simple as "if the count is greater than 10" or as complex as "if a new user logs in from 3 different countries in 1 minute."
- Triggering Alerts: If the rule's conditions are met, ElastAlert sends an alert through a specified channel (e.g., email, Slack, PagerDuty, custom webhook).
Key Concepts:
- Rule: A YAML file that defines what to search for and what to do when a match is found. This is the core of ElastAlert.
- Alert Type: The method of notification. ElastAlert supports many out-of-the-box (email, Slack, HipChat, PagerDuty, etc.) and allows for custom Python scripts.
- Query Type: The method used to search Elasticsearch. The most common is
terms_count, which counts documents matching a query. Others includeany(finds any matching document) andmetric_aggregation(for calculating sums, averages, etc.). - ElastAlert-Server: A newer, optional component that runs ElastAlert as a long-running web service, providing a UI for managing rules and viewing alert history.
Installation and Setup
Prerequisites
- Python 3.6+
- Elasticsearch (a running instance)
elasticsearchPython library:pip install elasticsearchelastalertPython library:pip install elastalert
Directory Structure
It's good practice to create a dedicated directory for ElastAlert.
mkdir elastalert cd elastalert
Inside this directory, you'll create:
config.yaml: The main configuration file.rules/: A directory to store all your rule YAML files.alert_test.py: A helper script to test your rules.
Configuration File (config.yaml)
This file tells ElastAlert how to connect to your Elasticsearch instance.
# config.yaml # Global Elasticsearch configuration es_host: "localhost" # or your Elasticsearch host es_port: 9200 # Optional: If you have security enabled # es_username: "your_username" # es_password: "your_password" # use_ssl: True # verify_certs: True # ca_certs: /path/to/your/ca.crt # The name of the index ElastAlert will write its status to # This is important for tracking rule runs and avoiding duplicate alerts writeback_index: elastalert_status # How often to run a query (in seconds) run_every: seconds: 60 # How long to wait for new data before considering a query "finished" # (e.g., if you have a 5-minute rule, this should be at least 5 minutes) buffer_time: minutes: 15 # The timezone for alerting and scheduling # 'utc' or a specific timezone like 'America/New_York' timezone: utc
Creating Your First Rule
Let's create a simple rule: Alert if there are more than 5 ERROR logs in the last 5 minutes.
Define the Rule (rules/my_first_rule.yaml)
Create a file named my_first_rule.yaml inside the rules directory.
# rules/my_first_rule.yaml
# Rule name and description
name: High Error Rate Alert
type: any
index: "logs-*" # The Elasticsearch index pattern to search
num_events: 5 # The number of events that must match the filter
timeframe:
minutes: 5 # The timeframe in which the events must occur
# Filter: The Elasticsearch query to find the documents
filter:
- query:
query_string:
query: "log_level: ERROR"
# Alert: What to do when the rule is triggered
alert:
- "email"
email:
- "your-email@example.com"
Let's break this down:
name: High Error Rate Alert: A descriptive name for the rule.type: any: This rule type triggers if it finds any document that meets thenum_eventscondition within thetimeframe. It will find 5 documents that areERRORlogs.index: "logs-*": The index pattern to search in.num_events: 5: The threshold.timeframe: { minutes: 5 }: The time window to look for the threshold.filter: This is the core Elasticsearch query. We are looking for documents where thelog_levelfield isERROR.alert: Specifies the alert type. Here, we're using the built-inemailtype.email: A list of email addresses to send the alert to.
Test the Rule
ElastAlert comes with a helper script to test your rules before you schedule them.
# Navigate to your elastalert directory cd /path/to/elastalert # Run the test script python -m elastalert.alert_test --config config.yaml --rule rules/my_first_rule.yaml
If your Elasticsearch instance has at least 5 ERROR logs in the last 5 minutes, you will see an output in your console that simulates the alert. If not, it will say no matches were found.
Run ElastAlert
Once your rule is tested, you can run ElastAlert in the foreground to start monitoring.
python -m elastalert.serve --config config.yaml
You will see logs as ElastAlert runs its queries. If a rule is triggered, you'll receive an email. To stop it, press Ctrl+C.
Advanced Rule Examples
Example 1: Spike Detection (Rate of Change)
This rule alerts if the rate of errors suddenly increases.
# rules/spike_detection_rule.yaml
name: Error Rate Spike
type: spike_aggregation
index: "logs-*"
# The query to find the documents to count
query:
query_string:
query: "log_level: ERROR"
# The time window to count events in
timeframe:
minutes: 5
# How to compare the current count to the previous count
spike_height: 2 # Current count must be > 2x the previous count
# The time window for the "previous" count
spike_type: "up" # Only trigger on an increase
# Optional: How many spikes to ignore after a trigger
# spike_count: 1
alert:
- "slack"
slack:
slack_webhook_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
Example 2: change Rule (New Value)
This rule alerts if a new value appears in a field for the first time. Useful for detecting new security events or application errors.
# rules/new_error_rule.yaml
name: New Error Type Detected
type: change
index: "logs-*"
# The field to watch for new values
query_key: "error_message"
# The query to find the documents
filter:
- query:
query_string:
query: "log_level: ERROR"
# Only alert if the new value has appeared at least once
num_events: 1
alert:
- "email"
email:
- "your-email@example.com"
Example 3: Custom Python Alert
Sometimes the built-in alerts aren't enough. You can write your own Python script.
Create the alert script (my_custom_alert.py)
# my_custom_alert.py
from elastalert.alerts import BaseAlert
from elastalert.util import EAException
class CustomAlert(BaseAlert):
# The 'required_fields' are passed from the rule's 'alert_args'
required_fields = ["webhook_url"]
def alert(self, matches):
# matches is a list of dictionaries, where each dict is a matching document
for match in matches:
try:
# Your custom logic here
print(f"Custom Alert Triggered! Match: {match}")
# Example: Send data to a custom webhook
# import requests
# requests.post(self.rule.get('alert_args').get('webhook_url'), json=match)
except Exception as e:
raise EAException(f"Error sending custom alert: {e}")
def get_info(self):
return {'type': 'Custom Alert'}
Update your rule to use the custom alert
# rules/custom_alert_rule.yaml
name: Custom Webhook Alert
type: any
index: "logs-*"
filter:
- query:
query_string:
query: "log_level: CRITICAL"
alert:
- "custom_alert" # The name of your class in the script
alert_args:
webhook_url: "https://api.your-service.com/webhook"
When you run this rule, it will execute the alert method in my_custom_alert.py.
Production Deployment: elastalert-server
For production, running elastalert.serve in a terminal is not ideal. The recommended way is to use elastalert-server.
-
Install
elastalert-server:pip install elastalert-server
-
Run the server:
elastalert-server --config config.yaml
This starts a web server, usually on
http://localhost:3030. -
Use the Web UI:
- You can now upload, manage, and test your rules through a web interface.
- It provides a dashboard to see rule status and a history of fired alerts.
- You can run ElastAlert as a background service (e.g., using
systemdorsupervisord) pointing to the server's command.
This setup is more robust, easier to manage, and provides better visibility into your alerting system.
