杰瑞科技汇

Python ElastAlert如何实现告警规则?

Of course! Python and ElastAlert are a powerful combination for monitoring and alerting on data in Elasticsearch. Here's a comprehensive guide covering what ElastAlert is, how to install it, and how to write rules.


What is ElastAlert?

ElastAlert is a simple, but powerful, framework for alerting on anomalies, spikes, or trends in your data as stored in Elasticsearch. It works by:

  1. Querying Elasticsearch: It runs periodic queries (e.g., "find all errors in the last 5 minutes") against your Elasticsearch indices.
  2. Applying Logic: It applies rules you define to the results of the query. This logic can be as simple as "if the count is greater than 10" or as complex as "if a new user logs in from 3 different countries in 1 minute."
  3. Triggering Alerts: If the rule's conditions are met, ElastAlert sends an alert through a specified channel (e.g., email, Slack, PagerDuty, custom webhook).

Key Concepts:

  • Rule: A YAML file that defines what to search for and what to do when a match is found. This is the core of ElastAlert.
  • Alert Type: The method of notification. ElastAlert supports many out-of-the-box (email, Slack, HipChat, PagerDuty, etc.) and allows for custom Python scripts.
  • Query Type: The method used to search Elasticsearch. The most common is terms_count, which counts documents matching a query. Others include any (finds any matching document) and metric_aggregation (for calculating sums, averages, etc.).
  • ElastAlert-Server: A newer, optional component that runs ElastAlert as a long-running web service, providing a UI for managing rules and viewing alert history.

Installation and Setup

Prerequisites

  • Python 3.6+
  • Elasticsearch (a running instance)
  • elasticsearch Python library: pip install elasticsearch
  • elastalert Python library: pip install elastalert

Directory Structure

It's good practice to create a dedicated directory for ElastAlert.

mkdir elastalert
cd elastalert

Inside this directory, you'll create:

  • config.yaml: The main configuration file.
  • rules/: A directory to store all your rule YAML files.
  • alert_test.py: A helper script to test your rules.

Configuration File (config.yaml)

This file tells ElastAlert how to connect to your Elasticsearch instance.

# config.yaml
# Global Elasticsearch configuration
es_host: "localhost"  # or your Elasticsearch host
es_port: 9200
# Optional: If you have security enabled
# es_username: "your_username"
# es_password: "your_password"
# use_ssl: True
# verify_certs: True
# ca_certs: /path/to/your/ca.crt
# The name of the index ElastAlert will write its status to
# This is important for tracking rule runs and avoiding duplicate alerts
writeback_index: elastalert_status
# How often to run a query (in seconds)
run_every:
  seconds: 60
# How long to wait for new data before considering a query "finished"
# (e.g., if you have a 5-minute rule, this should be at least 5 minutes)
buffer_time:
  minutes: 15
# The timezone for alerting and scheduling
# 'utc' or a specific timezone like 'America/New_York'
timezone: utc

Creating Your First Rule

Let's create a simple rule: Alert if there are more than 5 ERROR logs in the last 5 minutes.

Define the Rule (rules/my_first_rule.yaml)

Create a file named my_first_rule.yaml inside the rules directory.

# rules/my_first_rule.yaml
# Rule name and description
name: High Error Rate Alert
type: any
index: "logs-*" # The Elasticsearch index pattern to search
num_events: 5   # The number of events that must match the filter
timeframe:
  minutes: 5    # The timeframe in which the events must occur
# Filter: The Elasticsearch query to find the documents
filter:
  - query:
      query_string:
        query: "log_level: ERROR"
# Alert: What to do when the rule is triggered
alert:
  - "email"
email:
  - "your-email@example.com"

Let's break this down:

  • name: High Error Rate Alert: A descriptive name for the rule.
  • type: any: This rule type triggers if it finds any document that meets the num_events condition within the timeframe. It will find 5 documents that are ERROR logs.
  • index: "logs-*": The index pattern to search in.
  • num_events: 5: The threshold.
  • timeframe: { minutes: 5 }: The time window to look for the threshold.
  • filter: This is the core Elasticsearch query. We are looking for documents where the log_level field is ERROR.
  • alert: Specifies the alert type. Here, we're using the built-in email type.
  • email: A list of email addresses to send the alert to.

Test the Rule

ElastAlert comes with a helper script to test your rules before you schedule them.

# Navigate to your elastalert directory
cd /path/to/elastalert
# Run the test script
python -m elastalert.alert_test --config config.yaml --rule rules/my_first_rule.yaml

If your Elasticsearch instance has at least 5 ERROR logs in the last 5 minutes, you will see an output in your console that simulates the alert. If not, it will say no matches were found.

Run ElastAlert

Once your rule is tested, you can run ElastAlert in the foreground to start monitoring.

python -m elastalert.serve --config config.yaml

You will see logs as ElastAlert runs its queries. If a rule is triggered, you'll receive an email. To stop it, press Ctrl+C.


Advanced Rule Examples

Example 1: Spike Detection (Rate of Change)

This rule alerts if the rate of errors suddenly increases.

# rules/spike_detection_rule.yaml
name: Error Rate Spike
type: spike_aggregation
index: "logs-*"
# The query to find the documents to count
query:
  query_string:
    query: "log_level: ERROR"
# The time window to count events in
timeframe:
  minutes: 5
# How to compare the current count to the previous count
spike_height: 2  # Current count must be > 2x the previous count
# The time window for the "previous" count
spike_type: "up" # Only trigger on an increase
# Optional: How many spikes to ignore after a trigger
# spike_count: 1
alert:
  - "slack"
slack:
  slack_webhook_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"

Example 2: change Rule (New Value)

This rule alerts if a new value appears in a field for the first time. Useful for detecting new security events or application errors.

# rules/new_error_rule.yaml
name: New Error Type Detected
type: change
index: "logs-*"
# The field to watch for new values
query_key: "error_message"
# The query to find the documents
filter:
  - query:
      query_string:
        query: "log_level: ERROR"
# Only alert if the new value has appeared at least once
num_events: 1
alert:
  - "email"
email:
  - "your-email@example.com"

Example 3: Custom Python Alert

Sometimes the built-in alerts aren't enough. You can write your own Python script.

Create the alert script (my_custom_alert.py)

# my_custom_alert.py
from elastalert.alerts import BaseAlert
from elastalert.util import EAException
class CustomAlert(BaseAlert):
    # The 'required_fields' are passed from the rule's 'alert_args'
    required_fields = ["webhook_url"]
    def alert(self, matches):
        # matches is a list of dictionaries, where each dict is a matching document
        for match in matches:
            try:
                # Your custom logic here
                print(f"Custom Alert Triggered! Match: {match}")
                # Example: Send data to a custom webhook
                # import requests
                # requests.post(self.rule.get('alert_args').get('webhook_url'), json=match)
            except Exception as e:
                raise EAException(f"Error sending custom alert: {e}")
    def get_info(self):
        return {'type': 'Custom Alert'}

Update your rule to use the custom alert

# rules/custom_alert_rule.yaml
name: Custom Webhook Alert
type: any
index: "logs-*"
filter:
  - query:
      query_string:
        query: "log_level: CRITICAL"
alert:
  - "custom_alert" # The name of your class in the script
alert_args:
  webhook_url: "https://api.your-service.com/webhook"

When you run this rule, it will execute the alert method in my_custom_alert.py.


Production Deployment: elastalert-server

For production, running elastalert.serve in a terminal is not ideal. The recommended way is to use elastalert-server.

  1. Install elastalert-server:

    pip install elastalert-server
  2. Run the server:

    elastalert-server --config config.yaml

    This starts a web server, usually on http://localhost:3030.

  3. Use the Web UI:

    • You can now upload, manage, and test your rules through a web interface.
    • It provides a dashboard to see rule status and a history of fired alerts.
    • You can run ElastAlert as a background service (e.g., using systemd or supervisord) pointing to the server's command.

This setup is more robust, easier to manage, and provides better visibility into your alerting system.

分享:
扫描分享到社交APP
上一篇
下一篇