MDC Logging with Camel, Spring boot & ELK (part 3)

-

This blog is part 3 of a 3-part blog series about Apache Camel, ELK, and (MDC) logging.

Part 1 described how you can centralize the logging from Spring Boot / Camel apps into Elasticsearch using MDC and filebeat. In part 2 we aggregated the logging from part 1 with help of logstash into a separate elasticsearch index, grouping messages and making it a bit more readable for everyone.
This 3rd part will use the aggregated logging from part 2 to create watches/alerts to notify you about errors or warnings (use case 1) and when some entry is missing in the log you expected to be there (use case 2)

Watcher

Now that we got a nicely aggregated index, it is also very easy to create alerts on it. For example “email me when an error occurs in the logging”, or “email me when the ECB exchange rate didn’t come in on time”. For this, we can use Watcher. As the Elastic website says: Watcher is an Elasticsearch feature that you can use to create actions based on conditions, which are periodically evaluated using queries on your data. Note that this a paid feature, available starting from the Gold subscription: see https://www.elastic.co/subscriptions. You can however request a trial to try it out for 30 days.

Watch definition

A watch consists of a trigger, input, condition< and actions. The actions define what needs to be done once the condition is met. In addition, you can define conditions and transforms to process and prepare the watch payload before executing the actions.

  • Trigger: Determines when the watch is checked. A watch must have a trigger.
  • Input: Loads data into the watch payload. If no input is specified, an empty payload is loaded.
  • Condition: Controls whether the watch actions are executed. If no condition is specified, the condition defaults to always.
  • Transform: Processes the watch payload to prepare it for the watch actions. You can define transforms at the watch level or define action-specific transforms. Optional.
  • Actions: Specify what happens when the watch condition is met, e.g. mail, send to pager, slack, jira, etc

You can create or edit a watch from the Management / Watcher (under Elasticsearch) menu in Kibana or create one via the Dev tools menu calling the Watcher API’s.
For example: create or update a watch with id ‘acme-watch-banks-ecb’ in the dev console:

PUT _watcher/watch/acme-watch-banks-ecb
{
  "trigger": {...},
  "input": {...},
  "condition": {...},
  "transform": {...},
  "actions": {...}
}

One powerful feature of the watch API is that you can test your watch right away, without waiting for the specified trigger to occur. For example, to test the above watch just do this:

POST _watcher/watch/acme-watch-banks-ecb/_execute

Use case 1: Log and mail on WARNING / ERROR in the last hour

We can search the aggregated index for the Status field to indicate an error or warning. Full example:

{
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": ["aggr*"],
        "rest_total_hits_as_int": true,
        "body": {
          "query": {
            "bool": {
              "should": [{
                  "match": {"Status.keyword": "UNKNOWN"}
                },{
                  "match": {"Status.keyword": "ERROR"}
              }],
              "minimum_should_match": 1,
              "filter": [{
                  "range": {
                    "@timestamp": {"from": "now-1h","to": "now"}
                    }
              }]
  }}}}}},
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "log": {
      "logging": {
        "level": "warn",
        "text": "Warning or errors detected: {{ctx.payload.hits.total}}"
      }
    },
    "email_it": {
      "email": {
        "profile": "standard",
        "priority": "high",
        "to": ["alerts@acme.com"],
        "subject": "There are {{ctx.payload.hits.total}}  UNKNOWNs / ERRORs detected in last 1 hour!",
       }   
    },
   "throttle_period_in_millis": 3600000
  }
}

Explanation:

  • Line 2-5: Trigger: we look every minutes to see if an error or warning appeared in the logging
  • Line 7-27: Search for ERROR or WARNING status in the last hour
  • Line 28-34: Perform the action if we find any matches (> 0)
  • Line 36-41: log the total number of errors & warnings in the log file
  • Line 42-49: mail it also
  • Line 50: don’t do it more than once an hour (to prevent mail flood)

Use case 2: Mail on missing log entry

We like to check if our ECB rate did come in every weekday. Complete example:

PUT _watcher/watch/acme-watch-banks-ecb
{
  "trigger": {
    "schedule": {
      "cron": "0 15 14 ? * MON-FRI"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": ["aggregated*"],
        "rest_total_hits_as_int": true,
        "body": {
          "query": {
            "bool": {
              must": [{
                 "match": {"Status.keyword": "COMPLETED"}
                }, {
                  "match": {"Scenario.keyword": "acme-banks-ecb"}
              }],
              "filter": [{
                  "range": {
                    "@timestamp": {
                      "from": "now-30m", "to": "now"
                    }
                  }
              }]
     } } } } }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {"not_eq": 1}
    }
  },
  "actions": {
    "email_it": {
      "email": {
        "to": ["alerts@acme.com"],
        "subject": "ERROR: Acme @ TEST: did not receive ECB rates in time",
        "body": {
          "text": "ECB rates should be received by 16:05 CET / 14:05 UTC every day"
        }
  } } }
}

Explanation

  • Line 3-6: the watch will run Monday to Friday at 14:15. Note: the rates are triggered to be fetched at 14:00 or so, so by 14:15 the rates should be there in the aggregated logging.
  • Line 8-30: We’re searching in the aggregated index for STATUS=COMPLETED of the scenario ‘acme-banks-ecb’. We’re looking back 30 minutes just to be sure we don’t miss it.
  • Line 31-35: if we don’t find anything (#hits not equal to 1) then a mail will be send
  • Line 36-44: mail the error to alerts@acme.com

For the mail action to work, you need to specify the SMTP credentials in /etc/elasticsearch/elasticsearch.yml
We’re using Outlook, so it looks something like this:

xpack.notification.email.account:
    exchange_account:
        profile: outlook
        email_defaults:
           from: 'The Watcher <alerts@acme.com>'
        smtp:
            auth: true
            starttls.enable: false
            host: smtp.acme.dummy.com
            user: acme-mail-user

Now, when the ECB rate did not came in on time, we will get an email of it.

There are a lot more things you can do with the watcher API; like aggregating on the Status field creating a count per different status, using other actions like sending a SMS or sending a text to a Slack channel.

That was it! Hope you find it usefull. Keep on searchin !

Be sure to also checkout my blog post about Health dashboard with Kibana Canvas!

Further reading