From sentRy to Sentry

blog-image

This post was originally posted on Interesting Retrospectives.

      ___           ___           ___                         ___
     /  /\         /  /\         /  /\          ___          /  /\          __
    /  /::\       /  /::\       /  /::|        /__/\        /  /::\        |  |\
   /__/:/\:\     /  /:/\:\     /  /:|:|        \  \:\      /  /:/\:\       |  |:|
  _\_ \:\ \:\   /  /::\ \:\   /  /:/|:|__       \__\:\    /  /::\ \:\      |  |:|
 /__/\ \:\ \:\ /__/:/\:\ \:\ /__/:/ |:| /\      /  /::\  /__/:/\:\_\:\     |__|:|__
 \  \:\ \:\_\/ \  \:\ \:\_\/ \__\/  |:|/:/     /  /:/\:\ \__\/~|::\/:/     /  /::::\
  \  \:\_\:\    \  \:\ \:\       |  |:/:/     /  /:/__\/    |  |:|::/     /  /:/~~~~
   \  \:\/:/     \  \:\_\/       |__|::/     /__/:/         |  |:|\/     /__/:/
    \  \::/       \  \:\         /__/:/      \__\/          |__|:|~      \__\/
     \__\/         \__\/         \__\/                       \__\|

tl;dr

image

Implemented a alerting (kinda hard to call it “monitoring”) system for Django with R Server and a Telegram bot.

A typical instance of its report is like this:

Some errors occurred in the last 15 minutes on the server, a copy of log will be processed by sentRy. It will identify those new errors which have not been reported yet and save them to local storage. Meanwhile, the incremental part will be parsed to a telegram bot, sending error summaries and a recent 12-hour bar chart to a channel. At the same time, an updated copy of notifications (of course, errors) will be synced to Shinyapps.io and the Shiny app should have the latest info displayed.

image

It was designed to fulfill a particular job and I guess it got things done to some extent. But recently, our dev team deployed a fully functional monitoring system called Sentry. I mean, what a coincidence. I had no idea about this and only named it after the sentry gun in Team Fortress 2.

Dependencies

  • tidyverse
  • telegram.bot
  • cronR
  • shinyFiles
  • aws.s3

How to use it (anyway)?

  1. In Telegram, create a new bot under the permission of @BotFather. Follow the order and make sure you have a valid API Token.
  2. You can test the basic message functionality with bot_script.r. But it won’t be needed in the main script.
  3. You can also test run the actual process with log_process.r.
  4. If there are problems no more, click Addin in RStudio and select “Schedule R scripts on Linux/Unix”.

image

  • bot_script.r The script for testing Telegram bot creation. Will be dropped later.
  • dashboard A shiny app displaying all the errors.
  • error.log An error log copied from Django.
  • global.r The script dedicated to setting up S3 connection.
  • log_process.r The main script which needs to run periodically.
  • notification.csv The formatted backup of errors captured by this monitoring script. It is de facto very similar to error.log.
  • settings.csv A file to save some setting parameters including the latest reported error time.
  • log_process.log The R console log for running log_process.r. Potentially useful when debugging.

Issues

  1. Even the files published to shinyapps.io do not involve any unchecked files (when publishing), they will be verified any ways, thus leading to some filename and path related error returning. A better idea is to create a separate folder and zip it before uploading. Since I claimed all the paths absolute in my code due to the limitation of cronR, I have to scp another copy to the directory of shiny after writing to notification.
  2. You can always refer to the “Log” tab in shiny app to debug. Really helpful.
  3. Very hard to read a csv file without any column names in S3 bucket. The function aws.s3::s3read_using(FUN, ..., object, bucket, opts = NULL) is problematic, as FUN cannot insert any extra parameters. It is a shame that readr::read_delim cannot be used as well. At last a blog post on Medium saved my life.
  4. Something weird happens when meta1 and meta2 extracted have different length. Specifically, meta1 with “ERROR|WARNING|CRITICAL” has fewer than the number of rows. That is to say, some lines are not starting with “ERROR…” Turns out my regex should start with a ^ otherwise things like " self._log(ERROR, msg, args, **kwargs)” could be matched as well. In short,
^
comments powered by Disqus