Merge pull request #10 from TwinProduction/notify-on-resolved

Support sending an alert when an unhealthy service becomes healthy again
Minor fix
2020-09-04 22:23:47 -04:00 · 2020-09-04 22:15:22 -04:00 · 2020-09-04 21:57:31 -04:00 · 2020-09-04 21:31:28 -04:00 · 2020-09-04 18:53:55 -04:00 · 2020-09-04 18:23:56 -04:00
28 changed files with 850 additions and 154 deletions
--- a/.github/assets/example.png
+++ b/.github/assets/example.png
--- a/.github/assets/slack-alerts.png
+++ b/.github/assets/slack-alerts.png
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,2 @@
 bin
-.idea
-test.db
-daily.db
+.idea
--- a/README.md
+++ b/README.md
@@ -1,6 +1,7 @@
-# gatus
+![Gatus](static/logo-with-name.png)

 ![build](https://github.com/TwinProduction/gatus/workflows/build/badge.svg?branch=master)
+[![Go Report Card](https://goreportcard.com/badge/github.com/TwinProduction/gatus)](https://goreportcard.com/report/github.com/TwinProduction/gatus)
 [![Docker pulls](https://img.shields.io/docker/pulls/twinproduction/gatus.svg)](https://cloud.docker.com/repository/docker/twinproduction/gatus)

 A service health dashboard in Go that is meant to be used as a docker 
@@ -10,6 +11,33 @@ I personally deploy it in my Kubernetes cluster and have it monitor the status o
 core applications: https://status.twinnation.org/


+## Table of Contents
+
+- [Features](#features)
+- [Usage](#usage)
+  - [Configuration](#configuration)
+  - [Conditions](#conditions)
+- [Docker](#docker)
+- [Running the tests](#running-the-tests)
+- [Using in Production](#using-in-production)
+- [FAQ](#faq)
+  - [Sending a GraphQL request](#sending-a-graphql-request)
+  - [Configuring Slack alerts](#configuring-slack-alerts)
+  - [Configuring Twilio alerts](#configuring-twilio-alerts)
+  - [Configuring custom alerts](#configuring-custom-alerts)
+
+
+## Features
+
+The main features of Gatus are:
+- **Highly flexible health check conditions**: While checking the response status may be enough for some use cases, Gatus goes much further and allows you to add conditions on the response time, the response body and even the IP address.
+- **Ability to use Gatus for user acceptance tests**: Thanks to the point above, you can leverage this application to create automated user acceptance tests.
+- **Very easy to configure**: Not only is the configuration designed to be as readable as possible, it's also extremely easy to add a new service or a new endpoint to monitor.
+- **Alerting**: While having a pretty visual dashboard is useful to keep track of the state of your application(s), you probably don't want to stare at it all day. Thus, notifications via Slack are supported out of the box with the ability to configure a custom alerting provider for any needs you might have, whether it be a different provider like PagerDuty or a custom application that manages automated rollbacks. 
+- **Metrics**
+- **Low resource consumption**: As with most Go applications, the resource footprint that this application requires is negligibly small.
+
+
 ## Usage

 By default, the configuration file is expected to be at `config/config.yaml`.
@@ -22,57 +50,80 @@ Here's a simple example:
 metrics: true         # Whether to expose metrics at /metrics
 services:
  - name: twinnation  # Name of your service, can be anything
-    url: https://twinnation.org/health
-    interval: 15s     # Duration to wait between every status check (default: 10s)
+    url: "https://twinnation.org/health"
+    interval: 30s     # Duration to wait between every status check (default: 60s)
    conditions:
      - "[STATUS] == 200"         # Status must be 200
      - "[BODY].status == UP"     # The json path "$.status" must be equal to UP
      - "[RESPONSE_TIME] < 300"   # Response time must be under 300ms
  - name: example
-    url: https://example.org/
+    url: "https://example.org/"
    interval: 30s
    conditions:
      - "[STATUS] == 200"
 ```

-Note that you can also add environment variables in the your configuration file (i.e. `$DOMAIN`, `${DOMAIN}`)
+This example would look like this:
+
+![Simple example](.github/assets/example.png)
+
+Note that you can also add environment variables in the configuration file (i.e. `$DOMAIN`, `${DOMAIN}`)


 ### Configuration

-| Parameter               | Description                                            | Default        |
-| ----------------------- | ------------------------------------------------------ | -------------- |
-| `metrics`               | Whether to expose metrics at /metrics                  | `false`        |
-| `services[].name`       | Name of the service. Can be anything.                  | Required `""`  |
-| `services[].url`        | URL to send the request to                             | Required `""`  |
-| `services[].conditions` | Conditions used to determine the health of the service | `[]`           |
-| `services[].interval`   | Duration to wait between every status check            | `10s`          |
-| `services[].method`     | Request method                                         | `GET`          |
-| `services[].body`       | Request body                                           | `""`           |
-| `services[].headers`    | Request headers                                        | `{}`           |
+| Parameter                              | Description                                                     | Default        |
+| -------------------------------------- | --------------------------------------------------------------- | -------------- |
+| `debug`                                | Whether to enable debug logs                                    | `false`        |
+| `metrics`                              | Whether to expose metrics at /metrics                           | `false`        |
+| `services`                             | List of services to monitor                                     | Required `[]`  |
+| `services[].name`                      | Name of the service. Can be anything.                           | Required `""`  |
+| `services[].url`                       | URL to send the request to                                      | Required `""`  |
+| `services[].conditions`                | Conditions used to determine the health of the service          | `[]`           |
+| `services[].interval`                  | Duration to wait between every status check                     | `60s`          |
+| `services[].method`                    | Request method                                                  | `GET`          |
+| `services[].graphql`                   | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false`        |
+| `services[].body`                      | Request body                                                    | `""`           |
+| `services[].headers`                   | Request headers                                                 | `{}`           |
+| `services[].alerts[].type`             | Type of alert. Valid types: `slack`, `twilio`, `custom`         | Required `""`  |
+| `services[].alerts[].enabled`          | Whether to enable the alert                                     | `false`        |
+| `services[].alerts[].threshold`        | Number of failures in a row needed before triggering the alert  | `3`            |
+| `services[].alerts[].description`      | Description of the alert. Will be included in the alert sent    | `""`           |
+| `services[].alerts[].send-on-resolved` | Whether to send a notification once a triggered alert subsides  | `false`        |
+| `alerting`                             | Configuration for alerting                                      | `{}`           |
+| `alerting.slack`                       | Webhook to use for alerts of type `slack`                       | `""`           |
+| `alerting.twilio`                      | Settings for alerts of type `twilio`                            | `""`           |
+| `alerting.twilio.sid`                  | Twilio account SID                                              | Required `""`  |
+| `alerting.twilio.token`                | Twilio auth token                                               | Required `""`  |
+| `alerting.twilio.from`                 | Number to send Twilio alerts from                               | Required `""`  |
+| `alerting.twilio.to`                   | Number to send twilio alerts to                                 | Required `""`  |
+| `alerting.custom`                      | Configuration for custom actions on failure or alerts           | `""`           |
+| `alerting.custom.url`                  | Custom alerting request url                                     | `""`           |
+| `alerting.custom.body`                 | Custom alerting request body.                                   | `""`           |
+| `alerting.custom.headers`              | Custom alerting request headers                                 | `{}`           |


 ### Conditions

 Here are some examples of conditions you can use:

-| Condition                             | Description                               | Passing values           | Failing values          |
-| ------------------------------------- | ----------------------------------------- | ------------------------ | ----------------------- |
-| `[STATUS] == 200`                     | Status must be equal to 200               | 200                      | 201, 404, 500           |
-| `[STATUS] < 300`                      | Status must lower than 300                | 200, 201, 299            | 301, 302, 400, 500      |
-| `[STATUS] <= 299`                     | Status must be less than or equal to 299  | 200, 201, 299            | 301, 302, 400, 500      |
-| `[STATUS] > 400`                      | Status must be greater than 400           | 401, 402, 403, 404       | 200, 201, 300, 400      |
-| `[RESPONSE_TIME] < 500`               | Response time must be below 500ms         | 100ms, 200ms, 300ms      | 500ms, 1500ms           |
-| `[BODY] == 1`                         | The body must be equal to 1               | 1                        | literally anything else |
-| `[BODY].data.id == 1`                 | The jsonpath `$.data.id` is equal to 1    | `{"data":{"id":1}}`      | literally anything else |
-| `[BODY].data[0].id == 1`              | The jsonpath `$.data[0].id` is equal to 1 | `{"data":[{"id":1}]}`    | literally anything else |
-
-**NOTE**: `[BODY]` with JSON path (i.e. `[BODY].id == 1`) is currently in BETA. For the most part, the only thing that doesn't work is arrays.
+| Condition                    | Description                                             | Passing values           | Failing values |
+| -----------------------------| ------------------------------------------------------- | ------------------------ | -------------- |
+| `[STATUS] == 200`            | Status must be equal to 200                             | 200                      | 201, 404, ...  |
+| `[STATUS] < 300`             | Status must lower than 300                              | 200, 201, 299            | 301, 302, ...  |
+| `[STATUS] <= 299`            | Status must be less than or equal to 299                | 200, 201, 299            | 301, 302, ...  |
+| `[STATUS] > 400`             | Status must be greater than 400                         | 401, 402, 403, 404       | 400, 200, ...  |
+| `[RESPONSE_TIME] < 500`      | Response time must be below 500ms                       | 100ms, 200ms, 300ms      | 500ms, 501ms   |
+| `[BODY] == 1`                | The body must be equal to 1                             | 1                        | Anything else  |
+| `[BODY].data.id == 1`        | The jsonpath `$.data.id` is equal to 1                  | `{"data":{"id":1}}`      |  |
+| `[BODY].data[0].id == 1`     | The jsonpath `$.data[0].id` is equal to 1               | `{"data":[{"id":1}]}`    |  |
+| `len([BODY].data) > 0`       | Array at jsonpath `$.data` has less than 5 elements     | `{"data":[{"id":1}]}`    |  |
+| `len([BODY].name) == 8`      | String at jsonpath `$.name` has a length of 8           | `{"name":"john.doe"}`    | `{"name":"bob"}` |


 ## Docker

-Building the Docker image is done as following:
+Building the Docker image is done as follows:

 ```
 docker build . -t gatus
@@ -95,3 +146,136 @@ go test ./... -mod vendor
 ## Using in Production

 See the [example](example) folder.
+
+
+## FAQ
+
+### Sending a GraphQL request
+
+By setting `services[].graphql` to true, the body will automatically be wrapped by the standard GraphQL `query` parameter.
+
+For instance, the following configuration:
+```yaml
+services:
+  - name: filter users by gender
+    url: http://localhost:8080/playground
+    method: POST
+    graphql: true
+    body: |
+      {
+        user(gender: "female") {
+          id
+          name
+          gender
+          avatar
+        }
+      }
+    headers:
+      Content-Type: application/json
+    conditions:
+      - "[STATUS] == 200"
+      - "[BODY].data.user[0].gender == female"
+```
+
+will send a `POST` request to `http://localhost:8080/playground` with the following body:
+```json
+{"query":"      {\n        user(gender: \"female\") {\n          id\n          name\n          gender\n          avatar\n        }\n      }"}
+```
+
+
+### Configuring Slack alerts
+
+```yaml
+alerting:
+  slack: "https://hooks.slack.com/services/**********/**********/**********"
+services:
+  - name: twinnation
+    interval: 30s
+    url: "https://twinnation.org/health"
+    alerts:
+      - type: slack
+        enabled: true
+        description: "healthcheck failed 3 times in a row"
+        send-on-resolved: true
+      - type: slack
+        enabled: true
+        threshold: 5
+        description: "healthcheck failed 5 times in a row"
+        send-on-resolved: true
+    conditions:
+      - "[STATUS] == 200"
+      - "[BODY].status == UP"
+      - "[RESPONSE_TIME] < 300"
+```
+
+Here's an example of what the notifications look like:
+
+![Slack notifications](.github/assets/slack-alerts.png)
+
+
+### Configuring Twilio alerts
+
+```yaml
+alerting:
+  twilio:
+    sid: "..."
+    token: "..."
+    from: "+1-234-567-8901"
+    to: "+1-234-567-8901"
+services:
+  - name: twinnation
+    interval: 30s
+    url: "https://twinnation.org/health"
+    alerts:
+      - type: twilio
+        enabled: true
+        threshold: 5
+        description: "healthcheck failed 5 times in a row"
+    conditions:
+      - "[STATUS] == 200"
+      - "[BODY].status == UP"
+      - "[RESPONSE_TIME] < 300"
+```
+
+
+### Configuring custom alerts
+
+While they're called alerts, you can use this feature to call anything. 
+
+For instance, you could automate rollbacks by having an application that keeps tracks of new deployments, and by 
+leveraging Gatus, you could have Gatus call that application endpoint when a service starts failing. Your application
+would then check if the service that started failing was recently deployed, and if it was, then automatically 
+roll it back.
+
+The values `[ALERT_DESCRIPTION]` and `[SERVICE_NAME]` are automatically substituted for the alert description and the 
+service name respectively in the body (`alerting.custom.body`) as well as the url (`alerting.custom.url`).
+
+If you have `send-on-resolved` set to `true`, you may want to use `[ALERT_TRIGGERED_OR_RESOLVED]` to differentiate
+the notifications. It will be replaced for either `TRIGGERED` or `RESOLVED`, based on the situation.
+
+For all intents and purpose, we'll configure the custom alert with a Slack webhook, but you can call anything you want.
+
+```yaml
+alerting:
+  custom:
+    url: "https://hooks.slack.com/services/**********/**********/**********"
+    method: "POST"
+    body: |
+      {
+        "text": "[ALERT_TRIGGERED_OR_RESOLVED]: [SERVICE_NAME] - [ALERT_DESCRIPTION]"
+      }
+services:
+  - name: twinnation
+    interval: 30s
+    url: "https://twinnation.org/health"
+    alerts:
+      - type: custom
+        enabled: true
+        threshold: 10
+        send-on-resolved: true
+        description: "healthcheck failed 10 times in a row"
+    conditions:
+      - "[STATUS] == 200"
+      - "[BODY].status == UP"
+      - "[RESPONSE_TIME] < 300"
+```
--- a/config.yaml
+++ b/config.yaml
@@ -1,21 +1,17 @@
 metrics: true
 services:
  - name: twinnation
-    interval: 10s
+    interval: 30s
    url: https://twinnation.org/health
    conditions:
      - "[STATUS] == 200"
      - "[BODY].status == UP"
      - "[RESPONSE_TIME] < 1000"
  - name: twinnation-articles-api
-    interval: 10s
-    url: https://twinnation.org/api/v1/articles/24
+    interval: 30s
+    url: "https://twinnation.org/api/v1/articles/24"
    conditions:
      - "[STATUS] == 200"
      - "[BODY].id == 24"
      - "[BODY].tags[0] == spring"
-  - name: example
-    url: https://example.org/
-    interval: 30s
-    conditions:
-      - "[STATUS] == 200"
+      - "len([BODY].tags) > 0"
--- a/config/config.go
+++ b/config/config.go
@@ -21,8 +21,10 @@ var (
 )

 type Config struct {
-	Metrics  bool            `yaml:"metrics"`
-	Services []*core.Service `yaml:"services"`
+	Metrics  bool                 `yaml:"metrics"`
+	Debug    bool                 `yaml:"debug"`
+	Alerting *core.AlertingConfig `yaml:"alerting"`
+	Services []*core.Service      `yaml:"services"`
 }

 func Get() *Config {
--- a/config/config_test.go
+++ b/config/config_test.go
@@ -2,6 +2,7 @@ package config

 import (
 	"fmt"
+	"github.com/TwinProduction/gatus/core"
 	"testing"
 	"time"
 )
@@ -23,6 +24,9 @@ services:
 	if err != nil {
 		t.Error("No error should've been returned")
 	}
+	if config == nil {
+		t.Fatal("Config shouldn't have been nil")
+	}
 	if len(config.Services) != 2 {
 		t.Error("Should have returned two services")
 	}
@@ -36,8 +40,8 @@ services:
 	if config.Services[0].Interval != 15*time.Second {
 		t.Errorf("Interval should have been %s", 15*time.Second)
 	}
-	if config.Services[1].Interval != 10*time.Second {
-		t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
+	if config.Services[1].Interval != 60*time.Second {
+		t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
 	}
 	if len(config.Services[0].Conditions) != 1 {
 		t.Errorf("There should have been %d conditions", 1)
@@ -58,14 +62,17 @@ services:
 	if err != nil {
 		t.Error("No error should've been returned")
 	}
+	if config == nil {
+		t.Fatal("Config shouldn't have been nil")
+	}
 	if config.Metrics {
 		t.Error("Metrics should've been false by default")
 	}
 	if config.Services[0].Url != "https://twinnation.org/actuator/health" {
 		t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
 	}
-	if config.Services[0].Interval != 10*time.Second {
-		t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
+	if config.Services[0].Interval != 60*time.Second {
+		t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
 	}
 }

@@ -81,14 +88,17 @@ services:
 	if err != nil {
 		t.Error("No error should've been returned")
 	}
+	if config == nil {
+		t.Fatal("Config shouldn't have been nil")
+	}
 	if !config.Metrics {
 		t.Error("Metrics should have been true")
 	}
 	if config.Services[0].Url != "https://twinnation.org/actuator/health" {
 		t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
 	}
-	if config.Services[0].Interval != 10*time.Second {
-		t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
+	if config.Services[0].Interval != 60*time.Second {
+		t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
 	}
 }

@@ -107,3 +117,62 @@ badconfig:
 		t.Error("The error returned should have been of type ErrNoServiceInConfig")
 	}
 }
+
+func TestParseAndValidateConfigBytesWithAlerting(t *testing.T) {
+	config, err := parseAndValidateConfigBytes([]byte(`
+alerting:
+  slack: "http://example.com"
+services:
+  - name: twinnation
+    url: https://twinnation.org/actuator/health
+    alerts:
+      - type: slack
+        enabled: true
+        threshold: 7
+        description: "Healthcheck failed 7 times in a row"
+    conditions:
+      - "[STATUS] == 200"
+`))
+	if err != nil {
+		t.Error("No error should've been returned")
+	}
+	if config == nil {
+		t.Fatal("Config shouldn't have been nil")
+	}
+	if config.Metrics {
+		t.Error("Metrics should've been false by default")
+	}
+	if config.Alerting == nil {
+		t.Fatal("config.AlertingConfig shouldn't have been nil")
+	}
+	if config.Alerting.Slack != "http://example.com" {
+		t.Errorf("Slack webhook should've been %s, but was %s", "http://example.com", config.Alerting.Slack)
+	}
+	if len(config.Services) != 1 {
+		t.Error("There should've been 1 service")
+	}
+	if config.Services[0].Url != "https://twinnation.org/actuator/health" {
+		t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
+	}
+	if config.Services[0].Interval != 60*time.Second {
+		t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
+	}
+	if config.Services[0].Alerts == nil {
+		t.Fatal("The service alerts shouldn't have been nil")
+	}
+	if len(config.Services[0].Alerts) != 1 {
+		t.Fatal("There should've been 1 alert configured")
+	}
+	if !config.Services[0].Alerts[0].Enabled {
+		t.Error("The alert should've been enabled")
+	}
+	if config.Services[0].Alerts[0].Threshold != 7 {
+		t.Errorf("The threshold of the alert should've been %d, but it was %d", 7, config.Services[0].Alerts[0].Threshold)
+	}
+	if config.Services[0].Alerts[0].Type != core.SlackAlert {
+		t.Errorf("The type of the alert should've been %s, but it was %s", core.SlackAlert, config.Services[0].Alerts[0].Type)
+	}
+	if config.Services[0].Alerts[0].Description != "Healthcheck failed 7 times in a row" {
+		t.Errorf("The type of the alert should've been %s, but it was %s", "Healthcheck failed 7 times in a row", config.Services[0].Alerts[0].Description)
+	}
+}
--- a/core/alert.go
+++ b/core/alert.go
@@ -0,0 +1,27 @@
+package core
+
+// Alert is the service's alert configuration
+type Alert struct {
+	// Type of alert
+	Type AlertType `yaml:"type"`
+
+	// Enabled defines whether or not the alert is enabled
+	Enabled bool `yaml:"enabled"`
+
+	// Threshold is the number of failures in a row needed before triggering the alert
+	Threshold int `yaml:"threshold"`
+
+	// Description of the alert. Will be included in the alert sent.
+	Description string `yaml:"description"`
+
+	// SendOnResolved defines whether to send a second notification when the issue has been resolved
+	SendOnResolved bool `yaml:"send-on-resolved"`
+}
+
+type AlertType string
+
+const (
+	SlackAlert  AlertType = "slack"
+	TwilioAlert AlertType = "twilio"
+	CustomAlert AlertType = "custom"
+)
--- a/core/alerting.go
+++ b/core/alerting.go
@@ -0,0 +1,149 @@
+package core
+
+import (
+	"bytes"
+	"encoding/base64"
+	"fmt"
+	"github.com/TwinProduction/gatus/client"
+	"net/http"
+	"net/url"
+	"strings"
+)
+
+type AlertingConfig struct {
+	Slack  string               `yaml:"slack"`
+	Twilio *TwilioAlertProvider `yaml:"twilio"`
+	Custom *CustomAlertProvider `yaml:"custom"`
+}
+
+type TwilioAlertProvider struct {
+	SID   string `yaml:"sid"`
+	Token string `yaml:"token"`
+	From  string `yaml:"from"`
+	To    string `yaml:"to"`
+}
+
+func (provider *TwilioAlertProvider) IsValid() bool {
+	return len(provider.Token) > 0 && len(provider.SID) > 0 && len(provider.From) > 0 && len(provider.To) > 0
+}
+
+type CustomAlertProvider struct {
+	Url     string            `yaml:"url"`
+	Method  string            `yaml:"method,omitempty"`
+	Body    string            `yaml:"body,omitempty"`
+	Headers map[string]string `yaml:"headers,omitempty"`
+}
+
+func (provider *CustomAlertProvider) IsValid() bool {
+	return len(provider.Url) > 0
+}
+
+func (provider *CustomAlertProvider) buildRequest(serviceName, alertDescription string, resolved bool) *http.Request {
+	body := provider.Body
+	providerUrl := provider.Url
+	if strings.Contains(body, "[ALERT_DESCRIPTION]") {
+		body = strings.ReplaceAll(body, "[ALERT_DESCRIPTION]", alertDescription)
+	}
+	if strings.Contains(body, "[SERVICE_NAME]") {
+		body = strings.ReplaceAll(body, "[SERVICE_NAME]", serviceName)
+	}
+	if strings.Contains(body, "[ALERT_TRIGGERED_OR_RESOLVED]") {
+		if resolved {
+			body = strings.ReplaceAll(body, "[ALERT_TRIGGERED_OR_RESOLVED]", "RESOLVED")
+		} else {
+			body = strings.ReplaceAll(body, "[ALERT_TRIGGERED_OR_RESOLVED]", "TRIGGERED")
+		}
+	}
+	if strings.Contains(providerUrl, "[ALERT_DESCRIPTION]") {
+		providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_DESCRIPTION]", alertDescription)
+	}
+	if strings.Contains(providerUrl, "[SERVICE_NAME]") {
+		providerUrl = strings.ReplaceAll(providerUrl, "[SERVICE_NAME]", serviceName)
+	}
+	if strings.Contains(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]") {
+		if resolved {
+			providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]", "RESOLVED")
+		} else {
+			providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]", "TRIGGERED")
+		}
+	}
+	bodyBuffer := bytes.NewBuffer([]byte(body))
+	request, _ := http.NewRequest(provider.Method, providerUrl, bodyBuffer)
+	for k, v := range provider.Headers {
+		request.Header.Set(k, v)
+	}
+	return request
+}
+
+func (provider *CustomAlertProvider) Send(serviceName, alertDescription string, resolved bool) error {
+	request := provider.buildRequest(serviceName, alertDescription, resolved)
+	response, err := client.GetHttpClient().Do(request)
+	if err != nil {
+		return err
+	}
+	if response.StatusCode > 399 {
+		return fmt.Errorf("call to provider alert returned status code %d", response.StatusCode)
+	}
+	return nil
+}
+
+func CreateSlackCustomAlertProvider(slackWebHookUrl string, service *Service, alert *Alert, result *Result, resolved bool) *CustomAlertProvider {
+	var message string
+	var color string
+	if resolved {
+		message = fmt.Sprintf("An alert for *%s* has been resolved after %d failures in a row", service.Name, service.NumberOfFailuresInARow)
+		color = "#36A64F"
+	} else {
+		message = fmt.Sprintf("An alert for *%s* has been triggered", service.Name)
+		color = "#DD0000"
+	}
+	var results string
+	for _, conditionResult := range result.ConditionResults {
+		var prefix string
+		if conditionResult.Success {
+			prefix = ":heavy_check_mark:"
+		} else {
+			prefix = ":x:"
+		}
+		results += fmt.Sprintf("%s - `%s`\n", prefix, conditionResult.Condition)
+	}
+	return &CustomAlertProvider{
+		Url:    slackWebHookUrl,
+		Method: "POST",
+		Body: fmt.Sprintf(`{
+  "text": "",
+  "attachments": [
+    {
+      "title": ":helmet_with_white_cross: Gatus",
+      "text": "%s:\n> %s",
+      "short": false,
+      "color": "%s",
+      "fields": [
+        {
+          "title": "Condition results",
+          "value": "%s",
+          "short": false
+        }
+      ]
+    },
+  ]
+}`, message, alert.Description, color, results),
+		Headers: map[string]string{"Content-Type": "application/json"},
+	}
+}
+
+func CreateTwilioCustomAlertProvider(provider *TwilioAlertProvider, message string) *CustomAlertProvider {
+	return &CustomAlertProvider{
+		Url:    fmt.Sprintf("https://api.twilio.com/2010-04-01/Accounts/%s/Messages.json", provider.SID),
+		Method: "POST",
+		Body: url.Values{
+			"To":   {provider.To},
+			"From": {provider.From},
+			"Body": {message},
+		}.Encode(),
+		Headers: map[string]string{
+			"Content-Type":  "application/x-www-form-urlencoded",
+			"Authorization": fmt.Sprintf("Basic %s", base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%s:%s", provider.SID, provider.Token)))),
+		},
+	}
+}
--- a/core/condition.go
+++ b/core/condition.go
@@ -41,10 +41,10 @@ func (c *Condition) evaluate(result *Result) bool {
 		return false
 	}
 	conditionToDisplay := condition
-	// If the condition isn't a success, return the resolved condition
+	// If the condition isn't a success, return what the resolved condition was too
 	if !success {
 		log.Printf("[Condition][evaluate] Condition '%s' did not succeed because '%s' is false", condition, resolvedCondition)
-		conditionToDisplay = resolvedCondition
+		conditionToDisplay = fmt.Sprintf("%s (%s)", condition, resolvedCondition)
 	}
 	result.ConditionResults = append(result.ConditionResults, &ConditionResult{Condition: conditionToDisplay, Success: success})
 	return success
--- a/core/condition_test.go
+++ b/core/condition_test.go
@@ -166,3 +166,21 @@ func TestCondition_evaluateWithBodyJsonPathComplexIntFailureUsingLessThan(t *tes
 		t.Errorf("Condition '%s' should have been a failure", condition)
 	}
 }
+
+func TestCondition_evaluateWithBodySliceLength(t *testing.T) {
+	condition := Condition("len([BODY].data) == 3")
+	result := &Result{Body: []byte("{\"data\": [{\"id\": 1}, {\"id\": 2}, {\"id\": 3}]}")}
+	condition.evaluate(result)
+	if !result.ConditionResults[0].Success {
+		t.Errorf("Condition '%s' should have been a success", condition)
+	}
+}
+
+func TestCondition_evaluateWithBodyStringLength(t *testing.T) {
+	condition := Condition("len([BODY].name) == 8")
+	result := &Result{Body: []byte("{\"name\": \"john.doe\"}")}
+	condition.evaluate(result)
+	if !result.ConditionResults[0].Success {
+		t.Errorf("Condition '%s' should have been a success", condition)
+	}
+}
--- a/core/service.go
+++ b/core/service.go
@@ -2,6 +2,7 @@ package core

 import (
 	"bytes"
+	"encoding/json"
 	"errors"
 	"github.com/TwinProduction/gatus/client"
 	"io/ioutil"
@@ -16,20 +17,42 @@ var (
 	ErrNoUrl       = errors.New("you must specify an url for each service")
 )

+// Service is the configuration of a monitored endpoint
 type Service struct {
-	Name       string            `yaml:"name"`
-	Url        string            `yaml:"url"`
-	Method     string            `yaml:"method,omitempty"`
-	Body       string            `yaml:"body,omitempty"`
-	Headers    map[string]string `yaml:"headers,omitempty"`
-	Interval   time.Duration     `yaml:"interval,omitempty"`
-	Conditions []*Condition      `yaml:"conditions"`
+	// Name of the service. Can be anything.
+	Name string `yaml:"name"`
+
+	// URL to send the request to
+	Url string `yaml:"url"`
+
+	// Method of the request made to the url of the service
+	Method string `yaml:"method,omitempty"`
+
+	// Body of the request
+	Body string `yaml:"body,omitempty"`
+
+	// GraphQL is whether to wrap the body in a query param ({"query":"$body"})
+	GraphQL bool `yaml:"graphql,omitempty"`
+
+	// Headers of the request
+	Headers map[string]string `yaml:"headers,omitempty"`
+
+	// Interval is the duration to wait between every status check
+	Interval time.Duration `yaml:"interval,omitempty"`
+
+	// Conditions used to determine the health of the service
+	Conditions []*Condition `yaml:"conditions"`
+
+	// Alerts is the alerting configuration for the service in case of failure
+	Alerts []*Alert `yaml:"alerts"`
+
+	NumberOfFailuresInARow int
 }

 func (service *Service) Validate() {
 	// Set default values
 	if service.Interval == 0 {
-		service.Interval = 10 * time.Second
+		service.Interval = 1 * time.Minute
 	}
 	if len(service.Method) == 0 {
 		service.Method = http.MethodGet
@@ -37,6 +60,11 @@ func (service *Service) Validate() {
 	if len(service.Headers) == 0 {
 		service.Headers = make(map[string]string)
 	}
+	for _, alert := range service.Alerts {
+		if alert.Threshold <= 0 {
+			alert.Threshold = 3
+		}
+	}
 	if len(service.Url) == 0 {
 		panic(ErrNoUrl)
 	}
@@ -69,6 +97,20 @@ func (service *Service) EvaluateConditions() *Result {
 	return result
 }

+func (service *Service) GetAlertsTriggered() []Alert {
+	var alerts []Alert
+	if service.NumberOfFailuresInARow == 0 {
+		return alerts
+	}
+	for _, alert := range service.Alerts {
+		if alert.Enabled && alert.Threshold == service.NumberOfFailuresInARow {
+			alerts = append(alerts, *alert)
+			continue
+		}
+	}
+	return alerts
+}
+
 func (service *Service) getIp(result *Result) {
 	urlObject, err := url.Parse(service.Url)
 	if err != nil {
@@ -102,7 +144,17 @@ func (service *Service) call(result *Result) {
 }

 func (service *Service) buildRequest() *http.Request {
-	request, _ := http.NewRequest(service.Method, service.Url, bytes.NewBuffer([]byte(service.Body)))
+	var bodyBuffer *bytes.Buffer
+	if service.GraphQL {
+		graphQlBody := map[string]string{
+			"query": service.Body,
+		}
+		body, _ := json.Marshal(graphQlBody)
+		bodyBuffer = bytes.NewBuffer(body)
+	} else {
+		bodyBuffer = bytes.NewBuffer([]byte(service.Body))
+	}
+	request, _ := http.NewRequest(service.Method, service.Url, bodyBuffer)
 	for k, v := range service.Headers {
 		request.Header.Set(k, v)
 	}
--- a/core/util.go
+++ b/core/util.go
@@ -13,6 +13,9 @@ const (
 	ResponseTimePlaceHolder = "[RESPONSE_TIME]"
 	BodyPlaceHolder         = "[BODY]"

+	LengthFunctionPrefix = "len("
+	FunctionSuffix       = ")"
+
 	InvalidConditionElementSuffix = "(INVALID)"
 )

@@ -32,13 +35,22 @@ func sanitizeAndResolve(list []string, result *Result) []string {
 			element = body
 		default:
 			// if starts with BodyPlaceHolder, then evaluate json path
-			if strings.HasPrefix(element, BodyPlaceHolder) {
-				resolvedElement, err := jsonpath.Eval(strings.Replace(element, fmt.Sprintf("%s.", BodyPlaceHolder), "", 1), result.Body)
+			if strings.Contains(element, BodyPlaceHolder) {
+				wantLength := false
+				if strings.HasPrefix(element, LengthFunctionPrefix) && strings.HasSuffix(element, FunctionSuffix) {
+					wantLength = true
+					element = strings.TrimSuffix(strings.TrimPrefix(element, LengthFunctionPrefix), FunctionSuffix)
+				}
+				resolvedElement, resolvedElementLength, err := jsonpath.Eval(strings.Replace(element, fmt.Sprintf("%s.", BodyPlaceHolder), "", 1), result.Body)
 				if err != nil {
 					result.Errors = append(result.Errors, err.Error())
 					element = fmt.Sprintf("%s %s", element, InvalidConditionElementSuffix)
 				} else {
-					element = resolvedElement
+					if wantLength {
+						element = fmt.Sprintf("%d", resolvedElementLength)
+					} else {
+						element = resolvedElement
+					}
 				}
 			}
 		}
--- a/example/docker-compose-grafana-prometheus/config.yaml
+++ b/example/docker-compose-grafana-prometheus/config.yaml
@@ -2,11 +2,12 @@ metrics: true
 services:
  - name: TwiNNatioN
    url: https://twinnation.org/health
-    interval: 10s
+    interval: 30s
    conditions:
      - "[STATUS] == 200"
  - name: GitHub
    url: https://api.github.com/healthz
+    interval: 5m
    conditions:
      - "[STATUS] == 200"
  - name: Example
--- a/example/kubernetes/gatus.yaml
+++ b/example/kubernetes/gatus.yaml
@@ -10,6 +10,7 @@ data:
          - "[STATUS] == 200"
      - name: GitHub
        url: https://api.github.com/healthz
+        interval: 5m
        conditions:
          - "[STATUS] == 200"
      - name: Example
--- a/go.mod
+++ b/go.mod
@@ -1,6 +1,6 @@
 module github.com/TwinProduction/gatus

-go 1.14
+go 1.15

 require (
 	github.com/prometheus/client_golang v1.2.1
--- a/go.sum
+++ b/go.sum
@@ -18,6 +18,7 @@ github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/me
 github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
 github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
 github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
+github.com/golang/protobuf v1.3.2 h1:6nsPYzhq5kReh6QImI3k5qWzO4PEbvbIW2cwSfR/6xs=
 github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
 github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
 github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
--- a/jsonpath/jsonpath.go
+++ b/jsonpath/jsonpath.go
@@ -8,26 +8,30 @@ import (
 )

 // Eval is a half-baked json path implementation that needs some love
-func Eval(path string, b []byte) (string, error) {
+func Eval(path string, b []byte) (string, int, error) {
 	var object interface{}
 	err := json.Unmarshal(b, &object)
 	if err != nil {
 		// Try to unmarshal it into an array instead
-		return "", err
+		return "", 0, err
 	}
 	return walk(path, object)
 }

-func walk(path string, object interface{}) (string, error) {
+func walk(path string, object interface{}) (string, int, error) {
 	keys := strings.Split(path, ".")
 	currentKey := keys[0]
 	switch value := extractValue(currentKey, object).(type) {
 	case map[string]interface{}:
 		return walk(strings.Replace(path, fmt.Sprintf("%s.", currentKey), "", 1), value)
+	case string:
+		return value, len(value), nil
+	case []interface{}:
+		return fmt.Sprintf("%v", value), len(value), nil
 	case interface{}:
-		return fmt.Sprintf("%v", value), nil
+		return fmt.Sprintf("%v", value), 1, nil
 	default:
-		return "", fmt.Errorf("couldn't walk through '%s' because type was '%T', but expected 'map[string]interface{}'", currentKey, value)
+		return "", 0, fmt.Errorf("couldn't walk through '%s' because type was '%T', but expected 'map[string]interface{}'", currentKey, value)
 	}
 }

--- a/jsonpath/jsonpath_test.go
+++ b/jsonpath/jsonpath_test.go
@@ -8,10 +8,13 @@ func TestEval(t *testing.T) {

 	expectedOutput := "value"

-	output, err := Eval(path, []byte(data))
+	output, outputLength, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
+	if outputLength != len(expectedOutput) {
+		t.Errorf("Expected output length to be %v, but was %v", len(expectedOutput), outputLength)
+	}
 	if output != expectedOutput {
 		t.Errorf("Expected output to be %v, but was %v", expectedOutput, output)
 	}
@@ -23,7 +26,7 @@ func TestEvalWithLongSimpleWalk(t *testing.T) {

 	expectedOutput := "value"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
@@ -38,10 +41,11 @@ func TestEvalWithArrayOfMaps(t *testing.T) {

 	expectedOutput := "2"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
+
 	if output != expectedOutput {
 		t.Errorf("Expected output to be %v, but was %v", expectedOutput, output)
 	}
@@ -53,7 +57,7 @@ func TestEvalWithArrayOfValues(t *testing.T) {

 	expectedOutput := "1"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
@@ -68,7 +72,7 @@ func TestEvalWithRootArrayOfValues(t *testing.T) {

 	expectedOutput := "2"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
@@ -83,7 +87,7 @@ func TestEvalWithRootArrayOfMaps(t *testing.T) {

 	expectedOutput := "1"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
@@ -96,7 +100,7 @@ func TestEvalWithRootArrayOfMapsUsingInvalidArrayIndex(t *testing.T) {
 	path := "[5].id"
 	data := `[{"id": 1}, {"id": 2}]`

-	_, err := Eval(path, []byte(data))
+	_, _, err := Eval(path, []byte(data))
 	if err == nil {
 		t.Error("Should've returned an error, but didn't")
 	}
@@ -108,7 +112,7 @@ func TestEvalWithLongWalkAndArray(t *testing.T) {

 	expectedOutput := "1"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
@@ -123,7 +127,7 @@ func TestEvalWithNestedArray(t *testing.T) {

 	expectedOutput := "7"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
@@ -138,7 +142,7 @@ func TestEvalWithMapOfNestedArray(t *testing.T) {

 	expectedOutput := "e"

-	output, err := Eval(path, []byte(data))
+	output, _, err := Eval(path, []byte(data))
 	if err != nil {
 		t.Error("Didn't expect any error, but got", err)
 	}
--- a/main.go
+++ b/main.go
@@ -1,13 +1,24 @@
 package main

 import (
-	"encoding/json"
+	"bytes"
+	"compress/gzip"
 	"github.com/TwinProduction/gatus/config"
 	"github.com/TwinProduction/gatus/watchdog"
 	"github.com/prometheus/client_golang/prometheus/promhttp"
 	"log"
 	"net/http"
 	"os"
+	"strings"
+	"time"
+)
+
+const CacheTTL = 10 * time.Second
+
+var (
+	cachedServiceResults          []byte
+	cachedServiceResultsGzipped   []byte
+	cachedServiceResultsTimestamp time.Time
 )

 func main() {
@@ -37,14 +48,29 @@ func loadConfiguration() *config.Config {
 	return config.Get()
 }

-func serviceResultsHandler(writer http.ResponseWriter, _ *http.Request) {
-	serviceResults := watchdog.GetServiceResults()
-	data, err := json.Marshal(serviceResults)
-	if err != nil {
-		log.Printf("[main][serviceResultsHandler] Unable to marshall object to JSON: %s", err.Error())
-		writer.WriteHeader(http.StatusInternalServerError)
-		_, _ = writer.Write([]byte("Unable to marshall object to JSON"))
-		return
+func serviceResultsHandler(writer http.ResponseWriter, r *http.Request) {
+	if isExpired := cachedServiceResultsTimestamp.IsZero() || time.Now().Sub(cachedServiceResultsTimestamp) > CacheTTL; isExpired {
+		buffer := &bytes.Buffer{}
+		gzipWriter := gzip.NewWriter(buffer)
+		data, err := watchdog.GetJsonEncodedServiceResults()
+		if err != nil {
+			log.Printf("[main][serviceResultsHandler] Unable to marshal object to JSON: %s", err.Error())
+			writer.WriteHeader(http.StatusInternalServerError)
+			_, _ = writer.Write([]byte("Unable to marshal object to JSON"))
+			return
+		}
+		gzipWriter.Write(data)
+		gzipWriter.Close()
+		cachedServiceResults = data
+		cachedServiceResultsGzipped = buffer.Bytes()
+		cachedServiceResultsTimestamp = time.Now()
+	}
+	var data []byte
+	if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
+		writer.Header().Set("Content-Encoding", "gzip")
+		data = cachedServiceResultsGzipped
+	} else {
+		data = cachedServiceResults
 	}
 	writer.Header().Add("Content-type", "application/json")
 	writer.WriteHeader(http.StatusOK)
--- a/static/favicon.ico
+++ b/static/favicon.ico
--- a/static/index.html
+++ b/static/index.html
@@ -31,11 +31,18 @@
 		}
 		.status {
 			cursor: pointer;
-			transition: opacity 500ms ease-in-out;
+			transition: all 500ms ease-in-out;
+			overflow-x: hidden;
+			padding: .25em 0;
+			color: white;
+		}
+		.title {
+			font-size: 2.5rem;
 		}
 		.status:hover {
 			opacity: 0.7;
 			transition: opacity 100ms ease-in-out;
+			color: black;
 		}
 		.status-over-time {
 			overflow: auto;
@@ -48,6 +55,9 @@
 			opacity: 0.5;
 			margin-top: 5px;
 		}
+		.status-min-max-ms {
+			overflow-x: hidden;
+		}
 		#tooltip {
 			position: fixed;
 			top: 0;
@@ -76,9 +86,16 @@
 	</style>
 </head>
 <body>
-	<div class="container my-3 rounded p-4 border shadow">
-		<div class="mb-3">
-			<div class="display-4">Health Status</div>
+	<div class="container my-3 rounded p-3 border shadow">
+		<div class="mb-2">
+			<div class="row">
+				<div class="col-8 text-left my-auto">
+					<div class="title display-4">Health Status</div>
+				</div>
+				<div class="col-4 text-right">
+					<img src="logo.png" alt="GaTuS" style="position: relative; min-width: 50px; max-width: 200px; width: 20%;"/>
+				</div>
+			</div>
 		</div>
 		<div id="results"></div>
 	</div>
@@ -136,9 +153,15 @@
 			let tooltipBoundingClientRect = document.querySelector('#tooltip').getBoundingClientRect();
 			if (targetLeftPosition + window.scrollX + tooltipBoundingClientRect.width + 50 > document.body.getBoundingClientRect().width) {
 				targetLeftPosition = element.getBoundingClientRect().x - tooltipBoundingClientRect.width + element.getBoundingClientRect().width;
+				if (targetLeftPosition < 0) {
+					targetLeftPosition += -targetLeftPosition;
+				}
 			}
-			if (targetTopPosition + window.scrollY + tooltipBoundingClientRect.height + 50 > document.body.getBoundingClientRect().height) {
-				targetTopPosition = element.getBoundingClientRect().y - (tooltipBoundingClientRect.height + 10)
+			if (targetTopPosition + window.scrollY + tooltipBoundingClientRect.height + 50 > document.body.getBoundingClientRect().height && targetTopPosition >= 0) {
+				targetTopPosition = element.getBoundingClientRect().y - (tooltipBoundingClientRect.height + 10);
+				if (targetTopPosition < 0) {
+					targetTopPosition = element.getBoundingClientRect().y + 30;
+				}
 			}
 			$("#tooltip").css({top: targetTopPosition + "px", left: targetLeftPosition + "px"});
 		}
@@ -160,62 +183,69 @@

 		function refreshResults() {
 			$.getJSON("/api/v1/results", function (data) {
-				serviceStatuses = data;
-				let output = "";
-				for (let serviceName in data) {
-					let serviceStatusOverTime = "";
-					let hostname = data[serviceName][data[serviceName].length-1].hostname
-					let minResponseTime = null;
-					let maxResponseTime = null;
-					let newestTimestamp = null;
-					let oldestTimestamp = null;
-					for (let key in data[serviceName]) {
-						let serviceResult = data[serviceName][key];
-						serviceStatusOverTime = createStatusBadge(serviceName, key, serviceResult.success) + serviceStatusOverTime;
-						const responseTime = parseInt(serviceResult.duration/1000000);
-						if (minResponseTime == null || minResponseTime > responseTime) {
-							minResponseTime = responseTime;
-						}
-						if (maxResponseTime == null || maxResponseTime < responseTime) {
-							maxResponseTime = responseTime;
-						}
-						const timestamp = new Date(serviceResult.timestamp);
-						if (newestTimestamp == null || newestTimestamp > timestamp) {
-							newestTimestamp = timestamp;
-						}
-						if (oldestTimestamp == null || oldestTimestamp < timestamp) {
-							oldestTimestamp = timestamp;
-						}
-					}
-					output += ""
-						+ "<div class='container py-3 border-left border-right border-top border-black'>"
-						+ "  <div class='row mb-2'>"
-						+ "    <div class='col-8'>"
-						+ "      <span class='font-weight-bold'>" + serviceName + "</span> <span class='text-secondary font-weight-lighter'>- " + hostname + "</span>"
-						+ "    </div>"
-						+ "    <div class='col-4 text-right'>"
-						+ "      <span class='font-weight-lighter'>" + (minResponseTime === maxResponseTime ? minResponseTime : (minResponseTime + "-" + maxResponseTime)) + "ms</span>"
-						+ "    </div>"
-						+ "  </div>"
-						+ "  <div class='row'>"
-						+ "    <div class='col-12 d-flex flex-row-reverse status-over-time'>"
-						+ "      " + serviceStatusOverTime
-						+ "    </div>"
-						+ "  </div>"
-						+ "  <div class='row status-time-ago'>"
-						+ "    <div class='col-6'>"
-						+ "      " + generatePrettyTimeAgo(newestTimestamp)
-						+ "    </div>"
-						+ "    <div class='col-6 text-right'>"
-						+ "      " + generatePrettyTimeAgo(oldestTimestamp)
-						+ "    </div>"
-						+ "  </div>"
-						+ "</div>";
+				// Update the table only if there's a change
+				if (JSON.stringify(serviceStatuses) !== JSON.stringify(data)) {
+					serviceStatuses = data;
+					buildTable();
 				}
-				$("#results").html(output);
 			});
 		}

+		function buildTable() {
+			let output = "";
+			for (let serviceName in serviceStatuses) {
+				let serviceStatusOverTime = "";
+				let hostname = serviceStatuses[serviceName][serviceStatuses[serviceName].length-1].hostname
+				let minResponseTime = null;
+				let maxResponseTime = null;
+				let newestTimestamp = null;
+				let oldestTimestamp = null;
+				for (let key in serviceStatuses[serviceName]) {
+					let serviceResult = serviceStatuses[serviceName][key];
+					serviceStatusOverTime = createStatusBadge(serviceName, key, serviceResult.success) + serviceStatusOverTime;
+					const responseTime = parseInt(serviceResult.duration/1000000);
+					if (minResponseTime == null || minResponseTime > responseTime) {
+						minResponseTime = responseTime;
+					}
+					if (maxResponseTime == null || maxResponseTime < responseTime) {
+						maxResponseTime = responseTime;
+					}
+					const timestamp = new Date(serviceResult.timestamp);
+					if (newestTimestamp == null || newestTimestamp < timestamp) {
+						newestTimestamp = timestamp;
+					}
+					if (oldestTimestamp == null || oldestTimestamp > timestamp) {
+						oldestTimestamp = timestamp;
+					}
+				}
+				output += ""
+					+ "<div class='container py-3 border-left border-right border-top border-black'>"
+					+ "  <div class='row mb-2'>"
+					+ "    <div class='col-md-10'>"
+					+ "      <span class='font-weight-bold'>" + serviceName + "</span> <span class='text-secondary font-weight-lighter'>- " + hostname + "</span>"
+					+ "    </div>"
+					+ "    <div class='col-md-2 text-right'>"
+					+ "      <span class='font-weight-lighter status-min-max-ms'>" + (minResponseTime === maxResponseTime ? minResponseTime : (minResponseTime + "-" + maxResponseTime)) + "ms</span>"
+					+ "    </div>"
+					+ "  </div>"
+					+ "  <div class='row'>"
+					+ "    <div class='col-12 d-flex flex-row-reverse status-over-time'>"
+					+ "      " + serviceStatusOverTime
+					+ "    </div>"
+					+ "  </div>"
+					+ "  <div class='row status-time-ago'>"
+					+ "    <div class='col-6'>"
+					+ "      " + generatePrettyTimeAgo(oldestTimestamp)
+					+ "    </div>"
+					+ "    <div class='col-6 text-right'>"
+					+ "      " + generatePrettyTimeAgo(newestTimestamp)
+					+ "    </div>"
+					+ "  </div>"
+					+ "</div>";
+			}
+			$("#results").html(output);
+		}
+
 		function prettifyTimestamp(timestamp) {
 			let date = new Date(timestamp);
 			let YYYY = date.getFullYear();
@@ -224,7 +254,7 @@
 			let hh = ((date.getHours())<10?"0":"")+""+(date.getHours());
 			let mm = ((date.getMinutes())<10?"0":"")+""+(date.getMinutes());
 			let ss = ((date.getSeconds())<10?"0":"")+""+(date.getSeconds());
-			return YYYY+"-"+MM+"-"+DD+" "+hh+":"+mm+":"+ss;
+			return YYYY + "-" + MM + "-" + DD + " " + hh + ":" + mm + ":" + ss;
 		}

 		function generatePrettyTimeAgo(t) {
--- a/static/logo-256px.png
+++ b/static/logo-256px.png
--- a/static/logo-candidate.png
+++ b/static/logo-candidate.png
--- a/static/logo-small-padding.png
+++ b/static/logo-small-padding.png
--- a/static/logo-with-name.png
+++ b/static/logo-with-name.png
--- a/static/logo.png
+++ b/static/logo.png
--- a/watchdog/watchdog.go
+++ b/watchdog/watchdog.go
@@ -1,6 +1,8 @@
 package watchdog

 import (
+	"encoding/json"
+	"fmt"
 	"github.com/TwinProduction/gatus/config"
 	"github.com/TwinProduction/gatus/core"
 	"github.com/TwinProduction/gatus/metric"
@@ -11,13 +13,25 @@ import (

 var (
 	serviceResults = make(map[string][]*core.Result)
-	rwLock         sync.RWMutex
+
+	// serviceResultsMutex is used to prevent concurrent map access
+	serviceResultsMutex sync.RWMutex
+
+	// monitoringMutex is used to prevent multiple services from being evaluated at the same time.
+	// Without this, conditions using response time may become inaccurate.
+	monitoringMutex sync.Mutex
 )

-func GetServiceResults() *map[string][]*core.Result {
-	return &serviceResults
+// GetJsonEncodedServiceResults returns a list of the last 20 results for each services encoded using json.Marshal.
+// The reason why the encoding is done here is because we use a mutex to prevent concurrent map access.
+func GetJsonEncodedServiceResults() ([]byte, error) {
+	serviceResultsMutex.RLock()
+	data, err := json.Marshal(serviceResults)
+	serviceResultsMutex.RUnlock()
+	return data, err
 }

+// Monitor loops over each services and starts a goroutine to monitor each services separately
 func Monitor(cfg *config.Config) {
 	for _, service := range cfg.Services {
 		go monitor(service)
@@ -26,26 +40,134 @@ func Monitor(cfg *config.Config) {
 	}
 }

+// monitor monitors a single service in a loop
 func monitor(service *core.Service) {
+	cfg := config.Get()
 	for {
 		// By placing the lock here, we prevent multiple services from being monitored at the exact same time, which
 		// could cause performance issues and return inaccurate results
-		rwLock.Lock()
-		log.Printf("[watchdog][Monitor] Monitoring serviceName=%s", service.Name)
+		monitoringMutex.Lock()
+		if cfg.Debug {
+			log.Printf("[watchdog][monitor] Monitoring serviceName=%s", service.Name)
+		}
 		result := service.EvaluateConditions()
 		metric.PublishMetricsForService(service, result)
+		serviceResultsMutex.Lock()
 		serviceResults[service.Name] = append(serviceResults[service.Name], result)
 		if len(serviceResults[service.Name]) > 20 {
 			serviceResults[service.Name] = serviceResults[service.Name][1:]
 		}
-		rwLock.Unlock()
+		serviceResultsMutex.Unlock()
+		var extra string
+		if !result.Success {
+			extra = fmt.Sprintf("responseBody=%s", result.Body)
+		}
 		log.Printf(
-			"[watchdog][Monitor] Finished monitoring serviceName=%s; errors=%d; requestDuration=%s",
+			"[watchdog][monitor] Monitored serviceName=%s; success=%v; errors=%d; requestDuration=%s; %s",
 			service.Name,
+			result.Success,
 			len(result.Errors),
 			result.Duration.Round(time.Millisecond),
+			extra,
 		)
-		log.Printf("[watchdog][Monitor] Waiting interval=%s before monitoring serviceName=%s", service.Interval, service.Name)
+		handleAlerting(service, result)
+		if cfg.Debug {
+			log.Printf("[watchdog][monitor] Waiting for interval=%s before monitoring serviceName=%s again", service.Interval, service.Name)
+		}
+		monitoringMutex.Unlock()
 		time.Sleep(service.Interval)
 	}
 }
+
+func handleAlerting(service *core.Service, result *core.Result) {
+	cfg := config.Get()
+	if cfg.Alerting == nil {
+		return
+	}
+	if result.Success {
+		if service.NumberOfFailuresInARow > 0 {
+			for _, alert := range service.Alerts {
+				if !alert.Enabled || !alert.SendOnResolved || alert.Threshold > service.NumberOfFailuresInARow {
+					continue
+				}
+				var alertProvider *core.CustomAlertProvider
+				if alert.Type == core.SlackAlert {
+					if len(cfg.Alerting.Slack) > 0 {
+						log.Printf("[watchdog][handleAlerting] Sending Slack alert because alert with description=%s has been resolved", alert.Description)
+						alertProvider = core.CreateSlackCustomAlertProvider(cfg.Alerting.Slack, service, alert, result, true)
+					} else {
+						log.Printf("[watchdog][handleAlerting] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
+					}
+				} else if alert.Type == core.TwilioAlert {
+					if cfg.Alerting.Twilio != nil && cfg.Alerting.Twilio.IsValid() {
+						log.Printf("[watchdog][handleAlerting] Sending Twilio alert because alert with description=%s has been resolved", alert.Description)
+						alertProvider = core.CreateTwilioCustomAlertProvider(cfg.Alerting.Twilio, fmt.Sprintf("RESOLVED: %s - %s", service.Name, alert.Description))
+					} else {
+						log.Printf("[watchdog][handleAlerting] Not sending Twilio alert despite being resolved, because Twilio isn't configured properly")
+					}
+				} else if alert.Type == core.CustomAlert {
+					if cfg.Alerting.Custom != nil && cfg.Alerting.Custom.IsValid() {
+						log.Printf("[watchdog][handleAlerting] Sending custom alert because alert with description=%s has been resolved", alert.Description)
+						alertProvider = &core.CustomAlertProvider{
+							Url:     cfg.Alerting.Custom.Url,
+							Method:  cfg.Alerting.Custom.Method,
+							Body:    cfg.Alerting.Custom.Body,
+							Headers: cfg.Alerting.Custom.Headers,
+						}
+					} else {
+						log.Printf("[watchdog][handleAlerting] Not sending custom alert despite being resolved, because the custom provider isn't configured properly")
+					}
+				}
+				if alertProvider != nil {
+					err := alertProvider.Send(service.Name, alert.Description, true)
+					if err != nil {
+						log.Printf("[watchdog][handleAlerting] Ran into error sending an alert: %s", err.Error())
+					}
+				}
+			}
+		}
+		service.NumberOfFailuresInARow = 0
+	} else {
+		service.NumberOfFailuresInARow++
+		for _, alert := range service.Alerts {
+			// If the alert hasn't been triggered, move to the next one
+			if !alert.Enabled || alert.Threshold != service.NumberOfFailuresInARow {
+				continue
+			}
+			var alertProvider *core.CustomAlertProvider
+			if alert.Type == core.SlackAlert {
+				if len(cfg.Alerting.Slack) > 0 {
+					log.Printf("[watchdog][handleAlerting] Sending Slack alert because alert with description=%s has been triggered", alert.Description)
+					alertProvider = core.CreateSlackCustomAlertProvider(cfg.Alerting.Slack, service, alert, result, false)
+				} else {
+					log.Printf("[watchdog][handleAlerting] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
+				}
+			} else if alert.Type == core.TwilioAlert {
+				if cfg.Alerting.Twilio != nil && cfg.Alerting.Twilio.IsValid() {
+					log.Printf("[watchdog][handleAlerting] Sending Twilio alert because alert with description=%s has been triggered", alert.Description)
+					alertProvider = core.CreateTwilioCustomAlertProvider(cfg.Alerting.Twilio, fmt.Sprintf("TRIGGERED: %s - %s", service.Name, alert.Description))
+				} else {
+					log.Printf("[watchdog][handleAlerting] Not sending Twilio alert despite being triggered, because Twilio config settings missing")
+				}
+			} else if alert.Type == core.CustomAlert {
+				if cfg.Alerting.Custom != nil && cfg.Alerting.Custom.IsValid() {
+					log.Printf("[watchdog][handleAlerting] Sending custom alert because alert with description=%s has been triggered", alert.Description)
+					alertProvider = &core.CustomAlertProvider{
+						Url:     cfg.Alerting.Custom.Url,
+						Method:  cfg.Alerting.Custom.Method,
+						Body:    cfg.Alerting.Custom.Body,
+						Headers: cfg.Alerting.Custom.Headers,
+					}
+				} else {
+					log.Printf("[watchdog][handleAlerting] Not sending custom alert despite being triggered, because there is no custom url configured")
+				}
+			}
+			if alertProvider != nil {
+				err := alertProvider.Send(service.Name, alert.Description, false)
+				if err != nil {
+					log.Printf("[watchdog][handleAlerting] Ran into error sending an alert: %s", err.Error())
+				}
+			}
+		}
+	}
+}
Author	SHA1	Message	Date
Christian C	4df1baf432	Merge pull request #10 from TwinProduction/notify-on-resolved Support sending an alert when an unhealthy service becomes healthy again	2020-09-04 22:23:47 -04:00
TwinProduction	5a7164b17d	Minor fix	2020-09-04 22:15:22 -04:00
TwinProduction	d4623f5c61	Add [ALERT_TRIGGERED_OR_RESOLVED] placeholder for custom alert provider Fix placeholder bug in CustomAlertProvider	2020-09-04 21:57:31 -04:00
TwinProduction	139e186ac2	Support sending notifications when alert is resolved Add debug parameter for those wishing to filter some noise from the logs	2020-09-04 21:31:28 -04:00
TwinProduction	8a0a2ef51f	Fix typo	2020-09-04 18:53:55 -04:00
TwinProduction	51ea912cf9	Start working on notifications when service is back to healthy (#9 )	2020-09-04 18:23:56 -04:00
Greg Holmes	db7c516819	Add support for Twilio alerts (#7 )	2020-09-04 17:43:14 -04:00
TwinProduction	f893c0ee7f	Fix failing tests due to new default interval (from 10s to 60s)	2020-09-01 12:46:23 -04:00
TwinProduction	0454854f04	Improve documentation	2020-09-01 00:29:17 -04:00
TwinProduction	42dd6a1e88	Remove unnecessarily ignored rules	2020-09-01 00:28:49 -04:00
TwinProduction	64a160923b	Update default interval to 60s	2020-09-01 00:25:57 -04:00
TwinProduction	dad09e780e	Add features section and cute image	2020-08-29 13:23:03 -04:00
TwinProduction	2cb1600f94	Fix typo	2020-08-28 01:03:52 -04:00
TwinProduction	37c4715453	Support custom alert provider	2020-08-27 22:23:21 -04:00
TwinProduction	4b57654592	Fix issue with tooltip overflowing at the top	2020-08-25 14:27:13 -04:00
TwinProduction	af6298de05	Add documentation for alerts	2020-08-22 14:15:44 -04:00
TwinProduction	22fef4e9aa	Add tests for alert configuration	2020-08-22 14:15:21 -04:00
TwinProduction	9a3c9e4d61	Set default alert threshold to 3	2020-08-22 14:15:08 -04:00
TwinProduction	62f7bdbd63	Add favicon.ico and logo-small-padding.png	2020-08-21 22:17:53 -04:00
TwinProduction	04d6c8bb82	Improve mobile-friendliness and add logo	2020-08-21 22:07:46 -04:00
TwinProduction	e1721fa237	Update Go to 1.15	2020-08-21 21:57:23 -04:00
TwinProduction	6f4cf69c4e	Implement Slack alerting (#2 )	2020-08-20 21:11:22 -04:00
TwinProduction	6596d253aa	Continue working on #2 : Slack alerts	2020-08-19 19:41:01 -04:00
TwinProduction	857fe5eb8c	Rename SendMessage to SendSlackMessage	2020-08-19 19:40:00 -04:00
TwinProduction	8abcab6a8f	Start working on #2 : Slack alerts	2020-08-18 22:24:00 -04:00
TwinProduction	0fd8bf4198	Add Go report card badge	2020-08-17 22:21:20 -04:00
TwinProduction	946101e995	Add documentation in watchdog.go	2020-08-17 20:25:29 -04:00
TwinProduction	f930687b4a	Clean up code for len() function	2020-08-16 15:19:53 -04:00
TwinProduction	43aa31be58	Add missing yaml identifier to enable code highlighting	2020-08-15 18:34:05 -04:00
TwinProduction	adfee25a22	Update interval in config.yaml	2020-08-15 16:59:05 -04:00
TwinProduction	1f241ecdb3	Support Gzip and cache result to prevent wasting CPU	2020-08-15 16:44:28 -04:00
TwinProduction	7849cc6dd4	Regenerate the table only if there's a change	2020-08-15 16:42:47 -04:00
TwinProduction	a62eab58ef	Update examples	2020-08-14 20:05:10 -04:00
TwinProduction	da92907873	Add support for getting the length of the string or the slice of a json path	2020-08-12 21:42:13 -04:00
TwinProduction	937b136e60	Update README.md	2020-07-24 18:38:35 -04:00
TwinProduction	12db0d7c40	Allocate more space for service name and host	2020-07-24 18:36:16 -04:00
TwinProduction	f50589e3c4	Add support for simple GraphQL requests	2020-07-24 16:45:51 -04:00