Compare commits

...

20 Commits

Author SHA1 Message Date
TwinProduction
9f8a74f5b1 Add subtle Github link in bottom right corner 2020-09-08 18:28:19 -04:00
azerty
2888254bda Add Docker Compose example (#11)
* Add Docker Compose example

* Revert README.md

* Move docker-compose sample to examples folder

* docker-compose folder has its own config.yaml
2020-09-06 12:37:31 -04:00
TwinProduction
e31c017a00 Stop using CDNs + Remove bootstrap.min.js because it wasn't necessary
+ Gzip http.FileServer
2020-09-06 00:55:01 -04:00
TwinProduction
ed4ed520b7 Update Docker instructions 2020-09-06 00:27:26 -04:00
TwinProduction
6e1f888e39 Improve test coverage 2020-09-05 00:01:12 -04:00
Christian C
4df1baf432 Merge pull request #10 from TwinProduction/notify-on-resolved
Support sending an alert when an unhealthy service becomes healthy again
2020-09-04 22:23:47 -04:00
TwinProduction
5a7164b17d Minor fix 2020-09-04 22:15:22 -04:00
TwinProduction
d4623f5c61 Add [ALERT_TRIGGERED_OR_RESOLVED] placeholder for custom alert provider
Fix placeholder bug in CustomAlertProvider
2020-09-04 21:57:31 -04:00
TwinProduction
139e186ac2 Support sending notifications when alert is resolved
Add debug parameter for those wishing to filter some noise from the logs
2020-09-04 21:31:28 -04:00
TwinProduction
8a0a2ef51f Fix typo 2020-09-04 18:53:55 -04:00
TwinProduction
51ea912cf9 Start working on notifications when service is back to healthy (#9) 2020-09-04 18:23:56 -04:00
Greg Holmes
db7c516819 Add support for Twilio alerts (#7) 2020-09-04 17:43:14 -04:00
TwinProduction
f893c0ee7f Fix failing tests due to new default interval (from 10s to 60s) 2020-09-01 12:46:23 -04:00
TwinProduction
0454854f04 Improve documentation 2020-09-01 00:29:17 -04:00
TwinProduction
42dd6a1e88 Remove unnecessarily ignored rules 2020-09-01 00:28:49 -04:00
TwinProduction
64a160923b Update default interval to 60s 2020-09-01 00:25:57 -04:00
TwinProduction
dad09e780e Add features section and cute image 2020-08-29 13:23:03 -04:00
TwinProduction
2cb1600f94 Fix typo 2020-08-28 01:03:52 -04:00
TwinProduction
37c4715453 Support custom alert provider 2020-08-27 22:23:21 -04:00
TwinProduction
4b57654592 Fix issue with tooltip overflowing at the top 2020-08-25 14:27:13 -04:00
23 changed files with 617 additions and 153 deletions

BIN
.github/assets/example.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

BIN
.github/assets/slack-alerts.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

4
.gitignore vendored
View File

@@ -1,4 +1,2 @@
bin
.idea
test.db
daily.db
.idea

200
README.md
View File

@@ -11,6 +11,33 @@ I personally deploy it in my Kubernetes cluster and have it monitor the status o
core applications: https://status.twinnation.org/
## Table of Contents
- [Features](#features)
- [Usage](#usage)
- [Configuration](#configuration)
- [Conditions](#conditions)
- [Docker](#docker)
- [Running the tests](#running-the-tests)
- [Using in Production](#using-in-production)
- [FAQ](#faq)
- [Sending a GraphQL request](#sending-a-graphql-request)
- [Configuring Slack alerts](#configuring-slack-alerts)
- [Configuring Twilio alerts](#configuring-twilio-alerts)
- [Configuring custom alerts](#configuring-custom-alerts)
## Features
The main features of Gatus are:
- **Highly flexible health check conditions**: While checking the response status may be enough for some use cases, Gatus goes much further and allows you to add conditions on the response time, the response body and even the IP address.
- **Ability to use Gatus for user acceptance tests**: Thanks to the point above, you can leverage this application to create automated user acceptance tests.
- **Very easy to configure**: Not only is the configuration designed to be as readable as possible, it's also extremely easy to add a new service or a new endpoint to monitor.
- **Alerting**: While having a pretty visual dashboard is useful to keep track of the state of your application(s), you probably don't want to stare at it all day. Thus, notifications via Slack are supported out of the box with the ability to configure a custom alerting provider for any needs you might have, whether it be a different provider like PagerDuty or a custom application that manages automated rollbacks.
- **Metrics**
- **Low resource consumption**: As with most Go applications, the resource footprint that this application requires is negligibly small.
## Usage
By default, the configuration file is expected to be at `config/config.yaml`.
@@ -23,74 +50,81 @@ Here's a simple example:
metrics: true # Whether to expose metrics at /metrics
services:
- name: twinnation # Name of your service, can be anything
url: https://twinnation.org/health
interval: 15s # Duration to wait between every status check (default: 10s)
url: "https://twinnation.org/health"
interval: 30s # Duration to wait between every status check (default: 60s)
conditions:
- "[STATUS] == 200" # Status must be 200
- "[BODY].status == UP" # The json path "$.status" must be equal to UP
- "[RESPONSE_TIME] < 300" # Response time must be under 300ms
- name: example
url: https://example.org/
url: "https://example.org/"
interval: 30s
conditions:
- "[STATUS] == 200"
```
Note that you can also add environment variables in the your configuration file (i.e. `$DOMAIN`, `${DOMAIN}`)
This example would look like this:
![Simple example](.github/assets/example.png)
Note that you can also add environment variables in the configuration file (i.e. `$DOMAIN`, `${DOMAIN}`)
### Configuration
| Parameter | Description | Default |
| --------------------------------- | --------------------------------------------------------------- | -------------- |
| `metrics` | Whether to expose metrics at /metrics | `false` |
| `alerting.slack` | Webhook to use for alerts of type `slack` | `""` |
| `services[].name` | Name of the service. Can be anything. | Required `""` |
| `services[].url` | URL to send the request to | Required `""` |
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
| `services[].interval` | Duration to wait between every status check | `10s` |
| `services[].method` | Request method | `GET` |
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
| `services[].body` | Request body | `""` |
| `services[].headers` | Request headers | `{}` |
| `services[].alerts[].type` | Type of alert. Currently, only `slack` is supported | Required `""` |
| `services[].alerts[].enabled` | Whether to enable the alert | `false` |
| `services[].alerts[].threshold` | Number of failures in a row needed before triggering the alert | `3` |
| `services[].alerts[].description` | Description of the alert. Will be included in the alert sent | `""` |
| Parameter | Description | Default |
| -------------------------------------- | --------------------------------------------------------------- | -------------- |
| `debug` | Whether to enable debug logs | `false` |
| `metrics` | Whether to expose metrics at /metrics | `false` |
| `services` | List of services to monitor | Required `[]` |
| `services[].name` | Name of the service. Can be anything. | Required `""` |
| `services[].url` | URL to send the request to | Required `""` |
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
| `services[].interval` | Duration to wait between every status check | `60s` |
| `services[].method` | Request method | `GET` |
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
| `services[].body` | Request body | `""` |
| `services[].headers` | Request headers | `{}` |
| `services[].alerts[].type` | Type of alert. Valid types: `slack`, `twilio`, `custom` | Required `""` |
| `services[].alerts[].enabled` | Whether to enable the alert | `false` |
| `services[].alerts[].threshold` | Number of failures in a row needed before triggering the alert | `3` |
| `services[].alerts[].description` | Description of the alert. Will be included in the alert sent | `""` |
| `services[].alerts[].send-on-resolved` | Whether to send a notification once a triggered alert subsides | `false` |
| `alerting` | Configuration for alerting | `{}` |
| `alerting.slack` | Webhook to use for alerts of type `slack` | `""` |
| `alerting.twilio` | Settings for alerts of type `twilio` | `""` |
| `alerting.twilio.sid` | Twilio account SID | Required `""` |
| `alerting.twilio.token` | Twilio auth token | Required `""` |
| `alerting.twilio.from` | Number to send Twilio alerts from | Required `""` |
| `alerting.twilio.to` | Number to send twilio alerts to | Required `""` |
| `alerting.custom` | Configuration for custom actions on failure or alerts | `""` |
| `alerting.custom.url` | Custom alerting request url | `""` |
| `alerting.custom.body` | Custom alerting request body. | `""` |
| `alerting.custom.headers` | Custom alerting request headers | `{}` |
### Conditions
Here are some examples of conditions you can use:
| Condition | Description | Passing values | Failing values |
| -----------------------------| ------------------------------------------------------- | ------------------------ | ----------------------- |
| `[STATUS] == 200` | Status must be equal to 200 | 200 | 201, 404, 500 |
| `[STATUS] < 300` | Status must lower than 300 | 200, 201, 299 | 301, 302, 400, 500 |
| `[STATUS] <= 299` | Status must be less than or equal to 299 | 200, 201, 299 | 301, 302, 400, 500 |
| `[STATUS] > 400` | Status must be greater than 400 | 401, 402, 403, 404 | 200, 201, 300, 400 |
| `[RESPONSE_TIME] < 500` | Response time must be below 500ms | 100ms, 200ms, 300ms | 500ms, 1500ms |
| `[BODY] == 1` | The body must be equal to 1 | 1 | literally anything else |
| `[BODY].data.id == 1` | The jsonpath `$.data.id` is equal to 1 | `{"data":{"id":1}}` | literally anything else |
| `[BODY].data[0].id == 1` | The jsonpath `$.data[0].id` is equal to 1 | `{"data":[{"id":1}]}` | literally anything else |
| `len([BODY].data) > 0` | Array at jsonpath `$.data` has less than 5 elements | `{"data":[{"id":1}]}` | `{"data":[{"id":1}]}` |
| `len([BODY].name) == 8` | String at jsonpath `$.name` has a length of 8 | `{"name":"john.doe"}` | `{"name":"bob"}` |
**NOTE**: `[BODY]` with JSON path (i.e. `[BODY].id == 1`) is currently in BETA. For the most part, the only thing that doesn't work is arrays.
| Condition | Description | Passing values | Failing values |
| -----------------------------| ------------------------------------------------------- | ------------------------ | -------------- |
| `[STATUS] == 200` | Status must be equal to 200 | 200 | 201, 404, ... |
| `[STATUS] < 300` | Status must lower than 300 | 200, 201, 299 | 301, 302, ... |
| `[STATUS] <= 299` | Status must be less than or equal to 299 | 200, 201, 299 | 301, 302, ... |
| `[STATUS] > 400` | Status must be greater than 400 | 401, 402, 403, 404 | 400, 200, ... |
| `[RESPONSE_TIME] < 500` | Response time must be below 500ms | 100ms, 200ms, 300ms | 500ms, 501ms |
| `[BODY] == 1` | The body must be equal to 1 | 1 | Anything else |
| `[BODY].data.id == 1` | The jsonpath `$.data.id` is equal to 1 | `{"data":{"id":1}}` | |
| `[BODY].data[0].id == 1` | The jsonpath `$.data[0].id` is equal to 1 | `{"data":[{"id":1}]}` | |
| `len([BODY].data) > 0` | Array at jsonpath `$.data` has less than 5 elements | `{"data":[{"id":1}]}` | |
| `len([BODY].name) == 8` | String at jsonpath `$.name` has a length of 8 | `{"name":"john.doe"}` | `{"name":"bob"}` |
## Docker
Building the Docker image is done as following:
```
docker build . -t gatus
```
You can then run the container with the following command:
```
docker run -p 8080:8080 --name gatus gatus
docker run -p 8080:8080 --name gatus twinproduction/gatus
```
@@ -145,21 +179,95 @@ will send a `POST` request to `http://localhost:8080/playground` with the follow
```yaml
alerting:
slack: https://hooks.slack.com/services/**********/**********/**********
slack: "https://hooks.slack.com/services/**********/**********/**********"
services:
- name: twinnation
interval: 30s
url: https://twinnation.org/health
url: "https://twinnation.org/health"
alerts:
- type: slack
enabled: true
description: "healthcheck failed 3 times in a row"
send-on-resolved: true
- type: slack
enabled: true
threshold: 5
description: "healthcheck failed 5 times in a row"
send-on-resolved: true
conditions:
- "[STATUS] == 200"
- "[BODY].status == UP"
- "[RESPONSE_TIME] < 300"
```
Here's an example of what the notifications look like:
![Slack notifications](.github/assets/slack-alerts.png)
### Configuring Twilio alerts
```yaml
alerting:
twilio:
sid: "..."
token: "..."
from: "+1-234-567-8901"
to: "+1-234-567-8901"
services:
- name: twinnation
interval: 30s
url: "https://twinnation.org/health"
alerts:
- type: twilio
enabled: true
threshold: 5
description: "healthcheck failed 5 times in a row"
conditions:
- "[STATUS] == 200"
- "[BODY].status == UP"
- "[RESPONSE_TIME] < 300"
```
```
### Configuring custom alerts
While they're called alerts, you can use this feature to call anything.
For instance, you could automate rollbacks by having an application that keeps tracks of new deployments, and by
leveraging Gatus, you could have Gatus call that application endpoint when a service starts failing. Your application
would then check if the service that started failing was recently deployed, and if it was, then automatically
roll it back.
The values `[ALERT_DESCRIPTION]` and `[SERVICE_NAME]` are automatically substituted for the alert description and the
service name respectively in the body (`alerting.custom.body`) as well as the url (`alerting.custom.url`).
If you have `send-on-resolved` set to `true`, you may want to use `[ALERT_TRIGGERED_OR_RESOLVED]` to differentiate
the notifications. It will be replaced for either `TRIGGERED` or `RESOLVED`, based on the situation.
For all intents and purpose, we'll configure the custom alert with a Slack webhook, but you can call anything you want.
```yaml
alerting:
custom:
url: "https://hooks.slack.com/services/**********/**********/**********"
method: "POST"
body: |
{
"text": "[ALERT_TRIGGERED_OR_RESOLVED]: [SERVICE_NAME] - [ALERT_DESCRIPTION]"
}
services:
- name: twinnation
interval: 30s
url: "https://twinnation.org/health"
alerts:
- type: custom
enabled: true
threshold: 10
send-on-resolved: true
description: "healthcheck failed 10 times in a row"
conditions:
- "[STATUS] == 200"
- "[BODY].status == UP"
- "[RESPONSE_TIME] < 300"
```

View File

@@ -1,31 +0,0 @@
package alerting
import (
"bytes"
"encoding/json"
"fmt"
"github.com/TwinProduction/gatus/client"
"io/ioutil"
)
type requestBody struct {
Text string `json:"text"`
}
// SendSlackMessage sends a message to the given Slack webhook
func SendSlackMessage(webhookUrl, service, description string) error {
body, _ := json.Marshal(requestBody{Text: fmt.Sprintf("*[Gatus]*\n*service:* %s\n*description:* %s", service, description)})
response, err := client.GetHttpClient().Post(webhookUrl, "application/json", bytes.NewBuffer(body))
if err != nil {
return err
}
defer response.Body.Close()
output, err := ioutil.ReadAll(response.Body)
if err != nil {
return fmt.Errorf("unable to read response body: %v", err.Error())
}
if string(output) != "ok" {
return fmt.Errorf("error: %s", string(output))
}
return nil
}

View File

@@ -9,7 +9,7 @@ services:
- "[RESPONSE_TIME] < 1000"
- name: twinnation-articles-api
interval: 30s
url: https://twinnation.org/api/v1/articles/24
url: "https://twinnation.org/api/v1/articles/24"
conditions:
- "[STATUS] == 200"
- "[BODY].id == 24"

View File

@@ -21,9 +21,10 @@ var (
)
type Config struct {
Metrics bool `yaml:"metrics"`
Alerting *core.Alerting `yaml:"alerting"`
Services []*core.Service `yaml:"services"`
Metrics bool `yaml:"metrics"`
Debug bool `yaml:"debug"`
Alerting *core.AlertingConfig `yaml:"alerting"`
Services []*core.Service `yaml:"services"`
}
func Get() *Config {

View File

@@ -40,8 +40,8 @@ services:
if config.Services[0].Interval != 15*time.Second {
t.Errorf("Interval should have been %s", 15*time.Second)
}
if config.Services[1].Interval != 10*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
if config.Services[1].Interval != 60*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
}
if len(config.Services[0].Conditions) != 1 {
t.Errorf("There should have been %d conditions", 1)
@@ -71,8 +71,8 @@ services:
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
}
if config.Services[0].Interval != 10*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
if config.Services[0].Interval != 60*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
}
}
@@ -97,8 +97,8 @@ services:
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
}
if config.Services[0].Interval != 10*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
if config.Services[0].Interval != 60*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
}
}
@@ -143,7 +143,7 @@ services:
t.Error("Metrics should've been false by default")
}
if config.Alerting == nil {
t.Fatal("config.Alerting shouldn't have been nil")
t.Fatal("config.AlertingConfig shouldn't have been nil")
}
if config.Alerting.Slack != "http://example.com" {
t.Errorf("Slack webhook should've been %s, but was %s", "http://example.com", config.Alerting.Slack)
@@ -154,8 +154,8 @@ services:
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
}
if config.Services[0].Interval != 10*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
if config.Services[0].Interval != 60*time.Second {
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
}
if config.Services[0].Alerts == nil {
t.Fatal("The service alerts shouldn't have been nil")

View File

@@ -13,10 +13,15 @@ type Alert struct {
// Description of the alert. Will be included in the alert sent.
Description string `yaml:"description"`
// SendOnResolved defines whether to send a second notification when the issue has been resolved
SendOnResolved bool `yaml:"send-on-resolved"`
}
type AlertType string
const (
SlackAlert AlertType = "slack"
SlackAlert AlertType = "slack"
TwilioAlert AlertType = "twilio"
CustomAlert AlertType = "custom"
)

View File

@@ -1,5 +1,149 @@
package core
type Alerting struct {
Slack string `yaml:"slack"`
import (
"bytes"
"encoding/base64"
"fmt"
"github.com/TwinProduction/gatus/client"
"net/http"
"net/url"
"strings"
)
type AlertingConfig struct {
Slack string `yaml:"slack"`
Twilio *TwilioAlertProvider `yaml:"twilio"`
Custom *CustomAlertProvider `yaml:"custom"`
}
type TwilioAlertProvider struct {
SID string `yaml:"sid"`
Token string `yaml:"token"`
From string `yaml:"from"`
To string `yaml:"to"`
}
func (provider *TwilioAlertProvider) IsValid() bool {
return len(provider.Token) > 0 && len(provider.SID) > 0 && len(provider.From) > 0 && len(provider.To) > 0
}
type CustomAlertProvider struct {
Url string `yaml:"url"`
Method string `yaml:"method,omitempty"`
Body string `yaml:"body,omitempty"`
Headers map[string]string `yaml:"headers,omitempty"`
}
func (provider *CustomAlertProvider) IsValid() bool {
return len(provider.Url) > 0
}
func (provider *CustomAlertProvider) buildRequest(serviceName, alertDescription string, resolved bool) *http.Request {
body := provider.Body
providerUrl := provider.Url
if strings.Contains(body, "[ALERT_DESCRIPTION]") {
body = strings.ReplaceAll(body, "[ALERT_DESCRIPTION]", alertDescription)
}
if strings.Contains(body, "[SERVICE_NAME]") {
body = strings.ReplaceAll(body, "[SERVICE_NAME]", serviceName)
}
if strings.Contains(body, "[ALERT_TRIGGERED_OR_RESOLVED]") {
if resolved {
body = strings.ReplaceAll(body, "[ALERT_TRIGGERED_OR_RESOLVED]", "RESOLVED")
} else {
body = strings.ReplaceAll(body, "[ALERT_TRIGGERED_OR_RESOLVED]", "TRIGGERED")
}
}
if strings.Contains(providerUrl, "[ALERT_DESCRIPTION]") {
providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_DESCRIPTION]", alertDescription)
}
if strings.Contains(providerUrl, "[SERVICE_NAME]") {
providerUrl = strings.ReplaceAll(providerUrl, "[SERVICE_NAME]", serviceName)
}
if strings.Contains(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]") {
if resolved {
providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]", "RESOLVED")
} else {
providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]", "TRIGGERED")
}
}
bodyBuffer := bytes.NewBuffer([]byte(body))
request, _ := http.NewRequest(provider.Method, providerUrl, bodyBuffer)
for k, v := range provider.Headers {
request.Header.Set(k, v)
}
return request
}
func (provider *CustomAlertProvider) Send(serviceName, alertDescription string, resolved bool) error {
request := provider.buildRequest(serviceName, alertDescription, resolved)
response, err := client.GetHttpClient().Do(request)
if err != nil {
return err
}
if response.StatusCode > 399 {
return fmt.Errorf("call to provider alert returned status code %d", response.StatusCode)
}
return nil
}
func CreateSlackCustomAlertProvider(slackWebHookUrl string, service *Service, alert *Alert, result *Result, resolved bool) *CustomAlertProvider {
var message string
var color string
if resolved {
message = fmt.Sprintf("An alert for *%s* has been resolved after %d failures in a row", service.Name, service.NumberOfFailuresInARow)
color = "#36A64F"
} else {
message = fmt.Sprintf("An alert for *%s* has been triggered", service.Name)
color = "#DD0000"
}
var results string
for _, conditionResult := range result.ConditionResults {
var prefix string
if conditionResult.Success {
prefix = ":heavy_check_mark:"
} else {
prefix = ":x:"
}
results += fmt.Sprintf("%s - `%s`\n", prefix, conditionResult.Condition)
}
return &CustomAlertProvider{
Url: slackWebHookUrl,
Method: "POST",
Body: fmt.Sprintf(`{
"text": "",
"attachments": [
{
"title": ":helmet_with_white_cross: Gatus",
"text": "%s:\n> %s",
"short": false,
"color": "%s",
"fields": [
{
"title": "Condition results",
"value": "%s",
"short": false
}
]
},
]
}`, message, alert.Description, color, results),
Headers: map[string]string{"Content-Type": "application/json"},
}
}
func CreateTwilioCustomAlertProvider(provider *TwilioAlertProvider, message string) *CustomAlertProvider {
return &CustomAlertProvider{
Url: fmt.Sprintf("https://api.twilio.com/2010-04-01/Accounts/%s/Messages.json", provider.SID),
Method: "POST",
Body: url.Values{
"To": {provider.To},
"From": {provider.From},
"Body": {message},
}.Encode(),
Headers: map[string]string{
"Content-Type": "application/x-www-form-urlencoded",
"Authorization": fmt.Sprintf("Basic %s", base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%s:%s", provider.SID, provider.Token)))),
},
}
}

48
core/alerting_test.go Normal file
View File

@@ -0,0 +1,48 @@
package core
import (
"io/ioutil"
"testing"
)
func TestCustomAlertProvider_buildRequestWhenResolved(t *testing.T) {
const (
ExpectedUrl = "http://example.com/service-name"
ExpectedBody = "service-name,alert-description,RESOLVED"
)
customAlertProvider := &CustomAlertProvider{
Url: "http://example.com/[SERVICE_NAME]",
Method: "GET",
Body: "[SERVICE_NAME],[ALERT_DESCRIPTION],[ALERT_TRIGGERED_OR_RESOLVED]",
Headers: nil,
}
request := customAlertProvider.buildRequest("service-name", "alert-description", true)
if request.URL.String() != ExpectedUrl {
t.Error("expected URL to be", ExpectedUrl, "was", request.URL.String())
}
body, _ := ioutil.ReadAll(request.Body)
if string(body) != ExpectedBody {
t.Error("expected body to be", ExpectedBody, "was", string(body))
}
}
func TestCustomAlertProvider_buildRequestWhenTriggered(t *testing.T) {
const (
ExpectedUrl = "http://example.com/service-name"
ExpectedBody = "service-name,alert-description,TRIGGERED"
)
customAlertProvider := &CustomAlertProvider{
Url: "http://example.com/[SERVICE_NAME]",
Method: "GET",
Body: "[SERVICE_NAME],[ALERT_DESCRIPTION],[ALERT_TRIGGERED_OR_RESOLVED]",
Headers: nil,
}
request := customAlertProvider.buildRequest("service-name", "alert-description", false)
if request.URL.String() != ExpectedUrl {
t.Error("expected URL to be", ExpectedUrl, "was", request.URL.String())
}
body, _ := ioutil.ReadAll(request.Body)
if string(body) != ExpectedBody {
t.Error("expected body to be", ExpectedBody, "was", string(body))
}
}

View File

@@ -17,24 +17,42 @@ var (
ErrNoUrl = errors.New("you must specify an url for each service")
)
// Service is the configuration of a monitored endpoint
type Service struct {
Name string `yaml:"name"`
Url string `yaml:"url"`
Method string `yaml:"method,omitempty"`
Body string `yaml:"body,omitempty"`
GraphQL bool `yaml:"graphql,omitempty"`
Headers map[string]string `yaml:"headers,omitempty"`
Interval time.Duration `yaml:"interval,omitempty"`
Conditions []*Condition `yaml:"conditions"`
Alerts []*Alert `yaml:"alerts"`
// Name of the service. Can be anything.
Name string `yaml:"name"`
numberOfFailuresInARow int
// URL to send the request to
Url string `yaml:"url"`
// Method of the request made to the url of the service
Method string `yaml:"method,omitempty"`
// Body of the request
Body string `yaml:"body,omitempty"`
// GraphQL is whether to wrap the body in a query param ({"query":"$body"})
GraphQL bool `yaml:"graphql,omitempty"`
// Headers of the request
Headers map[string]string `yaml:"headers,omitempty"`
// Interval is the duration to wait between every status check
Interval time.Duration `yaml:"interval,omitempty"`
// Conditions used to determine the health of the service
Conditions []*Condition `yaml:"conditions"`
// Alerts is the alerting configuration for the service in case of failure
Alerts []*Alert `yaml:"alerts"`
NumberOfFailuresInARow int
}
func (service *Service) Validate() {
// Set default values
if service.Interval == 0 {
service.Interval = 10 * time.Second
service.Interval = 1 * time.Minute
}
if len(service.Method) == 0 {
service.Method = http.MethodGet
@@ -76,22 +94,16 @@ func (service *Service) EvaluateConditions() *Result {
}
}
result.Timestamp = time.Now()
if result.Success {
service.numberOfFailuresInARow = 0
// TODO: Send notification that alert has been resolved?
} else {
service.numberOfFailuresInARow++
}
return result
}
func (service *Service) GetAlertsTriggered() []Alert {
var alerts []Alert
if service.numberOfFailuresInARow == 0 {
if service.NumberOfFailuresInARow == 0 {
return alerts
}
for _, alert := range service.Alerts {
if alert.Enabled && alert.Threshold == service.numberOfFailuresInARow {
if alert.Enabled && alert.Threshold == service.NumberOfFailuresInARow {
alerts = append(alerts, *alert)
continue
}

View File

@@ -0,0 +1,6 @@
services:
- name: example
url: http://example.org
interval: 30s
conditions:
- "[STATUS] == 200"

View File

@@ -0,0 +1,8 @@
version: "3.8"
services:
gatus:
image: twinproduction/gatus:latest
ports:
- 8080:8080
volumes:
- ./config.yaml:/config/config.yaml

1
go.sum
View File

@@ -18,6 +18,7 @@ github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/me
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.2 h1:6nsPYzhq5kReh6QImI3k5qWzO4PEbvbIW2cwSfR/6xs=
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=

46
gzip.go Normal file
View File

@@ -0,0 +1,46 @@
package main
import (
"compress/gzip"
"io"
"io/ioutil"
"net/http"
"strings"
"sync"
)
var gzPool = sync.Pool{
New: func() interface{} {
return gzip.NewWriter(ioutil.Discard)
},
}
type gzipResponseWriter struct {
io.Writer
http.ResponseWriter
}
func (w *gzipResponseWriter) WriteHeader(status int) {
w.Header().Del("Content-Length")
w.ResponseWriter.WriteHeader(status)
}
func (w *gzipResponseWriter) Write(b []byte) (int, error) {
return w.Writer.Write(b)
}
func GzipHandler(next http.Handler) http.Handler {
return http.HandlerFunc(func(writer http.ResponseWriter, r *http.Request) {
// If the request doesn't specify that it supports gzip, then don't compress it
if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
next.ServeHTTP(writer, r)
return
}
writer.Header().Set("Content-Encoding", "gzip")
gz := gzPool.Get().(*gzip.Writer)
defer gzPool.Put(gz)
gz.Reset(writer)
defer gz.Close()
next.ServeHTTP(&gzipResponseWriter{ResponseWriter: writer, Writer: gz}, r)
})
}

10
main.go
View File

@@ -3,7 +3,6 @@ package main
import (
"bytes"
"compress/gzip"
"encoding/json"
"github.com/TwinProduction/gatus/config"
"github.com/TwinProduction/gatus/watchdog"
"github.com/prometheus/client_golang/prometheus/promhttp"
@@ -26,7 +25,7 @@ func main() {
cfg := loadConfiguration()
http.HandleFunc("/api/v1/results", serviceResultsHandler)
http.HandleFunc("/health", healthHandler)
http.Handle("/", http.FileServer(http.Dir("./static")))
http.Handle("/", GzipHandler(http.FileServer(http.Dir("./static"))))
if cfg.Metrics {
http.Handle("/metrics", promhttp.Handler())
}
@@ -53,12 +52,11 @@ func serviceResultsHandler(writer http.ResponseWriter, r *http.Request) {
if isExpired := cachedServiceResultsTimestamp.IsZero() || time.Now().Sub(cachedServiceResultsTimestamp) > CacheTTL; isExpired {
buffer := &bytes.Buffer{}
gzipWriter := gzip.NewWriter(buffer)
serviceResults := watchdog.GetServiceResults()
data, err := json.Marshal(serviceResults)
data, err := watchdog.GetJsonEncodedServiceResults()
if err != nil {
log.Printf("[main][serviceResultsHandler] Unable to marshall object to JSON: %s", err.Error())
log.Printf("[main][serviceResultsHandler] Unable to marshal object to JSON: %s", err.Error())
writer.WriteHeader(http.StatusInternalServerError)
_, _ = writer.Write([]byte("Unable to marshall object to JSON"))
_, _ = writer.Write([]byte("Unable to marshal object to JSON"))
return
}
gzipWriter.Write(data)

6
static/bootstrap.min.css vendored Normal file

File diff suppressed because one or more lines are too long

BIN
static/github.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 KiB

View File

@@ -3,7 +3,7 @@
<head>
<title>Health Dashboard</title>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<link rel="stylesheet" href="/bootstrap.min.css" />
<style>
html, body {
background-color: #f7f9fb;
@@ -22,13 +22,6 @@
border-color: #dee2e6;
border-style: solid;
}
.status-ok {
display: inline-block;
width: 1%;
height: 20px;
margin-right: 4px;
background-color: #28a745;
}
.status {
cursor: pointer;
transition: all 500ms ease-in-out;
@@ -83,6 +76,20 @@
#tooltip>.tooltip-title:first-child {
margin-top: 0;
}
#social {
position: fixed;
right: 5px;
bottom: 5px;
padding: 5px;
margin: 0;
z-index: 100;
}
#social img {
opacity: 0.3;
}
#social img:hover {
opacity: 1;
}
</style>
</head>
<body>
@@ -93,7 +100,7 @@
<div class="title display-4">Health Status</div>
</div>
<div class="col-4 text-right">
<img src="logo.png" alt="GaTuS" style="position: relative; min-width: 50px; max-width: 200px; width: 20%;"/>
<img src="logo.png" alt="Gatus" style="position: relative; min-width: 50px; max-width: 200px; width: 20%;"/>
</div>
</div>
</div>
@@ -113,8 +120,13 @@
</div>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script>
<script src="/jquery.min.js"></script>
<div id="social">
<a href="https://github.com/TwinProduction/gatus" target="_blank" title="Gatus on GitHub">
<img src="/github.png" alt="GitHub" width="32" height="auto" />
</a>
</div>
<script>
let serviceStatuses = {};
@@ -157,8 +169,11 @@
targetLeftPosition += -targetLeftPosition;
}
}
if (targetTopPosition + window.scrollY + tooltipBoundingClientRect.height + 50 > document.body.getBoundingClientRect().height) {
targetTopPosition = element.getBoundingClientRect().y - (tooltipBoundingClientRect.height + 10)
if (targetTopPosition + window.scrollY + tooltipBoundingClientRect.height + 50 > document.body.getBoundingClientRect().height && targetTopPosition >= 0) {
targetTopPosition = element.getBoundingClientRect().y - (tooltipBoundingClientRect.height + 10);
if (targetTopPosition < 0) {
targetTopPosition = element.getBoundingClientRect().y + 30;
}
}
$("#tooltip").css({top: targetTopPosition + "px", left: targetLeftPosition + "px"});
}

2
static/jquery.min.js vendored Normal file

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

View File

@@ -1,8 +1,8 @@
package watchdog
import (
"encoding/json"
"fmt"
"github.com/TwinProduction/gatus/alerting"
"github.com/TwinProduction/gatus/config"
"github.com/TwinProduction/gatus/core"
"github.com/TwinProduction/gatus/metric"
@@ -13,12 +13,22 @@ import (
var (
serviceResults = make(map[string][]*core.Result)
rwLock sync.RWMutex
// serviceResultsMutex is used to prevent concurrent map access
serviceResultsMutex sync.RWMutex
// monitoringMutex is used to prevent multiple services from being evaluated at the same time.
// Without this, conditions using response time may become inaccurate.
monitoringMutex sync.Mutex
)
// GetServiceResults returns a list of the last 20 results for each services
func GetServiceResults() *map[string][]*core.Result {
return &serviceResults
// GetJsonEncodedServiceResults returns a list of the last 20 results for each services encoded using json.Marshal.
// The reason why the encoding is done here is because we use a mutex to prevent concurrent map access.
func GetJsonEncodedServiceResults() ([]byte, error) {
serviceResultsMutex.RLock()
data, err := json.Marshal(serviceResults)
serviceResultsMutex.RUnlock()
return data, err
}
// Monitor loops over each services and starts a goroutine to monitor each services separately
@@ -32,45 +42,132 @@ func Monitor(cfg *config.Config) {
// monitor monitors a single service in a loop
func monitor(service *core.Service) {
cfg := config.Get()
for {
// By placing the lock here, we prevent multiple services from being monitored at the exact same time, which
// could cause performance issues and return inaccurate results
rwLock.Lock()
log.Printf("[watchdog][monitor] Monitoring serviceName=%s", service.Name)
monitoringMutex.Lock()
if cfg.Debug {
log.Printf("[watchdog][monitor] Monitoring serviceName=%s", service.Name)
}
result := service.EvaluateConditions()
metric.PublishMetricsForService(service, result)
serviceResultsMutex.Lock()
serviceResults[service.Name] = append(serviceResults[service.Name], result)
if len(serviceResults[service.Name]) > 20 {
serviceResults[service.Name] = serviceResults[service.Name][1:]
}
rwLock.Unlock()
serviceResultsMutex.Unlock()
var extra string
if !result.Success {
extra = fmt.Sprintf("responseBody=%s", result.Body)
}
log.Printf(
"[watchdog][monitor] Finished monitoring serviceName=%s; errors=%d; requestDuration=%s; %s",
"[watchdog][monitor] Monitored serviceName=%s; success=%v; errors=%d; requestDuration=%s; %s",
service.Name,
result.Success,
len(result.Errors),
result.Duration.Round(time.Millisecond),
extra,
)
handleAlerting(service, result)
if cfg.Debug {
log.Printf("[watchdog][monitor] Waiting for interval=%s before monitoring serviceName=%s again", service.Interval, service.Name)
}
monitoringMutex.Unlock()
time.Sleep(service.Interval)
}
}
cfg := config.Get()
if cfg.Alerting != nil {
for _, alertTriggered := range service.GetAlertsTriggered() {
if alertTriggered.Type == core.SlackAlert {
func handleAlerting(service *core.Service, result *core.Result) {
cfg := config.Get()
if cfg.Alerting == nil {
return
}
if result.Success {
if service.NumberOfFailuresInARow > 0 {
for _, alert := range service.Alerts {
if !alert.Enabled || !alert.SendOnResolved || alert.Threshold > service.NumberOfFailuresInARow {
continue
}
var alertProvider *core.CustomAlertProvider
if alert.Type == core.SlackAlert {
if len(cfg.Alerting.Slack) > 0 {
log.Printf("[watchdog][monitor] Sending Slack alert because alert with description=%s has been triggered", alertTriggered.Description)
alerting.SendSlackMessage(cfg.Alerting.Slack, service.Name, alertTriggered.Description)
log.Printf("[watchdog][handleAlerting] Sending Slack alert because alert with description=%s has been resolved", alert.Description)
alertProvider = core.CreateSlackCustomAlertProvider(cfg.Alerting.Slack, service, alert, result, true)
} else {
log.Printf("[watchdog][monitor] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
log.Printf("[watchdog][handleAlerting] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
}
} else if alert.Type == core.TwilioAlert {
if cfg.Alerting.Twilio != nil && cfg.Alerting.Twilio.IsValid() {
log.Printf("[watchdog][handleAlerting] Sending Twilio alert because alert with description=%s has been resolved", alert.Description)
alertProvider = core.CreateTwilioCustomAlertProvider(cfg.Alerting.Twilio, fmt.Sprintf("RESOLVED: %s - %s", service.Name, alert.Description))
} else {
log.Printf("[watchdog][handleAlerting] Not sending Twilio alert despite being resolved, because Twilio isn't configured properly")
}
} else if alert.Type == core.CustomAlert {
if cfg.Alerting.Custom != nil && cfg.Alerting.Custom.IsValid() {
log.Printf("[watchdog][handleAlerting] Sending custom alert because alert with description=%s has been resolved", alert.Description)
alertProvider = &core.CustomAlertProvider{
Url: cfg.Alerting.Custom.Url,
Method: cfg.Alerting.Custom.Method,
Body: cfg.Alerting.Custom.Body,
Headers: cfg.Alerting.Custom.Headers,
}
} else {
log.Printf("[watchdog][handleAlerting] Not sending custom alert despite being resolved, because the custom provider isn't configured properly")
}
}
if alertProvider != nil {
err := alertProvider.Send(service.Name, alert.Description, true)
if err != nil {
log.Printf("[watchdog][handleAlerting] Ran into error sending an alert: %s", err.Error())
}
}
}
}
log.Printf("[watchdog][monitor] Waiting for interval=%s before monitoring serviceName=%s", service.Interval, service.Name)
time.Sleep(service.Interval)
service.NumberOfFailuresInARow = 0
} else {
service.NumberOfFailuresInARow++
for _, alert := range service.Alerts {
// If the alert hasn't been triggered, move to the next one
if !alert.Enabled || alert.Threshold != service.NumberOfFailuresInARow {
continue
}
var alertProvider *core.CustomAlertProvider
if alert.Type == core.SlackAlert {
if len(cfg.Alerting.Slack) > 0 {
log.Printf("[watchdog][handleAlerting] Sending Slack alert because alert with description=%s has been triggered", alert.Description)
alertProvider = core.CreateSlackCustomAlertProvider(cfg.Alerting.Slack, service, alert, result, false)
} else {
log.Printf("[watchdog][handleAlerting] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
}
} else if alert.Type == core.TwilioAlert {
if cfg.Alerting.Twilio != nil && cfg.Alerting.Twilio.IsValid() {
log.Printf("[watchdog][handleAlerting] Sending Twilio alert because alert with description=%s has been triggered", alert.Description)
alertProvider = core.CreateTwilioCustomAlertProvider(cfg.Alerting.Twilio, fmt.Sprintf("TRIGGERED: %s - %s", service.Name, alert.Description))
} else {
log.Printf("[watchdog][handleAlerting] Not sending Twilio alert despite being triggered, because Twilio config settings missing")
}
} else if alert.Type == core.CustomAlert {
if cfg.Alerting.Custom != nil && cfg.Alerting.Custom.IsValid() {
log.Printf("[watchdog][handleAlerting] Sending custom alert because alert with description=%s has been triggered", alert.Description)
alertProvider = &core.CustomAlertProvider{
Url: cfg.Alerting.Custom.Url,
Method: cfg.Alerting.Custom.Method,
Body: cfg.Alerting.Custom.Body,
Headers: cfg.Alerting.Custom.Headers,
}
} else {
log.Printf("[watchdog][handleAlerting] Not sending custom alert despite being triggered, because there is no custom url configured")
}
}
if alertProvider != nil {
err := alertProvider.Send(service.Name, alert.Description, false)
if err != nil {
log.Printf("[watchdog][handleAlerting] Ran into error sending an alert: %s", err.Error())
}
}
}
}
}