Compare commits
13 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4df1baf432 | ||
|
|
5a7164b17d | ||
|
|
d4623f5c61 | ||
|
|
139e186ac2 | ||
|
|
8a0a2ef51f | ||
|
|
51ea912cf9 | ||
|
|
db7c516819 | ||
|
|
f893c0ee7f | ||
|
|
0454854f04 | ||
|
|
42dd6a1e88 | ||
|
|
64a160923b | ||
|
|
dad09e780e | ||
|
|
2cb1600f94 |
BIN
.github/assets/example.png
vendored
Normal file
BIN
.github/assets/example.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 25 KiB |
BIN
.github/assets/slack-alerts.png
vendored
Normal file
BIN
.github/assets/slack-alerts.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 34 KiB |
4
.gitignore
vendored
4
.gitignore
vendored
@@ -1,4 +1,2 @@
|
|||||||
bin
|
bin
|
||||||
.idea
|
.idea
|
||||||
test.db
|
|
||||||
daily.db
|
|
||||||
143
README.md
143
README.md
@@ -13,6 +13,7 @@ core applications: https://status.twinnation.org/
|
|||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Features](#features)
|
||||||
- [Usage](#usage)
|
- [Usage](#usage)
|
||||||
- [Configuration](#configuration)
|
- [Configuration](#configuration)
|
||||||
- [Conditions](#conditions)
|
- [Conditions](#conditions)
|
||||||
@@ -22,7 +23,19 @@ core applications: https://status.twinnation.org/
|
|||||||
- [FAQ](#faq)
|
- [FAQ](#faq)
|
||||||
- [Sending a GraphQL request](#sending-a-graphql-request)
|
- [Sending a GraphQL request](#sending-a-graphql-request)
|
||||||
- [Configuring Slack alerts](#configuring-slack-alerts)
|
- [Configuring Slack alerts](#configuring-slack-alerts)
|
||||||
- [Configuring custom alert](#configuring-custom-alerts)
|
- [Configuring Twilio alerts](#configuring-twilio-alerts)
|
||||||
|
- [Configuring custom alerts](#configuring-custom-alerts)
|
||||||
|
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
The main features of Gatus are:
|
||||||
|
- **Highly flexible health check conditions**: While checking the response status may be enough for some use cases, Gatus goes much further and allows you to add conditions on the response time, the response body and even the IP address.
|
||||||
|
- **Ability to use Gatus for user acceptance tests**: Thanks to the point above, you can leverage this application to create automated user acceptance tests.
|
||||||
|
- **Very easy to configure**: Not only is the configuration designed to be as readable as possible, it's also extremely easy to add a new service or a new endpoint to monitor.
|
||||||
|
- **Alerting**: While having a pretty visual dashboard is useful to keep track of the state of your application(s), you probably don't want to stare at it all day. Thus, notifications via Slack are supported out of the box with the ability to configure a custom alerting provider for any needs you might have, whether it be a different provider like PagerDuty or a custom application that manages automated rollbacks.
|
||||||
|
- **Metrics**
|
||||||
|
- **Low resource consumption**: As with most Go applications, the resource footprint that this application requires is negligibly small.
|
||||||
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
@@ -38,7 +51,7 @@ metrics: true # Whether to expose metrics at /metrics
|
|||||||
services:
|
services:
|
||||||
- name: twinnation # Name of your service, can be anything
|
- name: twinnation # Name of your service, can be anything
|
||||||
url: "https://twinnation.org/health"
|
url: "https://twinnation.org/health"
|
||||||
interval: 30s # Duration to wait between every status check (default: 10s)
|
interval: 30s # Duration to wait between every status check (default: 60s)
|
||||||
conditions:
|
conditions:
|
||||||
- "[STATUS] == 200" # Status must be 200
|
- "[STATUS] == 200" # Status must be 200
|
||||||
- "[BODY].status == UP" # The json path "$.status" must be equal to UP
|
- "[BODY].status == UP" # The json path "$.status" must be equal to UP
|
||||||
@@ -50,58 +63,67 @@ services:
|
|||||||
- "[STATUS] == 200"
|
- "[STATUS] == 200"
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that you can also add environment variables in the your configuration file (i.e. `$DOMAIN`, `${DOMAIN}`)
|
This example would look like this:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Note that you can also add environment variables in the configuration file (i.e. `$DOMAIN`, `${DOMAIN}`)
|
||||||
|
|
||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
| Parameter | Description | Default |
|
| Parameter | Description | Default |
|
||||||
| --------------------------------- | --------------------------------------------------------------- | -------------- |
|
| -------------------------------------- | --------------------------------------------------------------- | -------------- |
|
||||||
| `metrics` | Whether to expose metrics at /metrics | `false` |
|
| `debug` | Whether to enable debug logs | `false` |
|
||||||
| `services` | List of services to monitor | Required `[]` |
|
| `metrics` | Whether to expose metrics at /metrics | `false` |
|
||||||
| `services[].name` | Name of the service. Can be anything. | Required `""` |
|
| `services` | List of services to monitor | Required `[]` |
|
||||||
| `services[].url` | URL to send the request to | Required `""` |
|
| `services[].name` | Name of the service. Can be anything. | Required `""` |
|
||||||
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
|
| `services[].url` | URL to send the request to | Required `""` |
|
||||||
| `services[].interval` | Duration to wait between every status check | `10s` |
|
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
|
||||||
| `services[].method` | Request method | `GET` |
|
| `services[].interval` | Duration to wait between every status check | `60s` |
|
||||||
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
|
| `services[].method` | Request method | `GET` |
|
||||||
| `services[].body` | Request body | `""` |
|
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
|
||||||
| `services[].headers` | Request headers | `{}` |
|
| `services[].body` | Request body | `""` |
|
||||||
| `services[].alerts[].type` | Type of alert. Valid types: `slack`, `custom` | Required `""` |
|
| `services[].headers` | Request headers | `{}` |
|
||||||
| `services[].alerts[].enabled` | Whether to enable the alert | `false` |
|
| `services[].alerts[].type` | Type of alert. Valid types: `slack`, `twilio`, `custom` | Required `""` |
|
||||||
| `services[].alerts[].threshold` | Number of failures in a row needed before triggering the alert | `3` |
|
| `services[].alerts[].enabled` | Whether to enable the alert | `false` |
|
||||||
| `services[].alerts[].description` | Description of the alert. Will be included in the alert sent | `""` |
|
| `services[].alerts[].threshold` | Number of failures in a row needed before triggering the alert | `3` |
|
||||||
| `alerting` | Configuration for alerting | `{}` |
|
| `services[].alerts[].description` | Description of the alert. Will be included in the alert sent | `""` |
|
||||||
| `alerting.slack` | Webhook to use for alerts of type `slack` | `""` |
|
| `services[].alerts[].send-on-resolved` | Whether to send a notification once a triggered alert subsides | `false` |
|
||||||
| `alerting.custom` | Configuration for custom actions on failure or alerts | `""` |
|
| `alerting` | Configuration for alerting | `{}` |
|
||||||
| `alerting.custom.url` | Custom alerting request url | `""` |
|
| `alerting.slack` | Webhook to use for alerts of type `slack` | `""` |
|
||||||
| `alerting.custom.body` | Custom alerting request body. | `""` |
|
| `alerting.twilio` | Settings for alerts of type `twilio` | `""` |
|
||||||
| `alerting.custom.headers` | Custom alerting request headers | `{}` |
|
| `alerting.twilio.sid` | Twilio account SID | Required `""` |
|
||||||
|
| `alerting.twilio.token` | Twilio auth token | Required `""` |
|
||||||
|
| `alerting.twilio.from` | Number to send Twilio alerts from | Required `""` |
|
||||||
|
| `alerting.twilio.to` | Number to send twilio alerts to | Required `""` |
|
||||||
|
| `alerting.custom` | Configuration for custom actions on failure or alerts | `""` |
|
||||||
|
| `alerting.custom.url` | Custom alerting request url | `""` |
|
||||||
|
| `alerting.custom.body` | Custom alerting request body. | `""` |
|
||||||
|
| `alerting.custom.headers` | Custom alerting request headers | `{}` |
|
||||||
|
|
||||||
|
|
||||||
### Conditions
|
### Conditions
|
||||||
|
|
||||||
Here are some examples of conditions you can use:
|
Here are some examples of conditions you can use:
|
||||||
|
|
||||||
| Condition | Description | Passing values | Failing values |
|
| Condition | Description | Passing values | Failing values |
|
||||||
| -----------------------------| ------------------------------------------------------- | ------------------------ | ----------------------- |
|
| -----------------------------| ------------------------------------------------------- | ------------------------ | -------------- |
|
||||||
| `[STATUS] == 200` | Status must be equal to 200 | 200 | 201, 404, 500 |
|
| `[STATUS] == 200` | Status must be equal to 200 | 200 | 201, 404, ... |
|
||||||
| `[STATUS] < 300` | Status must lower than 300 | 200, 201, 299 | 301, 302, 400, 500 |
|
| `[STATUS] < 300` | Status must lower than 300 | 200, 201, 299 | 301, 302, ... |
|
||||||
| `[STATUS] <= 299` | Status must be less than or equal to 299 | 200, 201, 299 | 301, 302, 400, 500 |
|
| `[STATUS] <= 299` | Status must be less than or equal to 299 | 200, 201, 299 | 301, 302, ... |
|
||||||
| `[STATUS] > 400` | Status must be greater than 400 | 401, 402, 403, 404 | 200, 201, 300, 400 |
|
| `[STATUS] > 400` | Status must be greater than 400 | 401, 402, 403, 404 | 400, 200, ... |
|
||||||
| `[RESPONSE_TIME] < 500` | Response time must be below 500ms | 100ms, 200ms, 300ms | 500ms, 1500ms |
|
| `[RESPONSE_TIME] < 500` | Response time must be below 500ms | 100ms, 200ms, 300ms | 500ms, 501ms |
|
||||||
| `[BODY] == 1` | The body must be equal to 1 | 1 | literally anything else |
|
| `[BODY] == 1` | The body must be equal to 1 | 1 | Anything else |
|
||||||
| `[BODY].data.id == 1` | The jsonpath `$.data.id` is equal to 1 | `{"data":{"id":1}}` | literally anything else |
|
| `[BODY].data.id == 1` | The jsonpath `$.data.id` is equal to 1 | `{"data":{"id":1}}` | |
|
||||||
| `[BODY].data[0].id == 1` | The jsonpath `$.data[0].id` is equal to 1 | `{"data":[{"id":1}]}` | literally anything else |
|
| `[BODY].data[0].id == 1` | The jsonpath `$.data[0].id` is equal to 1 | `{"data":[{"id":1}]}` | |
|
||||||
| `len([BODY].data) > 0` | Array at jsonpath `$.data` has less than 5 elements | `{"data":[{"id":1}]}` | `{"data":[{"id":1}]}` |
|
| `len([BODY].data) > 0` | Array at jsonpath `$.data` has less than 5 elements | `{"data":[{"id":1}]}` | |
|
||||||
| `len([BODY].name) == 8` | String at jsonpath `$.name` has a length of 8 | `{"name":"john.doe"}` | `{"name":"bob"}` |
|
| `len([BODY].name) == 8` | String at jsonpath `$.name` has a length of 8 | `{"name":"john.doe"}` | `{"name":"bob"}` |
|
||||||
|
|
||||||
**NOTE**: `[BODY]` with JSON path (i.e. `[BODY].id == 1`) is currently in BETA. For the most part, the only thing that doesn't work is arrays.
|
|
||||||
|
|
||||||
|
|
||||||
## Docker
|
## Docker
|
||||||
|
|
||||||
Building the Docker image is done as following:
|
Building the Docker image is done as follows:
|
||||||
|
|
||||||
```
|
```
|
||||||
docker build . -t gatus
|
docker build . -t gatus
|
||||||
@@ -174,10 +196,41 @@ services:
|
|||||||
- type: slack
|
- type: slack
|
||||||
enabled: true
|
enabled: true
|
||||||
description: "healthcheck failed 3 times in a row"
|
description: "healthcheck failed 3 times in a row"
|
||||||
|
send-on-resolved: true
|
||||||
- type: slack
|
- type: slack
|
||||||
enabled: true
|
enabled: true
|
||||||
threshold: 5
|
threshold: 5
|
||||||
description: "healthcheck failed 5 times in a row"
|
description: "healthcheck failed 5 times in a row"
|
||||||
|
send-on-resolved: true
|
||||||
|
conditions:
|
||||||
|
- "[STATUS] == 200"
|
||||||
|
- "[BODY].status == UP"
|
||||||
|
- "[RESPONSE_TIME] < 300"
|
||||||
|
```
|
||||||
|
|
||||||
|
Here's an example of what the notifications look like:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
|
### Configuring Twilio alerts
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
alerting:
|
||||||
|
twilio:
|
||||||
|
sid: "..."
|
||||||
|
token: "..."
|
||||||
|
from: "+1-234-567-8901"
|
||||||
|
to: "+1-234-567-8901"
|
||||||
|
services:
|
||||||
|
- name: twinnation
|
||||||
|
interval: 30s
|
||||||
|
url: "https://twinnation.org/health"
|
||||||
|
alerts:
|
||||||
|
- type: twilio
|
||||||
|
enabled: true
|
||||||
|
threshold: 5
|
||||||
|
description: "healthcheck failed 5 times in a row"
|
||||||
conditions:
|
conditions:
|
||||||
- "[STATUS] == 200"
|
- "[STATUS] == 200"
|
||||||
- "[BODY].status == UP"
|
- "[BODY].status == UP"
|
||||||
@@ -195,7 +248,10 @@ would then check if the service that started failing was recently deployed, and
|
|||||||
roll it back.
|
roll it back.
|
||||||
|
|
||||||
The values `[ALERT_DESCRIPTION]` and `[SERVICE_NAME]` are automatically substituted for the alert description and the
|
The values `[ALERT_DESCRIPTION]` and `[SERVICE_NAME]` are automatically substituted for the alert description and the
|
||||||
service name accordingly in the body (`alerting.custom.body`) and the url (`alerting.custom.url`).
|
service name respectively in the body (`alerting.custom.body`) as well as the url (`alerting.custom.url`).
|
||||||
|
|
||||||
|
If you have `send-on-resolved` set to `true`, you may want to use `[ALERT_TRIGGERED_OR_RESOLVED]` to differentiate
|
||||||
|
the notifications. It will be replaced for either `TRIGGERED` or `RESOLVED`, based on the situation.
|
||||||
|
|
||||||
For all intents and purpose, we'll configure the custom alert with a Slack webhook, but you can call anything you want.
|
For all intents and purpose, we'll configure the custom alert with a Slack webhook, but you can call anything you want.
|
||||||
|
|
||||||
@@ -206,7 +262,7 @@ alerting:
|
|||||||
method: "POST"
|
method: "POST"
|
||||||
body: |
|
body: |
|
||||||
{
|
{
|
||||||
"text": "[SERVICE_NAME] - [ALERT_DESCRIPTION]"
|
"text": "[ALERT_TRIGGERED_OR_RESOLVED]: [SERVICE_NAME] - [ALERT_DESCRIPTION]"
|
||||||
}
|
}
|
||||||
services:
|
services:
|
||||||
- name: twinnation
|
- name: twinnation
|
||||||
@@ -216,9 +272,10 @@ services:
|
|||||||
- type: custom
|
- type: custom
|
||||||
enabled: true
|
enabled: true
|
||||||
threshold: 10
|
threshold: 10
|
||||||
|
send-on-resolved: true
|
||||||
description: "healthcheck failed 10 times in a row"
|
description: "healthcheck failed 10 times in a row"
|
||||||
conditions:
|
conditions:
|
||||||
- "[STATUS] == 200"
|
- "[STATUS] == 200"
|
||||||
- "[BODY].status == UP"
|
- "[BODY].status == UP"
|
||||||
- "[RESPONSE_TIME] < 300"
|
- "[RESPONSE_TIME] < 300"
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -22,6 +22,7 @@ var (
|
|||||||
|
|
||||||
type Config struct {
|
type Config struct {
|
||||||
Metrics bool `yaml:"metrics"`
|
Metrics bool `yaml:"metrics"`
|
||||||
|
Debug bool `yaml:"debug"`
|
||||||
Alerting *core.AlertingConfig `yaml:"alerting"`
|
Alerting *core.AlertingConfig `yaml:"alerting"`
|
||||||
Services []*core.Service `yaml:"services"`
|
Services []*core.Service `yaml:"services"`
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -40,8 +40,8 @@ services:
|
|||||||
if config.Services[0].Interval != 15*time.Second {
|
if config.Services[0].Interval != 15*time.Second {
|
||||||
t.Errorf("Interval should have been %s", 15*time.Second)
|
t.Errorf("Interval should have been %s", 15*time.Second)
|
||||||
}
|
}
|
||||||
if config.Services[1].Interval != 10*time.Second {
|
if config.Services[1].Interval != 60*time.Second {
|
||||||
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
|
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
|
||||||
}
|
}
|
||||||
if len(config.Services[0].Conditions) != 1 {
|
if len(config.Services[0].Conditions) != 1 {
|
||||||
t.Errorf("There should have been %d conditions", 1)
|
t.Errorf("There should have been %d conditions", 1)
|
||||||
@@ -71,8 +71,8 @@ services:
|
|||||||
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
|
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
|
||||||
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
|
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
|
||||||
}
|
}
|
||||||
if config.Services[0].Interval != 10*time.Second {
|
if config.Services[0].Interval != 60*time.Second {
|
||||||
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
|
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -97,8 +97,8 @@ services:
|
|||||||
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
|
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
|
||||||
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
|
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
|
||||||
}
|
}
|
||||||
if config.Services[0].Interval != 10*time.Second {
|
if config.Services[0].Interval != 60*time.Second {
|
||||||
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
|
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -154,8 +154,8 @@ services:
|
|||||||
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
|
if config.Services[0].Url != "https://twinnation.org/actuator/health" {
|
||||||
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
|
t.Errorf("URL should have been %s", "https://twinnation.org/actuator/health")
|
||||||
}
|
}
|
||||||
if config.Services[0].Interval != 10*time.Second {
|
if config.Services[0].Interval != 60*time.Second {
|
||||||
t.Errorf("Interval should have been %s, because it is the default value", 10*time.Second)
|
t.Errorf("Interval should have been %s, because it is the default value", 60*time.Second)
|
||||||
}
|
}
|
||||||
if config.Services[0].Alerts == nil {
|
if config.Services[0].Alerts == nil {
|
||||||
t.Fatal("The service alerts shouldn't have been nil")
|
t.Fatal("The service alerts shouldn't have been nil")
|
||||||
|
|||||||
@@ -13,11 +13,15 @@ type Alert struct {
|
|||||||
|
|
||||||
// Description of the alert. Will be included in the alert sent.
|
// Description of the alert. Will be included in the alert sent.
|
||||||
Description string `yaml:"description"`
|
Description string `yaml:"description"`
|
||||||
|
|
||||||
|
// SendOnResolved defines whether to send a second notification when the issue has been resolved
|
||||||
|
SendOnResolved bool `yaml:"send-on-resolved"`
|
||||||
}
|
}
|
||||||
|
|
||||||
type AlertType string
|
type AlertType string
|
||||||
|
|
||||||
const (
|
const (
|
||||||
SlackAlert AlertType = "slack"
|
SlackAlert AlertType = "slack"
|
||||||
|
TwilioAlert AlertType = "twilio"
|
||||||
CustomAlert AlertType = "custom"
|
CustomAlert AlertType = "custom"
|
||||||
)
|
)
|
||||||
|
|||||||
119
core/alerting.go
119
core/alerting.go
@@ -2,17 +2,31 @@ package core
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
"bytes"
|
||||||
|
"encoding/base64"
|
||||||
"fmt"
|
"fmt"
|
||||||
"github.com/TwinProduction/gatus/client"
|
"github.com/TwinProduction/gatus/client"
|
||||||
"net/http"
|
"net/http"
|
||||||
|
"net/url"
|
||||||
"strings"
|
"strings"
|
||||||
)
|
)
|
||||||
|
|
||||||
type AlertingConfig struct {
|
type AlertingConfig struct {
|
||||||
Slack string `yaml:"slack"`
|
Slack string `yaml:"slack"`
|
||||||
|
Twilio *TwilioAlertProvider `yaml:"twilio"`
|
||||||
Custom *CustomAlertProvider `yaml:"custom"`
|
Custom *CustomAlertProvider `yaml:"custom"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
type TwilioAlertProvider struct {
|
||||||
|
SID string `yaml:"sid"`
|
||||||
|
Token string `yaml:"token"`
|
||||||
|
From string `yaml:"from"`
|
||||||
|
To string `yaml:"to"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (provider *TwilioAlertProvider) IsValid() bool {
|
||||||
|
return len(provider.Token) > 0 && len(provider.SID) > 0 && len(provider.From) > 0 && len(provider.To) > 0
|
||||||
|
}
|
||||||
|
|
||||||
type CustomAlertProvider struct {
|
type CustomAlertProvider struct {
|
||||||
Url string `yaml:"url"`
|
Url string `yaml:"url"`
|
||||||
Method string `yaml:"method,omitempty"`
|
Method string `yaml:"method,omitempty"`
|
||||||
@@ -20,31 +34,49 @@ type CustomAlertProvider struct {
|
|||||||
Headers map[string]string `yaml:"headers,omitempty"`
|
Headers map[string]string `yaml:"headers,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
func (provider *CustomAlertProvider) buildRequest(serviceName, alertDescription string) *http.Request {
|
func (provider *CustomAlertProvider) IsValid() bool {
|
||||||
|
return len(provider.Url) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (provider *CustomAlertProvider) buildRequest(serviceName, alertDescription string, resolved bool) *http.Request {
|
||||||
body := provider.Body
|
body := provider.Body
|
||||||
url := provider.Url
|
providerUrl := provider.Url
|
||||||
if strings.Contains(provider.Body, "[ALERT_DESCRIPTION]") {
|
if strings.Contains(body, "[ALERT_DESCRIPTION]") {
|
||||||
body = strings.ReplaceAll(provider.Body, "[ALERT_DESCRIPTION]", alertDescription)
|
body = strings.ReplaceAll(body, "[ALERT_DESCRIPTION]", alertDescription)
|
||||||
}
|
}
|
||||||
if strings.Contains(provider.Body, "[SERVICE_NAME]") {
|
if strings.Contains(body, "[SERVICE_NAME]") {
|
||||||
body = strings.ReplaceAll(provider.Body, "[SERVICE_NAME]", serviceName)
|
body = strings.ReplaceAll(body, "[SERVICE_NAME]", serviceName)
|
||||||
}
|
}
|
||||||
if strings.Contains(provider.Url, "[ALERT_DESCRIPTION]") {
|
if strings.Contains(body, "[ALERT_TRIGGERED_OR_RESOLVED]") {
|
||||||
url = strings.ReplaceAll(provider.Url, "[ALERT_DESCRIPTION]", alertDescription)
|
if resolved {
|
||||||
|
body = strings.ReplaceAll(body, "[ALERT_TRIGGERED_OR_RESOLVED]", "RESOLVED")
|
||||||
|
} else {
|
||||||
|
body = strings.ReplaceAll(body, "[ALERT_TRIGGERED_OR_RESOLVED]", "TRIGGERED")
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if strings.Contains(provider.Url, "[SERVICE_NAME]") {
|
if strings.Contains(providerUrl, "[ALERT_DESCRIPTION]") {
|
||||||
url = strings.ReplaceAll(provider.Url, "[SERVICE_NAME]", serviceName)
|
providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_DESCRIPTION]", alertDescription)
|
||||||
|
}
|
||||||
|
if strings.Contains(providerUrl, "[SERVICE_NAME]") {
|
||||||
|
providerUrl = strings.ReplaceAll(providerUrl, "[SERVICE_NAME]", serviceName)
|
||||||
|
}
|
||||||
|
if strings.Contains(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]") {
|
||||||
|
if resolved {
|
||||||
|
providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]", "RESOLVED")
|
||||||
|
} else {
|
||||||
|
providerUrl = strings.ReplaceAll(providerUrl, "[ALERT_TRIGGERED_OR_RESOLVED]", "TRIGGERED")
|
||||||
|
}
|
||||||
}
|
}
|
||||||
bodyBuffer := bytes.NewBuffer([]byte(body))
|
bodyBuffer := bytes.NewBuffer([]byte(body))
|
||||||
request, _ := http.NewRequest(provider.Method, url, bodyBuffer)
|
request, _ := http.NewRequest(provider.Method, providerUrl, bodyBuffer)
|
||||||
for k, v := range provider.Headers {
|
for k, v := range provider.Headers {
|
||||||
request.Header.Set(k, v)
|
request.Header.Set(k, v)
|
||||||
}
|
}
|
||||||
return request
|
return request
|
||||||
}
|
}
|
||||||
|
|
||||||
func (provider *CustomAlertProvider) Send(serviceName, alertDescription string) error {
|
func (provider *CustomAlertProvider) Send(serviceName, alertDescription string, resolved bool) error {
|
||||||
request := provider.buildRequest(serviceName, alertDescription)
|
request := provider.buildRequest(serviceName, alertDescription, resolved)
|
||||||
response, err := client.GetHttpClient().Do(request)
|
response, err := client.GetHttpClient().Do(request)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -54,3 +86,64 @@ func (provider *CustomAlertProvider) Send(serviceName, alertDescription string)
|
|||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func CreateSlackCustomAlertProvider(slackWebHookUrl string, service *Service, alert *Alert, result *Result, resolved bool) *CustomAlertProvider {
|
||||||
|
var message string
|
||||||
|
var color string
|
||||||
|
if resolved {
|
||||||
|
message = fmt.Sprintf("An alert for *%s* has been resolved after %d failures in a row", service.Name, service.NumberOfFailuresInARow)
|
||||||
|
color = "#36A64F"
|
||||||
|
} else {
|
||||||
|
message = fmt.Sprintf("An alert for *%s* has been triggered", service.Name)
|
||||||
|
color = "#DD0000"
|
||||||
|
}
|
||||||
|
var results string
|
||||||
|
for _, conditionResult := range result.ConditionResults {
|
||||||
|
var prefix string
|
||||||
|
if conditionResult.Success {
|
||||||
|
prefix = ":heavy_check_mark:"
|
||||||
|
} else {
|
||||||
|
prefix = ":x:"
|
||||||
|
}
|
||||||
|
results += fmt.Sprintf("%s - `%s`\n", prefix, conditionResult.Condition)
|
||||||
|
}
|
||||||
|
return &CustomAlertProvider{
|
||||||
|
Url: slackWebHookUrl,
|
||||||
|
Method: "POST",
|
||||||
|
Body: fmt.Sprintf(`{
|
||||||
|
"text": "",
|
||||||
|
"attachments": [
|
||||||
|
{
|
||||||
|
"title": ":helmet_with_white_cross: Gatus",
|
||||||
|
"text": "%s:\n> %s",
|
||||||
|
"short": false,
|
||||||
|
"color": "%s",
|
||||||
|
"fields": [
|
||||||
|
{
|
||||||
|
"title": "Condition results",
|
||||||
|
"value": "%s",
|
||||||
|
"short": false
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
]
|
||||||
|
}`, message, alert.Description, color, results),
|
||||||
|
Headers: map[string]string{"Content-Type": "application/json"},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func CreateTwilioCustomAlertProvider(provider *TwilioAlertProvider, message string) *CustomAlertProvider {
|
||||||
|
return &CustomAlertProvider{
|
||||||
|
Url: fmt.Sprintf("https://api.twilio.com/2010-04-01/Accounts/%s/Messages.json", provider.SID),
|
||||||
|
Method: "POST",
|
||||||
|
Body: url.Values{
|
||||||
|
"To": {provider.To},
|
||||||
|
"From": {provider.From},
|
||||||
|
"Body": {message},
|
||||||
|
}.Encode(),
|
||||||
|
Headers: map[string]string{
|
||||||
|
"Content-Type": "application/x-www-form-urlencoded",
|
||||||
|
"Authorization": fmt.Sprintf("Basic %s", base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%s:%s", provider.SID, provider.Token)))),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -17,24 +17,42 @@ var (
|
|||||||
ErrNoUrl = errors.New("you must specify an url for each service")
|
ErrNoUrl = errors.New("you must specify an url for each service")
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// Service is the configuration of a monitored endpoint
|
||||||
type Service struct {
|
type Service struct {
|
||||||
Name string `yaml:"name"`
|
// Name of the service. Can be anything.
|
||||||
Url string `yaml:"url"`
|
Name string `yaml:"name"`
|
||||||
Method string `yaml:"method,omitempty"`
|
|
||||||
Body string `yaml:"body,omitempty"`
|
|
||||||
GraphQL bool `yaml:"graphql,omitempty"`
|
|
||||||
Headers map[string]string `yaml:"headers,omitempty"`
|
|
||||||
Interval time.Duration `yaml:"interval,omitempty"`
|
|
||||||
Conditions []*Condition `yaml:"conditions"`
|
|
||||||
Alerts []*Alert `yaml:"alerts"`
|
|
||||||
|
|
||||||
numberOfFailuresInARow int
|
// URL to send the request to
|
||||||
|
Url string `yaml:"url"`
|
||||||
|
|
||||||
|
// Method of the request made to the url of the service
|
||||||
|
Method string `yaml:"method,omitempty"`
|
||||||
|
|
||||||
|
// Body of the request
|
||||||
|
Body string `yaml:"body,omitempty"`
|
||||||
|
|
||||||
|
// GraphQL is whether to wrap the body in a query param ({"query":"$body"})
|
||||||
|
GraphQL bool `yaml:"graphql,omitempty"`
|
||||||
|
|
||||||
|
// Headers of the request
|
||||||
|
Headers map[string]string `yaml:"headers,omitempty"`
|
||||||
|
|
||||||
|
// Interval is the duration to wait between every status check
|
||||||
|
Interval time.Duration `yaml:"interval,omitempty"`
|
||||||
|
|
||||||
|
// Conditions used to determine the health of the service
|
||||||
|
Conditions []*Condition `yaml:"conditions"`
|
||||||
|
|
||||||
|
// Alerts is the alerting configuration for the service in case of failure
|
||||||
|
Alerts []*Alert `yaml:"alerts"`
|
||||||
|
|
||||||
|
NumberOfFailuresInARow int
|
||||||
}
|
}
|
||||||
|
|
||||||
func (service *Service) Validate() {
|
func (service *Service) Validate() {
|
||||||
// Set default values
|
// Set default values
|
||||||
if service.Interval == 0 {
|
if service.Interval == 0 {
|
||||||
service.Interval = 10 * time.Second
|
service.Interval = 1 * time.Minute
|
||||||
}
|
}
|
||||||
if len(service.Method) == 0 {
|
if len(service.Method) == 0 {
|
||||||
service.Method = http.MethodGet
|
service.Method = http.MethodGet
|
||||||
@@ -76,22 +94,16 @@ func (service *Service) EvaluateConditions() *Result {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
result.Timestamp = time.Now()
|
result.Timestamp = time.Now()
|
||||||
if result.Success {
|
|
||||||
service.numberOfFailuresInARow = 0
|
|
||||||
// TODO: Send notification that alert has been resolved?
|
|
||||||
} else {
|
|
||||||
service.numberOfFailuresInARow++
|
|
||||||
}
|
|
||||||
return result
|
return result
|
||||||
}
|
}
|
||||||
|
|
||||||
func (service *Service) GetAlertsTriggered() []Alert {
|
func (service *Service) GetAlertsTriggered() []Alert {
|
||||||
var alerts []Alert
|
var alerts []Alert
|
||||||
if service.numberOfFailuresInARow == 0 {
|
if service.NumberOfFailuresInARow == 0 {
|
||||||
return alerts
|
return alerts
|
||||||
}
|
}
|
||||||
for _, alert := range service.Alerts {
|
for _, alert := range service.Alerts {
|
||||||
if alert.Enabled && alert.Threshold == service.numberOfFailuresInARow {
|
if alert.Enabled && alert.Threshold == service.NumberOfFailuresInARow {
|
||||||
alerts = append(alerts, *alert)
|
alerts = append(alerts, *alert)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|||||||
1
go.sum
1
go.sum
@@ -18,6 +18,7 @@ github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/me
|
|||||||
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
|
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
|
||||||
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
|
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
|
||||||
github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
|
github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
|
||||||
|
github.com/golang/protobuf v1.3.2 h1:6nsPYzhq5kReh6QImI3k5qWzO4PEbvbIW2cwSfR/6xs=
|
||||||
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
|
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
|
||||||
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
|
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
|
||||||
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
|
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
|
||||||
|
|||||||
8
main.go
8
main.go
@@ -3,7 +3,6 @@ package main
|
|||||||
import (
|
import (
|
||||||
"bytes"
|
"bytes"
|
||||||
"compress/gzip"
|
"compress/gzip"
|
||||||
"encoding/json"
|
|
||||||
"github.com/TwinProduction/gatus/config"
|
"github.com/TwinProduction/gatus/config"
|
||||||
"github.com/TwinProduction/gatus/watchdog"
|
"github.com/TwinProduction/gatus/watchdog"
|
||||||
"github.com/prometheus/client_golang/prometheus/promhttp"
|
"github.com/prometheus/client_golang/prometheus/promhttp"
|
||||||
@@ -53,12 +52,11 @@ func serviceResultsHandler(writer http.ResponseWriter, r *http.Request) {
|
|||||||
if isExpired := cachedServiceResultsTimestamp.IsZero() || time.Now().Sub(cachedServiceResultsTimestamp) > CacheTTL; isExpired {
|
if isExpired := cachedServiceResultsTimestamp.IsZero() || time.Now().Sub(cachedServiceResultsTimestamp) > CacheTTL; isExpired {
|
||||||
buffer := &bytes.Buffer{}
|
buffer := &bytes.Buffer{}
|
||||||
gzipWriter := gzip.NewWriter(buffer)
|
gzipWriter := gzip.NewWriter(buffer)
|
||||||
serviceResults := watchdog.GetServiceResults()
|
data, err := watchdog.GetJsonEncodedServiceResults()
|
||||||
data, err := json.Marshal(serviceResults)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Printf("[main][serviceResultsHandler] Unable to marshall object to JSON: %s", err.Error())
|
log.Printf("[main][serviceResultsHandler] Unable to marshal object to JSON: %s", err.Error())
|
||||||
writer.WriteHeader(http.StatusInternalServerError)
|
writer.WriteHeader(http.StatusInternalServerError)
|
||||||
_, _ = writer.Write([]byte("Unable to marshall object to JSON"))
|
_, _ = writer.Write([]byte("Unable to marshal object to JSON"))
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
gzipWriter.Write(data)
|
gzipWriter.Write(data)
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
package watchdog
|
package watchdog
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"github.com/TwinProduction/gatus/config"
|
"github.com/TwinProduction/gatus/config"
|
||||||
"github.com/TwinProduction/gatus/core"
|
"github.com/TwinProduction/gatus/core"
|
||||||
@@ -12,12 +13,22 @@ import (
|
|||||||
|
|
||||||
var (
|
var (
|
||||||
serviceResults = make(map[string][]*core.Result)
|
serviceResults = make(map[string][]*core.Result)
|
||||||
rwLock sync.RWMutex
|
|
||||||
|
// serviceResultsMutex is used to prevent concurrent map access
|
||||||
|
serviceResultsMutex sync.RWMutex
|
||||||
|
|
||||||
|
// monitoringMutex is used to prevent multiple services from being evaluated at the same time.
|
||||||
|
// Without this, conditions using response time may become inaccurate.
|
||||||
|
monitoringMutex sync.Mutex
|
||||||
)
|
)
|
||||||
|
|
||||||
// GetServiceResults returns a list of the last 20 results for each services
|
// GetJsonEncodedServiceResults returns a list of the last 20 results for each services encoded using json.Marshal.
|
||||||
func GetServiceResults() *map[string][]*core.Result {
|
// The reason why the encoding is done here is because we use a mutex to prevent concurrent map access.
|
||||||
return &serviceResults
|
func GetJsonEncodedServiceResults() ([]byte, error) {
|
||||||
|
serviceResultsMutex.RLock()
|
||||||
|
data, err := json.Marshal(serviceResults)
|
||||||
|
serviceResultsMutex.RUnlock()
|
||||||
|
return data, err
|
||||||
}
|
}
|
||||||
|
|
||||||
// Monitor loops over each services and starts a goroutine to monitor each services separately
|
// Monitor loops over each services and starts a goroutine to monitor each services separately
|
||||||
@@ -31,49 +42,72 @@ func Monitor(cfg *config.Config) {
|
|||||||
|
|
||||||
// monitor monitors a single service in a loop
|
// monitor monitors a single service in a loop
|
||||||
func monitor(service *core.Service) {
|
func monitor(service *core.Service) {
|
||||||
|
cfg := config.Get()
|
||||||
for {
|
for {
|
||||||
// By placing the lock here, we prevent multiple services from being monitored at the exact same time, which
|
// By placing the lock here, we prevent multiple services from being monitored at the exact same time, which
|
||||||
// could cause performance issues and return inaccurate results
|
// could cause performance issues and return inaccurate results
|
||||||
rwLock.Lock()
|
monitoringMutex.Lock()
|
||||||
log.Printf("[watchdog][monitor] Monitoring serviceName=%s", service.Name)
|
if cfg.Debug {
|
||||||
|
log.Printf("[watchdog][monitor] Monitoring serviceName=%s", service.Name)
|
||||||
|
}
|
||||||
result := service.EvaluateConditions()
|
result := service.EvaluateConditions()
|
||||||
metric.PublishMetricsForService(service, result)
|
metric.PublishMetricsForService(service, result)
|
||||||
|
serviceResultsMutex.Lock()
|
||||||
serviceResults[service.Name] = append(serviceResults[service.Name], result)
|
serviceResults[service.Name] = append(serviceResults[service.Name], result)
|
||||||
if len(serviceResults[service.Name]) > 20 {
|
if len(serviceResults[service.Name]) > 20 {
|
||||||
serviceResults[service.Name] = serviceResults[service.Name][1:]
|
serviceResults[service.Name] = serviceResults[service.Name][1:]
|
||||||
}
|
}
|
||||||
rwLock.Unlock()
|
serviceResultsMutex.Unlock()
|
||||||
var extra string
|
var extra string
|
||||||
if !result.Success {
|
if !result.Success {
|
||||||
extra = fmt.Sprintf("responseBody=%s", result.Body)
|
extra = fmt.Sprintf("responseBody=%s", result.Body)
|
||||||
}
|
}
|
||||||
log.Printf(
|
log.Printf(
|
||||||
"[watchdog][monitor] Finished monitoring serviceName=%s; errors=%d; requestDuration=%s; %s",
|
"[watchdog][monitor] Monitored serviceName=%s; success=%v; errors=%d; requestDuration=%s; %s",
|
||||||
service.Name,
|
service.Name,
|
||||||
|
result.Success,
|
||||||
len(result.Errors),
|
len(result.Errors),
|
||||||
result.Duration.Round(time.Millisecond),
|
result.Duration.Round(time.Millisecond),
|
||||||
extra,
|
extra,
|
||||||
)
|
)
|
||||||
|
handleAlerting(service, result)
|
||||||
|
if cfg.Debug {
|
||||||
|
log.Printf("[watchdog][monitor] Waiting for interval=%s before monitoring serviceName=%s again", service.Interval, service.Name)
|
||||||
|
}
|
||||||
|
monitoringMutex.Unlock()
|
||||||
|
time.Sleep(service.Interval)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
cfg := config.Get()
|
func handleAlerting(service *core.Service, result *core.Result) {
|
||||||
if cfg.Alerting != nil {
|
cfg := config.Get()
|
||||||
for _, alertTriggered := range service.GetAlertsTriggered() {
|
if cfg.Alerting == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if result.Success {
|
||||||
|
if service.NumberOfFailuresInARow > 0 {
|
||||||
|
for _, alert := range service.Alerts {
|
||||||
|
if !alert.Enabled || !alert.SendOnResolved || alert.Threshold > service.NumberOfFailuresInARow {
|
||||||
|
continue
|
||||||
|
}
|
||||||
var alertProvider *core.CustomAlertProvider
|
var alertProvider *core.CustomAlertProvider
|
||||||
if alertTriggered.Type == core.SlackAlert {
|
if alert.Type == core.SlackAlert {
|
||||||
if len(cfg.Alerting.Slack) > 0 {
|
if len(cfg.Alerting.Slack) > 0 {
|
||||||
log.Printf("[watchdog][monitor] Sending Slack alert because alert with description=%s has been triggered", alertTriggered.Description)
|
log.Printf("[watchdog][handleAlerting] Sending Slack alert because alert with description=%s has been resolved", alert.Description)
|
||||||
alertProvider = &core.CustomAlertProvider{
|
alertProvider = core.CreateSlackCustomAlertProvider(cfg.Alerting.Slack, service, alert, result, true)
|
||||||
Url: cfg.Alerting.Slack,
|
|
||||||
Method: "POST",
|
|
||||||
Body: fmt.Sprintf(`{"text":"*[Gatus]*\n*service:* %s\n*description:* %s"}`, service.Name, alertTriggered.Description),
|
|
||||||
Headers: map[string]string{"Content-Type": "application/json"},
|
|
||||||
}
|
|
||||||
} else {
|
} else {
|
||||||
log.Printf("[watchdog][monitor] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
|
log.Printf("[watchdog][handleAlerting] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
|
||||||
}
|
}
|
||||||
} else if alertTriggered.Type == core.CustomAlert {
|
} else if alert.Type == core.TwilioAlert {
|
||||||
if cfg.Alerting.Custom != nil && len(cfg.Alerting.Custom.Url) > 0 {
|
if cfg.Alerting.Twilio != nil && cfg.Alerting.Twilio.IsValid() {
|
||||||
log.Printf("[watchdog][monitor] Sending custom alert because alert with description=%s has been triggered", alertTriggered.Description)
|
log.Printf("[watchdog][handleAlerting] Sending Twilio alert because alert with description=%s has been resolved", alert.Description)
|
||||||
|
alertProvider = core.CreateTwilioCustomAlertProvider(cfg.Alerting.Twilio, fmt.Sprintf("RESOLVED: %s - %s", service.Name, alert.Description))
|
||||||
|
} else {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Not sending Twilio alert despite being resolved, because Twilio isn't configured properly")
|
||||||
|
}
|
||||||
|
} else if alert.Type == core.CustomAlert {
|
||||||
|
if cfg.Alerting.Custom != nil && cfg.Alerting.Custom.IsValid() {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Sending custom alert because alert with description=%s has been resolved", alert.Description)
|
||||||
alertProvider = &core.CustomAlertProvider{
|
alertProvider = &core.CustomAlertProvider{
|
||||||
Url: cfg.Alerting.Custom.Url,
|
Url: cfg.Alerting.Custom.Url,
|
||||||
Method: cfg.Alerting.Custom.Method,
|
Method: cfg.Alerting.Custom.Method,
|
||||||
@@ -81,19 +115,59 @@ func monitor(service *core.Service) {
|
|||||||
Headers: cfg.Alerting.Custom.Headers,
|
Headers: cfg.Alerting.Custom.Headers,
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
log.Printf("[watchdog][monitor] Not sending custom alert despite being triggered, because there is no custom url configured")
|
log.Printf("[watchdog][handleAlerting] Not sending custom alert despite being resolved, because the custom provider isn't configured properly")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if alertProvider != nil {
|
if alertProvider != nil {
|
||||||
err := alertProvider.Send(service.Name, alertTriggered.Description)
|
err := alertProvider.Send(service.Name, alert.Description, true)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Printf("[watchdog][monitor] Ran into error sending an alert: %s", err.Error())
|
log.Printf("[watchdog][handleAlerting] Ran into error sending an alert: %s", err.Error())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
service.NumberOfFailuresInARow = 0
|
||||||
log.Printf("[watchdog][monitor] Waiting for interval=%s before monitoring serviceName=%s", service.Interval, service.Name)
|
} else {
|
||||||
time.Sleep(service.Interval)
|
service.NumberOfFailuresInARow++
|
||||||
|
for _, alert := range service.Alerts {
|
||||||
|
// If the alert hasn't been triggered, move to the next one
|
||||||
|
if !alert.Enabled || alert.Threshold != service.NumberOfFailuresInARow {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
var alertProvider *core.CustomAlertProvider
|
||||||
|
if alert.Type == core.SlackAlert {
|
||||||
|
if len(cfg.Alerting.Slack) > 0 {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Sending Slack alert because alert with description=%s has been triggered", alert.Description)
|
||||||
|
alertProvider = core.CreateSlackCustomAlertProvider(cfg.Alerting.Slack, service, alert, result, false)
|
||||||
|
} else {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Not sending Slack alert despite being triggered, because there is no Slack webhook configured")
|
||||||
|
}
|
||||||
|
} else if alert.Type == core.TwilioAlert {
|
||||||
|
if cfg.Alerting.Twilio != nil && cfg.Alerting.Twilio.IsValid() {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Sending Twilio alert because alert with description=%s has been triggered", alert.Description)
|
||||||
|
alertProvider = core.CreateTwilioCustomAlertProvider(cfg.Alerting.Twilio, fmt.Sprintf("TRIGGERED: %s - %s", service.Name, alert.Description))
|
||||||
|
} else {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Not sending Twilio alert despite being triggered, because Twilio config settings missing")
|
||||||
|
}
|
||||||
|
} else if alert.Type == core.CustomAlert {
|
||||||
|
if cfg.Alerting.Custom != nil && cfg.Alerting.Custom.IsValid() {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Sending custom alert because alert with description=%s has been triggered", alert.Description)
|
||||||
|
alertProvider = &core.CustomAlertProvider{
|
||||||
|
Url: cfg.Alerting.Custom.Url,
|
||||||
|
Method: cfg.Alerting.Custom.Method,
|
||||||
|
Body: cfg.Alerting.Custom.Body,
|
||||||
|
Headers: cfg.Alerting.Custom.Headers,
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Not sending custom alert despite being triggered, because there is no custom url configured")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if alertProvider != nil {
|
||||||
|
err := alertProvider.Send(service.Name, alert.Description, false)
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("[watchdog][handleAlerting] Ran into error sending an alert: %s", err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user