Compare commits

..

4 Commits

Author SHA1 Message Date
TwinProduction
230c8821ec Merge util.go into condition.go 2020-10-04 20:00:24 -04:00
TwinProduction
ecef3d9538 Add result.Connected in integration tests 2020-10-04 19:55:19 -04:00
TwinProduction
3ecfe4d322 Close #18: Support monitoring TCP services 2020-10-04 19:49:02 -04:00
TwinProduction
6a3f65db7f Close #14: Support skipping certificate verification (services[].insecure) 2020-10-04 17:01:10 -04:00
10 changed files with 276 additions and 115 deletions

View File

@@ -29,6 +29,9 @@ core applications: https://status.twinnation.org/
- [Using in Production](#using-in-production)
- [FAQ](#faq)
- [Sending a GraphQL request](#sending-a-graphql-request)
- [Recommended interval](#recommended-interval)
- [Default timeouts](#default-timeouts)
- [Monitoring a TCP service](#monitoring-a-tcp-service)
## Features
@@ -84,6 +87,7 @@ Note that you can also add environment variables in the configuration file (i.e.
| `services[].name` | Name of the service. Can be anything. | Required `""` |
| `services[].url` | URL to send the request to | Required `""` |
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
| `services[].insecure` | Whether to skip verifying the server's certificate chain and host name | `false` |
| `services[].interval` | Duration to wait between every status check | `60s` |
| `services[].method` | Request method | `GET` |
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
@@ -121,6 +125,7 @@ Here are some examples of conditions you can use:
| `[STATUS] < 300` | Status must lower than 300 | 200, 201, 299 | 301, 302, ... |
| `[STATUS] <= 299` | Status must be less than or equal to 299 | 200, 201, 299 | 301, 302, ... |
| `[STATUS] > 400` | Status must be greater than 400 | 401, 402, 403, 404 | 400, 200, ... |
| `[CONNECTED] == true` | Connection to host must've been successful | true, false | |
| `[RESPONSE_TIME] < 500` | Response time must be below 500ms | 100ms, 200ms, 300ms | 500ms, 501ms |
| `[IP] == 127.0.0.1` | Target IP must be 127.0.0.1 | 127.0.0.1 | 0.0.0.0 |
| `[BODY] == 1` | The body must be equal to 1 | 1 | `{}`, `2`, ... |
@@ -134,12 +139,13 @@ Here are some examples of conditions you can use:
#### Placeholders
| Placeholder | Description | Example of resolved value |
|:---------------------------- |:------------------------------------------------------- |:-------------------------- |
| `[STATUS]` | Resolves into the HTTP status of the request | 404
| `[RESPONSE_TIME]` | Resolves into the response time the request took, in ms | 10
| `[IP]` | Resolves into the IP of the target host | 192.168.0.232
| `[BODY]` | Resolves into the response body. Supports JSONPath. | `{"name":"john.doe"}`
| Placeholder | Description | Example of resolved value |
|:----------------------- |:------------------------------------------------------- |:------------------------- |
| `[STATUS]` | Resolves into the HTTP status of the request | 404
| `[RESPONSE_TIME]` | Resolves into the response time the request took, in ms | 10
| `[IP]` | Resolves into the IP of the target host | 192.168.0.232
| `[BODY]` | Resolves into the response body. Supports JSONPath. | `{"name":"john.doe"}`
| `[CONNECTED]` | Resolves into whether a connection could be established | `true`
#### Functions
@@ -345,3 +351,60 @@ will send a `POST` request to `http://localhost:8080/playground` with the follow
```json
{"query":" {\n user(gender: \"female\") {\n id\n name\n gender\n avatar\n }\n }"}
```
### Recommended interval
To ensure that Gatus provides reliable and accurate results (i.e. response time), Gatus only evaluates one service at a time.
In other words, even if you have multiple services with the exact same interval, they will not execute at the same time.
You can test this yourself by running Gatus with several services configured with a very short, unrealistic interval,
such as 1ms. You'll notice that the response time does not fluctuate - that is because while services are evaluated on
different goroutines, there's a global lock that prevents multiple services from running at the same time.
Unfortunately, there is a drawback. If you have a lot of services, including some that are very slow or prone to time out (the default
time out is 10s for HTTP and 5s for TCP), then it means that for the entire duration of the request, no other services can be evaluated.
**This does mean that Gatus will be unable to evaluate the health of other services**.
The interval does not include the duration of the request itself, which means that if a service has an interval of 30s
and the request takes 2s to complete, the timestamp between two evaluations will be 32s, not 30s.
While this does not prevent Gatus' from performing health checks on all other services, it may cause Gatus to be unable
to respect the configured interval, for instance:
- Service A has an interval of 5s, and times out after 10s to complete
- Service B has an interval of 5s, and takes 1ms to complete
- Service B will be unable to run every 5s, because service A's health evaluation takes longer than its interval
To sum it up, while Gatus can really handle any interval you throw at it, you're better off having slow requests with
higher interval.
As a rule of the thumb, I personally set interval for more complex health checks to `5m` (5 minutes) and
simple health checks used for alerting (PagerDuty/Twilio) to `30s`.
### Default timeouts
| Protocol | Timeout |
|:-------- |:------- |
| HTTP | 10s
| TCP | 5s
### Monitoring a TCP service
By prefixing `services[].url` with `tcp:\\`, you can monitor TCP services at a very basic level:
```yaml
- name: redis
url: "tcp://127.0.0.1:6379"
interval: 30s
conditions:
- "[CONNECTED] == true"
```
Placeholders `[STATUS]` and `[BODY]` as well as the fields `services[].body`, `services[].insecure`,
`services[].headers`, `services[].method` and `services[].graphql` are not supported for TCP services.
**NOTE**: `[CONNECTED] == true` does not guarantee that the service itself is healthy - it only guarantees that there's
something at the given address listening to the given port, and that a connection to that address was successfully
established.

View File

@@ -73,7 +73,7 @@ func (provider *AlertProvider) buildRequest(serviceName, alertDescription string
// Send a request to the alert provider and return the body
func (provider *AlertProvider) Send(serviceName, alertDescription string, resolved bool) ([]byte, error) {
request := provider.buildRequest(serviceName, alertDescription, resolved)
response, err := client.GetHttpClient().Do(request)
response, err := client.GetHttpClient(false).Do(request)
if err != nil {
return nil, err
}

View File

@@ -1,19 +1,45 @@
package client
import (
"crypto/tls"
"net"
"net/http"
"time"
)
var (
client *http.Client
secureHttpClient *http.Client
insecureHttpClient *http.Client
)
func GetHttpClient() *http.Client {
if client == nil {
client = &http.Client{
Timeout: time.Second * 10,
func GetHttpClient(insecure bool) *http.Client {
if insecure {
if insecureHttpClient == nil {
insecureHttpClient = &http.Client{
Timeout: time.Second * 10,
Transport: &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
},
},
}
}
return insecureHttpClient
} else {
if secureHttpClient == nil {
secureHttpClient = &http.Client{
Timeout: time.Second * 10,
}
}
return secureHttpClient
}
return client
}
func CanCreateConnectionToTcpService(address string) bool {
conn, err := net.DialTimeout("tcp", address, 5*time.Second)
if err != nil {
return false
}
_ = conn.Close()
return true
}

View File

@@ -1,13 +1,28 @@
package client
import "testing"
import (
"testing"
)
func TestGetHttpClient(t *testing.T) {
if client != nil {
t.Error("client should've been nil since it hasn't been called a single time yet")
if secureHttpClient != nil {
t.Error("secureHttpClient should've been nil since it hasn't been called a single time yet")
}
_ = GetHttpClient()
if client == nil {
t.Error("client shouldn't have been nil, since it has been called once")
if insecureHttpClient != nil {
t.Error("insecureHttpClient should've been nil since it hasn't been called a single time yet")
}
_ = GetHttpClient(false)
if secureHttpClient == nil {
t.Error("secureHttpClient shouldn't have been nil, since it has been called once")
}
if insecureHttpClient != nil {
t.Error("insecureHttpClient should've been nil since it hasn't been called a single time yet")
}
_ = GetHttpClient(true)
if secureHttpClient == nil {
t.Error("secureHttpClient shouldn't have been nil, since it has been called once")
}
if insecureHttpClient == nil {
t.Error("insecureHttpClient shouldn't have been nil, since it has been called once")
}
}

View File

@@ -2,11 +2,27 @@ package core
import (
"fmt"
"github.com/TwinProduction/gatus/jsonpath"
"github.com/TwinProduction/gatus/pattern"
"log"
"strconv"
"strings"
)
const (
StatusPlaceholder = "[STATUS]"
IPPlaceHolder = "[IP]"
ResponseTimePlaceHolder = "[RESPONSE_TIME]"
BodyPlaceHolder = "[BODY]"
ConnectedPlaceHolder = "[CONNECTED]"
LengthFunctionPrefix = "len("
PatternFunctionPrefix = "pat("
FunctionSuffix = ")"
InvalidConditionElementSuffix = "(INVALID)"
)
type Condition string
// evaluate the Condition with the Result of the health check
@@ -74,3 +90,60 @@ func isEqual(first, second string) bool {
return first == second
}
}
// sanitizeAndResolve sanitizes and resolves a list of element and returns the list of resolved elements
func sanitizeAndResolve(list []string, result *Result) []string {
var sanitizedList []string
body := strings.TrimSpace(string(result.Body))
for _, element := range list {
element = strings.TrimSpace(element)
switch strings.ToUpper(element) {
case StatusPlaceholder:
element = strconv.Itoa(result.HttpStatus)
case IPPlaceHolder:
element = result.Ip
case ResponseTimePlaceHolder:
element = strconv.Itoa(int(result.Duration.Milliseconds()))
case BodyPlaceHolder:
element = body
case ConnectedPlaceHolder:
element = strconv.FormatBool(result.Connected)
default:
// if contains the BodyPlaceHolder, then evaluate json path
if strings.Contains(element, BodyPlaceHolder) {
wantLength := false
if strings.HasPrefix(element, LengthFunctionPrefix) && strings.HasSuffix(element, FunctionSuffix) {
wantLength = true
element = strings.TrimSuffix(strings.TrimPrefix(element, LengthFunctionPrefix), FunctionSuffix)
}
resolvedElement, resolvedElementLength, err := jsonpath.Eval(strings.Replace(element, fmt.Sprintf("%s.", BodyPlaceHolder), "", 1), result.Body)
if err != nil {
result.Errors = append(result.Errors, err.Error())
element = fmt.Sprintf("%s %s", element, InvalidConditionElementSuffix)
} else {
if wantLength {
element = fmt.Sprintf("%d", resolvedElementLength)
} else {
element = resolvedElement
}
}
}
}
sanitizedList = append(sanitizedList, element)
}
return sanitizedList
}
func sanitizeAndResolveNumerical(list []string, result *Result) []int {
var sanitizedNumbers []int
sanitizedList := sanitizeAndResolve(list, result)
for _, element := range sanitizedList {
if number, err := strconv.Atoi(element); err != nil {
// Default to 0 if the string couldn't be converted to an integer
sanitizedNumbers = append(sanitizedNumbers, 0)
} else {
sanitizedNumbers = append(sanitizedNumbers, number)
}
}
return sanitizedNumbers
}

View File

@@ -265,3 +265,21 @@ func TestCondition_evaluateWithStatusPatternFailure(t *testing.T) {
t.Errorf("Condition '%s' should have been a failure", condition)
}
}
func TestCondition_evaluateWithConnected(t *testing.T) {
condition := Condition("[CONNECTED] == true")
result := &Result{Connected: true}
condition.evaluate(result)
if !result.ConditionResults[0].Success {
t.Errorf("Condition '%s' should have been a success", condition)
}
}
func TestCondition_evaluateWithConnectedFailure(t *testing.T) {
condition := Condition("[CONNECTED] == true")
result := &Result{Connected: false}
condition.evaluate(result)
if result.ConditionResults[0].Success {
t.Errorf("Condition '%s' should have been a failure", condition)
}
}

View File

@@ -9,6 +9,7 @@ import (
"net"
"net/http"
"net/url"
"strings"
"time"
)
@@ -46,6 +47,9 @@ type Service struct {
// Alerts is the alerting configuration for the service in case of failure
Alerts []*Alert `yaml:"alerts"`
// Insecure is whether to skip verifying the server's certificate chain and host name
Insecure bool `yaml:"insecure,omitempty"`
NumberOfFailuresInARow int
NumberOfSuccessesInARow int
}
@@ -133,19 +137,30 @@ func (service *Service) getIp(result *Result) {
}
func (service *Service) call(result *Result) {
request := service.buildRequest()
startTime := time.Now()
response, err := client.GetHttpClient().Do(request)
if err != nil {
result.Duration = time.Since(startTime)
result.Errors = append(result.Errors, err.Error())
return
isServiceTcp := strings.HasPrefix(service.Url, "tcp://")
var request *http.Request
var response *http.Response
var err error
if !isServiceTcp {
request = service.buildRequest()
}
result.Duration = time.Since(startTime)
result.HttpStatus = response.StatusCode
result.Body, err = ioutil.ReadAll(response.Body)
if err != nil {
result.Errors = append(result.Errors, err.Error())
startTime := time.Now()
if isServiceTcp {
result.Connected = client.CanCreateConnectionToTcpService(strings.TrimPrefix(service.Url, "tcp://"))
result.Duration = time.Since(startTime)
} else {
response, err = client.GetHttpClient(service.Insecure).Do(request)
result.Duration = time.Since(startTime)
if err != nil {
result.Errors = append(result.Errors, err.Error())
return
}
result.HttpStatus = response.StatusCode
result.Connected = response.StatusCode > 0
result.Body, err = ioutil.ReadAll(response.Body)
if err != nil {
result.Errors = append(result.Errors, err.Error())
}
}
}

View File

@@ -15,6 +15,9 @@ func TestIntegrationEvaluateHealth(t *testing.T) {
if !result.ConditionResults[0].Success {
t.Errorf("Condition '%s' should have been a success", condition)
}
if !result.Connected {
t.Error("Because the connection has been established, result.Connected should've been true")
}
if !result.Success {
t.Error("Because all conditions passed, this should have been a success")
}
@@ -31,6 +34,9 @@ func TestIntegrationEvaluateHealthWithFailure(t *testing.T) {
if result.ConditionResults[0].Success {
t.Errorf("Condition '%s' should have been a failure", condition)
}
if !result.Connected {
t.Error("Because the connection has been established, result.Connected should've been true")
}
if result.Success {
t.Error("Because one of the conditions failed, success should have been false")
}

View File

@@ -9,16 +9,37 @@ type HealthStatus struct {
Message string `json:"message,omitempty"`
}
// Result of the evaluation of a Service
type Result struct {
HttpStatus int `json:"status"`
Body []byte `json:"-"`
Hostname string `json:"hostname"`
Ip string `json:"-"`
Duration time.Duration `json:"duration"`
Errors []string `json:"errors"`
// HttpStatus is the HTTP response status code
HttpStatus int `json:"status"`
// Body is the response body
Body []byte `json:"-"`
// Hostname extracted from the Service Url
Hostname string `json:"hostname"`
// Ip resolved from the Service Url
Ip string `json:"-"`
// Connected whether a connection to the host was established successfully
Connected bool `json:"-"`
// Duration time that the request took
Duration time.Duration `json:"duration"`
// Errors encountered during the evaluation of the service's health
Errors []string `json:"errors"`
// ConditionResults results of the service's conditions
ConditionResults []*ConditionResult `json:"condition-results"`
Success bool `json:"success"`
Timestamp time.Time `json:"timestamp"`
// Success whether the result signifies a success or not
Success bool `json:"success"`
// Timestamp when the request was sent
Timestamp time.Time `json:"timestamp"`
}
type ConditionResult struct {

View File

@@ -1,76 +0,0 @@
package core
import (
"fmt"
"github.com/TwinProduction/gatus/jsonpath"
"strconv"
"strings"
)
const (
StatusPlaceholder = "[STATUS]"
IPPlaceHolder = "[IP]"
ResponseTimePlaceHolder = "[RESPONSE_TIME]"
BodyPlaceHolder = "[BODY]"
LengthFunctionPrefix = "len("
PatternFunctionPrefix = "pat("
FunctionSuffix = ")"
InvalidConditionElementSuffix = "(INVALID)"
)
// sanitizeAndResolve sanitizes and resolves a list of element and returns the list of resolved elements
func sanitizeAndResolve(list []string, result *Result) []string {
var sanitizedList []string
body := strings.TrimSpace(string(result.Body))
for _, element := range list {
element = strings.TrimSpace(element)
switch strings.ToUpper(element) {
case StatusPlaceholder:
element = strconv.Itoa(result.HttpStatus)
case IPPlaceHolder:
element = result.Ip
case ResponseTimePlaceHolder:
element = strconv.Itoa(int(result.Duration.Milliseconds()))
case BodyPlaceHolder:
element = body
default:
// if contains the BodyPlaceHolder, then evaluate json path
if strings.Contains(element, BodyPlaceHolder) {
wantLength := false
if strings.HasPrefix(element, LengthFunctionPrefix) && strings.HasSuffix(element, FunctionSuffix) {
wantLength = true
element = strings.TrimSuffix(strings.TrimPrefix(element, LengthFunctionPrefix), FunctionSuffix)
}
resolvedElement, resolvedElementLength, err := jsonpath.Eval(strings.Replace(element, fmt.Sprintf("%s.", BodyPlaceHolder), "", 1), result.Body)
if err != nil {
result.Errors = append(result.Errors, err.Error())
element = fmt.Sprintf("%s %s", element, InvalidConditionElementSuffix)
} else {
if wantLength {
element = fmt.Sprintf("%d", resolvedElementLength)
} else {
element = resolvedElement
}
}
}
}
sanitizedList = append(sanitizedList, element)
}
return sanitizedList
}
func sanitizeAndResolveNumerical(list []string, result *Result) []int {
var sanitizedNumbers []int
sanitizedList := sanitizeAndResolve(list, result)
for _, element := range sanitizedList {
if number, err := strconv.Atoi(element); err != nil {
// Default to 0 if the string couldn't be converted to an integer
sanitizedNumbers = append(sanitizedNumbers, 0)
} else {
sanitizedNumbers = append(sanitizedNumbers, number)
}
}
return sanitizedNumbers
}