Working with Argos

Working with Argos#

Composite#

Use a composite to combine several components to one aggregated element. For example, you already have a dashboard that contains components of nodes of an Elasticsearch cluster. Each component represents the state of an Elasticsearch node:

Several components

For just getting a short overview of the entire cluster without interest in the particular nodes, you may want to have a composite that represents the combined state of all nodes, respectively the state of the entire Elasticsearch cluster.

Procede as follow:

  1. Add a new page “Overview” and save it.

  2. Edit the page and add a new component:

    Add composite
    • Enter a unique id and a name

    • At Link target select the page of the Elasticsearch cluster from the list

    • Save component configuration

  3. Save page and leave editor

You now have a new page containing a composite that represents the entire cluster state:

ES cluster composite

When you hover over the composite element, you can click on several icons:

Composite icons
  1. Jump to linked page of the composite

  2. See statistics graph, if available for the composite

  3. Mute alerts of the composite

Stats graph#

Depending on the component, you can inspect the statistics graph by clicking on the charts icon:

Select stats

In this example you can see various performance statistics of a Linux machine:

Stats graph

Zoom into a specific section of the chart by clicking and dragging the mouse:

Zoom into a section

After releasing the mouse button, the chart now represents the selected section:

Charts of section

You can hover over the chart to get a tooltip with more details about this particular time. Activate Sychronize tooltips in order to see more details of all charts about this specific time. For example, this allows you to examine the memory consumption and CPU usage at the same time:

Synchronize details

Rules#

Per component you can define rules that trigger actions, if the condition is matched.

Component configuration - Rules

For example, you may want to trigger an alert, if the CPU usage of a machine is higher than 75%. The corresponding rule may look as follows:

if usage_user > 75 then create error alert with message "The CPU usage on host [host] is at [usage_user]%"

The formulation of a rule follows a grammar that is described in the following.

Rule definition:
Rule definition

A Rule expression refers to received data from a component.

On Rule missing the rule is triggered, if data is not received from a specific component.

General structure:

Grammar:

Grammar

Nested if … then … else:

If then else

Example:

if usage_user > 75 then create error alert with message "The CPU usage on host [host] is at [usage_user]%"
else if usage_user > 50 then create warning alert with message "The CPU usage on host [host] is at [usage_user]%"
Comparators:

Building conditions and chaining them:

Comparators

Numeric operators are: ==, =, >=, >, <=, <, !=

Textual operators are: ==, =

Example:

if usage_user > 50 and host == "vmlinux01" then create warning alert with message "The CPU usage on host [host] is at [usage_user]%"
Actions:

Simple actions:

Simple actions

Simple actions:

  • create alert

  • set (component) status

  • execute (a script)

Throttling:

  • by time

  • by occurences

Examples:

if usage_user > 50 then create warning alert with message "The CPU usage on host [host] is at [usage_user]%" every 5 minutes

create alert (always clear alerts!):

… then create warning alert with message "A warning message" else clear alert

set compoient status:

… then set status to disconnected

execute a script:

… then execute "/opt/myscript.sh"

chaining actions:

… then execute "/opt/myscript.sh" and then set status to disconnected and then create warning alert with message "A warning message"
Throttling:

Simple and complex throttling:

Throttling

Examples:

execute a script only every 5 minutes:

… then execute "/opt/myscript.sh" every 5 minutes

execute a script only every 5 minutes, if at least 5 events did occur:

… then execute "/opt/myscript.sh" every 5 minutes if at least 5 events occured
Rule wide setting:

Rule wide variable:

Rule wide variable

Examples:

if usage_user > 75 then create error alert with message "The CPU usage on host [host] is at [usage_user]%"
else if usage_user > 50 then create warning alert with message "The CPU usage on host [host] is at [usage_user]%"
else if usage_user > 25 then create info alert with message "The CPU usage on host [host] is at [usage_user]%"
else clear alert
set text alert_message to "The CPU usage on host [host] is at [usage_user]%"

if usage_user > 75 then create error alert with message alert_message
else if usage_user > 50 then create warning alert with message alert_message
else if usage_user > 25 then create info alert alert_message
else clear alert
Rule wide throttling:
Rule wide throttling

Examples:

if usage_user > 75
    then create error alert with message "The CPU usage on host [host] is
    at [usage_user]%" every 5 minutes
else clear alert
set throttling throttle_5 to every 5 minutes
set text alert_message to "The CPU usage on host [host] is at [usage_user]%"

if usage_user > 75
    then create error alert with message alert_message
    using throttling throttle_5
else clear alert