====== monitoring ======
* https://livemap.pingdom.com/
* https://downdetector.ru/
* https://downdetector.com/
https://sensu.io/ - посмотреть
===== Monitoring metrics =====
* Victoria Metrics, Mimir, Thanos https://habr.com/ru/companies/slurm/news/741948/
====== CheckMK ======
Monitoring software sucsessor of nagios - start in docker https://gist.github.com/tttest25/eedede9aab0d33e17b3f8623dec25b49 \\
Info ''http://localhost:8080/ cmkadmin -e.................9''
===== CheckMK.links =====
* debug check mk checks - https://www.simon-meggle.de/en/tutorial-debugging-checkmk-checks/
* example of agent spool metrics - https://docs.checkmk.com/latest/en/agent_linux.html#mrpe
* create local check in checkmk - https://www.ctl.io/developers/blog/post/local-check-mk
* piggy service example - https://forum.checkmk.com/t/help-understanding-piggyback-services/29964
* allow spooling plugin outputs via files - https://checkmk.com/werk/16
* send emails - https://docs.checkmk.com/latest/en/managing_docker.html#_sending_notifications
===== CheckMK Information =====
* checkmk - uvicorn application
* wato - gui interface located in py '~/lib/python3/cmk/gui'
* open monitoring distribution → omd → cmk$ omd status
# console utility for root
$ omd status
$ omd staке
$ omd stop
===== CheckMk.Monitoring Server =====
==== LiveStatus ====
Get unixtime -
* Live status get statehist columns and description
''OMD[cmk]:~$ lq "GET columns\nOutputFormat: json"| jq -r '.[]| select(.[2]=="statehist")|join("|")' | grep time''
* get current state history
''OMD[cmk]:~$ lq "GET statehist\nColumns: host_name service_description\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nFilter: service_description = Memory\nOutputFormat: json"''
* get gistory of state of service in period
''OMD[cmk]:~$ lq "GET statehist\nColumns: host_name service_description state from until\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nFilter: service_description = Memory\nOutputFormat: json"''
* get data for access matrix
''lq "GET statehist\nColumns: host_name service_description state from until\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nOutputFormat: json"''
==== CheckMK.commands ====
# -- get pass of user automation
#on host
$ cd /omd/sites/cmk
$ cat var/check_mk/web/automation/automation.secret
09c823XX-e4b0-4e0d-aeXX-53946ccdcfc8
* in cmk user start cmk is utility for checkmk
su -s /bin/bash - cmk
* show all modules
cmk -L # все модули
* Discover on agent
cmk -vv --debug -I cmk_docker
* получить сырые данные
cmk -vv -d cmk_docker
* debug agent start process
cmk -nvv --debug cmk_docker
===== CheckMk.Monitoring Agent =====
* put external metrics to spool https://checkmk.com/werk/16
==== CMD-agent example register agent ====
* register agent to cmk '' sudo cmk-agent-ctl register -v -H wiki -s 10.59.0.64:8008 -i cmk -U automation -P 16d25a2c-96ea-4222-a238-87a09e63cbec ''
==== Test agent ====
# cmk-agent
$ su -s /bin/bash
# test agent on host - network
$ telnet foohost 6556
==== Agent spool example/ piggyback ====
# switch CMK-agent
$ su - cmk-agent -s /bin/bash
# find path of spool agent
$ /usr/bin/check_mk_agent |head | grep spool
# working variant
OMD[cmk]:~$ cat /var/lib/check_mk_agent/spool/599_foobar.txt
<<>>
0 omega_record_state - record_state=ON
<<<<>>>>
# piggyback example - for another host
<<<>>>
<<>>
0 omega_record_state - record_state=ON
<<<>>>
==== Nagios migration metrics to piggyback ====
---------------------------
# Python performance process migration
b=pd.split('|')
if(len(b)>1):
b[1]=re.sub(r'\s+',r'|',b[1].strip())
else:
b[0]=pd.strip()
b.append('-')
----------------------------
# Nagios passive
#f:fed-hw;
#n:SHERB-24-ATS_10.59.170.2;
#ec:0;
#pd:PING OK - Packet loss = 0%, RTA = 1.72 ms|rta=1.722000ms;100.000000;500.000000;0.000000 pl=0%;40;60;0
# nagios metric exmaple to piggyback
1 <<<>>>
2 <<>>
3 0 omega_record_state - record_state=ON - 07:29 22.08.2022
4 <<>>
5 0 "test Call-center-router_188.17.152.47" rta=24.062000ms;200.000000;500.000000;0.000000|pl=20%;40;60;0 PING OK - Packet loss = 0%, RTA = 8.06 ms