monitoring
https://sensu.io/ - посмотреть
Monitoring metrics
- Victoria Metrics, Mimir, Thanos https://habr.com/ru/companies/slurm/news/741948/
CheckMK
Monitoring software sucsessor of nagios - start in docker https://gist.github.com/tttest25/eedede9aab0d33e17b3f8623dec25b49
Info ''http://localhost:8080/ cmkadmin -e.................9''
CheckMK.links
- debug check mk checks - https://www.simon-meggle.de/en/tutorial-debugging-checkmk-checks/
- example of agent spool metrics - https://docs.checkmk.com/latest/en/agent_linux.html#mrpe
- create local check in checkmk - https://www.ctl.io/developers/blog/post/local-check-mk
- piggy service example - https://forum.checkmk.com/t/help-understanding-piggyback-services/29964
- allow spooling plugin outputs via files - https://checkmk.com/werk/16
CheckMK Information
- checkmk - uvicorn application
- wato - gui interface located in py '~/lib/python3/cmk/gui'
- open monitoring distribution → omd → cmk$ omd status
# console utility for root
$ omd status
$ omd staке
$ omd stop
CheckMk.Monitoring Server
LiveStatus
Get unixtime - <https://www.unixtimestamp.com/>
- Live status get statehist columns and description
OMD[cmk]:~$ lq “GET columns\nOutputFormat: json”| jq -r '.[]| select(.[2]==“statehist”)|join(“|”)' | grep time
- get current state history
OMD[cmk]:~$ lq “GET statehist\nColumns: host_name service_description\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nFilter: service_description = Memory\nOutputFormat: json”
- get gistory of state of service in period
OMD[cmk]:~$ lq “GET statehist\nColumns: host_name service_description state from until\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nFilter: service_description = Memory\nOutputFormat: json”
- get data for access matrix
lq “GET statehist\nColumns: host_name service_description state from until\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nOutputFormat: json”
CheckMK.commands
# -- get pass of user automation #on host $ cd /omd/sites/cmk $ cat var/check_mk/web/automation/automation.secret 09c823XX-e4b0-4e0d-aeXX-53946ccdcfc8 * in cmk user start cmk is utility for checkmk su -s /bin/bash - cmk * show all modules cmk -L # все модули * Discover on agent cmk -vv --debug -I cmk_docker * получить сырые данные cmk -vv -d cmk_docker * debug agent start process cmk -nvv --debug cmk_docker
CheckMk.Monitoring Agent
- put external metrics to spool https://checkmk.com/werk/16
CMD-agent example register agent
- register agent to cmk
sudo cmk-agent-ctl register -v -H wiki -s 10.59.0.64:8008 -i cmk -U automation -P 16d25a2c-96ea-4222-a238-87a09e63cbec
Test agent
# cmk-agent $ su -s /bin/bash # test agent on host - network $ telnet foohost 6556
Agent spool example/ piggyback
# switch CMK-agent $ su - cmk-agent -s /bin/bash # find path of spool agent $ /usr/bin/check_mk_agent |head | grep spool # working variant OMD[cmk]:~$ cat /var/lib/check_mk_agent/spool/599_foobar.txt <<<local>>> 0 omega_record_state - record_state=ON <<<<>>>> # piggyback example - for another host <<<<hostB>>>> <<<local>>> 0 omega_record_state - record_state=ON <<<>>>
Nagios migration metrics to piggyback
--------------------------- # Python performance process migration b=pd.split('|') if(len(b)>1): b[1]=re.sub(r'\s+',r'|',b[1].strip()) else: b[0]=pd.strip() b.append('-') ---------------------------- # Nagios passive #f:fed-hw; #n:SHERB-24-ATS_10.59.170.2; #ec:0; #pd:PING OK - Packet loss = 0%, RTA = 1.72 ms|rta=1.722000ms;100.000000;500.000000;0.000000 pl=0%;40;60;0 # nagios metric exmaple to piggyback 1 <<<<fed_serv>>>> 2 <<<local>>> 3 0 omega_record_state - record_state=ON - 07:29 22.08.2022 4 <<<local>>> 5 0 "test Call-center-router_188.17.152.47" rta=24.062000ms;200.000000;500.000000;0.000000|pl=20%;40;60;0 PING OK - Packet loss = 0%, RTA = 8.06 ms