====== monitoring ====== * https://livemap.pingdom.com/ * https://downdetector.ru/ * https://downdetector.com/ https://sensu.io/ - посмотреть ===== Monitoring metrics ===== * Victoria Metrics, Mimir, Thanos https://habr.com/ru/companies/slurm/news/741948/ ====== CheckMK ====== Monitoring software sucsessor of nagios - start in docker https://gist.github.com/tttest25/eedede9aab0d33e17b3f8623dec25b49 \\ Info ''http://localhost:8080/ cmkadmin -e.................9'' ===== CheckMK.links ===== * debug check mk checks - https://www.simon-meggle.de/en/tutorial-debugging-checkmk-checks/ * example of agent spool metrics - https://docs.checkmk.com/latest/en/agent_linux.html#mrpe * create local check in checkmk - https://www.ctl.io/developers/blog/post/local-check-mk * piggy service example - https://forum.checkmk.com/t/help-understanding-piggyback-services/29964 * allow spooling plugin outputs via files - https://checkmk.com/werk/16 * send emails - https://docs.checkmk.com/latest/en/managing_docker.html#_sending_notifications ===== CheckMK Information ===== * checkmk - uvicorn application * wato - gui interface located in py '~/lib/python3/cmk/gui' * open monitoring distribution → omd → cmk$ omd status # console utility for root $ omd status $ omd staке $ omd stop ===== CheckMk.Monitoring Server ===== ==== LiveStatus ==== Get unixtime - * Live status get statehist columns and description ''OMD[cmk]:~$ lq "GET columns\nOutputFormat: json"| jq -r '.[]| select(.[2]=="statehist")|join("|")' | grep time'' * get current state history ''OMD[cmk]:~$ lq "GET statehist\nColumns: host_name service_description\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nFilter: service_description = Memory\nOutputFormat: json"'' * get gistory of state of service in period ''OMD[cmk]:~$ lq "GET statehist\nColumns: host_name service_description state from until\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nFilter: service_description = Memory\nOutputFormat: json"'' * get data for access matrix ''lq "GET statehist\nColumns: host_name service_description state from until\nFilter: time >= 1661123454\nFilter: time < 1661127106\nFilter: host_name = docker01_host\nOutputFormat: json"'' ==== CheckMK.commands ==== # -- get pass of user automation #on host $ cd /omd/sites/cmk $ cat var/check_mk/web/automation/automation.secret 09c823XX-e4b0-4e0d-aeXX-53946ccdcfc8 * in cmk user start cmk is utility for checkmk su -s /bin/bash - cmk * show all modules cmk -L # все модули * Discover on agent cmk -vv --debug -I cmk_docker * получить сырые данные cmk -vv -d cmk_docker * debug agent start process cmk -nvv --debug cmk_docker ===== CheckMk.Monitoring Agent ===== * put external metrics to spool https://checkmk.com/werk/16 ==== CMD-agent example register agent ==== * register agent to cmk '' sudo cmk-agent-ctl register -v -H wiki -s 10.59.0.64:8008 -i cmk -U automation -P 16d25a2c-96ea-4222-a238-87a09e63cbec '' ==== Test agent ==== # cmk-agent $ su -s /bin/bash # test agent on host - network $ telnet foohost 6556 ==== Agent spool example/ piggyback ==== # switch CMK-agent $ su - cmk-agent -s /bin/bash # find path of spool agent $ /usr/bin/check_mk_agent |head | grep spool # working variant OMD[cmk]:~$ cat /var/lib/check_mk_agent/spool/599_foobar.txt <<>> 0 omega_record_state - record_state=ON <<<<>>>> # piggyback example - for another host <<<>>> <<>> 0 omega_record_state - record_state=ON <<<>>> ==== Nagios migration metrics to piggyback ==== --------------------------- # Python performance process migration b=pd.split('|') if(len(b)>1): b[1]=re.sub(r'\s+',r'|',b[1].strip()) else: b[0]=pd.strip() b.append('-') ---------------------------- # Nagios passive #f:fed-hw; #n:SHERB-24-ATS_10.59.170.2; #ec:0; #pd:PING OK - Packet loss = 0%, RTA = 1.72 ms|rta=1.722000ms;100.000000;500.000000;0.000000 pl=0%;40;60;0 # nagios metric exmaple to piggyback 1 <<<>>> 2 <<>> 3 0 omega_record_state - record_state=ON - 07:29 22.08.2022 4 <<>> 5 0 "test Call-center-router_188.17.152.47" rta=24.062000ms;200.000000;500.000000;0.000000|pl=20%;40;60;0 PING OK - Packet loss = 0%, RTA = 8.06 ms