zabbix4监控LSI阵列卡

zabbix4监控LSI阵列卡

linux端

前提:搭建好zabbix平台,安装好megacli软件在/opt/MegaRAID/MegaCli/MegaCli64

环境:centos6.10

zabbix 4

功能实现

  • 硬盘自动发现并加入监控(新接入一块盘会自动接入)
  • 监控硬盘的物理坏道
  • 监控硬盘的逻辑坏道
  • 监控硬盘的预报错(DELL服务器确认硬盘是否故障的最重要指标)
  • 监控硬盘的状态
  • 监控阵列等级状态,一但出现降级则告警

阀值设置

  • Medaia Error Count on Every Disk <=30
  • Other Error Count on Every Disk <=1000
  • Predictive Failure Count On Every Disk <=2
  • Firmware State on Every Disk !=Unconfigured(bad),Failed
  • Raid Level State != Degraded

创建取值脚本

为了安全与方便,在home目录下创建一个zabbix的文件夹存放缓存,并把所有者改为zabbix

mkdir /home/zabbix
chown zabbix:zabbix /home/zabbix
mkdir /home/zabbix/tmp
chown zabbix:zabbix /home/zabbix/tmp

这里放在/home/zabbix/diskcheck_megacli.sh,看你喜好

#!/bin/bash
#zabbix监控硬盘信息脚本
#By xiangjunyu 20151101


TEMP_DIR="/home/zabbix"



#获取磁盘信息
sudo /opt/MegaRAID/MegaCli/MegaCli64 -Pdlist -a0|grep -Ei '(Slot Number|Media Error Count|Other Error Count|Predictive Failure Count|Raw Size|Firmware state)'|sed -e "s:\[0x.*Sectors\]::g" >${TEMP_DIR}/tmp/pdinfo.txt

#将每块磁盘信息拆分,进行逐盘分析
split -l 6 -d ${TEMP_DIR}/tmp/pdinfo.txt ${TEMP_DIR}/tmp/pdinfo

#获取磁盘数量(实际数量=PDNUM+1)
PDNUM=`sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDGetNum -aAll|grep Physical|awk '{ print $8 }'`

#磁盘分块后文件名规范统一化
for((i=0;i<${PDNUM};i++))
do
mv ${TEMP_DIR}/tmp/pdinfo0${i} ${TEMP_DIR}/tmp/pdinfo${i} >/dev/null 2>&1
#ls /tmp/pdinfo${i}
done
SLOT_NUM=$2
DATAFORMATE()
{
while read LINE
    do
        if [[ ${LINE} == Slot* ]];
            then
            SLOTNUMNAME=`echo ${LINE}|awk -F: '{ print $1 }'`
            SLOTNUM=`echo ${LINE}|awk -F: '{ print $2 }'`
        elif [[ ${LINE} == Media* ]];
            then
            MECNAME=`echo ${LINE}|awk -F: '{ print $1 }'`
            MEC=`echo ${LINE}|awk -F: '{ print $2 }'`
        elif  [[ ${LINE} == Other* ]];
            then
            OECNAME=`echo ${LINE}|awk -F: '{ print $1 }'`
            OEC=`echo ${LINE}|awk -F: '{ print $2 }'`
        elif  [[ ${LINE} == Predictive* ]];
            then
            PFCNAME=`echo ${LINE}|awk -F: '{ print $1 }'`
            PFC=`echo ${LINE}|awk -F: '{ print $2 }'`
        elif [[ ${LINE} == Raw* ]];
            then
            RAWNAME=`echo ${LINE}|awk -F: '{ print $1 }'`
            SIZE=`echo ${LINE}|awk -F: '{ print $2 }'`
        elif [[ ${LINE} == Firmware* ]];
            then
            FIRMWARENAME=`echo ${LINE}|awk -F: '{ print $1 }'`
            FIRMWARESTATUS=`echo ${LINE}|awk -F: '{ print $2 }'`
        fi
    done <${TEMP_DIR}/tmp/pdinfo${SLOT_NUM}
}

#检测阵列等级状态
CHECKRAIDLEVEL()
{
sudo /opt/MegaRAID/MegaCli/MegaCli64  -LDInfo -Lall -aALL|grep Degraded
if [ $? = 0 ]
then
echo -1
else
echo 0
fi
}
OPTION=$1
case $OPTION in
    mec) DATAFORMATE
         echo ${MEC}
         ;;
    oec) DATAFORMATE
         echo ${OEC}
         ;;
    pfc) DATAFORMATE
         echo ${PFC}
         ;;
    firm)
         DATAFORMATE
         if [[ "$FIRMWARESTATUS{}" = "Unconfigured(bad)" ]]
         then
             echo -1
         elif [[ "$FIRMWARESTATUS{}" = "Failed" ]]
         then
             echo -1
         else
            echo 0
        fi
         ;;
    rdlevel)
         CHECKRAIDLEVEL
         ;;
    *) echo "Please select option: mec $slot_num ;oec $slot_num;pfc $slot_num;firm $slot_num;rdlevel"
esac

rm -rf ${TEMP_DIR}/tmp/pdinfo*

权限加个x(运行)

chmod +x /home/zabbix/diskcheck_megacli.sh

zabbix agent设置

给zabbix用户分root权限运行MegaCli
visudo
加上zabbix ALL=(root) NOPASSWD: /opt/MegaRAID/MegaCli/MegaCli64
esc键
:wq保存
修改zabbix agent配置文件
vim /etc/zabbix/zabbix_agentd.conf
#加上一行
UnsafeUserParameters=1
#还有一行应该默认有Include=/etc/zabbix/zabbix_agentd.conf.d/
创建自定义监控项的配置文件

写个配置文件在/etc/zabbix/zabbix_agentd.conf.d/disk.conf

#硬盘自动发现

#UserParameter=raid.pd.discovery,MegaCli64 -PDlist -aAll -NoLog|grep Slot|awk 'BEGIN{printf "{"data":[nn"} {printf ",n{ "{#SLOT_NUM}":"%s"}", $NF, $1;} END{ printf "nt]n}n";}' | sed '/^,$/d'

UserParameter=raid.pd.discovery,sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDlist -aAll -NoLog | grep Slot | awk 'BEGIN{printf "{\"data\":["} {if(NR>1) printf ","; printf "\n{\"{#SLOT_NUM}\":\"%s\"}", $NF} END{printf "\n]}\n"}' | sed '/^,$/d'



#收集Media Error Count

UserParameter=raid.phy.mec[*],/home/zabbix/diskcheck_megacli.sh mec $1

#收集Other Error Count

UserParameter=raid.phy.oec[*],/home/zabbix/diskcheck_megacli.sh oec $1

#收集Predictive Failure Count

UserParameter=raid.phy.pfc[*],/home/zabbix/diskcheck_megacli.sh pfc $1

#检测硬盘状态,有故障则回复-1

UserParameter=raid.phy.firms[*],/home/zabbix/diskcheck_megacli.sh firm $1

#检测阵列等级,有降级则回复-1

UserParameter=raid.level.state,/home/zabbix/diskcheck_megacli.sh rdlevel

重启zabbix agent

service zabbix-agent restart

测试一下取值

 zabbix_agentd -t raid.phy.mec[0]
 zabbix_agentd -t raid.pd.discovery
 

raid.phy.mec[3] [t|0]

[root@localhost zabbix_agentd.d]# zabbix_agentd -t raid.pd.discovery raid.pd.discovery [t|{“data”:[ {"{#SLOT_NUM}":“0”}, {"{#SLOT_NUM}":“1”}, {"{#SLOT_NUM}":“2”}, {"{#SLOT_NUM}":“3”} ]}]

取到值为0,则正常;如有报错,要具体看

zabbix server设置

创建模板

命名为Check Raid By MegaCli

image-20240617141824056

创建自动发现规则

在模板中新建一个Discovery rule

Name:Physical disk discovery

Type:Zabbix agent(active)

Key:raid.pd.discovery

Update interval (in sec):3600

Keep lost resources period (in days):30

Deion:Find physical disk

Enabled: ✔

image-20240617142213128

并且在过滤器添加一项{#SLOT_NUM},对应disk.conf里面写的key

image-20240617144911698

创建监控项原型item

在自动发现规则中Physical disk discovery创建监控项原型

  • Media Error Count On Slot {#SLOT_NUM}

    Name:Media Error Count On Slot $1

    Type:Zabbix agent(active)

    Key:raid.phy.mec[{#SLOT_NUM}] #这里的key注意和disk.conf里的匹配

    Applications:MegaRaid #自己新建一个Application

    Enabled: ✔

  • Other Error Count On Slot {#SLOT_NUM}

  • Predictive Error Count On Slot {#SLOT_NUM}

  • Firmware State On Slot {#SLOT_NUM}

  • Raid Level State

image-20240617175306296

创建触发器原型trigger

在自动发现规则中Physical disk discovery创建触发器原型

名称:{HOST.NAME}硬盘阵列 SLOT {#SLOT_NUM} Firmware State 报错

表达式:{Check Raid By MegaCli:raid.phy.firms[{#SLOT_NUM}].last()}<>0

严重性:严重

以此类推

image-20240617175228322

image-20240617175253345

疑难解答

自动发现规则报错

无法发送请求

错误的发现规则类型 请求无法发送。

Value should be a JSON object.

image-20240617160806970

格式错误

在server上用zabbix_get -s agent的ip -k key看看

zabbix_get -s xxx -k raid.pd.discovery

[root@localhost ~]# zabbix_get -s 192 -k raid.pd.discovery
{"data":[
]}
[root@localhost ~]# zabbix_get -s 192 -k raid.phy.mec[0]
/home/zabbix/diskcheck_megacli.sh: line 5: /var/lib/zabbix/.bash_profile: No such file or directory
/home/zabbix/diskcheck_megacli.sh: line 21: ((: i<: syntax error: operand expected (error token is "<")
/home/zabbix/diskcheck_megacli.sh: line 28: /home/zabbix/tmp/pdinfo0: No such file or directory

没有数据,应该是zabbix用户权限不足

在agent的机子上用sudo -u zabbix 跑跑看看

image-20240617171318647

要用root才能跑,在sudoer里面加上就行了

win端

Built with Hugo
Theme Stack designed by Jimmy