之前写了一篇文章介绍如何更换线上服务器磁盘,当时是把整体机器的磁盘全部不换掉了,但是最近另一台机器部分磁盘损坏,raid类型为10,经检测,只需要更换坏掉的磁盘即可,补充文档如下。
安装MegaCLI
安装包 。
安装过程
# 首先下载获取安装包# 解压$ tar -zxf MegaCli8.07.10.tar.gz$ cd MegaCli8.07.10/Linux/$ rpm -ivh Lib_Utils-1.00-09.noarch.rpm MegaCli-8.02.21-1.noarch.rpm# 加入系统环境$ ln -s /opt/MegaRAID/MegaCli/MegaCli64 /usr/local/bin/MegaCli $ MegaCli -v MegaCLI SAS RAID Management Tool Ver 8.02.21 Oct 21, 2011 (c)Copyright 2011, LSI Corporation, All Rights Reserved.Exit Code: 0x00# 安装完成!
-
冲突处理:
$ rpm -ivh Lib_Utils-1.00-09.noarch.rpm MegaCli-8.02.21-1.noarch.rpm 准备中... ################################# [100%] file /opt/lsi/3rdpartylibs/x86_64/libsysfs.so.2.0.2 from install of Lib_Utils-1.00-09.noarch conflicts with file from package srvadmin-storelib-sysfs-9.1.0-2757.12163.el7.x86_64
-
原因: Lib_Utils和Dell服务器自带的包srvadmin冲突,直接将其卸载,然后安装即可。
rpm -e srvadmin-storelib-sysfs-9.1.0-2757.12163.el7.x86_64 --nodeps
使用指南
基本用法
# 查raid级别$ megacli -LDInfo -Lall -aALL # 查raid卡信息$ megacli -AdpAllInfo -aALL # 查看硬盘信息$ megacli -PDList -aALL # 查看电池信息$ megacli -AdpBbuCmd -aAll # 查看raid卡日志$ megacli -FwTermLog -Dsply -aALL # 显示适配器个数$ megacli -adpCount # 显示适配器时间$ megacli -AdpGetTime –aALL # 显示所有适配器信息$ megacli -AdpAllInfo -aAll # 显示所有逻辑磁盘组信息$ megacli -LDInfo -LALL -aAll # 显示所有的物理信息$ megacli -PDList -aAll # 查看充电状态$ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status' # 显示BBU状态信息$ megacli -AdpBbuCmd -GetBbuStatus -aALL # 显示BBU容量信息$ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL # 显示BBU设计参数$ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL # 显示当前BBU属性$ megacli -AdpBbuCmd -GetBbuProperties -aALL # 显示Raid卡型号,Raid设置,Disk相关信息$ megacli -cfgdsply -aALL ## 磁带状态的变化,从拔盘,到插盘的过程中。Device |Normal |Damage |Rebuild |NormalVirtual Drive |Optimal|Degraded|Degraded|OptimalPhysical Drive |Online |Failed Unconfigured|Rebuild|Online# 查看物理磁盘状态:$ megacli -PDRbld -ShowProg -PhysDrv [Enclosure Device ID:Slot Number] -a0## Rebuild 中的物理磁盘状态中会显示:"Firmware state: Rebuild"# 查询 Rebuild 进度:$ megacli -pdrbld -showprog -physdrv[E:S] -aALL## 返回内容类似于下面这样:Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 77% in 101 Minutes.# 以文本进度条样式显示 Rebuild 进度:$ megacli -pdrbld -progdsply -physdrv[E:S] -aALL## 屏幕显示类似下面的内容:Rebuild progress of physical drives...Enclosure:Slot Percent Complete Time Elps 032 :05 #######################87 %################******* 01:59:07 Press key to quit...# 查看 RAID 卡 Rebuild 参数:$ megacli -AdpAllinfo -aALL | grep -i rebuild## 返回结果类似下面这样Rebuild Rate : 30%Auto Rebuild : EnabledRebuild Rate : YesForce Rebuild : Yes# 设置 RAID 卡 Rebuild 比例为60%:$ megacli -AdpSetProp { RebuildRate -60} -aALL## 设置成功后返回:Adapter 0: Set rebuild rate to 60% success.
MegaCLI使用方法:
重要参数
参数名称 | 含义 |
---|---|
Firmware state | 磁盘状态 |
Firmware state: Online, Spun Up | 磁盘正常 |
Firmware state: Unconfigured(good), Spun Up | 磁盘已安装,但未启用 |
Firmware state: Unconfigured(bad) | 故障, 对应hwcheck的 Non-Critical |
Firmware state: Failed | 故障, 对应hwcheck的Critical |
Firmware state: Rebuild | 重建,一般在更换磁盘时显示 |
Enclosure Device ID: 32 | 设备 |
Slot Number: 1 | 磁盘在服务器上的槽位 |
Adapter #0 | 适配器编号,对应 -a 参数 |
实战:raid10环境下替换硬盘
Raid10环境下换硬盘还是很简单的,支持热插拔,直接拔下换掉就可以了,下面是操作步骤。
主要环境
服务器: R720
系统: CentOS7
raid类型:raid10
查看硬盘信息
为了更加清楚的呈现操作过程,未对信息简化处理。
$ MegaCli -PDList -aAll -NoLog Adapter #0Enclosure Device ID: 32Slot Number: 0Drive's postion: DiskGroup: 0, Span: 0, Arm: 0Enclosure position: 0Device Id: 0WWN: 5000C50076CD09B4Sequence Number: 1Media Error Count: 0Other Error Count: 0Predictive Failure Count: 28Last Predictive Failure Event Seq Number: 4378PD Type: SASRaw Size: 558.911 GB [0x45dd2fb0 Sectors]Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]Coerced Size: 558.375 GB [0x45cc0000 Sectors]Firmware state: Unconfigured(good), Spun UpDevice Firmware Level: ES66Shield Counter: 0Successful diagnostics completion on : N/ASAS Address(0): 0x5000c50076cd09b5SAS Address(1): 0x0Connected Port Number: 5(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL8SASQ FDE Enable: DisableSecured: UnsecuredLocked: UnlockedNeeds EKM Attention: NoForeign State: Foreign Foreign Secure: Drive is not secured by a foreign lock keyDevice Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk DeviceDrive Temperature :40C (104.00 F)PI Eligibility: No Drive is formatted for PI information: NoPI: No PIDrive's write cache : DisabledPort-0 :Port status: ActivePort's Linkspeed: 6.0Gb/s Port-1 :Port status: ActivePort's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : YesEnclosure Device ID: 32Slot Number: 2Enclosure position: 0Device Id: 2WWN: 5000C50076CD05BCSequence Number: 2Media Error Count: 0Other Error Count: 0Predictive Failure Count: 0Last Predictive Failure Event Seq Number: 0PD Type: SASRaw Size: 0 KB [0x0 Sectors]Non Coerced Size: 0 KB [0x0 Sectors]Coerced Size: 0 KB [0x0 Sectors]Firmware state: Unconfigured(bad)Device Firmware Level: ES66Shield Counter: 0Successful diagnostics completion on : N/ASAS Address(0): 0x5000c50076cd05bdSAS Address(1): 0x0Connected Port Number: 1(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL8SAVC FDE Enable: DisableSecured: UnsecuredLocked: UnlockedNeeds EKM Attention: NoForeign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk DeviceDrive: Not SupportedDrive Temperature :0C (32.00 F)PI Eligibility: No Drive is formatted for PI information: NoPI: No PIDrive's write cache : DisabledPort-0 :Port status: ActivePort's Linkspeed: Unknown Port-1 :Port status: ActivePort's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : NoEnclosure Device ID: 32Slot Number: 1Drive's postion: DiskGroup: 0, Span: 0, Arm: 1Enclosure position: 0Device Id: 1WWN: 5000C500983873BCSequence Number: 2Media Error Count: 0Other Error Count: 0Predictive Failure Count: 0Last Predictive Failure Event Seq Number: 0PD Type: SASRaw Size: 558.911 GB [0x45dd2fb0 Sectors]Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]Coerced Size: 558.375 GB [0x45cc0000 Sectors]Firmware state: Online, Spun UpDevice Firmware Level: VT31Shield Counter: 0Successful diagnostics completion on : N/ASAS Address(0): 0x5000c500983873bdSAS Address(1): 0x0Connected Port Number: 3(path0) Inquiry Data: SEAGATE ST600MP0005 VT31S7M1CSLT FDE Enable: DisableSecured: UnsecuredLocked: UnlockedNeeds EKM Attention: NoForeign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk DeviceDrive Temperature :41C (105.80 F)PI Eligibility: No Drive is formatted for PI information: NoPI: No PIDrive's write cache : DisabledPort-0 :Port status: ActivePort's Linkspeed: 6.0Gb/s Port-1 :Port status: ActivePort's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : NoEnclosure Device ID: 32Slot Number: 3Drive's postion: DiskGroup: 0, Span: 1, Arm: 1Enclosure position: 0Device Id: 3WWN: 5000C50076CE2F30Sequence Number: 2Media Error Count: 5Other Error Count: 71Predictive Failure Count: 15Last Predictive Failure Event Seq Number: 4379PD Type: SASRaw Size: 558.911 GB [0x45dd2fb0 Sectors]Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]Coerced Size: 558.375 GB [0x45cc0000 Sectors]Firmware state: Online, Spun UpDevice Firmware Level: ES66Shield Counter: 0Successful diagnostics completion on : N/ASAS Address(0): 0x5000c50076ce2f31SAS Address(1): 0x0Connected Port Number: 2(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL8SAKA FDE Enable: DisableSecured: UnsecuredLocked: UnlockedNeeds EKM Attention: NoForeign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk DeviceDrive Temperature :48C (118.40 F)PI Eligibility: No Drive is formatted for PI information: NoPI: No PIDrive's write cache : DisabledPort-0 :Port status: ActivePort's Linkspeed: 6.0Gb/s Port-1 :Port status: ActivePort's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : YesEnclosure Device ID: 32Slot Number: 4Drive's postion: DiskGroup: 1, Span: 0, Arm: 0Enclosure position: 0Device Id: 4WWN: 5000C5007E70F0F8Sequence Number: 2Media Error Count: 0Other Error Count: 0Predictive Failure Count: 0Last Predictive Failure Event Seq Number: 0PD Type: SASRaw Size: 558.911 GB [0x45dd2fb0 Sectors]Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]Coerced Size: 558.375 GB [0x45cc0000 Sectors]Firmware state: Online, Spun UpDevice Firmware Level: ES66Shield Counter: 0Successful diagnostics completion on : N/ASAS Address(0): 0x5000c5007e70f0f9SAS Address(1): 0x0Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL9F1JB FDE Enable: DisableSecured: UnsecuredLocked: UnlockedNeeds EKM Attention: NoForeign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk DeviceDrive Temperature :46C (114.80 F)PI Eligibility: No Drive is formatted for PI information: NoPI: No PIDrive's write cache : DisabledPort-0 :Port status: ActivePort's Linkspeed: 6.0Gb/s Port-1 :Port status: ActivePort's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : NoEnclosure Device ID: 32Slot Number: 5Drive's postion: DiskGroup: 1, Span: 0, Arm: 1Enclosure position: 0Device Id: 5WWN: 5000C5007E708E3CSequence Number: 2Media Error Count: 0Other Error Count: 0Predictive Failure Count: 0Last Predictive Failure Event Seq Number: 0PD Type: SASRaw Size: 558.911 GB [0x45dd2fb0 Sectors]Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]Coerced Size: 558.375 GB [0x45cc0000 Sectors]Firmware state: Online, Spun UpDevice Firmware Level: ES66Shield Counter: 0Successful diagnostics completion on : N/ASAS Address(0): 0x5000c5007e708e3dSAS Address(1): 0x0Connected Port Number: 4(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL9F2RB FDE Enable: DisableSecured: UnsecuredLocked: UnlockedNeeds EKM Attention: NoForeign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk DeviceDrive Temperature :45C (113.00 F)PI Eligibility: No Drive is formatted for PI information: NoPI: No PIDrive's write cache : DisabledPort-0 :Port status: ActivePort's Linkspeed: 6.0Gb/s Port-1 :Port status: ActivePort's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : NoExit Code: 0x00
由以上信息可知该服务器有6块磁盘(Device Id)。
卸载故障硬盘
$ MegaCli -PDOffline -PhysDrv[32:2] -a0$ MegaCli -PDOffline -PhysDrv[32:0] -a0
上面命令中 32 和 2 以及 -a0 的对应关系:
Adapter #0Enclosure Device ID: 32Slot Number: 2
替换故障硬盘
此时故障硬盘已经OFFLINE,在服务器现场查看时,故障硬盘闪烁的是黄灯,正常硬盘的绿灯; 拔下故障硬盘,插上好硬盘,硬盘灯闪烁为绿色,并硬盘快速旋转,表示硬盘正在rebuild状态,查看状态如下:
$ MegaCli -PDList -aAll -NoLog...Enclosure Device ID: 32Slot Number: 3...Firmware state: Rebuild...
查看rebuild进度
$ MegaCli -PDRbld -ShowProg -PhysDrv[32:2] -aAllRebuild Progress on Device at Enclosure 32, Slot 3 Completed 16% in 94 Minutes.
磁盘更换完成
$ MegaCli -PDList -aAll -NoLog | grep 'Firmware state'Firmware state: Online, Spun UpFirmware state: Online, Spun UpFirmware state: Online, Spun UpFirmware state: Online, Spun UpFirmware state: Online, Spun UpFirmware state: Online, Spun Up