当前位置:   article > 正文

linux smartctl 命令,使用smartctl命令检查磁盘

smartctl点亮磁盘故障灯

使用smartctl命令检查磁盘

在TS过程中会碰到很多磁盘异常出现影响系统数据或者生产数据的情况,但是有时候无法判断磁盘出现问题是由于磁盘物理损坏还是磁盘SATA口接触不良导致,这个时候smartctl命令就可以很容易判断出磁盘是否存在物理损坏问题,并且及时作出预警。

使用smartctl进行SMART测试

所有现代硬盘都可通过SMART属性监视其当前状态。这些值提供有关硬盘各种参数的信息,并可提供有关磁盘剩余寿命或任何可能的错误的信息。此外,可以执行各种SMART测试,以确定磁盘上的任何硬件问题。本文介绍如何使用smartctl(Smartmontools)对Linux进行此类测试。

安装Smartmontools

Smartmontools可以使用yum直接安装:

# yum install -y smartmontools

要确保硬盘支持SMART并启用,请使用以下命令(本例中为硬盘/ dev / sda):

# smartctl -i /dev/sda

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.10.0-514.el7.x86_64] (local build)

Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Device Model: HGST HUS726030ALE610

Serial Number: N8GH1H5Y

LU WWN Device Id: 5 000cca 244c6d728

Firmware Version: APBDT7JN

User Capacity: 3,000,592,982,016 bytes [3.00 TB]

Sector Sizes: 512 bytes logical, 4096 bytes physical

Rotation Rate: 7200 rpm

Form Factor: 3.5 inches

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4

SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is: Wed Oct 11 07:09:17 2017 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

上面的参数中,SMART support is: Available - device has SMART capability.显示磁盘SMART是支持的,SMART support is: Enabled是可用的

可用测试

SMART根据规格类型和SCSI设备提供两种不同的测试。这些测试中的每一个都可以以两种模式进行:

前景模式

背景模式

在背景模式下,测试的优先级低,这意味着正常指令继续由硬盘处理。如果硬盘驱动器正忙,则测试暂停,然后以较低的加载速度继续运行,因此不会中断操作。

在前景模式下,所有命令将在测试期间以“检查条件”状态进行应答。因此,仅当不使用硬盘时,才建议使用此模式。

原则上,背景模式是首选模式。

ATA / SCSI测试

短期测试

短期测试的目标是快速识别有缺陷的硬盘驱动器。因此,短测试的最大运行时间为2分钟。测试通过将磁盘划分成三个不同的段来检查磁盘。测试以下几个方面:

电气特性:控制器测试自己的电子元件,由于这是每个制造商特有的,因此无法准确解释正在测试的内容。例如,可以想到测试内部RAM,读/写电路或头电子元件。

机械特性:要测试的伺服系统和定位机制的确切顺序也是每个制造商特有的。

读/校验:它将读取磁盘的某个区域并验证某些数据,读取的区域的大小和位置也是每个制造商特有的。

长期测试

长期测试被设计为生产中的最终测试,与短暂测试相同,有两个差异。

第一个:没有时间限制,并且在读/校验中检查整个磁盘,而不仅仅是一个部分。例如,长期测试可以用于确认短测试的结果。

ATA指定测试

此处列出的所有测试仅适用于ATA硬盘驱动器。

运输测试

可以执行该测试以确定在几分钟内传输硬盘时的损坏。

选择测试

在选定的测试期间,检查指定的逻辑块范围。要扫描的逻辑块以以下格式指定:

# smartctl -t select,10-20 /dev/sda ## LBA 10 to LBA 20(incl.)

# smartctl -t select,10+11 /dev/sda ## LBA 10 to LBA 20(incl.)

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".

SPAN STARTING_LBA ENDING_LBA

0 10 20

Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.

Testing has begun.

也可以有多个范围(最多5个)进行扫描:

# smartctl -t select,0-10 -t select,10-20 /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".

SPAN STARTING_LBA ENDING_LBA

0 0 10

1 10 20

Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.

Testing has begun.

smartctl的测试程序

在执行测试之前,使用以下命令显示各种测试的持续时间的近似值:

# smartctl -c /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===

General SMART Values:

Offline data collection status: (0x02) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Disabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 0) seconds.

Offline data collection

capabilities: (0x7d) SMART execute Offline immediate.

No Auto Offline data collection support.

Abort Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 48) minutes.

Conveyance self-test routine

recommended polling time: ( 2) minutes.

SCT capabilities: (0x0025) SCT Status supported.

SCT Data Table supported.

以下命令启动所需的测试(在后台模式下):

# smartctl -t /dev/sda

也可以执行“离线”测试。但是,仅进行标准自检(Short Test)。

示例输出:

# smartctl -t short /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Short self-test routine immediately in off-line mode".

Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 1 minutes for test to complete.

Test will complete after Wed Oct 11 15:40:28 2017

Use smartctl -X to abort test.

要在前台模式下执行测试,必须在命令中添加“-C”。

# smartctl -t -C /dev/sda

查看测试结果

通常,测试结果包含在以下命令的输出中:

# smartctl -a /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Device Model: KINGSTON SV300S37A60G

Serial Number: 50026B724A01E182

LU WWN Device Id: 5 0026b7 24a01e182

Firmware Version: 580ABBF0

User Capacity: 60,022,480,896 bytes [60.0 GB]

Sector Size: 512 bytes logical/physical

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: 8

ATA Standard is: ACS-2 revision 3

Local Time is: Wed Oct 11 15:41:49 2017 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x02) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Disabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 0) seconds.

Offline data collection

capabilities: (0x7d) SMART execute Offline immediate.

No Auto Offline data collection support.

Abort Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 48) minutes.

Conveyance self-test routine

recommended polling time: ( 2) minutes.

SCT capabilities: (0x0025) SCT Status supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x0032 095 095 050 Old_age Always - 8601004262

5 Reallocated_Sector_Ct 0x0033 099 099 003 Pre-fail Always - 0

9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 145066815403100

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 120

171 Unknown_Attribute 0x000a 100 100 000 Old_age Always - 0

172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0

174 Unknown_Attribute 0x0030 000 000 000 Old_age Offline - 97

177 Wear_Leveling_Count 0x0000 000 000 000 Old_age Offline - 96

181 Program_Fail_Cnt_Total 0x000a 100 100 000 Old_age Always - 0

182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age Always - 0

187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 0

189 High_Fly_Writes 0x0000 029 041 000 Old_age Offline - 77312098333

194 Temperature_Celsius 0x0022 029 041 000 Old_age Always - 29 (Min/Max 18/41)

195 Hardware_ECC_Recovered 0x001c 102 102 000 Old_age Offline - 8601004262

196 Reallocated_Event_Count 0x0033 099 099 003 Pre-fail Always - 0

201 Soft_Read_Error_Rate 0x001c 102 102 000 Old_age Offline - 8601004262

204 Soft_ECC_Correction 0x001c 102 102 000 Old_age Offline - 8601004262

230 Head_Amplitude 0x0013 100 100 000 Pre-fail Always - 100

231 Temperature_Celsius 0x0013 091 091 010 Pre-fail Always - 0

233 Media_Wearout_Indicator 0x0032 000 000 000 Old_age Always - 9261

234 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 14820

241 Total_LBAs_Written 0x0032 000 000 000 Old_age Always - 14820

242 Total_LBAs_Read 0x0032 000 000 000 Old_age Always - 6033

SMART Error Log not supported

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 13404 -

# 2 Selective offline Completed without error 00% 13404 -

# 3 Selective offline Completed without error 00% 13404 -

# 4 Selective offline Completed without error 00% 13404 -

# 5 Short offline Completed without error 00% 13403 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 10 Not_testing

2 10 20 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

如果只显示测试结果,也可以使用以下命令:

# smartctl -l selftest /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 13404 -

# 2 Selective offline Completed without error 00% 13404 -

# 3 Selective offline Completed without error 00% 13404 -

# 4 Selective offline Completed without error 00% 13404 -

# 5 Short offline Completed without error 00% 13403 -

显示磁盘全部的健康状态

# smartctl -H /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED ##说明自检通过,磁盘健康

查看磁盘错误日志

# smartctl -l error /dev/sdb

Sample Output

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-32-generic] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===

SMART Error Log Version: 1

ATA Error Count: 1

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 10042 hours (418 days + 10 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 a0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

ec 00 00 00 00 00 a0 08 35d+16:09:47.709 IDENTIFY DEVICE

ec 00 00 00 00 00 a0 08 35d+16:09:42.437 IDENTIFY DEVICE

ec 00 00 00 00 00 a0 08 35d+16:09:42.159 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 35d+16:09:42.159 SET FEATURES [Set transfer mode]

ec 00 00 00 00 00 a0 08 35d+16:09:42.159 IDENTIFY DEVICE

处理过程

首先通过smartctl -H /dev/sda检查磁盘健康状态,然后smartctl -a /dev/sda查看磁盘详细情况,再对磁盘进行短期测试smartctl -t short /dev/sda,最后查看磁盘测试结果smartctl -l selftest /dev/sda,基本磁盘健康状态就可以定位出来,最后检查磁盘错误日志smartctl -l error /dev/sdb

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/秋刀鱼在做梦/article/detail/917385
推荐阅读
相关标签
  

闽ICP备14008679号