Discussion:
[smartmontools-support] PERC H730 Mini: spun down hot spare: FailedReadSmartSelfTestLog
Lukas Pirl
2017-01-31 11:30:40 UTC
Permalink
Dear all,

first, thanks for this invaluable piece of software that you folks
created and maintain.

I have a Dell PowerEdge, housing a PERC H730 Mini (details, see
below), running on Debian 8.7 with smartd 6.4 (details, see below).

One of the disks is configured as a hot spare.
Hot spares are allowed to spin down when idle.

I regularly see the following log message:

smartd[…]: Device: /dev/bus/0 [megaraid_disk_04], failed to read
Temperature
Device: /dev/bus/0 [megaraid_disk_04], Read SMART Self-Test Log
Failed
…, 300 GB

However, sometimes I receive the message "Read SMART Self-Test Log
worked again" (or similar).
I suspect this happens when the controller does a patrol read and the
disk is spun up.

What is the recommended way of circumventing those – depending on the
definition – "false positives"? Is this kind of setup supported?

Please address me directly since I am no member of the list.

Thanks a lot in advance and best regards,

Lukas

################# <smartd info> #################
#
smartd 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke,
www.smartmontools.org

smartd comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See http://www.gnu.org for further details.

smartmontools release 6.4 dated 2014-07-26 at 09:49:11 UTC
smartmontools SVN rev 4002 dated 2014-10-07 at 11:11:49
smartmontools build host: x86_64-unknown-linux-gnu
smartmontools build with: GCC 4.9.2
smartmontools configure arguments: '--prefix=/usr'
'--sysconfdir=/etc' '--mandir=/usr/share/man'
'--with-initscriptdir=no'
'--with-docdir=/usr/share/doc/smartmontools' '--enable-drivedb'
'--enable-savestates' '--enable-attributelog'
'--with-savestates=/var/lib/smartmontools/smartd.'
'--with-attributelog=/var/lib/smartmontools/attrlog.'
'--with-exampledir=/usr/share/doc/smartmontools/examples/'
'--with-drivedbdir=/var/lib/smartmontools/drivedb'
'--with-systemdsystemunitdir=/lib/systemd/system'
'--with-smartdscriptdir=/usr/share/smartmontools'
'--with-smartdplugindir=/etc/smartmontools/smartd_warning.d'
'--with-systemdenvfile=/etc/default/smartmontools' '--with-selinux'
'CXXFLAGS=-g -O2 -fPIE -fstack-protector-strong -Wformat
- -Werror=format-security -fsigned-char -Wall -O2' 'LDFLAGS=-fPIE -pie
- -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CFLAGS=-g
- -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security
- -fsigned-char -Wall -O2'
#
################# </smartd info> #################

################# <adapter info> #################
#
Versions
================
Product Name : PERC H730 Mini
Serial No : …
FW Package Build: 25.4.1.0004

Mfg. Data
================
Mfg. Date : 08/24/16
Rework Date : 08/24/16
Revision No : A04
Battery FRU : N/A

Image Versions in Flash:
================
BIOS Version : 6.29.00.0_4.16.07.00_0x06120100
Ctrl-R Version : 5.14-0500
FW Version : 4.260.00-7126
NVDATA Version : 3.1511.00-0006
Boot Block Version : 3.07.00.00-0003

Pending Images in Flash
================
None

PCI Info
================
Controller Id : 0000
Vendor Id : 1000
Device Id : 005d
SubVendorId : 1028
SubDeviceId : 1f49

Host Interface : PCIE

ChipRevision : C0

Link Speed : 3
Number of Frontend Port: 0
Device Interface : PCIE

Number of Backend Port: 8
Port : Address
0 500003973800e63e
1 500003973800e646
2 500003973800cc9e
3 500003973800db46
4 500003973800cdce
5 0000000000000000
6 0000000000000000
7 0000000000000000

HW Configuration
================
SAS Address : 51866da074696900
BBU : Present
Alarm : Absent
NVRAM : Present
Serial Debugger : Present
Memory : Present
Flash : Present
Memory Size : 1024MB
TPM : Absent
On board Expander: Absent
Upgrade Key : Absent
Temperature sensor for ROC : Present
Temperature sensor for controller : Present

ROC temperature : 82 degree Celsius
Controller temperature : 82 degree Celcius

Settings
================
Current Time : 10:52:18 1/31, 2017
Predictive Fail Poll Interval : 300sec
Interrupt Throttle Active Count : 16
Interrupt Throttle Completion : 50us
Rebuild Rate : 30%
PR Rate : 30%
BGI Rate : 30%
Check Consistency Rate : 30%
Reconstruction Rate : 30%
Cache Flush Interval : 4s
Max Drives to Spinup at One Time : 4
Delay Among Spinup Groups : 12s
Physical Drive Coercion Mode : 128MB
Cluster Mode : Disabled
Alarm : Disabled
Auto Rebuild : Enabled
Battery Warning : Enabled
Ecc Bucket Size : 255
Ecc Bucket Leak Rate : 240 Minutes
Restore HotSpare on Insertion : Disabled
Expose Enclosure Devices : Disabled
Maintain PD Fail History : Disabled
Host Request Reordering : Enabled
Auto Detect BackPlane Enabled : SGPIO/i2c SEP
Load Balance Mode : Auto
Use FDE Only : Yes
Security Key Assigned : No
Security Key Failed : No
Security Key Not Backedup : No
Default LD PowerSave Policy : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 0
Auto Enhanced Import : No
Any Offline VD Cache Preserved : No
Allow Boot with Preserved Cache : No
Disable Online Controller Reset : No
PFK in NVRAM : No
Use disk activity for locate : No
POST delay : 90 seconds
BIOS Error Handling : Pause on Errors
Current Boot Mode :Normal
Capabilities
================
RAID Level Supported : RAID0, RAID1, RAID5, RAID6,
RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, PRL11-RLQ0 DDF
layout with no span, PRL11-RLQ0 DDF layout with span
Supported Drives : SAS, SATA

Allowed Mixing:

Mix in Enclosure Allowed

Status
================
ECC Bucket Count : 0

Limitations
================
Max Arms Per VD : 32
Max Spans Per VD : 8
Max Arrays : 128
Max Number of VDs : 64
Max Parallel Commands : 928
Max SGE Count : 60
Max Data Transfer Size : 8192 sectors
Max Strips PerIO : 42
Max LD per array : 16
Min Strip Size : 64 KB
Max Strip Size : 1.0 MB
Max Configurable CacheCade Size: 0 GB
Current Size of CacheCade : 0 GB
Current Size of FW Cache : 939 MB

Device Present
================
Virtual Drives : 1
Degraded : 0
Offline : 0
Physical Devices : 6
Disks : 5
Critical Disks : 0
Failed Disks : 0

Supported Adapter Operations
================
Rebuild Rate : Yes
CC Rate : Yes
BGI Rate : Yes
Reconstruct Rate : Yes
Patrol Read Rate : Yes
Alarm Control : Yes
Cluster Support : No
BBU : Yes
Spanning : Yes
Dedicated Hot Spare : Yes
Revertible Hot Spares : Yes
Foreign Config Import : Yes
Self Diagnostic : Yes
Allow Mixed Redundancy on Array : No
Global Hot Spares : Yes
Deny SCSI Passthrough : No
Deny SMP Passthrough : No
Deny STP Passthrough : No
Support Security : Yes
Snapshot Enabled : No
Support the OCE without adding drives : Yes
Support PFK : No
Support PI : Yes
Support Boot Time PFK Change : No
Disable Online PFK Change : No
Support LDPI Type1 : No
Support LDPI Type2 : Yes
Support LDPI Type3 : No
Support Shield State : Yes
Block SSD Write Disk Cache Change: No
Support Online FW Update : Yes

Supported VD Operations
================
Read Policy : Yes
Write Policy : Yes
IO Policy : Yes
Access Policy : Yes
Disk Cache Policy : Yes
Reconstruction : Yes
Deny Locate : No
Deny CC : No
Allow Ctrl Encryption: No
Enable LDBBM : Yes
Support Breakmirror : Yes
Power Savings : Yes

Supported PD Operations
================
Force Online : Yes
Force Offline : Yes
Force Rebuild : Yes
Deny Force Failed : No
Deny Force Good/Bad : No
Deny Missing Replace : No
Deny Clear : No
Deny Locate : No
Support Temperature : Yes
NCQ : No
Disable Copyback : No
Enable JBOD : Yes
Enable Copyback on SMART : No
Enable Copyback to SSD on SMART Error : No
Enable SSD Patrol Read : No
PR Correct Unconfigured Areas : Yes
Enable Spin Down of UnConfigured Drives : Yes
Disable Spin Down of hot spares : No
Spin Down time : 30
T10 Power State : Yes
Error Counters
================
Memory Correctable Errors : 0
Memory Uncorrectable Errors : 0

Cluster Information
================
Cluster Permitted : No
Cluster Active : No

Default Settings
================
Phy Polarity : 0
Phy PolaritySplit : 0
Background Rate : 30
Strip Size : 64kB
Flush Time : 4 seconds
Write Policy : WB
Read Policy : Adaptive
Cache When BBU Bad : Disabled
Cached IO : No
SMART Mode : Mode 6
Alarm Disable : No
Coercion Mode : 128MB
ZCR Config : Unknown
Dirty LED Shows Drive Activity : No
BIOS Continue on Error : 1
Spin Down Mode : None
Allowed Device Type : SAS/SATA Mix
Allow Mix in Enclosure : Yes
Allow HDD SAS/SATA Mix in VD : No
Allow SSD SAS/SATA Mix in VD : No
Allow HDD/SSD Mix in VD : No
Allow SATA in Cluster : No
Max Chained Enclosures : 4
Disable Ctrl-R : No
Enable Web BIOS : No
Direct PD Mapping : Yes
BIOS Enumerate VDs : Yes
Restore Hot Spare on Insertion : No
Expose Enclosure Devices : No
Maintain PD Fail History : No
Disable Puncturing : No
Zero Based Enclosure Enumeration : Yes
PreBoot CLI Enabled : No
LED Show Drive Activity : Yes
Cluster Disable : Yes
SAS Disable : No
Auto Detect BackPlane Enable : SGPIO/i2c SEP
Use FDE Only : Yes
Enable Led Header : No
Delay during POST : 0
EnableCrashDump : No
Disable Online Controller Reset : No
EnableLDBBM : Yes
Un-Certified Hard Disk Drives : Allow
Treat Single span R1E as R10 : Yes
Max LD per array : 16
Power Saving option : Don't spin down unconfigured drives
Don't spin down Hot spares
Don't Auto spin down Configured Drives
Power settings apply to all drives - individual PD/LD power settings
cannot be set
Max power savings option is not allowed for LDs. Only T10 power
conditions are to be used.
Cached writes are not used for spun down VDs
Can schedule disable power savings at controller level
Default spin down time in minutes: 30
Enable JBOD : Yes
TTY Log In Flash : Yes
Auto Enhanced Import : No
BreakMirror RAID Support : Yes
Disable Join Mirror : Yes
Enable Shield State : No
Time taken to detect CME : 60s
#
################# </adapter info> ################

################# <disk info> ################
#
Enclosure Device ID: 32
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 0
WWN: 500003973800CC9C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Online, Spun Up
Device Firmware Level: DM05
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500003973800cc9e
SAS Address(1): 0x0
Connected Port Number: 2(path0)
Inquiry Data: TOSHIBA AL14SEB030N …
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 12.0Gb/s
Link Speed: 12.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :38C (100.40 F)
PI Eligibility: No
Drive is formatted for PI information: Yes
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 32
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: 1
Device Id: 1
WWN: 500003973800E63C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Online, Spun Up
Device Firmware Level: DM05
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500003973800e63e
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: TOSHIBA AL14SEB030N …
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 12.0Gb/s
Link Speed: 12.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :37C (98.60 F)
PI Eligibility: No
Drive is formatted for PI information: Yes
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 32
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: 1
Device Id: 2
WWN: 500003973800CDCC
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Online, Spun Up
Device Firmware Level: DM05
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500003973800cdce
SAS Address(1): 0x0
Connected Port Number: 4(path0)
Inquiry Data: TOSHIBA AL14SEB030N …
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 12.0Gb/s
Link Speed: 12.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :33C (91.40 F)
PI Eligibility: No
Drive is formatted for PI information: Yes
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 32
Slot Number: 3
Drive's position: DiskGroup: 0, Span: 0, Arm: 3
Enclosure position: 1
Device Id: 3
WWN: 500003973800E644
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Online, Spun Up
Device Firmware Level: DM05
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500003973800e646
SAS Address(1): 0x0
Connected Port Number: 1(path0)
Inquiry Data: TOSHIBA AL14SEB030N …
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 12.0Gb/s
Link Speed: 12.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :34C (93.20 F)
PI Eligibility: No
Drive is formatted for PI information: Yes
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 32
Slot Number: 4
Enclosure position: 1
Device Id: 4
WWN: 500003973800DB44
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Hotspare Information:
Type: Global, is revertible

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Hotspare, Spun down
Device Firmware Level: DM05
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500003973800db46
SAS Address(1): 0x0
Connected Port Number: 3(path0)
Inquiry Data: TOSHIBA AL14SEB030N …
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 12.0Gb/s
Link Speed: 12.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :35C (95.00 F)
PI Eligibility: No
Drive is formatted for PI information: Yes
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Drive has flagged a S.M.A.R.T alert : No

#
################# </disk info> ################
Christian Franke
2017-02-17 06:43:11 UTC
Permalink
Post by Lukas Pirl
Dear all,
first, thanks for this invaluable piece of software that you folks
created and maintain.
You're welcome.
Post by Lukas Pirl
I have a Dell PowerEdge, housing a PERC H730 Mini (details, see
below), running on Debian 8.7 with smartd 6.4 (details, see below).
One of the disks is configured as a hot spare.
Hot spares are allowed to spin down when idle.
smartd[…]: Device: /dev/bus/0 [megaraid_disk_04], failed to read
Temperature
Device: /dev/bus/0 [megaraid_disk_04], Read SMART Self-Test Log
Failed
…, 300 GB
However, sometimes I receive the message "Read SMART Self-Test Log
worked again" (or similar).
I suspect this happens when the controller does a patrol read and the
disk is spun up.
There may be too small command timeout values for the pass-through
I/O-controls used for this controller. The related code uses 'timeout'
value 0 (some default?) for MEGASAS_IOC_FIRMWARE[1] or 2 (two seconds?)
for MEGAIOCCMD[2].

Could you (or someone else with access to such hardware) possibly change
these values, rebuild smartmontools from source and check whether the
problem persists?

Possibly related:
https://www.smartmontools.org/ticket/793

Thanks,
Christian

[1]
https://www.smartmontools.org/browser/trunk/smartmontools/os_linux.cpp?rev=4389#L1335
[2]
https://www.smartmontools.org/browser/trunk/smartmontools/os_linux.cpp?rev=4389#L1406
Loading...