Discussion:
[smartmontools-support] Trouble using smartctl with LSI megaraid controller
a***@gmail.com
2017-02-11 20:48:28 UTC
Permalink
Hello,
I am able to run the commands below successfully on the command line but I
am not able to get them to work in the /etc/smartd.conf file.

CentOS 6.8
smartmontools.x86_64 1:5.43-1.el6

#from the shell
Success! Results at end of email.
smartctl -s on -d sat+megaraid,30 /dev/sda
smartctl -a -d sat+megaraid,30 /dev/sda

A few questions....
1. Does anyone know the correct syntax to get it working in the smartd.conf
file?
2. There are two partitions in this system /dev/sda and /dev/sdb. How do I
know which drive Id (DID) belongs to which partition and does it matter if
I assign a physical drive to the incorrect partition?
3. Is there a way to check all the drives on a single partition at once
instead of having to check them individually with the sat+megaraid,N
syntax? There are 20 physical drives on this system.
4. Once I enable smartctl on the command line for a drive is it on
permanently or does it need to be enabled through the smartd.conf file?

Thanks for the help!

#smartd.conf file
Failure!
/dev/sda -H -d sat+megaraid,30 -s S/../../2/03

#/var/log/messages after restart of smartd service
Feb 11 12:23:16 db011 smartd[3751]: Opened configuration file
/etc/smartd.conf
Feb 11 12:23:16 db011 smartd[3751]: Configuration file /etc/smartd.conf
parsed.
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30]
[SAT], opened
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30]
[SAT], WDC WD2500BHTZ-04JCPV0, S/N:WD-WX11E23LZ256, WWN:5-0014ee-6ae7ef044,
FW:04.06A00, 250 GB
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30]
[SAT], not found in smartd database.
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30]
[SAT], not capable of SMART Health Status check
Feb 11 12:23:16 db011 smartd[3751]: Unable to register ATA device /dev/sda
[megaraid_disk_30] [SAT] at line 28 of file /etc/smartd.conf
Feb 11 12:23:16 db011 smartd[3751]: Device /dev/sda [megaraid_disk_30]
[SAT] not available
Feb 11 12:23:16 db011 smartd[3751]: Monitoring 0 ATA and 0 SCSI devices
Feb 11 12:23:16 db011 smartd[3753]: smartd has fork()ed into background
mode. New PID=3753.

Drive list here.
storcli64 /c0 /eall /sall show
-------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model
Sp
-------------------------------------------------------------------------------
29:0 30 Onln 0 232.375 GB SATA HDD N N 512B WDC WD2500BHTZ-04JCPV0
U
29:1 31 Onln 0 232.375 GB SATA HDD N N 512B WDC WD2500BHTZ-04JCPV0
U
29:2 35 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:3 33 GHS - 372.093 GB SATA SSD N N 512B SDLFODAM-400G-1HA1
U
29:4 39 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:5 34 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:6 37 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:7 32 Onln 1 372.093 GB SATA SSD N N 512B SDLFODAM-400G-1HA1
U
29:8 42 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:9 38 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:10 36 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:11 41 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:12 45 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:13 43 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:14 40 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:15 44 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:16 47 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:17 48 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:18 49 GHS - 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
29:19 46 Onln 1 372.093 GB SATA SSD N N 512B SDLFGD7R-400G-1HA1
U
-------------------------------------------------------------------------------


#Results of smartctl -a -d sat+megaraid,30 /dev/sda
=====================================================================
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64]
(local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: WDC WD2500BHTZ-04JCPV0
Serial Number: WD-WX11E23LZ256
LU WWN Device Id: 5 0014ee 6ae7ef044
Firmware Version: 04.06A00
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Feb 11 12:32:31 2017 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2400) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 31) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x30bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always
- 0
3 Spin_Up_Time 0x0027 177 177 021 Pre-fail Always
- 2108
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always
- 17
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always
- 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always
- 0
9 Power_On_Hours 0x0032 071 071 000 Old_age Always
- 21799
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always
- 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 17
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 16
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always
- 0
194 Temperature_Celsius 0x0022 119 107 000 Old_age Always
- 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline
- 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


#Results of smartctl -s on -d sat+megaraid,30 /dev/sda
======================================================================
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64]
(local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
mathog
2017-02-13 17:59:20 UTC
Permalink
Post by a***@gmail.com
A few questions....
1. Does anyone know the correct syntax to get it working in the smartd.conf
file?
Sorry, no.
Post by a***@gmail.com
2. There are two partitions in this system /dev/sda and /dev/sdb. How do I
know which drive Id (DID) belongs to which partition and does it matter if
I assign a physical drive to the incorrect partition?
Get the "megacli" program and install it. Then you can run things like
this:

SNAME=`hostname -s`
NOW=`date`
OFILE=/root/$SNAME.megaraid.info

echo "Megaraid information for $SNAME collected $NOW" > $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "General Information" >> $OFILE
megacli -AdpAllInfo -aAll | tr -d '\r' >> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Battery backup" >> $OFILE
megacli -AdpBbuCmd -aAll | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Virtual disks" >> $OFILE
megacli -LDInfo -Lall -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Physical drives" >> $OFILE
megacli -PDList -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Patrol read" >> $OFILE
megacli -AdpPR -Info -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "DONE" >> $OFILE
Post by a***@gmail.com
3. Is there a way to check all the drives on a single partition at once
instead of having to check them individually with the sat+megaraid,N
syntax? There are 20 physical drives on this system.
You mean with one "test all disks" smartctl command? No, I don't think
that is possible.
You can certainly start all of the tests sequentially with no delay,
which is pretty
much the same thing. Then wait however long is necessary for that disk
type and read back all the results. In theory. For some reason my test
script does them sequentially - I may have been worried about what the
RAID would do if all the disks were busy self testing at the same time.
Doing them sequentially in a script (with or without delays) isn't a big
deal, something along the lines of:

/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,0
sleep 10300
logger "Done smartctl -t long /dev/sda -d sat+megaraid,0"
/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,1
sleep 10300
logger "Done smartctl -t long /dev/sda -d sat+megaraid,1"
etc.

then read all the results. Use a loop if you don't want to write them
all out. It would also be fine to read the result for a disk immediately
after each long test completes.
Post by a***@gmail.com
4. Once I enable smartctl on the command line for a drive is it on
permanently or does it need to be enabled through the smartd.conf file?
Once enabled it should stay on until the system reboots. That command
changes a state
which is stored on the drive. The state will not survive a power cycle.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
Håkon Alstadheim
2017-02-13 19:34:24 UTC
Permalink
Post by mathog
Post by a***@gmail.com
A few questions....
1. Does anyone know the correct syntax to get it working in the smartd.conf
file?
me>
Post by mathog
Sorry, no.
Post by a***@gmail.com
2. There are two partitions in this system /dev/sda and /dev/sdb. How do I
know which drive Id (DID) belongs to which partition and does it matter if
I assign a physical drive to the incorrect partition?
Get the "megacli" program and install it. Then you can run things like
Second that, get megacli. Interface is atrocious, but there is help in
the program. Wrap the program in a script to reduce typing. The commands
are almost, but not completely, grouped in a logical way, with almost,
but not completely, consistent terminology and abbreviations.


I use:
-----/usr/local/bin/megacli:--------
#!/bin/sh
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/megacli/lib"
if test "$1" = "help" || test "$1" = "-help" ; then
echo /opt/megacli/megacli "$@"
exec /opt/megacli/megacli "$@"
else
echo /opt/megacli/megacli "$@" a0 nolog
exec /opt/megacli/megacli "$@" a0 nolog
fi
-----------------------

Before I got my head around the options, i hacked up a search facility
for the help-texts (see megahelp attached). The most useful way to run
megahelp is like:
$ megahelp <substring>
Where <substring> is something like "ld" or "pd". This will output a
(short) help-text on all commands containing that substring. If you run
it like:
$ megahelp -desc <substring>
it will return hits in the descriptions, rather than the commands
themselves. If you run it like:
$ megahelp -all
You get all possible sub-commands, with no help. Page through that and
look at specific commands with "megahelp <command>".


Everything after __DATA__ in the script attached is output from the
command "megacli help". You should replace it with output from YOUR
version. I also noticed that you get different output from "megacli
help" than you get by querying help for each individual command, so if
you search for a specific command, you get the output of "megacli help
<command>" rather than just an excerpt from the output of "megacli help" .

If you don't have a wrapper like this, there is a built-in pager in the
megacli program. Clues to how that works are in the header or footer of
the "megacli help" output. Never used it myself.

Once you know your way around you can find your way from logical disk
("ld") , to pci-id ("megacli adpgetpciinfo a0 nolog"), to
/dev/disk/by-path/pci-<pci-id> .

Perl-snippet from an awful spaghetti mess I use:
------snippet:----
local $pci_id = undef;
{
local ($bus_number,$device_number,$function_number);
open(PCIID,"$megacli adpgetpciinfo a0 nolog|") or die "Could not
find pci-info";
while($_=<PCIID>){
if(m(^Bus Number[ :]*([0-9]+))){$bus_number = $1; };
if(m(Device Number[ :]*([0-9]+))){ $device_number=$1;};
if(m(Function Number[ :]*([0-9]+)) ) { $function_number=$1;};
}
die "Could not find pci-id" unless defined($bus_number) &&
defined($device_number) && defined($function_number);
$pci_id =
sprintf("%02d:%02d.%d",$bus_number,$device_number,$function_number);
}
-----snippet ends.----

a***@gmail.com
2017-02-14 00:44:08 UTC
Permalink
Thanks for the info David. I'll run the tests separately on each drive
then. I thought it could be done on a partition. I can execute most
smartctl commands from the command line but would like it monitored by
the smartd daemon instead of running a script. By the documentation at
https://linux.die.net/man/5/smartd.conf it seems the commands below
work in the smartd.conf after restart after removing the -H and
substituting it with an -a? The output from /var/log/messages is
below.

/etc/smartd.conf

/dev/sda -d sat+megaraid,30 -a -s S/../.././01
/dev/sda -d sat+megaraid,31 -a -s S/../.././02

/var/log/messages
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_30] [SAT], not found in smartd database.
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_30] [SAT], not capable of SMART Health Status check
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_30] [SAT], is SMART capable. Adding to "monitor" list.
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_31] [SAT], opened
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_31] [SAT], WDC WD2500BHTZ-04JCPV0, S/N:WD-WX11E23LW514,
WWN:5-0014ee-65929e323, FW:04.06A00, 250 GB
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_31] [SAT], not found in smartd database.
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_31] [SAT], not capable of SMART Health Status check
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda
[megaraid_disk_31] [SAT], is SMART capable. Adding to "monitor" list.
Feb 13 16:00:53 db011 smartd[32430]: Monitoring 2 ATA and 0 SCSI devices
Feb 13 16:00:53 db011 smartd[32444]: smartd has fork()ed into
background mode. New PID=32444.

================================================================================

Sorry, no.
Post by a***@gmail.com
2. There are two partitions in this system /dev/sda and /dev/sdb. How do I
know which drive Id (DID) belongs to which partition and does it matter if
I assign a physical drive to the incorrect partition?
Get the "megacli" program and install it. Then you can run things like
this:

SNAME=`hostname -s`
NOW=`date`
OFILE=/root/$SNAME.megaraid.info

echo "Megaraid information for $SNAME collected $NOW" > $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "General Information" >> $OFILE
megacli -AdpAllInfo -aAll | tr -d '\r' >> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Battery backup" >> $OFILE
megacli -AdpBbuCmd -aAll | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Virtual disks" >> $OFILE
megacli -LDInfo -Lall -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Physical drives" >> $OFILE
megacli -PDList -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Patrol read" >> $OFILE
megacli -AdpPR -Info -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "DONE" >> $OFILE
Post by a***@gmail.com
3. Is there a way to check all the drives on a single partition at once
instead of having to check them individually with the sat+megaraid,N
syntax? There are 20 physical drives on this system.
You mean with one "test all disks" smartctl command? No, I don't think
that is possible.
You can certainly start all of the tests sequentially with no delay,
which is pretty
much the same thing. Then wait however long is necessary for that disk
type and read back all the results. In theory. For some reason my test
script does them sequentially - I may have been worried about what the
RAID would do if all the disks were busy self testing at the same time.
Doing them sequentially in a script (with or without delays) isn't a big
deal, something along the lines of:

/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,0
sleep 10300
logger "Done smartctl -t long /dev/sda -d sat+megaraid,0"
/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,1
sleep 10300
logger "Done smartctl -t long /dev/sda -d sat+megaraid,1"
etc.

then read all the results. Use a loop if you don't want to write them
all out. It would also be fine to read the result for a disk immediately
after each long test completes.
Post by a***@gmail.com
4. Once I enable smartctl on the command line for a drive is it on
permanently or does it need to be enabled through the smartd.conf file?
Once enabled it should stay on until the system reboots. That command
changes a state
which is stored on the drive. The state will not survive a power cycle.

Regards,

David Mathog
***@...
Manager, Sequence Analysis Facility, Biology Division, Caltech
Loading...