[smartmontools-support] raw values ? help required?

Discussion:

Franc Zabkar

2010-04-16 02:24:05 UTC

Some attributes are best viewed in hexadecimal format. To this end
there is a "hex48" (48-bit hex) switch in smartctl, eg ...

$ smartctl -v 5,hex48 -v 196,hex48 -v 203,hex48 -v
200,Write_Error_Rate - v 240,Transfer_Error_Rate -A /dev/ice

You can also change the names of the attributes, as in the previous example.

This article explains the attributes:
http://en.wikipedia.org/wiki/S.M.A.R.T.

I believe there may also be a switch to display smartctl results in
"Fujitsu format".

Here are three attributes that look better in hex:

Reallocated_Sector_Ct = 9019431321600 = 0x083400000000
Reallocated_Event_Count = 826015744 = 0x0000313c0000
Run_Out_Cancel = 3728044065286 = 0x036400be0206

AIUI, the Reallocated_Sector_Ct consists of three 16-bit words as follows:

(#spare sectors remaining) (#reallocated sectors) (#reallocation events)

So, AISI, the drive has 2100 (=0x0834) spares with no reallocations.

I believe the lower 16 bits of the Reallocated_Event_Count store the
number of reallocation events.

I have no idea what the Run_Out_Cancel data mean, except to say that
the raw value appears to consist of three 16-bit words.

As for the Raw_Read_Error_Rate, Seek_Error_Rate, and Write_Error_Rate
(attribute 200), I understand the raw numbers to be a sector count,
not an error count.

Fujitsu (and Seagate) drives compute the error rates for each block
of sectors accessed. Seagate appears to count up to 250 million
sectors before the number rolls over to zero, whereas Fujitsu appears
to count up to a much smaller number, probably 0x3FFFF (= 262,143).

In short, I don't see anything that would worry me.

Regards,
Franc

Date: Thu, 15 Apr 2010 21:21:27 +0200
Subject: [smartmontools-support] raw values ? help required?
### DISK 1 - BEGIN
Device Model: FUJITSU MJA2500BH G2
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
TYPE UPDATED WHEN_FAILED RAW_VALUE
1
Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail
Always - 168559
2
Throughput_Performance 0x0005 100 100 030 Pre-fail
Offline - 69861376
5
Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail
Always - 9019431321600
7
Seek_Error_Rate 0x000f 100 100 047 Pre-fail
Always - 1559
195
Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always
- 18071
196 Reallocated_Event_Count
0x0032 100 100 000 Old_age Always - 826015744
200
Multi_Zone_Error_Rate 0x000f 100 100 060 Pre-fail Always
- 4007
203
Run_Out_Cancel 0x0002 100 100 000 Old_age Always
- 3728044065286
240
Head_Flying_Hours 0x003e 200 200 000 Old_age Always
- 0

Franc Zabkar

2010-04-16 03:12:47 UTC

Permalink

The Raw_Read_Error_Rate for the Hitachi HTS545050B9A300 is 0x00010001
(= 65537), which tallies with loss of one point from the normalised
value of 99, ie the "read error rate" appears to be 1.

The Power_On_Hours figure of 3710 has resulted in a loss of 8 points.
So, according to SMART, the drive's rated life is between ...

3710 / 9 * 100 / 365 / 24 = 4.7 years

... and ...

3710 / 8 * 100 / 365 / 24 = 5.3 years

The SAMSUNG HM500JI hasn't lost a point after 3712 hours, so it
appears that there is a bug in that attribute.

The WDC WD5000BEVT has lost 2 points after 1485 hours, so its rated
life is between ...

1485 / 3 * 100 / 365 / 24 = 5.65 years

... and ...

1485 / 2 * 100 / 365 / 24 = 8.5 years

Your Load_Cycle_Count figures are extremely worrying.

Drive LCC POH LCC frequency (secs)
--------------------------------------------
Hitachi 903893 3710 14.8s
WDC 263874 1485 20.3s
SAMSUNG 3480080 3712 3.9s <--- is this realistic ???

The rated number of load cycles for the Hitachi appear to be between ...

903893 / 91 x 100 = 993,289

... and ...

903893 / 90 x 100 = 1,004,325

Assuming the maximum normalised value for the WD is 200, then its
rated number of load cycles appear to be between ...

263874 / 88 x 200 = 599,713

... and ...

263874 / 87 x 200 = 606,606

The Samsung has already exceeded its expected lifetime.

Regards,
Franc

Date: Thu, 15 Apr 2010 21:21:27 +0200
Subject: [smartmontools-support] raw values ? help required?
One other interesting thing is that all disks (except the replaced one) where
Power_On_Hours Disk
3710 DISK 2: Hitachi Travelstar 5K500.B HTS545050B9A300
1485 DISK 3: Western Digital Scorpio Blue WD5000BEVT
3712 DISK 4: SAMSUNG SpinPoint M7 HM500JI
As you can see, the Western Digital drive differs alot from the value it's
supposed to be.
=== START OF INFORMATION SECTION ===
Device Model: Hitachi HTS545050B9A300
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
TYPE UPDATED WHEN_FAILED RAW_VALUE
1
Raw_Read_Error_Rate 0x000b 099 099 062 Pre-fail
Always - 65537
9
Power_On_Hours 0x0012 092 092 000 Old_age
Always - 3710
193
Load_Cycle_Count 0x0012 010 010 000 Old_age Always
- 903893
### DISK 3 - BEGIN
Device Model: WDC WD5000BEVT-00A0RT0
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
TYPE UPDATED WHEN_FAILED RAW_VALUE
9
Power_On_Hours 0x0032 098 098 000 Old_age
Always - 1485
193
Load_Cycle_Count 0x0032 113 113 000 Old_age Always
- 263874
### DISK 4 - BEGIN
Device Model: SAMSUNG HM500JI
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
TYPE UPDATED WHEN_FAILED RAW_VALUE
9
Power_On_Hours 0x0032 100 100 000 Old_age
Always - 3712
225
Load_Cycle_Count 0x0032 001 001 000 Old_age Always
- 3480080

Tim Small

2010-04-16 10:51:10 UTC

Permalink

Post by Franc Zabkar
Your Load_Cycle_Count figures are extremely worrying.

FYI:

https://ata.wiki.kernel.org/index.php/Known_issues#Drives_which_perform_frequent_head_unloads_under_Linux

Try fiddling with the APM values, and see if they stop increasing
(doesn't work for WD you'll need to use their DOS tool).

Tim.

--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309

Maciej Żenczykowski

2010-04-16 16:47:56 UTC

Permalink

I have a system with two 2.5" Seagate Momentus 500GB drives, and the
BIOS appears to default to hdparm -B 128.
Since I have these drives in a raid configuration, and this setting
(and indeed any setting < 254) results in these drives being very slow
(it appears every time raid sends a request to the other drive, the
previous drive 'unloads/parks/stops' or does something equivalently
stupid), for me the solution is to run "hdparm -B 254" as early as
possible during boot (as early as possible, because the drives are
_ridiculously_ slow otherwise and thus the boot takes ...ages...).

- Maciej

Franc Zabkar

2010-04-16 21:46:37 UTC

Permalink

I would examine the SMART reports a day later, and compare the POH
counts to see if they have advanced by the same amount.

If the WDC lags behind, then it may be that the POH count is not
incremented during power saving mode. Or could there be a bug in the
firmware that causes the real time clock to advance by only 2 ticks
in every 5 ???

ie 3712 x (2/5) = 1484.8

Regards,
Franc

Franc Zabkar

2010-04-16 21:59:09 UTC

Permalink

There is a Dell firmware upgrade that appears to solve the frequent
parking issue.

Here is the upgrade matrix which I extracted from the update package:
http://www.users.on.net/~fzabkar/dell_fw_cfg.txt

The format appears to be ...

(model #) (existing F/W modules) (F/W update image) (updated F/W modules)

There is a long thread at Seagate's forums that explains how to
forcibly flash a retail drive with Dell's F/W distribution.

http://forums.seagate.com/t5/Internal-ATA-and-Serial-ATA/ST9500420ASG-Momentus-7200-4-clicking-noise/m-p/47457#M18798

If you want to try it, and you need help, let me know, or join the thread.

Regards,
Franc

Post by Maciej Å»enczykowski
I have a system with two 2.5" Seagate Momentus 500GB drives, and the
BIOS appears to default to hdparm -B 128.
Since I have these drives in a raid configuration, and this setting
(and indeed any setting < 254) results in these drives being very slow
(it appears every time raid sends a request to the other drive, the
previous drive 'unloads/parks/stops' or does something equivalently
stupid), for me the solution is to run "hdparm -B 254" as early as
possible during boot (as early as possible, because the drives are
_ridiculously_ slow otherwise and thus the boot takes ...ages...).
- Maciej

Tim Small

2010-04-17 10:34:03 UTC

Permalink

Post by Franc Zabkar
There is a Dell firmware upgrade that appears to solve the frequent
parking issue.
http://www.users.on.net/~fzabkar/dell_fw_cfg.txt
The format appears to be ...
(model #) (existing F/W modules) (F/W update image) (updated F/W modules)
There is a long thread at Seagate's forums that explains how to
forcibly flash a retail drive with Dell's F/W distribution.
http://forums.seagate.com/t5/Internal-ATA-and-Serial-ATA/ST9500420ASG-Momentus-7200-4-clicking-noise/m-p/47457#M18798
If you want to try it, and you need help, let me know, or join the thread.

FYI, in the past couple of weeks, I've successfully used hdparm 9.27 to
carry out a firmware upgrade on a few Seagate drives (two different
Enterprise 3.5" models) under Linux.

https://ata.wiki.kernel.org/index.php/Talk:Known_issues

.. however you have to be pretty sure that Linux isn't going to access
the drive, or it will issue a bus reset during the firmware update
process. e.g. if you need to carry out the update on a boot drive, then
you could prepare the files, then reboot into single user mode, with all
filesystems remounted read-only.

Tim.

Franc Zabkar

2010-04-17 22:44:58 UTC

Permalink

Thanks very much for that information. I had just recently been asked
if this was possible. I expected that the Dell header needed to be
stripped, but wasn't sure whether the firmware was compatible.

Be aware that certain upgrades will render your drive inoperable. For
example, Seagate warns that updating a CCxx 7200.11 drive with SDxx
firmware, or vice versa, will turn it into a paperweight. I have seen
the results of such an attempt at HDD Guru.

Furthermore, as seen in the following update matrix, sometimes the
firmware revisions are not intuitive.

http://www.users.on.net/~fzabkar/dell_fw_cfg.txt

For example, 0003SDM1 firmware is updated with 0004SDM1.lod, and
002SDM1 with 0005SDM1.lod.

-Franc

Post by Tim Small
FYI, in the past couple of weeks, I've successfully used hdparm 9.27
to carry out a firmware upgrade on a few Seagate drives (two
different Enterprise 3.5" models) under Linux.
https://ata.wiki.kernel.org/index.php/Talk:Known_issues
.. however you have to be pretty sure that Linux isn't going to
access the drive, or it will issue a bus reset during the firmware
update process. e.g. if you need to carry out the update on a boot
drive, then you could prepare the files, then reboot into single
user mode, with all filesystems remounted read-only.
Tim.