Discussion:
[smartmontools-support] smartmon email notifications are not always very 'smart' (need stateful information)
Anshuman Aggarwal
2017-02-07 03:39:06 UTC
Permalink
Hi,
the email notifications generated by smartmon tools do not work in a
very smart manner, hence limitiing their usefulness. Here is an
example:

I have a drive that is supposedly failing but will likely last me 6-8
months more (as part of a raid 6 cluster so I don't feel compelled to
replace right away, it also allowed a full reconstruction of the 2 TB
cluster without any issues) It gives the following error notifications
everyday for the last 3 months via emails.


Device: /dev/sdg [SAT], 11 Currently unreadable (pending) sectors

Device: /dev/sdg [SAT], 11 Offline uncorrectable sectors


However the number of reallocated and unreadable sectors is NOT going
up. It stays the same at 11 and 5 and has been for the last 3-4
months.

What I *would like* to see is a notification only when the number
*changes* from the last time the error failed so that I can react to
that change that the drive is 'worsening'.

Right now I get the same information repeated every day, which I have
to ignore on my own and will probably end up ignoring it when the
drive actually fails.

Anybody else feel this issue? Have I missed something in the settings etc?

Cheers,
Anshuman
Ray Andrews
2017-02-07 03:50:05 UTC
Permalink
Post by Anshuman Aggarwal
Anybody else feel this issue? Have I missed something in the settings etc?
This is exactly my issue too. But I can say that after looking at ' 77
Currently unreadable (pending) sectors' every day for over a year, I
finally realized that an: "$ e2fsck -ccvy ..." style check on the
entire disk will clear the error. But yes, some way of only seeing some
further deterioration in the disk would be very nice.
Christian Franke
2017-02-07 06:24:34 UTC
Permalink
Post by Anshuman Aggarwal
...
I have a drive that is supposedly failing but will likely last me 6-8
months more (as part of a raid 6 cluster so I don't feel compelled to
replace right away, it also allowed a full reconstruction of the 2 TB
cluster without any issues) It gives the following error notifications
everyday for the last 3 months via emails.
Device: /dev/sdg [SAT], 11 Currently unreadable (pending) sectors
Device: /dev/sdg [SAT], 11 Offline uncorrectable sectors
However the number of reallocated and unreadable sectors is NOT going
up. It stays the same at 11 and 5 and has been for the last 3-4
months.
What I *would like* to see is a notification only when the number
*changes* from the last time the error failed so that I can react to
that change that the drive is 'worsening'.
Try '-C 197+ -U 198+' directives, see smartd.conf man page.


Regards,
Christian
Anshuman Aggarwal
2017-02-08 07:20:13 UTC
Permalink
Doh! You always think you're not going to be the noob who hasn't RTFM
but then it happens to you.

On 7 February 2017 at 11:54, Christian Franke
Post by Christian Franke
Post by Anshuman Aggarwal
...
I have a drive that is supposedly failing but will likely last me 6-8
months more (as part of a raid 6 cluster so I don't feel compelled to
replace right away, it also allowed a full reconstruction of the 2 TB
cluster without any issues) It gives the following error notifications
everyday for the last 3 months via emails.
Device: /dev/sdg [SAT], 11 Currently unreadable (pending) sectors
Device: /dev/sdg [SAT], 11 Offline uncorrectable sectors
However the number of reallocated and unreadable sectors is NOT going
up. It stays the same at 11 and 5 and has been for the last 3-4
months.
What I *would like* to see is a notification only when the number
*changes* from the last time the error failed so that I can react to
that change that the drive is 'worsening'.
Try '-C 197+ -U 198+' directives, see smartd.conf man page.
Regards,
Christian
Ray Andrews
2017-02-08 17:17:36 UTC
Permalink
Post by Anshuman Aggarwal
Doh! You always think you're not going to be the noob who hasn't RTFM
but then it happens to you.
I've been trying to RTFM but keep bouncing off the jargon and complexity
of it ... but I'm making progress. As I said previously, I had hoped
for some quick and simple answers but there are none.
Christian Franke
2017-02-09 06:21:36 UTC
Permalink
Post by Anshuman Aggarwal
Doh! You always think you're not going to be the noob who hasn't RTFM
but then it happens to you.
On 7 February 2017 at 11:54, Christian Franke
PCYMTNQREAIYR[1] !

Thanks,
Christian

[1] https://cygwin.com/acronyms/#PCYMTNQREAIYR

Loading...