Discussion:
[smartmontools-support] Using SMARTd to monitor drives
Thane Sherrington
2017-01-24 00:05:57 UTC
Permalink
Hi there,

I've read through the manpage on SMARTD, but clearly I'm too dense
to grasp it.

I understand that I can do a devicescan in the smartd.conf file to
scan for all devices, so I have the following in the smartd.conf

DEVICESCAN -l error -l xerror -s L/../.././(00|06|12|18)

Now as I understand it, this should log errors and extended errors, and
run a long test at 12AM, 6AM, 12PM and 6PM.

Am I doing this correctly?

T
Christian Franke
2017-01-30 20:55:25 UTC
Permalink
Post by Thane Sherrington
I've read through the manpage on SMARTD, but clearly I'm too dense
to grasp it.
I understand that I can do a devicescan in the smartd.conf file to
scan for all devices, so I have the following in the smartd.conf
DEVICESCAN -l error -l xerror -s L/../.././(00|06|12|18)
Now as I understand it, this should log errors and extended errors, and
run a long test at 12AM, 6AM, 12PM and 6PM.
Yes, but this does not actually 'log errors'. It issues LOG_CRIT
messages if the maximum of both error counts has increased. Pending and
offline uncorrectable sectors (attribute 197 and 198) are also checked
by default unless -C 0 -U 0 is specified.

Note that you could print the test schedules with 'smartd -q showtests'
(see smartd man page).
Post by Thane Sherrington
Am I doing this correctly?
Yes and no. It is probably a bad idea to run 4 long(!) tests a day.

Here is the default setting I use on one machine ('console' and ',ns'
are Windows specific). One long test on Saturday, short tests each other
day.

DEFAULT \
-n standby \
-a -l xerror -l selfteststs,ns -l offlinests,ns \
-R 5! -r 5! \
-R 177! -r 177! \
-W 3,40,45 -I 190 -I 194 \
-s (S/../../[1-57]/17|L/../../6/17) \
-m console -M daily -M test

Thanks,
Christian
Ray Andrews
2017-01-30 21:35:42 UTC
Permalink
This post might be inappropriate. Click to display it.
Christian Franke
2017-01-31 09:21:43 UTC
Permalink
Post by Ray Andrews
...
It really would be nice if one could control the timing of reports
Christian.
The current possible settings -M once (default if state persistence is
disabled) and -M daily (default if state persistence is enabled) are
IMO sufficient. The warning emails (or scripts) are only intended to
alert the admin about serious problems immediately after detection -
with optional daily reminders if the problem persists.

Feel free to request an enhancement here:
https://www.smartmontools.org/newticket
Post by Ray Andrews
As above, I've found that it doesn't matter how often I
test, the running of smartd-notifier remains random.
No, if smartd is running 24x7 a reminder email is sent each ~24 hours
for each device and each type of disk problem detected. If smartd is not
running 24x7, it depends on state persistence setting (enabled by
default on Debian/Ubuntu).

It is logged to syslog as: "Sending warning via MAILER to ADDRESS ..."
immediately after the actual warning message.
Post by Ray Andrews
2017-01-26_Thu_19:56:28:IDENTICAL
2017-01-26_Thu_19:56:28:Device: /dev/sdb [SAT], Self-Test Log error
count increased from 15 to 16
This means that another SMART self test has failed. Did you check the
self-test log(s) in output of 'smartctl -x /dev/sdb' ? A failed
self-test typically reports the LBA of the first bad sector.
Post by Ray Andrews
2017-01-28_Sat_08:33:09:Device: /dev/sdb [SAT], 77 Currently unreadable
(pending) sectors
...
BTW, I can't clear that error no matter what I do :-/
In a previous mail from 2017-01-19, I already explained how to disable
the pending sectors alert with the -C directive.

Regards,
Christian
Ray Andrews
2017-02-03 19:55:13 UTC
Permalink
On 31/01/17 01:21 AM, Christian Franke wrote:


It is logged to syslog as: "Sending warning via MAILER to ADDRESS ..."
Post by Christian Franke
immediately after the actual warning message.
That's interesting, how do I enable that?
Post by Christian Franke
This means that another SMART self test has failed. Did you check the
self-test log(s) in output of 'smartctl -x /dev/sdb' ? A failed
self-test typically reports the LBA of the first bad sector.
It would sure be nice if there was a way of just getting the errors or
important changes. The above is so hard to use as far as seeing
important things.
Post by Christian Franke
error no matter what I do :-/
In a previous mail from 2017-01-19, I already explained how to disable
the pending sectors alert with the -C directive.
Yes, BUT, I want to see if the count ever increases. That is, I don't
want to see '77' over and over again, but I do want to see if it goes to 78.

You know Christian, you have a very powerful program, but for a
non-expert it can be difficult to figure out how to make it 'just work'
in a simple, easy and normal way. I had not expected to devote dozens
of hours to figuring this out, I just want timely messages of *new*
problems. The complexity gives one great power, but one is lost trying
to do simple things. A basic tutorial or howto would be very nice!
Gregory Sloop
2017-02-03 20:11:54 UTC
Permalink
This post might be inappropriate. Click to display it.
Gabriele Pohl
2017-02-03 21:28:49 UTC
Permalink
On Fri, 3 Feb 2017 12:11:54 -0800
Post by Gregory Sloop
Post by Ray Andrews
A basic tutorial or howto would be very nice!
And here's where you can "pay-it-forward" so to speak. Christian has done all this work for you [and me] for no charge. How about you spend some time giving back to the community and do a well-written FAQ? Again, this isn't meant as an attack, just a gentle reminder that this is a community, and a community's success is measured by how well the whole community steps up to take on their responsibilities and offer their time/money/resources.
Thanks for your impassioned plea!

I will be delighted about tutorials on recent smartmontools version
especially when covering the handling of notifications
and will link to these on our homepage:

https://www.smartmontools.org/wiki/TocDoc#Tutorials

where you can find several about older versions already
which are still helpful as Christian pays high attention
on backward compatibility ~

fyi and cheers!

Gabriele
Ray Andrews
2017-02-04 01:12:26 UTC
Permalink
Post by Gabriele Pohl
https://www.smartmontools.org/wiki/TocDoc#Tutorials
where you can find several about older versions already
which are still helpful as Christian pays high attention
on backward compatibility ~
fyi and cheers!
Gabriele
Many thanks!
Ray Andrews
2017-02-04 01:10:04 UTC
Permalink
Re: [smartmontools-support] Using SMARTd to monitor drives
*
*This certainly isn't an attempt at snark - but smartmontools is, I
think, intended at sysadmin type people. There are tools for end-users
that are less technical and more idiot-light style.
I'm not an idiot but at the same time there are folks who want utility
without devoting years of study to get it.
I think, in general, that I'd prefer smartmontools to keep the
detailed technical focus and not have the more friendly interface, if
doing so meant that the technical prowess of smt had to be diminished.
Does it really have to be one or the other?
[All projects are resource limited - and spending more time on one
thing usually means less somewhere else.]
And note that smt is provided to users at no cost. Those "easy"
programs aren't free and, IMO, provide less - so the "cost" for them
is dual; you get less technical detail and programmatic control, and a
closed eco-system, with no access to source code, etc along with the
license fees.
/>/*A basic tutorial or howto would be very nice!
*And here's where you can "pay-it-forward" so to speak. Christian has
done all this work for you [and me] for no charge. How about you spend
some time giving back to the community and do a well-written FAQ?
Again, this isn't meant as an attack, just a gentle reminder that this
is a community, and a community's success is measured by how well the
whole community steps up to take on their responsibilities and offer
their time/money/resources.
No, that's fine, a very valid request. You know, I believe that the
only ones who can write a useful how-to are the people who have just now
learned how to do something because once you are an expert, you forget
what it was like to not be an expert. We 'forget what we know', that
is, we know things without being aware of the knowledge and so we forget
to write them down for the one who knows nothing -- things become too
obvious to mention. For Christian it is now second nature how SMT works
so he has no empathy for the newbie, that's understandable. If I myself
could become competent I'd do just as you say! Alas, I'm still trying
to figure it out myself.
Gregory Sloop
2017-02-04 02:25:58 UTC
Permalink
This certainly isn't an attempt at snark - but smartmontools is, I think, intended at sysadmin type people. There are tools for end-users that are less technical and more idiot-light style.

I'm not an idiot but at the same time there are folks who want utility without devoting years of study to get it.

I think you're putting us on a bit. Years of study? :)
But yes, it is rather voluminous, the options and such. *nix tools are like that, often. Go read the man page for find sometime. But to do so much in a command-line tool, that's usually the way of it.

I wish it were different, when I'm trying to do something complex [for example in find] and I have to spend 45 minutes trying to make it work, and can't. I wish it were easy. I wish a google search would give me something I could mostly cut/paste in. [As long as I'm wishing, I'd like a pony too!]

But alas, sometimes it's just not that easy.



I think, in general, that I'd prefer smartmontools to keep the detailed technical focus and not have the more friendly interface, if doing so meant that the technical prowess of smt had to be diminished.

Does it really have to be one or the other?

I'm sure it wouldn't, if resources were unlimited. But Christian only has so much time he can work on this. Heck, I wish he'd write a native Windows port with a built in mailer and GUI. But to do that, he'd have to either get someone to pay for it, or cut somewhere else. Since I'm sure I can't pay for his dev time, I make do with what I get: A great tool that does a vast amount of stuff, and supports a vast number of devices. I've written my own code to handle routine-checks, emailing when non success events occur etc. I did it in power-shell. No, it's not at all perfect - but it works and Christian didn't have to forgo some other development work that's probably more important and certainly demands more technical skill that he has and I don't.



If I myself could become competent I'd do just as you say! Alas, I'm still trying to figure it out myself.

I hope you'll pause and take the time, once you've become more competent, to write up what you can. At worst, no one will use it. At best, it could be a distinct help to many.

Cheers!
-Greg
Ray Andrews
2017-02-04 02:44:18 UTC
Permalink
Re: [smartmontools-support] Using SMARTd to monitor drives
I think you're putting us on a bit. Years of study? :)
Well ok, months then ;-)
But yes, it is rather voluminous, the options and such. *nix tools are
like that, often. Go read the man page for find sometime. But to do so
much in a command-line tool, that's usually the way of it.
God, I know. I sometimes make cut down man pages that get rid of the
settings for the Coptic calendar and base 7 time reporting -- one will
only ever use 10% of what's available in some commands. And the text is
usually written by experts for experts, that is, you have to already be
an expert to even understand what's being said.
But alas, sometimes it's just not that easy.
Sure, that's the culture. But I rebel! I like docs that tell real
people what they need to know for real situations in language that they
can understand.
I think, in general, that I'd prefer smartmontools to keep the
detailed technical focus and not have the more friendly interface, if
doing so meant that the technical prowess of smt had to be diminished.
Does it really have to be one or the other?
I'm sure it wouldn't, if resources were unlimited. But Christian only
has so much time he can work on this. Heck, I wish he'd write a native
Windows port with a built in mailer and GUI. But to do that, he'd have
to either get someone to pay for it, or cut somewhere else. Since I'm
sure I can't pay for his dev time, I make do with what I get: A great
tool that does a vast amount of stuff, and supports a vast number of
devices. I've written my own code to handle routine-checks, emailing
when non success events occur etc. I did it in power-shell. No, it's
not at all perfect - but it works and Christian didn't have to forgo
some other development work that's probably more important and
certainly demands more technical skill that he has and I don't.
Sure! We don't forget this is a volunteer effort.
If I myself could become competent I'd do just as you say! Alas,
I'm still trying to figure it out myself.
I hope you'll pause and take the time, once you've become more
competent, to write up what you can. At worst, no one will use it. At
best, it could be a distinct help to many.
Maybe! I'm a pretty famous document writer in my other life.
Cheers!
-Greg
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Christian Franke
2017-02-04 11:24:41 UTC
Permalink
Post by Christian Franke
It is logged to syslog as: "Sending warning via MAILER to ADDRESS ..."
Post by Christian Franke
immediately after the actual warning message.
That's interesting, how do I enable that?
You cannot enable it. It is enabled by default and cannot be disabled.

Your question suggests that you never checked the SYSLOG output of
smartd. Please do this first before asking further questions here.

On Debian, try:
grep -w smartd /var/log/daemon.log
or for older logs:
zgrep -w smartd /var/log/daemon.log.*
Post by Christian Franke
Post by Christian Franke
In a previous mail from 2017-01-19, I already explained how to disable
the pending sectors alert with the -C directive.
Yes, BUT, I want to see if the count ever increases. That is, I don't
want to see '77' over and over again, but I do want to see if it goes to 78.
This is exactly the effect of the example from the mentioned mail.

Regards,
Christian
Ray Andrews
2017-02-04 15:15:10 UTC
Permalink
Post by Christian Franke
Post by Christian Franke
Your question suggests that you never checked the SYSLOG output of
smartd. Please do this first before asking further questions here.
grep -w smartd /var/log/daemon.log
zgrep -w smartd /var/log/daemon.log.*
That is very useful thanks. Honestly I thought you could only examine
syslog under systemd using the official tools like 'journalctl' , they
are always saying that it's binary now. I had no idea you could just
grep. The entire subject of access to logs is another thing that has
become very difficult to understand. Until now, I've been using
"journalctl -p err -b -1", and not seeing very much :(
Post by Christian Franke
Post by Christian Franke
Yes, BUT, I want to see if the count ever increases. That is, I don't
want to see '77' over and over again, but I do want to see if it goes to 78.
This is exactly the effect of the example from the mentioned mail.
I didn't realize that. The docs seem to say it kills the entire test.
As I replied last night, all of this is obvious to you, but not to
others. But sorry to test your patience.
Christian Franke
2017-02-05 15:30:38 UTC
Permalink
Post by Ray Andrews
Post by Christian Franke
Your question suggests that you never checked the SYSLOG output of
smartd. Please do this first before asking further questions here.
grep -w smartd /var/log/daemon.log
zgrep -w smartd /var/log/daemon.log.*
That is very useful thanks. Honestly I thought you could only examine
syslog under systemd using the official tools like 'journalctl' , they
are always saying that it's binary now. I had no idea you could just
grep.
Fortunately, the traditional log files created by some variant of the
syslog daemon (on Debian: rsyslogd) still exist even on systems that
have moved to systemd.
Post by Ray Andrews
The entire subject of access to logs is another thing that has
become very difficult to understand. Until now, I've been using
"journalctl -p err -b -1", and not seeing very much :(
Or course, as you restricted the messages to LOG_ERR or worse (-p err)
and to previous boot (-b -1). Most of smartd's messages are invisible
then because these use LOG_INFO level.

Note that systemd journals are often not persistent by default (at least
on Debian). Using 'journalctl -b -1' always fails then.

Regards,
Christian
Ray Andrews
2017-02-05 16:53:24 UTC
Permalink
Post by Christian Franke
Or course, as you restricted the messages to LOG_ERR or worse (-p err)
and to previous boot (-b -1). Most of smartd's messages are invisible
then because these use LOG_INFO level.
Live and learn. Smartd is the first program that has required me to
look any further into these logs, so before now I've only learned to
list the errors and I thought all these messages would be considered
errors. There is so much to know, and very little help for the
beginner. Things can become so complex that even the masters forget
how they work. IMHO there should always be a simple introduction to any
program and a simple way to use it for simple things. Anyway, after
what you showed me I wrote an alias:

# cut down log of smartd activity fits within 80 character terminal:
alias smart='grep -w smartd /var/log/daemon.log | sed -r
"s|^(.{16})(.{43})(.{3})(.{6})|\1\3|" | sed "s|SMART Usage Attribute||"'

... so now I can see activity whenever I want:

...
Feb 3 18:57:22 sdc, 77 Currently unreadable (pending) sectors
Feb 3 19:27:17 sdb, : 190 Airflow_Temperature_Cel changed from 68 to 69
Feb 3 19:27:17 sdb, : 194 Temperature_Celsius changed from 111 to 112
Feb 3 19:27:21 sdc, 77 Currently unreadable (pending) sectors
Feb 3 19:57:17 sdb, : 190 Airflow_Temperature_Cel changed from 69 to 67
Feb 3 19:57:18 sdb, : 194 Temperature_Celsius changed from 112 to 110
Feb 3 19:57:21 sdc, 77 Currently unreadable (pending) sectors
Binary file /var/log/daemon.log matches

... and now I can try to figure out how to filter out the pointless
messages. BTW interesting that grep considers the file binary, but you
can do text searches.
Post by Christian Franke
Note that systemd journals are often not persistent by default (at least
on Debian). Using 'journalctl -b -1' always fails then.
Yes, I've learned to make it persistent already.
Carlos E. R.
2017-02-06 11:41:58 UTC
Permalink
Post by Ray Andrews
Post by Christian Franke
Your question suggests that you never checked the SYSLOG output of
smartd. Please do this first before asking further questions here.
grep -w smartd /var/log/daemon.log
zgrep -w smartd /var/log/daemon.log.*
That is very useful thanks. Honestly I thought you could only examine
syslog under systemd using the official tools like 'journalctl' , they
are always saying that it's binary now. I had no idea you could just
grep. The entire subject of access to logs is another thing that has
become very difficult to understand. Until now, I've been using
"journalctl -p err -b -1", and not seeing very much :(
With the systemd journal you could do:

journalctl | grep -w smartd

with the same effect as the commands above. Internally, the journal is
binary, but you can inspect the text and work on it as always - albeit
more slowly.
--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)
Ray Andrews
2017-02-06 18:49:56 UTC
Permalink
Post by Carlos E. R.
journalctl | grep -w smartd
Thanks, that's better. The previous thing stopped giving me any
information. One day I'm going to have to study the entire logging
mechanism.
Carlos E. R.
2017-02-06 19:16:27 UTC
Permalink
Post by Ray Andrews
Post by Carlos E. R.
journalctl | grep -w smartd
Thanks, that's better. The previous thing stopped giving me any
information. One day I'm going to have to study the entire logging
mechanism.
I don't know about your distro, but in openSUSE both journal and syslog
can coexist, although the journal is the boss (and the only one by
default). A syslog daemon had to be installed, I used rsyslog. I don't
have permanent journal files, and I limited its size.
--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)
Ray Andrews
2017-02-06 19:46:43 UTC
Permalink
Post by Carlos E. R.
Post by Ray Andrews
Post by Carlos E. R.
journalctl | grep -w smartd
Thanks, that's better. The previous thing stopped giving me any
information. One day I'm going to have to study the entire logging
mechanism.
I don't know about your distro, but in openSUSE both journal and syslog
can coexist, although the journal is the boss (and the only one by
default). A syslog daemon had to be installed, I used rsyslog. I don't
have permanent journal files, and I limited its size.
God knows, I've just not given it much study. As I said, before smartd,
all I ever cared about was errors.

BTW, just now I wondering what the relationship is between


# default is every 30 minutes:
#smartd_opts="--interval=1800"

in '/etc/default/smartmontools'

and the '-i' switch used in '/etc/smartd.conf'. The latter seems to
have a different unit but in 'man smartd.conf' the switch is mentioned
only in the text, it does not have a section devoted to it. It seems
legal to use but I'm not sure what's going on there and the value can't
be above 255, it seems.

And:

I've tried '-n standby' but it seems to be ignored, all my disks spin up
after '$ smartd -q onecheck' even if I had just spun them down:

$ hdparm -S60y /dev/sda; hdparm -S60y /dev/sdb; hdparm -S60y /dev/sdc

/dev/sda:
setting standby to 60 (5 minutes)
issuing standby command

/dev/sdb:
setting standby to 60 (5 minutes)
issuing standby command

/dev/sdc:
setting standby to 60 (5 minutes)
issuing standby command

$ smartd -q onecheck
...

$ hdparm -C /dev/sda; hdparm -C /dev/sdb; hdparm -C /dev/sdc

/dev/sda:
drive state is: active/idle

/dev/sdb:
drive state is: active/idle

/dev/sdc:
drive state is: active/idle
Carlos E. R.
2017-02-06 22:01:22 UTC
Permalink
Post by Ray Andrews
BTW, just now I wondering what the relationship is between
#smartd_opts="--interval=1800"
in '/etc/default/smartmontools'
and the '-i' switch used in '/etc/smartd.conf'.
Well, I don't even have '/etc/default/smartmontools', it appears to be
an addition of your distribution. On a previous email, Christian Franke
said (9 Dec 2016 07:44:44 +0100, Re: '#start_smartd=yes' in
/etc/default/smartmontools is ignored):

«Please note that /etc/default/smartmontools and its evaluation in the
/etc/init.d/smartmontools is specific to the Ubuntu (and Debian)
packages. It is not part of upstream smartmontools code.»

«The settings in this file may no longer be effective for distributions
using systemd.»


Thus I don't know what does settings do, my documentation does not apply
to your setup.
Post by Ray Andrews
I've tried '-n standby' but it seems to be ignored, all my disks spin up
If you trigger a test I expect the disk to spin up immediately.
--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)
Christian Franke
2017-02-07 06:40:27 UTC
Permalink
Post by Ray Andrews
...
BTW, just now I wondering what the relationship is between
#smartd_opts="--interval=1800"
in '/etc/default/smartmontools'
and the '-i' switch used in '/etc/smartd.conf'. The latter seems to
have a different unit but in 'man smartd.conf' the switch is mentioned
only in the text, it does not have a section devoted to it. It seems
legal to use but I'm not sure what's going on there and the value can't
be above 255, it seems.
Besides the fact that '-i' is explained on smartd.conf man page, I
already explained the effects of both '-i' in my first answer from
2017-01-19. Please read & understand previous answers before asking same
question again.
Post by Ray Andrews
I've tried '-n standby' but it seems to be ignored, all my disks spin up
$ hdparm -S60y /dev/sda; hdparm -S60y /dev/sdb; hdparm -S60y /dev/sdc
setting standby to 60 (5 minutes)
issuing standby command
setting standby to 60 (5 minutes)
issuing standby command
setting standby to 60 (5 minutes)
issuing standby command
BTW, you could also do this with 'smartctl -s standby,N ...'
Post by Ray Andrews
$ smartd -q onecheck
...
$ hdparm -C /dev/sda; hdparm -C /dev/sdb; hdparm -C /dev/sdc
drive state is: active/idle
This is as expected. The initial drive registration (SMART enable,
capability checks) is always done by smartd. This may or may not spin up
drives, depending on drive firmware.

Quote from smartd.conf man page
"This ´nocheck´ Directive is used to prevent a disk from being spun-up
when it is *periodically polled* by smartd."


Regards,
Christian
Ray Andrews
2017-02-07 19:03:58 UTC
Permalink
Post by Christian Franke
Besides the fact that '-i' is explained on smartd.conf man page,
Yes, my mistake, pardon. I forget which of several man pages I'm
looking at. Above question was ad hoc, I should have given it more thought.
Post by Christian Franke
I
already explained the effects of both '-i' in my first answer from
2017-01-19. Please read & understand previous answers before asking same
question again.
Alas, answers drown in the sea of confusion sometimes.
Post by Christian Franke
BTW, you could also do this with 'smartctl -s standby,N ...'
Thanks. But finally I'm beginning to understand the man page. As I've
confessed, really I have just wanted quick, easy answers without deep
understanding; I had not expected to devote much study to SMART, but it
seems that deep study cannot be avoided. For the beginner, it is
overwhelming.

Loading...