[smartmontools-support] how to control how often smartd-runner fires.

Post by Ray Andrews
Also, I get the same message over and over again about my 77 pending
sectors, but it's been 77 pending sectors for years, and I'd really
rather not seeany further notifications about it untilsomethingelse goes
wrong.

Maybe you should try to "repair" them.

--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)

Ray Andrews

2017-01-18 23:27:36 UTC

Maybe you should try to "repair" them.

How? I've presumed that bad sectors are bad sectors forever.

Post by Carlos E. R.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

Carlos E. R.

2017-01-19 01:45:02 UTC

Maybe you should try to "repair" them.

How? I've presumed that bad sectors are bad sectors forever.

No, they get relocated.

Hard disks have an area dedicated since manufacture to relocate bad
sectors, automatically when trying to write to a bad sector. The pending
sector value means just that: they are marked for realocation the
instant you try to read them, and there is a count of them.

How to do that?

One method is backup the affected partition, fill it up with zeroes with
dd, format and restore.

However, just running "badblocks" on it may do the trick.

--
Cheers / Saludos,

Carlos E. R.

(from 42.2 x86_64 "Malachite" (Minas Tirith))

Ray Andrews

2017-01-19 03:11:32 UTC

Post by Ray Andrews
How? I've presumed that bad sectors are bad sectors forever.
No, they get relocated.

Yeah I know, I meant the actual physical sectors.

Post by Ray Andrews
Hard disks have an area dedicated since manufacture to relocate bad
sectors, automatically when trying to write to a bad sector. The pending
sector value means just that: they are marked for realocation the
instant you try to read them, and there is a count of them.

Thanks, I wondered exactly how that worked. Strange the disk doesn't
just get it over with tho.

Post by Ray Andrews
How to do that?
One method is backup the affected partition, fill it up with zeroes with
dd, format and restore.
However, just running "badblocks" on it may do the trick.

Nuts, 14 partitions on the disk :-( Oh, well, that's still a good idea.
But any ideas as to my original issue? I'd still like to be able to
control how often
smartd-runner runs.

Post by Ray Andrews
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

Carlos E. R.

2017-01-19 03:33:42 UTC

Post by Carlos E. R.
However, just running "badblocks" on it may do the trick.

Nuts, 14 partitions on the disk :-( Oh, well, that's still a good idea.

Well, if you know the LBA address of some of the bad sectors, then you
can find the partition affected and run the procedure on that one only.

I run recently badblocks trying to find the location of some bad blocks
I had, and they cleared out. Another chap in this list had the same
thing occur to him, recently, so maybe you get lucky too. :-)

Post by Ray Andrews
But any ideas as to my original issue? I'd still like to be able to
control how often
smartd-runner runs.

No, sorry... the concept is news to me.

But clearing the pending sector list would keep it silent, right? Or I hope.

--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)

Nathan Stratton Treadway

2017-01-19 02:43:38 UTC

Maybe you should try to "repair" them.

How? I've presumed that bad sectors are bad sectors forever.

It hasn't been updated in a while, but you may find some useful
discussion at:

https://www.smartmontools.org/browser/trunk/www/badblockhowto.xml

Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - ***@ontko.com - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239

Ray Andrews

2017-01-19 03:30:40 UTC

Post by Nathan Stratton Treadway
It hasn't been updated in a while, but you may find some useful
https://www.smartmontools.org/browser/trunk/www/badblockhowto.xml

Thanks, that's a well written doc.

mathog

2017-01-19 17:51:40 UTC

Post by Nathan Stratton Treadway

Maybe you should try to "repair" them.

How? I've presumed that bad sectors are bad sectors forever.

It hasn't been updated in a while, but you may find some useful
https://www.smartmontools.org/browser/trunk/www/badblockhowto.xml

For some disks the only way to clear these is to use "badblocks" in the
nondestructive read then write mode. One must boot the machine with a
rescue CD, USB drive, or over the network, because the disk in question
cannot be in use at the time. This will not tell you where those blocks
are (unless an error occurs, which it probably will not) but it will
clear the pending sectors count.

It is dumb, dumb, dumb that one must scan an entire disk to do this when
the disk already knows exactly where those blocks are - it just will not
divulge the information! The result is that it takes a very long time
and is very inconvenient to fix this issue when it should take about 100
milliseconds and be straightforward.

If the disk in question is part of a RAID set, well, good luck. I have
not had to deal with that yet, but suspect the easiest path might in
that case be to remove the disk from the machine and plug it into a
nonRAID machine for the repair. smartctl can talk "through" a RAID
controller to operate on the disks behind it, as with:

/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,5

but I don't think that badblocks has that ability. Nor does smartctl
have a nondestructive read then write mode. Somebody please correct me
if I'm wrong on that point. On the other hand, if you see pending
sectors on a RAID disk, you're probably going to want to replace it
right away in any case.

Regards,

David Mathog
***@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Ray Andrews

2017-01-19 18:25:31 UTC

Post by mathog
It is dumb, dumb, dumb that one must scan an entire disk to do this
whenthe disk already knows exactly where those blocks are - it just
will notdivulge the information! The result is that it takes a very
long time
and is very inconvenient to fix this issue when it should take about 100
milliseconds and be straightforward.

Thanks for being honest about that, sometimes guys try to convince you
that it's all for the best when really it is dumb -- let's just admit it
to ourselves and try to cope.

Post by mathog
but I don't think that badblocks has that ability. Nor does smartctl
have a nondestructive read then write mode. Somebody please correct me
if I'm wrong on that point. On the other hand, if you see pending
sectors on a RAID disk, you're probably going to want to replace it
right away in any case.

But you'd think that some whipper-snapper ace would write something.
Just from what I read at that link posted yesterday it sounded like it
might even be scriptable. And you'd also think that there'd be a
utility that let you know which files were effected so that you could do
something about it. At that point, you'd just have to read the file to
clear the error, no? Then make your backups and all is right with the
world.

In my case, I think the disk got bashed when running so it trashed some
part of the disk, but its otherwise fine, cuz it's been 77 pending
sectors for literally years otherwise of course I'd just replace the thing.

Carlos E. R.

2017-01-19 19:08:24 UTC

But you'd think that some whipper-snapper ace would write something.
Just from what I read at that link posted yesterday it sounded like it
might even be scriptable.

Not that simple. Remember that smartctl doesn't do any testing; it is
the hard disk itself, via its local cpu and "bios" and memory which does
the real thing. smartctl /simply/ triggers it, and later reads the logs
to find out what was the result.

In this context, we can guess that the hard disk keeps somewhere a list
of the current bad blocks. That somewhere is not standardized (this is
an eduicated guess of mine); apparently we only have access via SMART to
read the number of bad sectors pending realocation.

Post by Ray Andrews
And you'd also think that there'd be a
utility that let you know which files were effected so that you could do
something about it.

No, this is also difficult. If the functionality exists at all it
depends on the filesystem type. Consider that the filesystem keeps an
easy to search for filename (and path), somehow, very fast. Once you
have the entry for the filename, you get also some sort of table or list
of the sectors or records where that file is stored. This is a very fast
and optimized operation.

If you want the reverse, you have to scan all directory entries and all
tables/lists of record locations, one by one, till you hit the one you
search for. This is very disk intensive and slow. Sometimes the function
does not exist at all.

I read somewhere a procedure to find the affected file on some partition
types. It must be linked somewere at the smartmontools web page, if I
recall correctly.

So it is easier to try to backup the partition, file by file. If one
contains a badblock, you will get an error, maybe a crash of the tool.
If no error, then overwrite the entire partition with zeroes, format and
restore.

Post by Ray Andrews
At that point, you'd just have to read the file to
clear the error, no? Then make your backups and all is right with the
world.

No, a read does not clear the error. A write with failure triggers the
relocation, automatically by the hard disk. The operating system may
know nothing except the delay.

Post by Ray Andrews
In my case, I think the disk got bashed when running so it trashed some
part of the disk, but its otherwise fine, cuz it's been 77 pending
sectors for literally years otherwise of course I'd just replace the thing.

Run badblocks (read mode) on the entire disk.

--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)

Ray Andrews

2017-01-23 04:17:30 UTC

Post by Ray Andrews
And you'd also think that there'd be a
utility that let you know which files were effected so that you could do
something about it.

If you want the reverse, you have to scan all directory entries and all
tables/lists of record locations, one by one, till you hit the one you
search for. This is very disk intensive and slow. Sometimes the function
does not exist at all.

Right, I can see that the system only really works one way. But if one
was doing a full 'badblocks' scan anyway then surely the information
would be available at that time? Nuts, even a raw dump of the block
would give you some idea what's there. That's how it used to work under
DOS.

As to what Christian said:

"The '-M exec' script is only run on error conditions which also result
in a LOG_CRIT syslog message. For LOG_INFO messages issued by smartd,
see the configured syslog. On Debian/Ubuntu, this is usually
'/var/log/daemon.log'."

It seems counter intuitive. Why run smartd four times a day (or whatever) when there is no way to make the error visible more than (it seems) once or twice per day? I did this experiment:

In smartd_warning.sh:

# Export message with trailing newline
export SMARTD_FULLMESSAGE="$fullmessage
"
export SMARTD_ERROR="${SMARTD_MESSAGE-[SMARTD_MESSAGE]}"

In smartd_runner:

#!/bin/zsh

tmp=$(tempfile)
cat > $tmp

# Show the time and date of the test, no newline:
echo -n "$( date ):" >>! /var/lib/smartmontools/smartd-log

# Retrieve prevous error message:
SMARTD_PREVIOUS=$( cat /var/lib/smartmontools/smartd-previous )
# SMARTD_ERROR set in smartd_warning.sh:
if [ "$SMARTD_PREVIOUS" = "$SMARTD_ERROR" ]; then
echo "IDENTICAL" >>! /var/lib/smartmontools/smartd-log
# Nothing has changed so abort the notifier:
return
fi
# Message is not identical so echo it to the log:
echo "$SMARTD_ERROR" >>! /var/lib/smartmontools/smartd-log
# And save it for the next comparison:
echo "$SMARTD_PREVIOUS" >! /var/lib/smartmontools/smartd-previous

# runs '/usr/bin/smart-notifier -> /usr/share/smart-notifier
/smart-notifier' via '/etc/smartmontools/run.d/60smart-notifier'
run-parts --report --lsbsysinit --arg=$tmp --arg="$1" \
--arg="$2" --arg="$3" -- /etc/smartmontools/run.d

rm -f $tmp

And I get this:

Thu Jan 19 12:02:27 PST 2017 Device: /dev/sdb [SAT], 77 Currently unreadable (pending) sectors
Thu Jan 19 12:26:55 PST 2017:IDENTICAL
Fri Jan 20 12:05:27 PST 2017:IDENTICAL
Fri Jan 20 12:26:55 PST 2017:IDENTICAL
Sat Jan 21 12:57:38 PST 2017:IDENTICAL
Sun Jan 22 16:57:35 PST 2017:IDENTICAL

... So It seems I get one or two messages per day, sometimes about 20 minutes between them, and there is no way to control how many or when they show up? Anyway the code above does at least filter out duplicate messages, but I'd expect to have that functionality available stock. If I didn't use smart-notifier, could I receive timely messages some other way? That is, a message if needed every time smartd runs?

Carlos E. R.

2017-01-23 04:51:06 UTC

Post by Ray Andrews
And you'd also think that there'd be a
utility that let you know which files were effected so that you could do
something about it.

The badblocks utility doesn't really need to know what is that
particular sector for. It just tests sector by sector. I understand it
does not even care what filesystem it is. It tests one sector, then the
next, then the next... etc. Maybe it would mind if the sector contains
data or not.

Yes, a raw dump of a sector might give some idea of what it is. If it is
text and you recognize it, bingo! But suppose the bad sector is really
bad, that you can not read it... That's the worst case.

But even in MsDOS locating the file that owned certain sector took time;
maybe less than now because disks were smaller, and the basic read speed
was about the same as now.

The procedure is as I described: first scan the root directory. One
entry for each file contains the address of the first sector of a file,
and the FAT table has a linked list of sorts to the next sectors. The
FAT table can be stored in memory (in the old times, it was limited to a
"segment", 64KB), so finding all the sectors of that file is fast.
Compare the list with the sector of interest, then try next file in the
root directory. Repeat for all directories (a recursive search using
findfirst/findnext functions), till the sector of interest is located.
It is simple, only intensive.

I have the fuzzy idea that a procedure for finding the file that uses a
certain sector was discussed somewhere, perhaps on the smartctl howtos.
Not for all filesystems.

Post by Ray Andrews
"The '-M exec' script is only run on error conditions which also result
in a LOG_CRIT syslog message. For LOG_INFO messages issued by smartd,
see the configured syslog. On Debian/Ubuntu, this is usually
'/var/log/daemon.log'."
# Export message with trailing newline
export SMARTD_FULLMESSAGE="$fullmessage
"
export SMARTD_ERROR="${SMARTD_MESSAGE-[SMARTD_MESSAGE]}"
#!/bin/zsh
tmp=$(tempfile)
cat > $tmp
echo -n "$( date ):" >>! /var/lib/smartmontools/smartd-log
SMARTD_PREVIOUS=$( cat /var/lib/smartmontools/smartd-previous )
if [ "$SMARTD_PREVIOUS" = "$SMARTD_ERROR" ]; then
echo "IDENTICAL" >>! /var/lib/smartmontools/smartd-log
return
fi
echo "$SMARTD_ERROR" >>! /var/lib/smartmontools/smartd-log
echo "$SMARTD_PREVIOUS" >! /var/lib/smartmontools/smartd-previous
# runs '/usr/bin/smart-notifier -> /usr/share/smart-notifier
/smart-notifier' via '/etc/smartmontools/run.d/60smart-notifier'
run-parts --report --lsbsysinit --arg=$tmp --arg="$1" \
--arg="$2" --arg="$3" -- /etc/smartmontools/run.d
rm -f $tmp
Thu Jan 19 12:02:27 PST 2017 Device: /dev/sdb [SAT], 77 Currently unreadable (pending) sectors
Thu Jan 19 12:26:55 PST 2017:IDENTICAL
Fri Jan 20 12:05:27 PST 2017:IDENTICAL
Fri Jan 20 12:26:55 PST 2017:IDENTICAL
Sat Jan 21 12:57:38 PST 2017:IDENTICAL
Sun Jan 22 16:57:35 PST 2017:IDENTICAL
... So It seems I get one or two messages per day, sometimes about 20 minutes between them, and there is no way to control how many or when they show up? Anyway the code above does at least filter out duplicate messages, but I'd expect to have that functionality available stock. If I didn't use smart-notifier, could I receive timely messages some other way? That is, a message if needed every time smartd runs?
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)

Ray Andrews

2017-01-23 05:24:38 UTC

Post by Carlos E. R.
Yes, a raw dump of a sector might give some idea of what it is. If it is
text and you recognize it, bingo! But suppose the bad sector is really
bad, that you can not read it... That's the worst case.

Sure, but at least you have a fighting chance.

Post by Carlos E. R.
I have the fuzzy idea that a procedure for finding the file that uses a
certain sector was discussed somewhere, perhaps on the smartctl howtos.
Not for all filesystems.

It does seem strange that with all the robust tools available in NIX,
that when you loose a sector that it's an ordeal to figure out what's
been corrupted. However slow, one might expect that there'd be a
utility to figure it out even if it had to read the entirety of whatever
it is that is the analogue of the old FAT until it had found all the
files. There must be situations where one HAS to know, and nevermind
how long it takes.

Carlos E. R.

2017-01-23 20:06:44 UTC

Post by Carlos E. R.
I have the fuzzy idea that a procedure for finding the file that uses a
certain sector was discussed somewhere, perhaps on the smartctl howtos.
Not for all filesystems.

I asked google "find file at certain sector" and found some answers, for
Windows, ntfs.

http://superuser.com/questions/97823/how-do-i-determine-what-file-occupies-a-given-sector

apparently "nfi.exe" does it, but there are more answers.

http://www.tomshardware.co.uk/forum/272990-32-identify-file-sectors

can't read it, goes blank, after displaying some content. Mentions seatools.

Google "find file at certain sector linux"

https://wiki.archlinux.org/index.php/Identify_damaged_files

This is a good article. I think the one I was thinking of previously.

Another article mentions "https://sourceforge.net/projects/ddrutility/"
for ntfs. http://www.ubuntu-rescue-remix.org/Version12-04

--
Cheers / Saludos,

Carlos E. R.
(from 42.2 x86_64 "Malachite" at Telcontar)

Ray Andrews

2017-01-23 21:29:26 UTC

Post by Carlos E. R.
http://www.tomshardware.co.uk/forum/272990-32-identify-file-sectors
can't read it, goes blank, after displaying some content. Mentions seatools.

Seems the appropriate linux tool is 'diskdigger'.

Post by Carlos E. R.
Google "find file at certain sector linux"
https://wiki.archlinux.org/index.php/Identify_damaged_files
This is a good article. I think the one I was thinking of previously.

Best damn documents in linux. Gotta switch to Arch one of these days.

Christian Franke

2017-01-24 06:32:07 UTC

Post by Ray Andrews
It does seem strange that with all the robust tools available in NIX,
that when you loose a sector that it's an ordeal to figure out what's
been corrupted. However slow, one might expect that there'd be a
utility to figure it out even if it had to read the entirety of whatever
it is that is the analogue of the old FAT until it had found all the
files.

I use GNU ddrescue to check for bad blocks, do disk images and try to
recover bad blocks with read retries:

https://www.gnu.org/software/ddrescue/
(On Debian/Ubuntu, it is in the package gddrescue)

To check which files are affected by bad blocks, I typically use
ifind/istat/ffind from TSK:

https://www.sleuthkit.org/sleuthkit/

Ddrutility provides a script which automates this (I didn't test this yet):

https://sourceforge.net/projects/ddrutility/

Regards,
Christian

Ray Andrews

2017-01-24 14:56:08 UTC

Post by Christian Franke
I use GNU ddrescue to check for bad blocks, do disk images and try to
https://www.gnu.org/software/ddrescue/
(On Debian/Ubuntu, it is in the package gddrescue)
To check which files are affected by bad blocks, I typically use
https://www.sleuthkit.org/sleuthkit/
https://sourceforge.net/projects/ddrutility/

These look like powerful tools, thanks.

Ray Andrews

2017-01-27 15:38:50 UTC

Post by Christian Franke

I use GNU ddrescue to check for bad blocks, do disk images and try to
https://www.gnu.org/software/ddrescue/
(On Debian/Ubuntu, it is in the package gddrescue)
To check which files are affected by bad blocks, I typically use
https://www.sleuthkit.org/sleuthkit/
https://sourceforge.net/projects/ddrutility/
Regards,
Christian

BTW, as to my original question, it seems that "$ smartd -q onecheck"
will give me exactly the 'right now' check that I want, and if I'm
DEVICESCAN -a -H -l error -l selftest -f -s
(S/../.././12|L/../../6/12) -m root -M exec
/usr/share/smartmontools/smartd-runner.
... does the trick, and I don't need One further question: what
modifications to the above might be recommended? The documents, as
will most documents in Linux, presume that you are already an expert.
DEVICESCAN -a -m root -M exec /usr/share/smartmontools/smartd-runner
... What would I miss?

Christian Franke

2017-01-19 09:55:40 UTC

Post by Ray Andrews
I'm trying to control how often 'smart-notifier'/'smartd-runner' sends
DEVICESCAN -a -i 30 -m root -M test -M exec
/usr/share/smartmontools/smartd-runner

The '-i 30' smartd.conf directive means "ignore Attribute number 30 when
checking for failure of Usage Attributes" (see smartd.conf man page).
This is probably not what you want.

The '-i 30' smartd command line(!) option "sets the interval between
disk checks to 30 seconds" (see smartd man page). Using such short
intervals is not recommend, except for testing purposes.

The usage of 'smartd-runner' suggests that you are using Debian or
Ubuntu. If yes, the smartd command line options could be configured in
'/etc/default/smartmontools'. Note that both are Debian/Ubuntu specific
and not part of the upstream source code.

Post by Ray Andrews
$ smartctl -t short /dev/sdb
$ killall -HUP smartd
$ /etc/init.d/smartmontools restart
... and sure enough the test messages popup, but they contain no
actualinformation.

This is as expected. The '-M test' directive is only intended to test
the functionality of the '-m ... -M exec ...' directives.

Post by Ray Andrews
But it seems that setting '-i xxxx' to anything does not change how
often smartd-runner works in practice, I get about one message per day.

The '-M exec' script is only run on error conditions which also result
in a LOG_CRIT syslog message. For LOG_INFO messages issued by smartd,
see the configured syslog. On Debian/Ubuntu, this is usually
'/var/log/daemon.log'.

By default, pending sectors are considered critical and therefore daily
reminder LOG_CRIT messages are issued. Try '-C 197+' directive to change
this.

See '-m', '-M'and '-C' sections on smartd.conf man page for further details.

Thanks,
Christian

Ray Andrews

2017-01-19 15:20:33 UTC