Unknown reporter writes
I have experience repeated system crashes while using recollindex. The computer freezes completely and I have to hold the power button to shutdown the computer. I am working on Linux (customised fedora). I believe this behaviour started with the upgrade to kernel 4.7.2.
I have downgraded my kernel to 4.7.1 and indexing seems to work again. This is for your information. I am not knowledgable enough to find out what exactly goes wrong but in case you get other reports downgrading might be a suggestion.
medoc writes
Hi, and thanks for reporting this.
This is a very serious issue, but it is not a recoll one. A user-mode non-privileged application crashing or freezing the system is the very definition of a kernel bug (all the more because you can enable/clear the problem by changing the kernel version).
I think that the best approach would be to file a Fedora bug:
https://fedoraproject.org/wiki/Bugs_and_feature_requests https://fedoraproject.org/wiki/How_to_file_a_bug_report
An issue like this should get the attention of a Fedora developer.
I would be quite willing to help, but I’m not sure that I can do much. I’ll be available if the Fedora guy needs more info about the indexer.
One thing you could try would be to disable multihreading, to see if it changes anything. Set the following in the recoll configuration file (~./recoll/recoll.conf): thrQSizes = -1 -1 -1
One way the indexer could "simulate" a kernel freeze would be by eating all available memory, getting the system to swap and become so slow as to appear frozen. You could check this by running top and monitoring the indexer size while it is running.
In any case it’s a good thing that this is now recorded and searchable, this may help the next guy…
jf
r_mano writes
Hi --- I do not know if it’s related, but I have a similar problem in Ubuntu 16.04, kernel 4.4.0.
The system is not really dead; ctrl
-alt
-F1
and waiting a bit (60-70 seconds, on an intel i5 w/16G ram) I have a shell and killing recollindex with -9 will slowly take the system back to life. I think it’s memory related, but in top it doesn’t seems (virt 700M, 15-16% of CPU). Sometime, running recollindex with ionice -c3
, I can just wait a bit and work at the shell and the system will slowly come back to life, without killing anything, but still sluggish. The rest of data here is in this second case.
The log is suspicious: near the end I have
:3:index/fsindexer.cpp:239:FsIndexer::index missing helper program(s):
python:epub (application/epub+zip)
python:midi (audio/x-karaoke)
python:pylzma (application/x-7z-compressed)
python:rarfile (application/x-rar)
:3:index/fsindexer.cpp:243:fsindexer index time: 6270546 mS
:3:./utils/workqueue.h:202:DbUpd: tasks 776322 nowakes 911091 wsleeps 7824 csleeps 612427
:2:rcldb/rcldb.cpp:1989:Db::purge: document #123376: Expected block 53355 to be level 1, not 0
…and then the last line with a bit of different numbers is repeated thousands of times. It has been 15 minutes since the strange lines started to appear, and recollindex is still running and affecting the system responsiveness. I am sure that if I leave the system idle it will recreate the almost freeze condition.
PS I run recollindex with
RCLCRON_RCLINDEX= RECOLL_CONFDIR="/home/romano/.recoll/" ionice -c3 recollindex
Could be a database corruption?
EDIT: it finished after another 15 minutes, with
:2:./utils/workqueue.h:148:WorkQueue::waitIdle:DbUpd: not ok or can't lock
:3:rcldb/rcldb.cpp:1714:Db::waitUpdIdle: total xapian work 5540043 mS
:3:rcldb/rcldb.cpp:1714:Db::waitUpdIdle: total xapian work 473 mS
:3:./utils/workqueue.h:202:DbUpd: tasks 0 nowakes 0 wsleeps 1 csleeps 0
:3:./utils/workqueue.h:202:Internfile: tasks 131873 nowakes 132142 wsleeps 24260 csleeps 80120
:3:./utils/workqueue.h:202:Split: tasks 776322 nowakes 787069 wsleeps 4995 csleeps 752563
The full log is available if you need it; it’s quite big…
[romano:~/.local/tmp] % wc recollindex-RRyS-Thu
788739 4106137 106964930 recollindex-RRyS-Thu
medoc writes
Hi,
Yes, I think that the index is corrupted, this is very bad news, we were hoping that the problem described at the top of http://www.lesbonscomptes.com/recoll/BUGS.html was solved by Xapian 1.2.21, but it is not the case apparently as 16.04 has Xapian 1.22.
About the ionice command: this should have no effect actually, because recollindex does it internally. I need to check if this does what it is supposed to do…
Your index is damaged, unfortunately, I see no other possibility than to delete and recreate it. Maybe you can do it at night ?
It’s weird that the command is hurting the system while using so little memory and CPU, and while ioniced very low.
I think that it is a bit early to alert the Xapian devs, lets monitor this and see if it happens again.
r_mano writes
Ok, I will rebuild it. Notice that the index is really old (three year, maybe), so it can have remnants of data from previous releases. I have an almost mirrored machine at work (just less multimedia files there) and there is no problem with recollindex --- it is noticeable too on the load, but runs every other night at 1am and just update things in less than 10 minutes.
Can I just delete the xapian database directory and wait the next recollindex run? Or how can I reset and restart the indexing without having to re-instate all the preferences (skipped dirs, extensions, etc?)
medoc writes
You just need to delete the xapiandb directory. If the db is old, maybe the problem was initiated by the older xapian version, which would be good news, but we have no indication that the index db problem and the machine freeze are actually related, so we’ll see.
r_mano writes
Ok, I have rebuilt the index, and the good news is that it seems that the error is gone and the indexing time is back to normal, as in my other PC.
The bad news is that the recollindex still slow down the machine a lot. I suspect something with the window manager, but I am not sure (I use gnome-shell). I was running recollindex with a system monitor running, and the system seems idle. As soon as I do anything, the load jumps to the roof… and the system respond after minutes. This is the graphical idea:

80 seconds to make the screenshot… back to run recollindex at 1am ;-). This is so strange I think it’s a problem of this machine only. During this "waiting" time I have these messages:
SYS: Sep 2 20:51:52 RRyS gnome-session[22196]: Gjs-Message: JS LOG: Received error from DBus search provider org.gnome.Contacts.desktop: Gio.IOErrorEnum: Timeout was reached
SYS: Sep 2 20:51:52 RRyS gnome-session[22196]: Gjs-Message: JS LOG: Received error from DBus search provider org.gnome.Documents.desktop: Gio.IOErrorEnum: Timeout was reached
SYS: Sep 2 20:51:52 RRyS gnome-session[22196]: Gjs-Message: JS LOG: Received error from DBus search provider seahorse.desktop: Gio.IOErrorEnum: Timeout was reached
SYS: Sep 2 20:51:52 RRyS gnome-session[22196]: Gjs-Message: JS LOG: Received error from DBus search provider org.gnome.Photos.desktop: Gio.IOErrorEnum: Timeout was reached
SYS: Sep 2 20:51:52 RRyS gnome-session[22196]: Gjs-Message: JS LOG: Received error from DBus search provider org.gnome.Software.desktop: Gio.IOErrorEnum: Timeout was reached
…that can be (or not) related.
If I have other freezes or problems with index error I’ll chime back in. Thanks for your support…
medoc writes
Hi,
Your machine is using a quite significant amount of swap. This is not typical. I am no system performance analysis expert, but it probably indicates that it was seriously memory-starved at some point.
If elements of the desktop gets swapped out, this may explain why everything seems to work normally until you actually try to access the desktop at which point the system tries to get all needed pieces back into memory which generates a lot of activity and can take some time.
It can’t be excluded that the indexer has a memory usage peak at some point and evicts the rest of the system. This used to occur more often, but most cases have been eliminated by various means (splitting big text files, setting size thresholds, etc.). Maybe (quite probably) some cases have been missed.
The way to confirm this would be to monitor the indexer process during its whole life, possibly with something like the following:
while true; do ps aux | grep recoll | grep -v grep;sleep 1;done > /tmp/some-trace-name
If we do see a big peak in memory usage, we’ll have to find a way to correlate it with the actual indexer activity. It should be possible to mix the indexer log and ps trace to do this.
Using recoll as string to filter should also catch the input handlers in /usr/share/recoll/filters in addition to recollindex itself. It will not help if the culprit is actually an external helper lile pdftotext or antiword, but this is a start…
medoc writes
No feedback, no evidence of a recoll issue.