bharata_ishaya writes

The category filters, both in the form of panel buttons or the toolbar select-box, do not seem to work properly. The categories "all", "media", "message", "other", "presentation", "spreadsheet" and "text" seem fairly straightforward on their own, but the number of results for "all" rarely equal the total of the results in other categories. In fact, "all" usually displays, "no results found" despite the fact that several other categories have plenty of results.

I’m new to Recoll, so perhaps this is expected behavior, but I do find it counterintuitive. At this point it just an annoyance that I’d like to correct if possible. I am not sure if this is a sign of deeper problems.

bharata_ishaya writes

medoc writes

Hi,

Could you please use the "show query" link on the couple of query where "all" shows no results, but there are results under some category. This is really weird, but I guess we’ll arrive at a simple explanation, Recoll does not usually do this…

What system are you running this on (that’s mostly so that I can install the same package) ?

bharata_ishaya writes

Show query for "all" =

#!php

Result count (est.): -1
Query details: ((headache:(wqf=11) OR headaches OR headache's OR headach OR headaching))

Show query for "presentation" =

#!php

Result count (est.): 2
Query details: (((headache:(wqf=11) OR headaches OR headache's OR headach OR headaching) AND ( FILTER (Tapplication/vnd.ms-powerpoint OR Tapplication/vnd.oasis.opendocument.presentation OR Tapplication/vnd.openxmlformats-officedocument.presentationml.presentation OR Tapplication/vnd.openxmlformats-officedocument.presentationml.template OR Tapplication/vnd.sun.xml.impress OR Tapplication/vnd.sun.xml.impress.template))))

Show query for "spreadsheet" =

#!php

Result count (est.): 6
Query details: (((headache:(wqf=11) OR headaches OR headache's OR headach OR headaching) AND ( FILTER (Tapplication/vnd.ms-excel OR Tapplication/vnd.oasis.opendocument.spreadsheet OR Tapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet OR Tapplication/vnd.openxmlformats-officedocument.spreadsheetml.template OR Tapplication/vnd.sun.xml.calc OR Tapplication/vnd.sun.xml.calc.template OR Tapplication/x-gnumeric))))

after clicking on "sort by date" the show query for "spreadsheet" =

#!php

Result count (est.): -1
Query details: (((headache:(wqf=11) OR headaches OR headache's OR headach OR headaching) AND ( FILTER (Tapplication/vnd.ms-excel OR Tapplication/vnd.oasis.opendocument.spreadsheet OR Tapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet OR Tapplication/vnd.openxmlformats-officedocument.spreadsheetml.template OR Tapplication/vnd.sun.xml.calc OR Tapplication/vnd.sun.xml.calc.template OR Tapplication/x-gnumeric))))

bharata_ishaya writes

I am running a Linux flavor called Xubuntu. I added the launchpad repository from here:

medoc writes

Ok. I’m running xubuntu too, and this is my ppa, I’m running exactly the same thing as you do.

This is so weird that I think the first thing to do is to reset the index, before we get to looking at the debug log. Either delete ~/.recoll/xapiandb before reindexing or use recollindex -z. Please use the command line, not the menu entry, so we can see more clearly if the indexer crashes (trying to think of everything here…).

bharata_ishaya writes

OK. I first used the command recollindex -z from the command line but I could see that it completely ignored the indexing settings regarding what parts of my system to avoid ("skipped paths" and "skipped names" set in the GUI). Instead I deleted the folder xapiandb and restarted the the GUI from the command line (so that both the terminal window and the bottom edge of the GUI window will display the indexing progress). In the GUI I selected "rebuild index". I think it will take a few hours to complete.

Thanks for your help with this.

medoc writes

Do you set RECOLL_CONFIG in the environment or use the -c option ? Otherwise, recollindex and recoll pull the index configuration (topdirs) exactly from the same place). Actually the GUI option just does an "exec recollindex". I abandonned running the indexing code inside the GUI a very long time ago. So it appears difficult that they would use different parameters, except if they have a different environment or different command line arguments !

bharata_ishaya writes

Very strange indeed. I have not done anything with RECOLL_CONFIG or used the -c option. When I used the command recollindex -z from the command line I immediately saw that it was indexing .hidden files including .trash .. After manually deleting the folder xapiandb and restarting the the GUI from the command line with the command recoll, the output clearly showed that it was now not indexing .hidden files. What I find interesting is that the folder I deleted, /root/.recoll/xapiandb is not yet recreated. Shouldn’t it be there now, an hour into indexing? Perhaps I do somehow have multiple environments involved due to having a previous Recoll installation from the regular Xubuntu repositories prior to updating from your ppa? I had never indexed anything until after updating to this latest version so it seems like a long shot.

medoc writes

It seems that you have two configs. Were you executing one of the commands as root (su) and the other as your regular user? I would advise against executing recoll as root.

xapiandb should be recreated almost as soon as recollindex starts. If it’s not where you are looking for it, it’s probably elsewhere… You may check $HOME/.recoll/xapiandb and /.recoll/xapiandb if root is involved (bad idea) and depending on the local config.

bharata_ishaya writes

I do have a .recoll/ folder under home and root. In the steps detailed above I deleted the xapiandb folder under root. It has not been recreated so at least we have proof that I did not run Recoll as root. The xapiandb folder does still exist under /home/bharata/.recoll/ but I think selecting "rebuild index" is the same as recolindex -z and truely starts from scratch. Is that correct? Indexing has finished. After the last reference to indexing a file, the final lines visible in the terminal are:

#!python

:3:../index/fsindexer.cpp:244:FsIndexer::index missing helper program(s):
djvutxt (image/vnd.djvu)
lyx (application/x-lyx)
python:epub (application/epub+zip)
python:mutagen (audio/mpeg)
python:rarfile (application/x-rar)
wpd2html (application/vnd.wordperfect)

:3:../index/fsindexer.cpp:248:fsindexer index time:  7611369 mS
:3:../utils/workqueue.h:212:DbUpd: tasks 168533 nowakes 198248 wsleeps 56978 csleeps 77093
:2:../utils/workqueue.h:158:WorkQueue::waitIdle:DbUpd: not ok or can't lock
:3:../rcldb/rcldb.cpp:1651:Db::waitUpdIdle: total xapian work 3669390 mS
:3:../rcldb/rcldb.cpp:1651:Db::waitUpdIdle: total xapian work 634 mS
:3:../utils/workqueue.h:212:DbUpd: tasks 0 nowakes 0 wsleeps 1 csleeps 0
:3:../utils/workqueue.h:212:Internfile: tasks 124094 nowakes 126416 wsleeps 5369 csleeps 106621
:3:../utils/workqueue.h:212:Split: tasks 167769 nowakes 178700 wsleeps 79907 csleeps 72057

medoc writes

Yes, "Rebuild index" just runs "recollindex -z".

The last lines I see look like a normal end of indexing, with a few missing filters for some data formats, nothing special. So I guess that we are back to the queries now !

bharata_ishaya writes

Rebuilding the index has fixed this issue. Now I can enjoy the awesomeness of full-text search!

Thanks for your help with this!!

medoc writes

Thanks for your patience, this should not happen, I do not know how the Xapian database got corrupted in this case, but there have been a number of reports lately of the indexer crashing when running multithreaded. This remains a very rare event, and I was not able to find the bug at this point. You can disable multithreading by adding the following to ~/.recoll/recoll.conf:

#!shell

thrQSizes = -1 -1 -1

This will worsen indexing times of course, but may be worth doing if you experience repeated problems.

jf