koppel writes

A search using the GUI containing common words such as "arm" or "space" turn up zero matches. Searching for those words using the command line succeeds. The GUI term explorer can also find those words. The advanced search can be used to find the words in an abstract, but not elsewhere in the document.

I’m using Recoll 1.19.4 and Xapian 1.2.7.

medoc writes

Hi,

Could you please post the query that recoll performs in both cases ? This is the first line printed from the command line interface, and you can get it in the GUI by clicking the (show query) link (you can copy/paste the resulting text).

koppel writes

Here are the queries:

Result count (est.): -1 Query details: (ARM:(wqf=11))

Result count (est.): -1 Query details: space:(wqf=11) OR spacing OR Space OR spaces OR SPACE OR Spacings OR SPACES OR spaced OR Spacing OR Spaces OR spacings OR SPACING OR Spaced OR space’s OR Space’s OR SPACINGS OR SPACEs OR spacees OR spaceÍ’

medoc writes

I guess that these are 2 different queries in the GUI ? Or the command line ? What I would need is to compare the query performed by the GUI and the query performed by the command line.

Also, are you using multiple indexes by any chance ?

koppel writes

The queries that I posted above were both from the GUI.

Here is the result of a command-line query, cut to just show the first result. Six were actually displayed, though from my understanding of the help text it should have displayed 200 results. Even a query such as "recoll -t -n 100 -q ARM" shows just 6 results.

[sky.ece.lsu.edu] % recoll -t -q 'ARM'
Recoll query: (ARM:(wqf=11))
568 results
text/html   [file:///home/faculty/koppel/pub/sroot-off/info/as.info]        [as.info / Machine Dependencies / ARM-Dependent / ARM Syntax / ARM-Relocations] 1201    bytes

As far as I know I’m using one index.

medoc writes

Hi,

I would need to see the debug log for the GUI case.

Setting up the log (at level 6) is described here: https://bitbucket.org/medoc/recoll/wiki/ProblemSolvingData in the "Obtaining information from the log file" paragraph.

Please either send the log through email: jf@dockes.org, or attach it here. Don’t try to include it in a comment, it usually does not work well. Actually, email might be the best approach, as there may be need for a few more exchanges if you bear with me…

Thanks.

koppel writes

I’ll attach the log file here, it doesn’t have anything more personal than the path to my home directory.

koppel writes

Log file at loglevel 6 of GUI query "ARM" (without quotes). GUI showed zero matches.

koppel writes

FWIW, I’m attaching a log file for a query that does work correctly. Given what is shown in the first log file I’m tempted to rebuild my index, but I won’t until the cause of the flaw is found.

medoc writes

I have looked at the logs, and I have no idea about what causes the Xapian get_mset() exceptions. This might be a bug, but I think that the chance to find it are slight, and the most probable cause is a corrupted index.

I think that the best thing to do would be to just delete the Xapian index directory ($HOME/.recoll/xapiandb by default) and reindex.

Hopefully the problem will just go away, else we’ll know that there is something to debug.

This seems to be a case-sensitive index, did you rebuild it from scratch recently ?

koppel writes

> This seems to be a case-sensitive index, did you rebuild it from scratch recently ?

Yes, with a fresh Xapian directory. I would stop and start it during indexing to play with the parallelization parameters.

I’ll rebuild the index and post back whether it works. That might be on Monday, depending on how fast it reindexes.

koppel writes

Re-indexing is complete and now the search works properly.

It would be nice if this sort of problem could at least be mentioned in the release notes.

medoc writes

I’m glad that it now works.

The problem is going to be mentionned in the release notes, but as you are the first to report it, I’d have had to be quite a medium to do it earlier.

Recoll is not Firefox or Libreoffice, I do as much testing as I can, but it does not go through as much beta-testing as these high-diffusion packages, and the probability for a given user to discover an original problem is higher, which you just experienced.

Also, I still have no idea what happened. From what you wrote, I’d guess that there may be a suspicion that a multithreaded indexer can sometimes get in trouble when interrupted, but this is certainly not part of the design, and unexpected, I never saw it happen before this occurrence.

Did you notice indexing speed improvement while experiencing with the multithreading ?

koppel writes

> The problem is going to be mentionned in the release notes, but as you
> are the first to report it, I'd have had to be quite a medium to do it
> earlier.

I didn’t intend any criticism! I meant that it should be, not that it should have been!

> Recoll is not Firefox or Libreoffice, I do as much testing as I can,
> but it does not go through as much beta-testing as these
> high-diffusion packages, and the probability for a given user to
> discover an original problem is higher, which you just experienced.

I appreciate the effort!

> Also, I still have no idea what happened. From what you wrote, I'd
> guess that there may be a suspicion that a multithreaded indexer can
> sometimes get in trouble when interrupted, but this is certainly not
> part of the design, and unexpected, I never saw it happen before this
> occurrence.

I’ll be alert to similar problems in the future. If I find one I’ll open a new bug on it.

> Did you notice indexing speed improvement while experiencing with the
> multithreading ?

It seemed to go faster, but I didn’t do any actual measurement. There certainly was higher CPU utilization.

medoc writes

On hold until we can reproduce the issue