joellathrop writes

If a query of a single term of 40 characters or more is given to recoll, it will say that it "resolved to null query" and then list every document in the index.


$ recoll -c /mnt/bulk/vis-res-index/ -t  ffffffffffffffffffffffffffffffffffffffff
:2:../rcldb/searchdata.cpp:995:SearchDataClauseSimple: resolved to null query
Recoll query: (<alldocuments >)
1361102 results (printing  2000 max):
inode/directory     [file:///mnt/work/Data] [Data]  0       bytes
inode/directory     [file:///mnt/work/Data/Tests]   [Tests] 0       bytes
... etc., etc. ...

As far as I can tell, Xapian has term [maximum length limits](, so it’s understandable that there’s a limit somewhere. The problem is that the user is given no indication that this is the underlying issue, and recoll also unnecessarily begins listing all documents in the index.

It would be preferrable for recoll to simply give the user an error message informing of the term length limit, identifying the offensive term, and exiting with an error code.

medoc writes

Recoll discards all terms longer than 40 characters. This is an arbitrary choice which ought to be documented somewhere if it’s not already.

The consequence is indeed that a query for a term longer than 40 characters will resolve to a null query (<alldocuments >).

Unfortunately, given the relatively complex processing which occurs between a user entry and the final Xapian Query, handling the condition in a satisfactory manner is not trivial.

There does not seem to be an easy way to ask Xapian if the final query is (<alldocuments >), other than testing the query description string, which can just as well be done in the calling script.

I will document the situation and may add at least a warning flag to the text splitter to indicate that a long word discard occurred. This may in turn make its way up the call chain until we can display a user warning, but I am making no promises here, as this is a relatively complex fix to a rare and relatively benign condition.

medoc writes

This generates an error when the final query is null, but still no warning for a term discard if there are others in the query which make it not null