biomimetics writes


I am currently trying to improve the results of find similar (which beforehand gave me only rather akward results). so I have some questions

  • is there some more documentation on how recoll/xapian pick the most relevant terms (which also show up after in the search bar after I clicked on find similar, which is very helpful)

  • are these all the relevant terms which are used to find similar documents or are only the first eg 10 ones shown?

  • is there a possibility to interact with "find similar" in more detail (eg leave terms out or edit the weighting scheme?)

In the manual I only found:

The Find similar entry will select a number of relevant term from the current document and enter them into the simple search field. You can then start a simple search, with a good chance of finding documents related to the current result.

medoc writes

There is some Xapian documentation here: Search for get_eset()

There are probably other pieces of Xapian documentation about this, for example a sample Python program here:

I’m quite sure that there is some more in-depth Xapian doc on the subject. Else, ask the Xapian mailing list, they are generally quite helpful.

The relevant Recoll code is in rcldb/rclquery.cpp, Query::expand()

I’ve never worked on this a lot, there is a good chance that it could be improved. Maybe I’m doing it completely wrong actually :)