Installing over an older version and other notes

Ubuntu commands installed as snap packages can’t create arbitrary files under /tmp. This is for example the case by default for pdftk which is used by Recoll to extract PDF attachments. For best results, set TMPDIR to a location which belongs to you (e.g. inside your home, with something like export TMPDIR=~/tmp in your shell startup script). Recoll could conceivably work around the problem all by itself, but I find it in bad taste to create temporary files in an arbitrary location inside your home.

1.20-26 indexes are fully backward compatible. Installing 1.26 over an 1.19 index is possible, but there have been small changes in the way compound words (e.g. email addresses) are indexed, so it will be best to reset the index. Still, in a pinch, 1.26 search can mostly use an 1.19 index.

New index format with Xapian 1.4: the default on-disk format of Xapian 1.4 (glass) has changed to improve the performance of phrase searches. This had the infortunate consequence of rendering the Recoll snippets generation method excessively slow except for very small indexes. In consequence, new indexes created by Recoll 1.24/26 using Xapian 1.4 have a different format and store the document texts inside the index. No specific action is required from the user, except if you have and old index and want to use the new format (nicer snippets, faster phrase searches), in which case you should delete the old index (see next).

Always reset the index if you do not know by which version it was created (e.g.: you’re not sure it’s at least 1.18). The best method is to quit all Recoll programs and delete the index directory (rm -rf ~/.recoll/xapiandb), then start recoll or recollindex.

recollindex -z will do the same in most, but not all, cases. It’s better to use the rm method, which will also ensure that no debris from older releases remain (e.g.: old stemming files which are not used any more).

On Windows, the index is located by default in C:/Users/[me]/AppData/Local/Recoll/xapiandb

Case/diacritics sensitivity is off by default. It can be turned on only by editing recoll.conf ( see the manual). If you do so, you must then reset the index.

Changes in Recoll 1.27

  • Partial support for external term generators. In practise this allows using language-sensitive text analysers (Mecab-ko directly and others through konlpy) for indexing Korean text. See the page about Korean text in english and in korean.

  • Index MS-Word .docx endnotes and footnotes.

  • GUI: Fix having to type CR twice for multi-word searches.

  • Do not ignore white space while splitting CJK text with the n-gram splitter.

  • Windows: the Recoll Python extension is now available. This makes it possible to run the Recoll WEBUI under Windows.

  • Windows: MSVC build. This allows building the Python extension module (the official Python distribution for Windows is built with MSVC, so I did not find a way for the MinGW-built Recoll extension to work).

  • Windows: fix "special indexing" feature which had trouble passing arguments to recollindex.

  • Windows: fix bug which prevented Recoll from initializing when the configuration directory was inside the user’s home and the user login was not ASCII.

  • Windows: improve processing of non-ASCII paths in a few other places.

Minor releases at a glance

  • 1.27.12

    • Repair Python source file indexing.

    • Repair spelling replacements in GUI.

  • 1.27.11

    • Forgotten files in distribution broke the pythonx-recoll packages.

  • 1.27.10

    • Fix real-time indexer not working when topdirs was /

    • rclpython: was not working at all. Fixed, renamed Use rclexecm. Only beautify for preview, not indexing.

    • Fix indexer crashing on ARM because of dubious iterator increment.

    • PDF: guard against pdftk noise on stdout. Annotations: guard against possible exception while formatting results. Index pdf annotations separately under field name annotation. Add annot, pdfannot and pa aliases.

    • GUI dark mode: improve visibility of checkable actions

    • mimemap: add .xlsm

    • Python module: merge pyrecoll and pyrclextract C extensions into one _pyrecoll and create 2 python modules to maintain compat

  • 1.27.9

    • Fix bug in result table caching introduced in 1.27.7

  • 1.27.8

    • A bug in utf8 truncation could lead to indexing failure by trying to add an excessively long file name term. Randomly found the day after releasing 1.27.7 and unrelated to the changes in there.

  • 1.27.7

    • Change result table row height adjustment to a method which does not force fetching the whole result set. Should improve performance when there are many results.

    • Add underscoreasletter configuration parameter to treat an underscore as a letter instead of a separator.

    • Add menu entry to switch to dark display mode.

    • Show the configuration GUI tooltips for the whole entries, not only the label.

    • Use the result list HTML styling in the result table detail area.

  • 1.27.6

    • Index PDF annotations (needs poppler-glib and its GObject introspection data)

    • Fix syntax error now detected by gcc 9-10 (bad range for loop)

    • recollq: add option -p <maxlen> to be added to -A to show snippets instead of abstract

    • recollindex: the lock file name could differ depending on how the indexer was started. This had no consequence for the index (as far as I could see), because of Xapian locking. The most visible consequence were unexpected exits of the real time indexer, and inconsistent menu states in the GUI.

    • Add the web extension .rclwe to skippedNames. These files were sometimes indexed twice, by the regular indexer and the WEB one.

    • Fix "nonumbers" configuration variable function, which did not work any more.

  • 1.27.5

    • Fix recollindex -r not working with relative paths.

    • Fix PDF attachment processing.

    • Fix duplicate detection in some cases.

    • Process text/plain subdocuments (e.g. zip archive members) in the same way as top level files (esp: perform paging for big texts).

    • Msword .docx: avoid extracting some non-text data

    • Index Visio documents (.vsdx format only).

    • Epub: index the metadata "subjects" fields.

    • GUI: separate popup menu entries for "open parent" and "open folder".

    • GUI: doc icons were not shown with newer webengine versions.

    • GUI file name search: sort directories first.

  • 1.27.3

    • Fix the "remember show temp file warning" thing. Again…​

    • If XDG_RUNTIME_DIR is set, locate in it. Avoids spinning up the disk in some configurations.

    • When splitting to generate abstract from text, take care to generate all terms. Seems to solve issues with the snippet generator not finding a match when the query term is a partial span.

    • Updated Russian translation provided by Olesya Gerasimenko.

    • GUI: fix crash on exit caused by statically declared shortcut

    • GUI, Webengine version (Windows, mostly): the computed doc number for the right-click menu was wrong on any page but the first, resulting in bizarre behaviour.

    • Windows: use a patched xapian-core to ensure that the index can be located in an arbitrary Unicode path (useful for people with non-ASCII user names).

    • Windows: wrong date terms were generated and resulted in date field search failures.

    • Windows: ensure that the tmp environment (e.g. RECOLL_TMPDIR) is used properly.

  • 1.27.2

    • GUI: saving/restoring simple searches: the list of active external indexes is now restored with the query. Just open the external indexes preferences page to reset it when you are done.