Recoll features

Supported systems Document types Other features Desktop and web integration Stemming

General features

Supported systems

Recoll has been compiled and tested on Linux, MS-Windows 7-10, MacOS X and Solaris (initial versions Redhat 7, Fedora Core 5, Suse 10, Gentoo, Debian 3.1, Solaris 8). It should compile and run on all subsequent releases of these systems and probably a few others too.

Qt versions from 4.7 and later

Document types

Recoll can index many document types (along with their compressed versions). Some types are handled internally (no external application needed). Other types need a separate application to be installed to extract the text. Types that only need very common utilities (awk/sed/groff/iconv, Python etc.) are listed in the native section.

The MS-Windows installer includes the supporting application, the only additional package you will need is the Python language installation.

Many formats are processed by Python scripts. The Python dependency will not always be mentioned. In general, Recoll up to 1.24 expects Python 2.x to be available. Recoll 1.25 and later rely on Python3 (most scripts are actually compatible with both versions). Formats which are processed using Python and its standard library only are listed in the native section.

Some Python scripts need the Python2 'future' module (smoothing the transition to Python3). This is the case, e.g. for the Excel sheet handler.

File types indexed natively

File types indexed with external helpers

The XML ones

The following types need xsltproc from the libxslt package for Recoll versions before 1.22, or python-libxslt1 and python-libxml2 for 1.22 to 1.24, or python3-lxml for 1.25 and newer. Quite a few also need the unzip command:

Other formats

The following need miscellaneous helper programs to decode the internal formats.

Other features

Desktop and web integration

The Recoll GUI has many features that help to specify an efficient search and to manage the results. However it maybe sometimes preferable to use a simpler tool with a better integration with your desktop interfaces. Several solutions exist:

Recoll also has Python and PHP modules which allow easy integration with WEB or other applications.


Stemming is a process which transforms inflected words into their most basic form. For example, flooring, floors, floored would probably all be transformed to floor by a stemmer for the English language.

In many search engines, the stemming process occurs during indexing. The index will only contain the stemmed form of words, with exceptions for terms which are detected as being probably proper nouns (ie: capitalized). At query time, the terms entered by the user are stemmed, then matched against the index.

This process results into a smaller index, but it has the grave inconvenient of irrevocably losing information during indexing.

Recoll works in a different way. No stemming is performed at query time, so that all information gets into the index. The resulting index is bigger, but most people probably don't care much about this nowadays, because they have a 100Gb disk 95% full of binary data which does not get indexed.

At the end of an indexing pass, Recoll builds one or several stemming dictionaries, where all word stems are listed in correspondence to the list of their derivatives.

At query time, by default, user-entered terms are stemmed, then matched against the stem database, and the query is expanded to include all derivatives. This will yield search results analogous to those obtained by a classical engine. The benefits of this approach is that stem expansion can be controlled instantly at query time in several ways: