Unix-like systems: indexing visited WEB pages

With the help of a Firefox extension, Recoll can index the Internet pages that you visit. The extension has a long history: it was initially designed for the Beagle indexer, then adapted to Recoll and the Firefox XUL API. A new version of the addon has been written to work with the WebExtensions API, which is the only one supported after Firefox version 57.

The extension works by copying visited WEB pages to an indexing queue directory, which Recoll then processes, indexing the data, storing it into a local cache, then removing the file from the queue.

Because the WebExtensions API introduces more constraints to what extensions can do, the new version works with one more step: the files are first created in the browser default downloads location (typically $HOME/Downloads ), then moved by a script in the old queue location. The script is automatically executed by the Recoll indexer versions 1.23.5 and newer. It could conceivably be executed independently to make the new browser extension compatible with an older Recoll version (the script is named recoll-we-move-files.py).


For the WebExtensions-based version to work, it is necessary to set the webdownloadsdir value in the configuration if it was changed from the default $HOME/Downloads in the browser preferences.

The visited WEB pages indexing feature can be enabled on the Recoll side from the GUI Index configuration panel, or by editing the configuration file (set processwebqueue to 1).

A current pointer to the extension can be found, along with up-to-date instructions, on the Recoll wiki.

A copy of the indexed WEB pages is retained by Recoll in a local cache (from which previews can be fetched). The cache size can be adjusted from the Index configuration / Web history panel. Once the maximum size is reached, old pages are purged - both from the cache and the index - to make room for new ones, so you need to explicitly archive in some other place the pages that you want to keep indefinitely.