There are two Firefox extensions which work with Recoll to index the WEB pages which you visit:

  • For classic Firefox versions supporting XUL overlays: the Recoll Firefox extension

  • For Firefox versions 57 and later, the recoll-we extension, which is based on the WebExtensions API

The new version works with Recoll 1.23.5 or newer.

if your browser default downloads directory is not ~/Downloads, you will need to set the webdownloadsdir to the appropriate value in recoll.conf.

Both extensions works together with Recoll to index the Web pages that you visit. The old extension is itself based on an older one which was initially written for the Beagle indexer. The new extension is largely based on code stolen from the save-page-we extension.

The extension works by copying the data for the visited pages to a queue directory (~/.recollweb/ToIndex by default), from which they are indexed and removed by Recoll, and then stored in a local cache (the WebExtensions version is helped by a script executed by recollindex to move the files from the downloads directory to the queue directory).

The feature can be enabled in the Recoll GUI index configuration panel (Web history section), or by editing the configuration file (set processwebqueue to 1).

Recoll only stores a limited amount of cached web data (adjustable from the GUI Index Configuration section). Old pages will be purged from the index. Pages that you want to archive permanently need to be saved elsewhere, as they will otherwise eventually disappear from the Recoll results. Recoll can index .maff files, which may be a better choice for archival usage, or also see the Save Page WE extension.

Both versions of the Recoll extension are hosted on the Mozilla add-ons site, so you can install it very simply in Firefox:

Operation and configuration

By default, after installation, the extension will store all the HTML and PDF pages which you visit. You can change this behaviour through the options page or through the context menu.

Configuration options (preferences page)

Automatically index pages

If this is not set, a page will only be saved if requested by clicking the toolbox button or selecting the submenu action. If this is set, pages will be automatically saved, subject to the rules below.

Also do it for pages with secure content (https)

Enable/disable the same behaviour for https URLs.

The options which follow only have any effect if automatic indexing has been activated (see above).
Save by default (when no rules set matches)

This is originally set. It It is automatically unset the first time you add an inclusion rule.

Save when both rules sets match

If set, index the page when both the inclusion and exclusion rule sets match.

URL include rules

Rules to select the URLs which will be automatically indexed.

URL exclude rules

Rules to select URLs which will not be indexed.

Rules for both sets can be of three types: domain (just select by host name), wildcard, or regular expression. Rules added through the context menu are of the domain type.