Query data access for external indexers (1.23)

Recoll has internal methods to access document data for its internal (filesystem) indexer. An external indexer needs to provide data access methods if it needs integration with the GUI (e.g. preview function), or support for the rclextract module.

The index data and the access method are linked by the rclbes (recoll backend storage) Doc field. You should set this to a short string value identifying your indexer (e.g. the filesystem indexer uses either "FS" or an empty value, the Web history indexer uses "BGL").

The link is actually performed inside a backends configuration file (stored in the configuration directory). This defines commands to execute to access data from the specified indexer. Example, for the mbox indexing sample found in the Recoll source (which sets rclbes="MBOX"):

fetch = /path/to/recoll/src/python/samples/rclmbox.py fetch
makesig = path/to/recoll/src/python/samples/rclmbox.py makesig

fetch and makesig define two commands to execute to respectively retrieve the document text and compute the document signature (the example implementation uses the same script with different first parameters to perform both operations).

The scripts are called with three additional arguments: udi, url, ipath, stored with the document when it was indexed, and may use any or all to perform the requested operation. The caller expects the result data on stdout.