"Multiple" handlers

If you can program and want to write an execm handler, it should not be too difficult to make sense of one of the existing modules. There is a sample one with many comments, not actually used by Recoll, which would index a text file as one document per line. Look for rcltxtlines.py in the src/filters directory in the Recoll BitBucket repository (the sample not in the distributed release at the moment).

You can also have a look at the slightly more complex rclzip which uses Zip file paths as identifiers (ipath).

execm handlers sometimes need to make a choice for the nature of the ipath elements that they use in communication with the indexer. Here are a few guidelines:

  • Use ASCII or UTF-8 (if the identifier is an integer print it, for example, like printf %d would do).

  • If at all possible, the data should make some kind of sense when printed to a log file to help with debugging.

  • Recoll uses a colon (:) as a separator to store a complex path internally (for deeper embedding). Colons inside the ipath elements output by a handler will be escaped, but would be a bad choice as a handler-specific separator (mostly, again, for debugging issues).

In any case, the main goal is that it should be easy for the handler to extract the target document, given the file name and the ipath element.

execm handlers will also produce a document with a null ipath element. Depending on the type of document, this may have some associated data (e.g. the body of an email message), or none (typical for an archive file). If it is empty, this document will be useful anyway for some operations, as the parent of the actual data documents.