The Recoll source tree has two samples of external indexers
in the src/python/samples
directory.
rclmbox.py
indexes a directory containingmbox
folder files. Of course it is not really useful because Recoll can do this by itself, but it exercises most features in the update interface, and has a data access interface. Also it generates compound documents with actualipath
values.rcljoplin.py
indexes a Joplin application mainnotes
SQL table. Joplin sets an an update date attribute for each record in the table, so each note record can be processed as a standalone document (noipath
necessary). The sample has full preview and open support (the latter using a Joplin callback URL which allows displaying the result note inside the native app), so it could actually be useful to perform a unified search of the Joplin data and the regular Recoll data.
See the comments inside the scripts for more information.