Introduction

The Recoll Python programming interface can be used both for searching and for creating/updating an index. Bindings exist for Python2 and Python3 (Jan 2021: python2 support will be dropped soon).

The search interface is used in a number of active projects: the Recoll Gnome Shell Search Provider , the Recoll Web UI, and the upmpdcli UPnP Media Server, in addition to many small scripts.

The index update section of the API may be used to create and update Recoll indexes on specific configurations (separate from the ones created by recollindex). The resulting databases can be queried alone, or in conjunction with regular ones, through the GUI or any of the query interfaces.

The search API is modeled along the Python database API version 2.0 specification (early versions used the version 1.0 spec).

The recoll package contains two modules:

  • The recoll module contains functions and classes used to query (or update) the index.

  • The rclextract module contains functions and classes used at query time to access document data. The recoll module must be imported before rclextract

There is a good chance that your system repository has packages for the Recoll Python API, sometimes in a package separate from the main one (maybe named something like python-recoll). Else refer to the Building from source chapter.

As an introduction, the following small sample will run a query and list the title and url for each of the results. The python/samples source directory contains several examples of Python programming with Recoll, exercising the extension more completely, and especially its data extraction features.

#!/usr/bin/python3

from recoll import recoll

db = recoll.connect()
query = db.query()
nres = query.execute("some query")
results = query.fetchmany(20)
for doc in results:
    print("%s %s" % (doc.url, doc.title))

You can also take a look at the source for the Recoll WebUI, the upmpdcli local media server, or the Gnome Shell Search Provider.