Introduction

Recoll versions after 1.11 define a Python programming interface, both for searching and indexing.

The search interface is used in the Recoll Ubuntu Unity Lens and Recoll WebUI.

The indexing section of the API has seen little use, and is more a proof of concept. In truth it is waiting for its killer app...

The search API is modeled along the Python database API specification. There were two major changes along Recoll versions:

  • The basis for the Recoll API changed from Python database API version 1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and later).

  • The recoll module became a package (with an internal recoll module) as of Recoll version 1.19, in order to add more functions. For existing code, this only changes the way the interface must be imported.

We will mostly describe the new API and package structure here. A paragraph at the end of this section will explain a few differences and ways to write code compatible with both versions.

The Python interface can be found in the source package, under python/recoll.

The python/recoll/ directory contains the usual setup.py. After configuring the main Recoll code, you can use the script to build and install the Python module:

            cd recoll-xxx/python/recoll
            python setup.py build
            python setup.py install
          

As of Recoll 1.19, the module can be compiled for Python3.

The normal Recoll installer installs the Python2 API along with the main code. The Python3 version must be explicitely built and installed.

When installing from a repository, and depending on the distribution, the Python API can sometimes be found in a separate package.

The following small sample will run a query and list the title and url for each of the results. It would work with Recoll 1.19 and later. The python/samples source directory contains several examples of Python programming with Recoll, exercising the extension more completely, and especially its data extraction features.

          from recoll import recoll

          db = recoll.connect()
          query = db.query()
          nres = query.execute("some query")
          results = query.fetchmany(20)
          for doc in results:
              print(doc.url, doc.title)