Python update interface

The update methods are part of the recoll module described above. The connect() method is used with a writable=true parameter to obtain a writable Db object. The following Db object methods are then available.

addOrUpdate(udi, doc, parent_udi=None)

Add or update index data for a given document The udi string must define a unique id for the document. It is an opaque interface element and not interpreted inside Recoll. doc is a Doc object, created from the data to be indexed (the main text should be in doc.text). If parent_udi is set, this is a unique identifier for the top-level container (e.g. for the filesystem indexer, this would be the one which is an actual file).

delete(udi)

Purge index from all data for udi, and all documents (if any) which have a matrching parent_udi.

needUpdate(udi, sig)

Test if the index needs to be updated for the document identified by udi. If this call is to be used, the doc.sig field should contain a signature value when calling addOrUpdate(). The needUpdate() call then compares its parameter value with the stored sig for udi. sig is an opaque value, compared as a string.

The filesystem indexer uses a concatenation of the decimal string values for file size and update time, but a hash of the contents could also be used.

As a side effect, if the return value is false (the index is up to date), the call will set the existence flag for the document (and any subdocument defined by its parent_udi), so that a later purge() call will preserve them).

The use of needUpdate() and purge() is optional, and the indexer may use another method for checking the need to reindex or to delete stale entries.

purge()

Delete all documents that were not touched during the just finished indexing pass (since open-for-write). These are the documents for the needUpdate() call was not performed, indicating that they no longer exist in the primary storage system.