Interface elements

A few elements in the interface are specific and and need an explanation.

ipath

This data value (set as a field in the Doc object) is stored, along with the URL, but not indexed by Recoll. Its contents are not interpreted by the index layer, and its use is up to the application. For example, the Recoll file system indexer uses the ipath to store the part of the document access path internal to (possibly imbricated) container documents. ipath in this case is a vector of access elements (e.g, the first part could be a path inside a zip file to an archive member which happens to be an mbox file, the second element would be the message sequential number inside the mbox etc.). url and ipath are returned in every search result and define the access to the original document. ipath is empty for top-level document/files (e.g. a PDF document which is a filesystem file). The Recoll GUI knows about the structure of the ipath values used by the filesystem indexer, and uses it for such functions as opening the parent of a given document.

udi

An udi (unique document identifier) identifies a document. Because of limitations inside the index engine, it is restricted in length (to 200 bytes), which is why a regular URI cannot be used. The structure and contents of the udi is defined by the application and opaque to the index engine. For example, the internal file system indexer uses the complete document path (file path + internal path), truncated to length, the suppressed part being replaced by a hash value. The udi is not explicit in the query interface (it is used "under the hood" by the rclextract module), but it is an explicit element of the update interface.

parent_udi

If this attribute is set on a document when entering it in the index, it designates its physical container document. In a multilevel hierarchy, this may not be the immediate parent. parent_udi is optional, but its use by an indexer may simplify index maintenance, as Recoll will automatically delete all children defined by parent_udi == udi when the document designated by udi is destroyed. e.g. if a Zip archive contains entries which are themselves containers, like mbox files, all the subdocuments inside the Zip file (mbox, messages, message attachments, etc.) would have the same parent_udi, matching the udi for the Zip file, and all would be destroyed when the Zip file (identified by its udi) is removed from the index. The standard filesystem indexer uses parent_udi.

Stored and indexed fields

The fields file inside the Recoll configuration defines which document fields are either indexed (searchable), stored (retrievable with search results), or both. Apart from a few standard/internal fields, only the stored fields are retrievable through the Python search interface.