A few elements in the interface are specific and and need an explanation.
- ipath
This data value (set as a field in the Doc object) is stored, along with the URL, but not indexed by Recoll. Its contents are not interpreted by the index layer, and its use is up to the application. For example, the Recoll file system indexer uses the
ipath
to store the part of the document access path internal to (possibly imbricated) container documents.ipath
in this case is a vector of access elements (e.g, the first part could be a path inside a zip file to an archive member which happens to be an mbox file, the second element would be the message sequential number inside the mbox etc.).url
andipath
are returned in every search result and define the access to the original document.ipath
is empty for top-level document/files (e.g. a PDF document which is a filesystem file). The Recoll GUI knows about the structure of theipath
values used by the filesystem indexer, and uses it for such functions as opening the parent of a given document.- udi
An
udi
(unique document identifier) identifies a document. Because of limitations inside the index engine, it is restricted in length (to 200 bytes), which is why a regular URI cannot be used. The structure and contents of theudi
is defined by the application and opaque to the index engine. For example, the internal file system indexer uses the complete document path (file path + internal path), truncated to length, the suppressed part being replaced by a hash value. Theudi
is not explicit in the query interface (it is used "under the hood" by therclextract
module), but it is an explicit element of the update interface.- parent_udi
If this attribute is set on a document when entering it in the index, it designates its physical container document. In a multilevel hierarchy, this may not be the immediate parent. If the indexer uses the
purge()
method, then the use ofparent_udi
is mandatory for subdocuments. Else it is optional, but its use by an indexer may simplify index maintenance, as Recoll will automatically delete all children defined byparent_udi == udi
when the document designated byudi
is destroyed. e.g. if aZip
archive contains entries which are themselves containers, likembox
files, all the subdocuments inside theZip
file (mbox, messages, message attachments, etc.) would have the sameparent_udi
, matching theudi
for theZip
file, and all would be destroyed when theZip
file (identified by itsudi
) is removed from the index.- Stored and indexed fields
The
fields
file inside the Recoll configuration defines which document fields are eitherindexed
(searchable),stored
(retrievable with search results), or both. Apart from a few standard/internal fields, only thestored
fields are retrievable through the Python search interface.