Indexing is the process by which the set of documents is
analyzed and the data entered into the database. Recoll
indexing is normally incremental: documents will only be
processed if they have been modified since the last run. On
the first execution, all documents will need processing. A
full index build can be forced later by specifying an option
to the indexing command (recollindex
-z
or -Z
).
recollindex skips files which caused an
error during a previous pass. This is a performance optimization, and
the command line option -k
can be set to retry
failed files, for example after updating an input handler.
The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.
Depending on your data, temporary files may be needed during
indexing, some of them possibly quite big. You can use the
RECOLL_TMPDIR
or TMPDIR
environment
variables to determine where they are created (the default is to
use /tmp
). Using TMPDIR
has
the nice property that it may also be taken into account by
auxiliary commands executed by recollindex.