Parameters affecting what documents we index

topdirs

Space-separated list of files or directories to recursively index. Default to ~ (indexes $HOME). You can use symbolic links in the list, they will be followed, independantly of the value of the followLinks variable.

monitordirs

(1.25) Space-separated list of files or directories to monitor for updates. When running the real-time indexer, this allows monitoring only a subset of the whole indexed area. The elements must be included in the tree defined by the 'topdirs' members.

skippedNames

Files and directories which should be ignored. White space separated list of wildcard patterns (simple ones, not paths, must contain no / ), which will be tested against file and directory names. The list in the default configuration does not exclude hidden directories (names beginning with a dot), which means that it may index quite a few things that you do not want. On the other hand, email user agents like Thunderbird usually store messages in hidden directories, and you probably want this indexed. One possible solution is to have ".*" in "skippedNames", and add things like "~/.thunderbird" "~/.evolution" to "topdirs". Not even the file names are indexed for patterns in this list, see the "noContentSuffixes" variable for an alternative approach which indexes the file names. Can be redefined for any subtree.

skippedNames-

List of name endings to remove from the default skippedNames list.

skippedNames+

List of name endings to add to the default skippedNames list.

noContentSuffixes

List of name endings (not necessarily dot-separated suffixes) for which we don't try MIME type identification, and don't uncompress or index content. Only the names will be indexed. This complements the now obsoleted recoll_noindex list from the mimemap file, which will go away in a future release (the move from mimemap to recoll.conf allows editing the list through the GUI). This is different from skippedNames because these are name ending matches only (not wildcard patterns), and the file name itself gets indexed normally. This can be redefined for subdirectories.

noContentSuffixes-

List of name endings to remove from the default noContentSuffixes list.

noContentSuffixes+

List of name endings to add to the default noContentSuffixes list.

skippedPaths

Paths we should not go into. Space-separated list of wildcard expressions for filesystem paths. Can contain files and directories. The database and configuration directories will automatically be added. The expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by default. This means that '/' characters must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0 to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match '/dir1/dir2/dir3'). The default value contains the usual mount point for removable media to remind you that it is a bad idea to have Recoll work on these (esp. with the monitor: media gets indexed on mount, all data gets erased on unmount). Explicitely adding '/media/xxx' to the topdirs will override this.

skippedPathsFnmPathname

Set to 0 to override use of FNM_PATHNAME for matching skipped paths.

daemSkippedPaths

skippedPaths equivalent specific to real time indexing. This enables having parts of the tree which are initially indexed but not monitored. If daemSkippedPaths is not set, the daemon uses skippedPaths.

zipSkippedNames

Space-separated list of wildcard expressions for names that should be ignored inside zip archives. This is used directly by the zip handler, and has a function similar to skippedNames, but works independantly. Can be redefined for subdirectories. Supported by recoll 1.20 and newer. See https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html

followLinks

Follow symbolic links during indexing. The default is to ignore symbolic links to avoid multiple indexing of linked files. No effort is made to avoid duplication when this option is set to true. This option can be set individually for each of the 'topdirs' members by using sections. It can not be changed below the 'topdirs' level. Links in the 'topdirs' list itself are always followed.

indexedmimetypes

Restrictive list of indexed mime types. Normally not set (in which case all supported types are indexed). If it is set, only the types from the list will have their contents indexed. The names will be indexed anyway if indexallfilenames is set (default). MIME type names should be taken from the mimemap file (the values may be different from xdg-mime or file -i output in some cases). Can be redefined for subtrees.

excludedmimetypes

List of excluded MIME types. Lets you exclude some types from indexing. MIME type names should be taken from the mimemap file (the values may be different from xdg-mime or file -i output in some cases) Can be redefined for subtrees.

nomd5mimetypes

Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be very expensive to compute on multimedia or other big files. This list lets you turn off md5 computation for selected types. It is global (no redefinition for subtrees). At the moment, it only has an effect for external handlers (exec and execm). The file types can be specified by listing either MIME types (e.g. audio/mpeg) or handler names (e.g. rclaudio).

compressedfilemaxkbs

Size limit for compressed files. We need to decompress these in a temporary directory for identification, which can be wasteful in some cases. Limit the waste. Negative means no limit. 0 results in no processing of any compressed file. Default 50 MB.

textfilemaxmbs

Size limit for text files. Mostly for skipping monster logs. Default 20 MB.

indexallfilenames

Index the file names of unprocessed files Index the names of files the contents of which we don't index because of an excluded or unsupported MIME type.

usesystemfilecommand

Use a system command for file MIME type guessing as a final step in file type identification This is generally useful, but will usually cause the indexing of many bogus 'text' files. See 'systemfilecommand' for the command used.

systemfilecommand

Command used to guess MIME types if the internal methods fails This should be a "file -i" workalike. The file path will be added as a last parameter to the command line. 'xdg-mime' works better than the traditional 'file' command, and is now the configured default (with a hard-coded fallback to 'file')

processwebqueue

Decide if we process the Web queue. The queue is a directory where the Recoll Web browser plugins create the copies of visited pages.

textfilepagekbs

Page size for text files. If this is set, text/plain files will be divided into documents of approximately this size. Will reduce memory usage at index time and help with loading data in the preview window at query time. Particularly useful with very big files, such as application or system logs. Also see textfilemaxmbs and compressedfilemaxkbs.

membermaxkbs

Size limit for archive members. This is passed to the filters in the environment as RECOLL_FILTER_MAXMEMBERKB.