Parameters affecting what documents we index:

topdirs

Specifies the list of directories or files to index (recursively for directories). You can use symbolic links as elements of this list. See the followLinks option about following symbolic links found under the top elements (not followed by default).

skippedNames

A space-separated list of wilcard patterns for names of files or directories that should be completely ignored. The list defined in the default file is:

skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
 	       *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
	       .recoll* xapiandb recollrc recoll.conf 

The list can be redefined at any sub-directory in the indexed area.

The top-level directories are not affected by this list (that is, a directory in topdirs might match and would still be indexed).

The list in the default configuration does not exclude hidden directories (names beginning with a dot), which means that it may index quite a few things that you do not want. On the other hand, email user agents like thunderbird usually store messages in hidden directories, and you probably want this indexed. One possible solution is to have .* in skippedNames, and add things like ~/.thunderbird or ~/.evolution in topdirs.

Not even the file names are indexed for patterns in this list. See the noContentSuffixes variable for an alternative approach which indexes the file names.

noContentSuffixes

This is a list of file name endings (not wildcard expressions, nor dot-delimited suffixes). Only the names of matching files will be indexed (no attempt at MIME type identification, no decompression, no content indexing). This can be redefined for subdirectories, and edited from the GUI. The default value is:

noContentSuffixes = .md5 .map \
       .o .lib .dll .a .sys .exe .com \
       .mpp .mpt .vsd \
	   .img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \
       .dat .bak .rdf .log.gz .log .db .msf .pid \
       ,v ~ #

skippedPaths and daemSkippedPaths

A space-separated list of patterns for paths of files or directories that should be skipped. There is no default in the sample configuration file, but the code always adds the configuration and database directories in there.

skippedPaths is used both by batch and real time indexing. daemSkippedPaths can be used to specify things that should be indexed at startup, but not monitored.

Example of use for skipping text files only in a specific directory:

skippedPaths = ~/somedir/*.txt
              
skippedPathsFnmPathname

The values in the *skippedPaths variables are matched by default with fnmatch(3), with the FNM_PATHNAME flag. This means that '/' characters must be matched explicitely. You can set skippedPathsFnmPathname to 0 to disable the use of FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).

zipSkippedNames

A space-separated list of patterns for names of files or directories that should be ignored inside zip archives. This is used directly by the zip handler, and has a function similar to skippedNames, but works independantly. Can be redefined for filesystem subdirectories. For versions up to 1.19, you will need to update the Zip handler and install a supplementary Python module. The details are described on the Recoll wiki.

followLinks

Specifies if the indexer should follow symbolic links while walking the file tree. The default is to ignore symbolic links to avoid multiple indexing of linked files. No effort is made to avoid duplication when this option is set to true. This option can be set individually for each of the topdirs members by using sections. It can not be changed below the topdirs level.

indexedmimetypes

Recoll normally indexes any file which it knows how to read. This list lets you restrict the indexed MIME types to what you specify. If the variable is unspecified or the list empty (the default), all supported types are processed. Can be redefined for subdirectories.

excludedmimetypes

This list lets you exclude some MIME types from indexing. Can be redefined for subdirectories.

compressedfilemaxkbs

Size limit for compressed (.gz or .bz2) files. These need to be decompressed in a temporary directory for identification, which can be very wasteful if 'uninteresting' big compressed files are present. Negative means no limit, 0 means no processing of any compressed file. Defaults to -1.

textfilemaxmbs

Maximum size for text files. Very big text files are often uninteresting logs. Set to -1 to disable (default 20MB).

textfilepagekbs

If set to other than -1, text files will be indexed as multiple documents of the given page size. This may be useful if you do want to index very big text files as it will both reduce memory usage at index time and help with loading data to the preview window. A size of a few megabytes would seem reasonable (default: 1MB).

membermaxkbs

This defines the maximum size in kilobytes for an archive member (zip, tar or rar at the moment). Bigger entries will be skipped.

indexallfilenames

Recoll indexes file names in a special section of the database to allow specific file names searches using wild cards. This parameter decides if file name indexing is performed only for files with MIME types that would qualify them for full text indexing, or for all files inside the selected subtrees, independently of MIME type.

usesystemfilecommand

Decide if we execute a system command (file -i by default) as a final step for determining the MIME type for a file (the main procedure uses suffix associations as defined in the mimemap file). This can be useful for files with suffix-less names, but it will also cause the indexing of many bogus "text" files.

systemfilecommand

Command to use for mime for mime type determination if usesystefilecommand is set. Recent versions of xdg-mime sometimes work better than file.

processwebqueue

If this is set, process the directory where Web browser plugins copy visited pages for indexing.

webqueuedir

The path to the web indexing queue. This is hard-coded in the Firefox plugin as ~/.recollweb/ToIndex so there should be no need to change it.