Parameters affecting how we generate terms

indexStripChars

Decide if we store character case and diacritics in the index. If we do, searches sensitive to case and diacritics can be performed, but the index will be bigger, and some marginal weirdness may sometimes occur. The default is a stripped index. When using multiple indexes for a search, this parameter must be defined identically for all. Changing the value implies an index reset.

nonumbers

Decides if terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not be indexed if nonumbers is set ("value123" would still be). Numbers are often quite interesting to search for, and this should probably not be set except for special situations, ie, scientific documents with huge amounts of numbers in them, where setting nonumbers will reduce the index size. This can only be set for a whole index, not for a subtree.

dehyphenate

Determines if we index 'coworker' also when the input is 'co-worker'. This is new in version 1.22, and on by default. Setting the variable to off allows restoring the previous behaviour.

nocjk

Decides if specific East Asian (Chinese Korean Japanese) characters/word splitting is turned off. This will save a small amount of CPU if you have no CJK documents. If your document base does include such text but you are not interested in searching it, setting nocjk may be a significant time and space saver.

cjkngramlen

This lets you adjust the size of n-grams used for indexing CJK text. The default value of 2 is probably appropriate in most cases. A value of 3 would allow more precision and efficiency on longer words, but the index will be approximately twice as large.

indexstemminglanguages

Languages for which to create stemming expansion data. Stemmer names can be found by executing 'recollindex -l', or this can also be set from a list in the GUI.

defaultcharset

Default character set. This is used for files which do not contain a character set definition (e.g.: text/plain). Values found inside files, e.g. a 'charset' tag in HTML documents, will override it. If this is not set, the default character set is the one defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact). If for some reason you want a general default which does not match your LANG and is not 8859-1, use this variable. This can be redefined for any sub-directory.

unac_except_trans

A list of characters, encoded in UTF-8, which should be handled specially when converting text to unaccented lowercase. For example, in Swedish, the letter a with diaeresis has full alphabet citizenship and should not be turned into an a. Each element in the space-separated list has the special character as first element and the translation following. The handling of both the lowercase and upper-case versions of a character should be specified, as appartenance to the list will turn-off both standard accent and case processing. The value is global and affects both indexing and querying. Examples: Swedish: unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå . German: unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl In French, you probably want to decompose oe and ae and nobody would type a German ß unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl . The default for all until someone protests follows. These decompositions are not performed by unac, but it is unlikely that someone would type the composed forms in a search. unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl

maildefcharset

Overrides the default character set for email messages which don't specify one. This is mainly useful for readpst (libpst) dumps, which are utf-8 but do not say so.

localfields

Set fields on all files (usually of a specific fs area). Syntax is the usual: name = value ; attr1 = val1 ; [...] value is empty so this needs an initial semi-colon. This is useful, e.g., for setting the rclaptg field for application selection inside mimeview.

testmodifusemtime

Use mtime instead of ctime to test if a file has been modified. The time is used in addition to the size, which is always used. Setting this can reduce re-indexing on systems where extended attributes are used (by some other application), but not indexed, because changing extended attributes only affects ctime. Notes: - This may prevent detection of change in some marginal file rename cases (the target would need to have the same size and mtime). - You should probably also set noxattrfields to 1 in this case, except if you still prefer to perform xattr indexing, for example if the local file update pattern makes it of value (as in general, there is a risk for pure extended attributes updates without file modification to go undetected). Perform a full index reset after changing this.

noxattrfields

Disable extended attributes conversion to metadata fields. This probably needs to be set if testmodifusemtime is set.

metadatacmds

Define commands to gather external metadata, e.g. tmsu tags. There can be several entries, separated by semi-colons, each defining which field name the data goes into and the command to use. Don't forget the initial semi-colon. All the field names must be different. You can use aliases in the "field" file if necessary. As a not too pretty hack conceded to convenience, any field name beginning with "rclmulti" will be taken as an indication that the command returns multiple field values inside a text blob formatted as a recoll configuration file ("fieldname = fieldvalue" lines). The rclmultixx name will be ignored, and field names and values will be parsed from the data. Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f