Parameters affecting indexing performance and resource usage

idxflushmb

Threshold (megabytes of new data) where we flush from memory to disk index. Setting this allows some control over memory usage by the indexer process. A value of 0 means no explicit flushing, which lets Xapian perform its own thing, meaning flushing every $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory usage depends on average document size, not only document count, the Xapian approach is is not very useful, and you should let Recoll manage the flushes. The program compiled value is 0. The configured default value (from this file) is 10 MB, and will be too low in many cases (it is chosen to conserve memory). If you are looking for maximum speed, you may want to experiment with values between 20 and 200. In my experience, values beyond this are always counterproductive. If you find otherwise, please drop me a note.

filtermaxseconds

Maximum external filter execution time in seconds. Default 1200 (20mn). Set to 0 for no limit. This is mainly to avoid infinite loops in postscript files (loop.ps)

filtermaxmbytes

Maximum virtual memory space for filter processes (setrlimit(RLIMIT_AS)), in megabytes. Note that this includes any mapped libs (there is no reliable Linux way to limit the data space only), so we need to be a bit generous here. Anything over 2000 will be ignored on 32 bits machines.

thrQSizes

Stage input queues configuration. There are three internal queues in the indexing pipeline stages (file data extraction, terms generation, index update). This parameter defines the queue depths for each stage (three integer values). If a value of -1 is given for a given stage, no queue is used, and the thread will go on performing the next stage. In practise, deep queues have not been shown to increase performance. Default: a value of 0 for the first queue tells Recoll to perform autoconfiguration based on the detected number of CPUs (no need for the two other values in this case). Use thrQSizes = -1 -1 -1 to disable multithreading entirely.

thrTCounts

Number of threads used for each indexing stage. The three stages are: file data extraction, terms generation, index update). The use of the counts is also controlled by some special values in thrQSizes: if the first queue depth is 0, all counts are ignored (autoconfigured); if a value of -1 is used for a queue depth, the corresponding thread count is ignored. It makes no sense to use a value other than 1 for the last stage because updating the Xapian index is necessarily single-threaded (and protected by a mutex).