Unix-like systems: real time indexing

Real time monitoring/indexing is performed by starting the recollindex -m command. With this option, recollindex will detach from the terminal and become a daemon, permanently monitoring file changes and updating the index.

In this situation, the recoll GUI File menu makes two operations available: Stop and Trigger incremental pass.

Trigger incremental pass has the same effect as restarting the indexer, and will cause a complete walk of the indexed area, processing the changed files, then switch to monitoring. This is only marginally useful, maybe in cases where the indexer is configured to delay updates, or to force an immediate rebuild of the stemming and phonetic data, which are only processed at intervals by the real time indexer.

While it is convenient that data is indexed in real time, repeated indexing can generate a significant load on the system when files such as email folders change. Also, monitoring large file trees by itself significantly taxes system resources. You probably do not want to enable it if your system is short on resources. Periodic indexing is adequate in most cases.

As of Recoll 1.24, you can set the monitordirs configuration variable to specify that only a subset of your indexed files will be monitored for instant indexing. In this situation, an incremental pass on the full tree can be triggered by either restarting the indexer, or just running recollindex, which will notify the running process. The recoll GUI also has a menu entry for this.

Automatic daemon start with systemd

The installation contains two example files (in share/recoll/examples) for starting the indexing daemon with systemd.

recollindex.service would be used for starting recollindex as a user service. The indexer will start when the user logs in and run while there is a session open for them.

recollindex@.service is a template service which would be used for starting the indexer at boot time, running as a specific user. It can be useful when running the text search as a shared service (e.g. when users access it through the WEB UI).

If configured to do so, the unit files should have been installed in your system's default systemd paths (usually /usr/lib/systemd/system/ and /usr/lib/systemd/user/). If not, you may need to copy the files there before starting the service.

With the unit files installed in the proper location, the user unit can be started with the following commands:

systemctl --user daemon-reload
systemctl --user enable --now recollindex.service

The system unit file can be enabled for a particular user by running, as root:

systemctl daemon-reload
systemctl enable --now recollindex@username.service

(A valid user name should be substituted for username, of course.)

Automatic daemon start from the desktop session

Under KDE, Gnome and some other desktop environments, the daemon can automatically started when you log in, by creating a desktop file inside the ~/.config/autostart directory. This can be done for you by the Recoll GUI. Use the Preferences->Indexing Schedule menu.

With older X11 setups, starting the daemon is normally performed as part of the user session script.

The rclmon.sh script can be used to easily start and stop the daemon. It can be found in the examples directory (typically /usr/local/[share/]recoll/examples).

For example, a good old xdm-based session could have a .xsession script with the following lines at the end:

recollconf=$HOME/.recoll-home
recolldata=/usr/local/share/recoll
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start

fvwm 

The indexing daemon gets started, then the window manager, for which the session waits.

By default the indexing daemon will monitor the state of the X11 session, and exit when it finishes, it is not necessary to kill it explicitly. (The X11 server monitoring can be disabled with option -x to recollindex).

If you use the daemon completely out of an X11 session, you need to add option -x to disable X11 session monitoring (else the daemon will not start).

Miscellaneous details

By default, the messages from the indexing daemon will be sent to the same file as those from the interactive commands (logfilename). You may want to change this by setting the daemlogfilename and daemloglevel configuration parameters. Also the log file will only be truncated when the daemon starts. If the daemon runs permanently, the log file may grow quite big, depending on the log level.

Increasing resources for inotify. On Linux systems, monitoring a big tree may need increasing the resources available to inotify, which are normally defined in /etc/sysctl.conf.

### inotify
#
# cat  /proc/sys/fs/inotify/max_queued_events   - 16384
# cat  /proc/sys/fs/inotify/max_user_instances  - 128
# cat  /proc/sys/fs/inotify/max_user_watches    - 16384
#
# -- Change to:
#
fs.inotify.max_queued_events=32768
fs.inotify.max_user_instances=256
fs.inotify.max_user_watches=32768
        

Especially, you will need to trim your tree or adjust the max_user_watches value if indexing exits with a message about errno ENOSPC (28) from inotify_add_watch.

Slowing down the reindexing rate for fast changing files. When using the real time monitor, it may happen that some files need to be indexed, but change so often that they impose an excessive load for the system. Recoll provides a configuration option to specify the minimum time before which a file, specified by a wildcard pattern, cannot be reindexed. See the mondelaypatterns parameter in the configuration section.