biomimetics writes

is it somehow possible for recoll to count the number of documents which will eventually be indexed and to show the number of documents which already have been indexed?

I have one index running on data on a remote server. the indexer is set to index real time. now even though this setup hasn’t changed for a while, the first search for a single term resulted in 20 result, then in 600 results, and now half an hour later in 625 results. this is quite a bad situation if you are asked how many hits do you have, because I do not know if all documents have been indexed yet.

br

robin

medoc writes

The first step when the indexer is started in real-time is to perform a normal incremental indexing pass. Once this is done, the indexer restarts in modification monitoring mode. This is signaled by a -n option on the command line, which you can check with "ps". As long as the initial pass is not done, the index is not complete.

As for counting the documents to be indexed, this would force doing an empty pass on the document set, which is not a completely trivial task (in terms of disk load, etc.).

The indexing status file (RECOLL_CONFIG/idxstatus.txt) has a running count of processed files (filesdone field). You can compare this to the total number of files (df -i or find) to get a rough estimate of the state of indexing.

biomimetics writes

many thanks for the explanation and workarounds!