rdav00 writes

Is it possible to get statistics of files indexed by recollindex? eg number of documents, email, audio files, pdf, chm, spreadsheets, etc.

medoc writes

This is not very easy currently. The counts provided by Xapian with Recoll queries are not accurate because of Xapian optimizations (probably). Otherwise, you could use a query like "mime:application/pdf", which is accepted by recent Recoll releases (at least 1.17, maybe 1.16.2 too, can’t remember).

If you really need to do this, I think that the best approach would be to build the "xadump" tool from recoll source. You can create it after a normal build by typing "make" inside the "query" directory. Mime type terms are like Tmime-type inside the index, so something like the following would allow computing, for example, the count of pdf documents: {{{ xadump -d ~/.recoll/xapiandb/ -t Tapplication/pdf -F }}}

xadump has a number of other useful options but no manual page unfortunately. Email me if you need more info about it, or about other special terms in the index (jfd at recoll org)

If you are thinking of something more general, the problem is that we’d either need to come up with a fixed set of queries, or have some sort of language to perform them, I’m not sure that this is worth it. Maybe a script based on xadump or xapian’s "delve" which does basically the same thing but is probably installed with some Xapian packages (no compilation needed).

jf