cmdtstof writes

suse tumbleweed; compiled Recoll 1.20.6 + Xapian 1.2.20

after years i came back to recoll. there are some ms-stuff files on the maschine (which i do not need to be indexed). reindexing without antiword boils up all cpu to 100% with never ending increasing memory load. the indexer is hanging on ms-stuff files!

excluding mime ms-stuff in index-config indexes 200000 files and recoll searches great!

i suggest to implement an emergency break for such cases.

thanks

medoc writes

The main ms-related change in recent Recoll versions compared to the ones from years ago is that the Excel and Powerpoint filters, which were very fast but failed most of the time have been replaced by Python filters which are much better at actually doing something.

It is difficult to determine what happened on your system without more information. Maybe the filters were just doing their job, in which case, nothing needs fixing. The other possibility was that some were actually looping. To make any progress, I would need to know what process was actually using the CPU (top is your friend), and which file triggered the issue (you need to set up the debug log: https://bitbucket.org/medoc/recoll/wiki/ProblemSolvingData). If said file is non-confidential, a sample would obviously be of immense help.

I really hope that you can find some time (maybe not now, just when you have a minute) to perform the diagnostics, so that I can improve the software.

Until then, and please don’t take it badly, I am closing this report, because I can’t do anything about it.

Cheers,

jf