Timo_Lee writes

My indexing has been run for more than a week on a large number of files. Now it looks like the command has run through all files, but gets stuck by my Ubuntu not having some applications installed:

#!bash

...
/recover_by_RStudioUbuntu13sep2012/Extra Found Files/Text Document/150992.txt|]
:3:../rcldb/rcldb.cpp:1278:Db::add: docid 1353223 added [/media/My Passport/T400/recover_by_RStudioUbuntu13sep2012/Extra Found Files/Text Document/150993.txt|]
:3:../index/fsindexer.cpp:133:FsIndexer::index missing helper program(s):
Perl::Image::ExifTool (image/jpeg)
antiword (application/msword)
catppt (application/vnd.ms-powerpoint)
pstotext (application/postscript)
python:chm (application/x-chm)
python:mutagen (audio/mpeg)
python:rarfile (application/x-rar)
unrtf (text/rtf)
xls2csv (application/vnd.ms-excel)

It doesn’t continue. What shall I do? I don’t want to rerun the indexing again, because it takes more than a week.

Thanks!

Timo_Lee writes

After a few more hours, the indexing program exited. Seem like no need to worry about the problem?

medoc writes

The "pause" is normal, it occurs while recoll is creating the auxiliary data structures (stemming and spelling). It has to walk the just created index for this. I’ve never heard of it taking hours though. Just out of curiosity, what kind of system are you using and how big is the index ?

Timo_Lee writes

Thanks! The directory xapiandb is 21.1GB. My OS is Ubuntu 12.04.

Shall I install those missing applications, although I care much more abouth plain text, pdf and djvu files than other formats?

If I install them, how long will it perhaps take to rerun again, given this run took me more than one week?