rdzidlic writes

[rz@rz books1]$ ll attachments.zip -rw-r—r--. 1 rz 500 346989 Jul 28 2006 attachments.zip

:4:../internfile/internfile.cpp:161:FileInterner::init fn [/home/rz/books1/attachments.zip] mime [(null)] preview 0 :4:../internfile/mimehandler.cpp:249:getMimeHandler: mtype [application/zip] filtertypes 1 :4:../internfile/mimehandler.cpp:64:getMimeHandlerFromCache: 03240d3cf80bf44d582457ccd88fd507 cache size 2 :4:../internfile/mimehandler.cpp:80:getMimeHandlerFromCache: 03240d3cf80bf44d582457ccd88fd507 not found :4:../internfile/internfile.cpp:248:FileInterner:: init ok application/zip [/home/rz/books1/attachments.zip] :4:../internfile/internfile.cpp:737:FileInterner::internfile. ipath [] :4:../internfile/mh_execm.cpp:152:MimeHandlerExecMultiple::next_document(): [/home/rz/books1/attachments.zip] :4:../internfile/mh_execm.cpp:39:MimeHandlerExecMultiple::startCmd :4:../utils/execmd.cpp:330:ExecCmd::startExec: (1|1) /usr/share/recoll/filters/rclzip :4:../internfile/mh_execm.cpp:217:MHExecMultiple: got FILEERROR :4:../internfile/mh_execm.cpp:90:MHExecMultiple: Got empty line :2:../internfile/internfile.cpp:732:FileInterner::internfile: next_document error [/home/rz/books1/attachments.zip] application/zip :4:../internfile/internfile.cpp:852:FileInterner::internfile: conversion ended with no doc :4:../rcldb/rcldb.cpp:1249:Db::add: udi [/home/rz/books1/attachments.zip|] parent [] :3:../rcldb/rcldb.cpp:604:Db::add: docid 5854 updated [/home/rz/books1/attachments.zip|]

$ unzip -t attachments.zip Archive: attachments.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of attachments.zip or attachments.zip.zip, and cannot find attachments.zip.ZIP, period.

— broken file and the error won’t go away as long as datestamp is 2006

medoc writes

Recoll retries files in error at each indexing, this is a known issue. Work on a solution has begun, but it is rather complicated, and I need to make sure that this does not create more problems than it solves. Just remember that even if the file does not change, the filter or other pieces of software could change and retrying could succeed because of this

I don’t consider this as a major issue, except if I’m missing something ?

rdzidlic writes

it is tricky and I believe it is a major issue in some cases. Retrying damaged zip files is fairly cheap.

Some cases where I could immagine it can get extremely expensive: * huge .xxx.bz2 file is unpacked and .xxx can’t be processsed for some reason * unpacking fails because of lack of temporary space * it would appear that in "realtime" indexing mode those error files are constantly retried in a loop? * some file helpers may be much heavier than others (openoffice?)

Overall baloo seems many times faster at reindexing so for now I can only afford to reindex recoll manually.

medoc writes

Changed from bug to enhancement: there is a reason why Recoll always retries indexing: this is in case a helper program was installed and indexing would now succeed. Still, things could be improved.

medoc writes

Issue #234 was marked as a duplicate of this issue.

medoc writes

Issue #233 was marked as a duplicate of this issue.

medoc writes

I have made a number of changes to the future 1.21. recollindex will not retry failed files by default. This can be changed by a command line option (-k), and the indexer also executes a script when starting, which can decide to set the retry option (the script currently checks for changes to the bin dirs, this is supposed to catch helper installs, but of course, it will also fire for other reasons). This is the best I can think of for now, maybe this will be refined if I get better ideas before the release.

I am now convinced that not retrying failures is actually the better default.