So far, it looks like Recoll has failed to index my 4500 MacWord 5 and WriteNow legacy documents… I guess antiword and wvWare doesn’t read such old documents.

It appears that LibreOffice 4.1 converts now almost all of these pre-OSX MAC text formats, using libmwaw (http://sourceforge.net/p/libmwaw/wiki/Home/), one of the libraries from the libwpd family.

Would it be possible to integrate it to Recoll too ?


I gave a try to opening one of the sample documents from your fridrich link in libreoffice 4.1, and I can indeed save to odt, which can then be processed by the regular Recoll input filter.

So it should be possible to build an input filter for these files, using either the command line version of libreoffice, or the unoconv script. This will be very slow, these commands don’t work fast.

One of the difficulties for Recoll will be to identify the document types to chose the right filter. The "file" command on Mac OS X does not seem to know about these types (it just says data). Do your documents have recognizable name extensions ?

Another approach would be to batch-convert the whole bunch to odt, and index the result instead. This would not take more time that the recoll indexing (same conversion), and it will only have to be done once. Once it’s done, re-indexing and preview will not have to use unoconv/libreoffice and will become fast.


