mschwipps writes

The first lines of the patch fix the html-header.

If a Word-file contains only images there is no text for indexing. They should be ignored.

Empty or image-only Word-files should not throw an error.

Use

#!bash

patch rcldoc  < rcldoc.patch

to patch the file.

medoc writes

Thanks for providing this correction. The header thing. Wow. Wow, that’s why we need code reviews :)

I’m committing a slightly different patch which preserves the actual exit status from antiword, as we previously relied on an empty output to detect bad files (rtf, text, text too small for antiword).