Recoll uses external applications to index some file types. You need to install them for the file types that you wish to have indexed (these are run-time optional dependencies. None is needed for building or running Recoll except for indexing their specific file type).
After an indexing pass, the commands that were found missing can be displayed from the recoll File menu. The list is stored in the missing text file inside the configuration directory.
A list of common file types which need external commands follows. Many of the filters need the iconv command, which is not always listed as a dependancy.
Please note that, due to the relatively dynamic nature of this information, the most up to date version is now kept on the Recoll helper applications page along with links to the home pages or best source/patches pages, and misc tips. The list below is not updated often and may be quite stale.
For many Linux distributions, most of the commands listed can be installed from the package repositories. However, the packages are sometimes outdated, or not the best version for Recoll, so you should take a look at the Recoll helper applications page if a file type is important to you.
As of Recoll release 1.14, a number of XML-based formats that were handled by ad hoc filter code now use the xsltproc command, which usually comes with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
Now for the list:
Openoffice files need unzip and xsltproc.
PDF files need pdftotext which is part of the Xpdf or Poppler packages.
Postscript files need pstotext. The original version has an issue with shell character in file names, which is corrected in recent packages. See the the Recoll helper applications page for more detail.
MS Word needs antiword. It is also useful to have wvWare installed as it may be be used as a fallback for some files which antiword does not handle.
MS Excel and PowerPoint need catdoc.
MS Open XML (docx) needs xsltproc.
Wordperfect files need wpd2html from the libwpd (or libwpd-tools on Ubuntu) package.
RTF files need unrtf, which, in its standard version, has much trouble with non-western character sets. Check the Recoll helper applications page.
TeX files need untex or detex. Check the Recoll helper applications page for sources if it's not packaged for your distribution.
dvi files need dvips.
djvu files need djvutxt and djvused from the DjVuLibre package.
Audio files: Recoll releases before 1.13 used the id3info command from the id3lib package to extract mp3 tag information, metaflac (standard flac tools) for flac files, and ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a single Python filter based on mutagen for all audio file types.
Pictures: Recoll uses the Exiftool Perl package to extract tag information. Most image file formats are supported. Note that there may not be much interest in indexing the technical tags (image size, aperture, etc.). This is only of interest if you store personal tags or textual descriptions inside the image files.
chm: files in microsoft help format need Python and the pychm module (which needs chmlib).
ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar module. icalendar is not needed for newer versions, which use internal code.
Zip archives need Python (and the standard zipfile module).
Rar archives need Python, the rarfile Python module and the unrar utility.
Midi karaoke files need Python and the Midi module
Konqueror webarchive format with Python (uses the Tarfile module).
mimehtml web archive format (support based on the email filter, which introduces some mild weirdness, but still usable).
Text, HTML, email folders, and Scribus files are processed internally. Lyx is used to index Lyx files. Many filters need iconv and the standard sed and awk.