Recoll is a desktop full-text search tool.
Recoll finds keywords inside documents as well as file names.
- Versions are available for Linux and MS Windows.
- A WEB front-end with preview and download features can replace or supplement the GUI for remote use.
- It can search most document formats. You may need external applications for text extraction.
- It can reach any storage place: files, archive members, email attachments, transparently handling decompression.
- One click will open the document inside a native editor or display an even quicker text preview.
- The software is free, open source, and licensed under the GPL.
- Detailed features and application requirements for supported document types.
Recoll is based on the very capable Xapian search engine library, for which it provides a powerful text extraction layer and a complete, yet easy to use, Qt graphical interface.
Recoll will index an MS-Word document stored as an attachment to an e-mail message inside a Thunderbird folder archived in a Zip file (and more...). It will also help you search for it with a friendly and powerful interface, and let you open a copy of a PDF at the right page with two clicks. There is little that will remain hidden on your disk.
Recoll user ? Maybe there are still a few useful search tricks that you don't know about. A quick look at the search tips might prove useful ! Also the Faqs and Howtos section, and some contributed result list formats.
Recoll borrows a lot of code from other packages, and welcomes code and ideas from contributors, see some of the Credits.
- Release 1.23.3 has some minuscule changes and fixes.
- Finalizing the move to the new site, I am closing the old BitBucket project. The existing BitBucket issues have been archived.
- The source code repository and issue
tracker are moving to a
- Release 1.23.2 has gotten much better at processing PDF XMP data.
- Release 1.23.2. This fixes a couple of quite serious bugs. See the Release notes
- Release 1.23.1. See the Release notes
- Release 1.22.4 is available and fixes an ennoying qt5 glitch (advanced search 'start search' button doing nothing). Release notes
- Release 1.22.3 is available. This is going to replace 1.21 as the main release. See the the release notes. Some input handler dependancies have changed.
- Release 1.21.7 fixes an ennoying but benign GUI crash-on-exit bug reported on Fedora 23 (qt5).
- I experimented with installing the Recoll Web UI with Apache, and found out that this is really easy, actually both easier to set up and more useful than running it standalone. Recently added: instructions for running with Nginx instead of Apache.
- Found a GUI crash bug with a reasonably easy workaround.
- Release 1.22.0 is now available from the download area. The binary packages should wait until enough brave souls have tested it. See the the release notes.
- Release 1.21.6 adds KDE5 compatibility for the KIO slave.
- Release 1.21.5 is out. It fixes a relatively nasty bug affecting all previous 1.21 versions: the query language parser processed incorrectly multiple mime type or category specifications, with missing results as a consequence
- It seems that we currently have a relatively frequent problem resulting in damaged indexes. If you are experimenting heavy reindexing (incremental indexing takes longer than it should), or missing search results, please take a look at the top of the known bugs page
MS-Windows. Still a few things missing (like
real-time monitoring), but it does work, and it has a proper
installer, so you can easily get rid of it if you don't like
it. Have a look..
This is an almost-native port, based on Qt and the Windows
API, no need for Cygwin. Thanks to Christian Motz for
helping with the filter interface (and the rest). I would
love some feedback!
- A bug in the verification of configuration file path variables generates spurious warnings from recollindex when the skippedPaths variable contains elements with wildcards. This has no consequence except for the spurious error message.
- Release 1.21.2 is out, and replaces 1.20 as production release.
- A new rclpdf filter, with improved compatibility with recent poppler pdftotext versions. See rclpdf filter.
- Recoll 1.21.0 is out. This has a new query parser and should be considered an instable release, please do not package it (1.20.6 is the one you want for stability). It also changes the way filters are executed for better performance. See the release notes for more detail about the few other changes.
- Recoll 1.20.6 is out, with mostly small fixes to compressed file handling, which may make a big difference in some cases. See the release notes. Of course it also incorportates the Qt 5 compatibility from 1.20.5 (Qt 5.3.2 ok, 5.2 does not work).
- Recoll 1.20.4 released. This fixes real time indexing of the web history (when using the Firefox plugin).
- Unrtf 21.8 has been released. This fixes many issues in unrtf, some with possible security implications. You really want to use this version.
- Recoll 1.20.1 is out and replaces 1.19 as the main version. I have been using 1.20 for months (along with a number of fearless builders-from-source), and it's as stable as 1.19, with nice small new features. Packages will follow shortly. It is recommended (but not strictly required, see the notes) to run an index reset when upgrading.
- The aspell command used for orthographic suggestions is broken on Debian Jessie (because of an aspell packaging issue), and this will not be fixed for the Debian release. See the simple workaround here.
- If you are still running anything
older than 1.19.14p2, YOU SHOULD
particular, this index
corruption issue leading to repeated reindexing of
documents, and possibly query problems too, can be pretty
GOTO download and install 1.19.14p2 or 1.20. Reset your index after upgrading (rm -rf ~/.recoll/xapiandb).
- A nice new application to complement Recoll: recollfs implements a Fuse filesystem where Recoll queries are represented as directories, the contents of which are links to the result documents.
- Recoll version 1.19.14p2 fixes more resource management issues in the Python module (only the Python package needs upgrading for this), and the processing of Bengali characters (no more diacritics stripping).
- An updated filter for Open/LibreOffice documents. The previous version merged words which were tab-separated in the input.
- The source tarball for version 1.20.0 has been released. This version has a number of improvements over 1.19, but also some incompatibilities. The first minor releases for 1.20 may contain some functional changes in addition to bug fixes, so they may be slightly less stable than 1.19, and 1.19 packages remain the "safe Recoll" for now. Still, if you build from source, there are a few nice things in 1.20...
- Version 1.19.14 is out and fixes a handful of minor-to-ennoying indexing glitches (see the Release notes).
- Version 1.19.13 is out and hopefully fixes the remaining (rare) crashes of multithreaded indexing.
- I have separated the code for the Recoll Unity Scope from the main body of code, in hope that it may interest someone to work on it. It's Python and simple, mostly depending on the Unity API. The Ubuntu Unity API is apparently going to change *again* for the next version, and I think I've seen enough of it.
- 1.19.12 is out. It's mostly identical to 1.19.11 apart from a new parameter to change the max size of stored attributes. No need to update in general.
- I hear from time to time about
recollindex crashes. These appear to be quite rare, but they
do happen, and I think that they are linked to a yet unfound
bug in multithread indexing. If you experience such crashes or
stalls, you can disable multithreading by adding the following
to your recoll.conf:
thrQSizes = -1 -1 -1
- While working on a Recoll-Mutt interface I discovered incidentally that the Recoll Webui Web interface works quite well with the links web browser inside a terminal window. This appears to be an interesting solution for people looking for a search interface usable in a non-GUI environment.
- A new filter for PowerPoint files. The previous one was based on the ancient catppt from the catdoc utilities and usually extracted nothing from more recent PowerPoint files (this is about .ppt: .pptx is handled by a native Recoll filter).
- Sometimes things just work...
- Thanks to some of its users, Recoll now has filters to index and retrieve Lotus Notes messages (some implementation notes from an early user), and there is also now a Web browser interface for querying your Recoll indexes.
- A problem with a simple workaround has caused
several reported recollindex
crashes recently (for 1.17). If you store and index
Mozilla/Thunderbird email out of the standard location
(~/.thunderbird), you should add the following at the end of
your configuration file (e.g.:
[/path/to/my/mozilla/mail] mhmboxquirks = tbirdAdjust the path to your local value of course... Without this hint, recollindex has trouble finding the message delimiters inside the folder files, and will possibly use all the computer's memory and crash. Apart from crashes, which only occur for very big folders, this also causes incorrect mail indexing.
- A new user-contributed script for those who use real-time indexing on laptops: stop or start indexing according to AC power status. See the details on the Wiki.
- We now have a Chinese user manual: Recoll现在有中文手册咯： Recoll中文手册，HTML