Recoll journal of user-visible changes
1.13.02
This version has a single fix to work around a problem in the Qt 4.6.1 uic utility. If you are not using Qt 4.6.1 and are currently running Recoll 1.13.01, you do not need to upgrade.
1.13.01
- Recoll has a new class of persistent external filters with the capability to process several documents, or multi-document files, in the same instance. Benefits: much faster image tag indexing, and new file formats. Except for the Perl image tag filter (because of ExifTool), the new filters are written in Python.
- New file formats: chm (microsoft help), zip archives, .ics calendar files. Individual pages in chm files are indexed and can be previewed. Zip is quite convenient for maildir archives (for example).
- Recoll can now use the output of the Beagle Firefox plugin to index visited web pages and bookmarks. This is only usable if Beagle itself is not running, else Recoll and Beagle will be fighting for the same queue.
- Big text files (like application logs) can now be paged for indexing, avoiding excess memory usage during indexing and improving the usability at query time. They can also be altogether skipped by setting a maximum size configuration parameter. These parameters have default values (1 MB and 20 MB) which change Recoll behaviour compared to previous versions. You can set textfilepagekbs and textfilemaxmbs to -1 in the configuration to restore the old behaviour.
- A cache was implemented for mbox message header offsets. This speeds up message previews for big mbox files.
- Miscellaneous usability improvements:
- Allow using page-up/down and shift-home to scroll the result list while the focus is in the search entry.
- Make 'Use desktop preferences' the default for new Recoll installations, and make this choice more prominent in the external viewer dialog.
- ^P starts the print dialog on a preview window.
- If a search has no result, alternate spellings are suggested. This feature is still a bit raw and will be improved.
- If the text of a document is empty, preview will switch to displaying the document fields.
- New entry in the result list contextual menu for opening the parent document of a result list hit with its native application. Useful for exemple for pages inside chm files.
- Indentation is now preserved when displaying text documents inside the preview window. This is particularly welcome for program source files.
- Allow substituting arbitrary fields in the result paragraph, using a %(fieldname) syntax
- The real-time indexing monitor will now accumulate modifications during 30 S before indexing.
- The indexer can now split camelCase words, allowing search on component terms. This is not enabled by default as it can confuse phrase searches (ie: "MySQL manual" is matched by phrase queries for "my sql manual" and "MySQL manual" but not "mysql manual"). Use "configure --enable-camelcase" to activate it.
- The ipath is now printed by default after the url in the default result list format.
- recoll_noindex and skippedNames can now be changed at any point in the tree (only for topdirs previously).
- Allow using location/application sensitivity in external viewer
choice. This uses several new functions:
- Allow the substitution of arbitrary document fields inside external viewer command line arguments.
- Allow field values to be set on all documents in a file system subtree. For example, you can set an application tag (ie: rclaptg = gnus) on all mailbox files under a specific directory.
- New syntax in mimeview for including the rclaptg field in viewer choice (mimetype|tagvalue = ...).
- Allow specifiying a specific default character set for mail messages. This is mainly useful for readpst dumps. All reasonable non-ascii messages specify their character set.
- Added a --without-gui configure option. Removes all X11 and Qt dependancies and only compiles the command-line interface.
- Improved the kio_recoll build. There is no need to run configure manually in the main directory any more. Ubuntu packages for kio_recoll are now built on the recoll-backports PPA on launchpad.net.
1.12.4
Bugs fixed:
- Qt4 version only: the search inside the preview window could become unbearably slow for big documents (quadratically so), and could not be interrupted (Qt bug). The Qt3 version of the code was included in the preview tool to restore good performance. This bug is the main reason for this release.
Build system improvements:
- Perform minimal base package configuration inside the kio cmake code to permit building it from scratch (without a build of the main code). Mainly useful for builds on the Ubuntu PPA.
- Implement a --without-gui option to build a pure command-line version with no Qt or X11 dependancies.
- Ensure that the user's PATH settings determine where we look first for qmake in all cases.
1.12.3
This is a bug fix release.
- Fix the sort tool which had been broken since 1.11 with some (or all?) qt3 versions.
- Catch two Xapian exceptions which could crash the GUI when a query was run while the index was being updated.
- Ensure that the result list right-click pop up menu will appear even when the click is inside a table.
- Fix the way we retrieve the Xapian library version to avoid GUI compilation problems.
- Inside the real-time indexer: only use the main thread to test that the X11 server is still alive. Multithreaded calls to x11IsAlive() would sometimes crash the process because of an X11 error.
- Define filter timeout so that a looping filter (ie: rclps trying to index loop.ps) will not completely stop the indexing. Default value: 20mn. Add loop.ps to skippedNames.
- Improve filter subprocesses management. Some could previously be left around after recollindex was killed. Improve cancellation request acknowledgment by recollindex (two ^C were sometimes necessary to make it terminate).
- Signals SIGUSR1 and SIGUSR2 are now blocked in addition to INTR/TERM/QUIT.
- Extended attributes indexing now works for all file types.
- Ensure that queries started from the command line are handled as normal ones (they previously could not be sorted).
- Improve man page indexing: do not index section header terms.
1.12.1
This is a very minor release, mainly to fix compilation issues and a few very minor bugs. No need to upgrade if you don't experience these.
- Fixed compilation errors for new gcc and gnu libc.
- Use groff html output in rclman to get rid of control characters in output (improve manual pages indexing). Fix 8bit character issues in file names in rcllyx.
- Fixed command line arguments processing problem with "recoll -q"
1.12.0
- Recoll now implements a KIO slave to allow searching directly from KDE applications. This does not affect the main application and is not enabled by default (go to the kde/kio/recoll source directory for build instructions).
- Recoll now computes md5 checksums for all indexed documents and optionally collapses duplicate entries inside the result list. This needs a full reindex to become effective for older documents already in the index. The option to activate collapsing is in the Query Configuration.
- Typing F1 anywhere in the GUI should bring up the appropriate section of the manual in the application configured for viewing HTML documents.
- The result list right click menu now has an entry to save the document to a file. This is only enabled for documents contained inside another file (ie, messages inside an mbox folder, or attachments), and is especially useful for extracting an attachment with no associated external editor.
- The preview window now has a right-click menu, with an entry to toggle between viewing the main text or all the metadata for the document. This is most useful in the case where the search match actually occurred in a field not visible in the main text (ie: author or HTML title).
- Words glued by an underscore character like compound_word are now split during indexing, and will be found when queried either as themselves or in a search for the components.
- There is now a size limit over which no attempt will be made to uncompress/identify/index compressed files. Not active by default, to be set in the Indexing Configuration.
- Added support for fetching field values from extended file attributes. This is not enabled by default, use configure --enable-xattr. You'll also need to set up a map from the attributes names to the Recoll field names (see comment at the end of the fields configuration file.
1.11.4
- Bugs fixed: check the list.
- The right-click menu "Copy" commands inside the result list now copy to the clipboard in addition to the main selection, enabling subsequent ^v commands.
1.11.0
Recoll release 1.11 has relatively extensive changes that have necessitated a modification of the index format. Hence installing this release implies a full re-indexing, which is enforced by the software.
- Filtering on category (message/text/media etc.) as a function of the main window for quick access.
- Use html for preview when available (ex: html files or "colorized" python) instead of converting to text. This can be turned of in the preferences.
- New Python query and index interfaces. The Python query interface will be used for building a Xesam adapter for Recoll when the specification is stabilized, and could be useful for other things, such as indexing contents from an RDBMS (see the manual for details). Restructured and cleaned up internal Recoll interfaces.
- Improved filter framework. Can now process either html or text output from the filters, and more easily execute "raw" commands instead of Recoll scripts. Avoided wasteful repeated execution of filters for which the helper application is missing.
- Query language now closer to Xesam specification, (but still far from a complete implementation). See the Recoll manual and http://www.xesam.org/main/XesamUserSearchLanguage
- Much improved configuration for fields. Fields like "author" can now be specified as storable (displayable in results) and/or indexed (searchable). Added alias facility for translating from user-level names to internal.
- Added "recipient" as an indexed/searchable field for emails.
- rcltext filter for processing text such as C code for which no specific processing is needed when indexing but a specific viewer is desired.
1.10.6
- Fix a simple and mildly nasty bug that would cause the indexer to stop indexing an mbox on encountering a specific but not exceptional error condition (like a few dozen errors while indexing attachments for which no filter was installed).
1.10.5
- Ensure that file names indexed as terms don't overflow the maximum term size.
- Handle non-standard date format in mbox separator lines sometimes generated by thunderbird.
- Use attachment file names to help identify a better mime type for parts only described as application/octet-stream
- For Phrase/Near searches, highlight all term groups in preview, not just the first
- Added Open XML filters
1.10.2
- Fixed openSuse 11 compile issues.
- Fixed bug in interpreting email mime structure, which resulted in base-64 decoding errors.
- Fixed "Prev" button in preview window. Would actually go forward when walking the search terms.
- Allow setting the highlight color for search terms in result list and preview (yes: feature change, should have waited for major release...)
- Added svg filter
1.10.1
- Ensure that in case the data of a file can't be indexed because of some error, at least the file name is indexed.
- Improve query language to support OR queries of terms with field specifications (ie: title:someterm OR author:someauthor).
- Fix filename search to split patterns on white space, so that a "*.jpg *.jpeg" search does what's expected. Means you now need to use double-quotes if there is actual embedded white space.
- Jump directly to the external editor choice dialog instead of opening preferences when an external viewer is not found.
- Allow stopping indexing through menu action (only works with qt4 for now).
- Create an "indexedmimetypes" configuration variable to allow explicitely restricting the file types which do get indexed.
1.10.0
- Added a GUI dialog to configure the indexing parameters.
- Added better support for indexing CJK text (Chinese, Japanese, Korean). Please note that: - You will need a full reindex to take good advantage of this. (You *don't* need to reindex if you don't need to search CJK, even if there is some in your index). - When entering CJK search terms, words (single or multiple characters) should be separated with white space. - The specific CJK processing can be turned off by setting the nocjk variable to true in the configuration file (this may make sense if you have a mixed cjk/other document base and don't want to index the cjk part, as it will save some disk space and a minuscule amount of cpu).
- Changed the way Recoll handles searches including composite words (like an email address). The new approach looks saner, but could have side-effects, please report any problems in this area.
- The query language got a new "dir:" specifier to filter results on location.
- New rclimg perl filter for better indexing of picture tags, thanks to Cedric Scott. This depends on Exiftool.
- New rcltex filter.
- Changed and improved how the preview window local search finds the query terms, this does not involve weird characters any more. The display is cleaner and cut and paste works better.
- Fixed the fact that a newline-separated word list in simple search would wrongly trigger a phrase search.
- Fixed the way we input text to the preview textedit (the old way would sometimes confuse the window into displaying tags instead of acting on them).
- Fixed transcoding to utf-8 for text/plain email attachments
- Improved mbox From_ line detection
- Added indexedmimetypes variables to allow restricting the list of indexed mime types.
- KDE kicker applet: start a recoll search from the