biomimetics writes

Hi,

is there the chance to have something like removal of complete phrases before the file is put into the xapian dababase? in our case we send out many batch letters, where 90% of the content is not of any relevance for searching later on. so it is almost crucial to remove this beforehand.

best regards

robin

medoc writes

Hi,

If these documents are all in the same format, the simplest approach I think would be to pre-process the text inside the appropriate filter (e.g. rcldoc if these are msword).

You can set an alternate filter inside $HOME/.recoll/mimeconf, this is documented in the manual. If you have trouble with this, contact me by email.

I think that a feature like this makes more sense as a local customization than as a recoll function.

medoc writes

This would be too complicated to solve in a generic manner. Much easier as local change to a specific filter