naufraghi writes

I think it could be usefull to support some sort of local config.

For example one can have an option to parse the .hgignore and follow locally the rules. Another option can be to have a .recoll local config to specify new options that override locally the global ones.

medoc writes

There is already some form of local configuration in the recoll.conf file, for example the list of ignored file name patterns (skippedNames) can be redefined for any subtree. While I agree that local ".recoll" files might be nice (ie: because they move along when you rename things), they would also be only usable in areas of the file system on which you have write access, so the issue is balanced, and I’d have a tendancy to stay with what we have currently.

Please have a look at what you can currently change locally (in the GUI indexing configuration "local" section), I’m quite interested to hear what would be missing there.

About interpreting .hgignore: I agree that this would be nice, but a bit farfetched, because then, what’s to stop us from doing the same thing for git, svn and CVS ?

I am thinking of another approach which is not enabled currently but would be really easy to implement (a few lines of code): the recollindex command line program can index individual files with the "-i" option. It will read file names on stdin if no parameters are given (find /my/top/dir | recollindex -i). Currently, this respects the skippedNames and skippedPaths from the configuration, but it would be really easy to add a command-line option to ignore these values. In this case, if you have a very special directory, like a mercurial repository, you could add it to the skippedPaths and index it using find or any kind of script to generate the file list. recollindex normally performs the up-to-date checks when it works this way, everything is normal except the file list generation.

What do you think ?

naufraghi writes

Ok, I agree that moving away from the global config is a burden but I see a big plus. Like for .hgignore when you get a package you get the author config too. The same can be for read olny folders, the manager can define the indexing rules and all users will inherit the work. Moreover the config is near to the data. Perhaps desktop users will prefer continue using the GUI but console users perhaps would prefer a local config.

Considering .hgignore, you are right, but I imagine that an user configured filter can do the dirty work, like for content in mimeconf and convert the .??ignore syntax to recoll skipped?? lines. In the meanwhile this script can be used to generate the local config sections: https://gist.github.com/973260 Obviously some advanced syntax will not work, there is space for improvements.

Regarding "recollindex -i" I think is a good idea, this way one can let recoll do the hard work and move out the indexing logic if needed.

medoc writes

> Perhaps desktop users will prefer continue using the GUI but console users perhaps would prefer a local config.

Just so that there is no confusion: the current indexing configuration was primarily intended to be edited by hand, the GUI is just a help. Command line users (which I am) should have no trouble with this.

About the hgignore- >recoll.conf script, I get the idea but, as you rightly say, I’m afraid that fully converting the rather different logic of hgignore would need a little more code… Especially it seems probable that a static approach can’t work.

Which is why I really prefer the recollindex -i approach. This way, you can use whatever rules you like for selecting files, using any tool, and just pass the list to recollindex. In the case of Mercurial, I think that a simple variation on "hg status" would nicely do the job, letting mercurial deal with its own complexities :)

In practise, something like the following will be possible in the next release:

in recoll.conf, add the mercurial repository to the skippedPaths:

skippedPaths = … /path/to/hg/repository …

Then have a script like the following in cron (not tried, may need some small adjustments):

cd /path/to/hg/repository hg status -mac -n | recollindex -i -f

The -f says to ignore skippedPaths, this is missing in 1.15.8, see commit 2287 <<changeset 3739490d88d8 > > , which should work ok as a patch to 1.15.8

I’m marking this as resolved because I’m pleased with my solution :) but don’t hesitate to bugger me more if you have more ideas, your suggestion lead to a really useful new function in recoll. Thanks. jf