rdzidlic writes

it appears recoll would reindex even messages it has already seen?

That would certainly make a good optimization because some of the messages contain attachments like big tarballs.

Should be doable, the messages have a date, msg-id and if really paranoid size and md5 could be checked.

medoc writes

Thanks for this report and the others, this testing and detailed analysis are very useful. I’m a bit swamped at the moment, but I’ll come back to the reports as soon as I can.

medoc writes

Changed from bug to enhancement. As you write, this is doable, but certainly not simple, as somhing needs to be stored in the index to identify the already indexed messages (probably a hash of the msgid inside a Xapian value, but this really needs a detailed study).

rdzidlic writes

another idea to get something implemented quickly: assuming that messages are always appended and old messages rarely deleted/changed this would work:

Fingerprint something like the first 95% of an MBOX and assuming the fingerpint of this part remained the same reindex only the rest.