Unknown reporter writes

I have indexed the same page twice via the firefox extension, and per

$ strings webcache/circache.crch   | grep -i url

that page is now stored twice in the web-cache. The page has changed, but not content wise - just script ids, etc - inconsequential changes. However only one page is indexed as seen with the query dir:/ (only one result for the duplicated url), and the index references only the latest content as verified by saving a copy via the recoll gui (the copy matches the file last saved by the firefox extension in .recollweb/ToIndex).

medoc writes

The cache is a circular buffer. The older entry will be overwritten when new data comes over it.

I now think that designing the cache this way was a very bad idea (and the implementation was vastly more complicated than expected), but this is not a bug, I’d need to re-design and re-code the web cache to change this behaviour.

There is a utility somewhere which could be used to compact the file, ask me for details if you need it.