837183 writes

I’m sorry, this is a question..not a defect, enhancement request or task..

My ebooks library is getting very big, it takes weeks to index it (1 TB of PDFs), and also - once indexed, a simple search takes minutes.

This is understandable, because that’s a lot of data.

But I’m wondering if there’s a way to help Recoll be faster. will buying a better hardware (CPU?) make Recoll index faster and search faster?

medoc writes

Weeks and minutes seem a lot, even for 1 TB of source documents. I have a number of questions first:

  • What recoll version are you using ?

  • What is the size of the index itself (du $HOME/.recoll/) ?

  • What hardware are you running this on at the moment ?

  • By simple query, do you mean a single-term query ? Or something a bit more complicated?

I have little experience with datasets of this size, but I know that there are others, and the Xapian people have quite a lot, so I’m reasonably certain that there are things to be done.

medoc writes

Hi, I do have ideas about what to do to speed things up, but I need feedback. Please re-open if you want.