837183 writes

It seems "Term A Term B"o3 would give a result where both terms appear at distance of max 3 words from each other, in which order…

I’m wondering if there’s a way to specify the order. meaning Recoll would match only documents where Term A appear within 3 words of Term B..

medoc writes

"TermA TermB"o3 is a phrase search (the order of terms must be respected), with possile interspersed other terms.

"TermA TermB"po3 is a proximity search: termB can come first.

When in doubt use the "Show Query" link, it will show a PHRASE Xapian operator for phrase searches, and "NEAR" for proximity.

I suppose that you have seen the manual page to get to this point, but just in case:

837183 writes

It’s weird then..the query "potter harry"o6 would return PDF’s containing only "harry potter"

Since results are ordered by the term’s frequency of appearance..I went to the last ones, and manually searched for potter - seeing if 6 words from it I would find harry, I did not. I did however find harry potter. also the snippet window would show for this book only hits for harry potter..

medoc writes

Did you also check the metadata (the pdf properties) ? There might be a Potter,Harry hiding in there. You can check the first lines of output of the "rclpdf" filter. Any culprit should hide in the "meta" elements.

If this is not the reason, I would like to take a look at a sample document, if this is possible.

837183 writes

I’ll check the metadata..and if the string doesn’t appear there, ofcourse I’ll send you a sample, however all of the PDF’s I have are copyrighted, can I send you the file to jfd at recoll org ?

medoc writes

Sure, I’ll make sure I delete it after testing.

837183 writes

Ah..It’s not about that..I’m just explaining why I won’t post it here, it’s probably forbidden. will check the metadata now..

medoc writes

Ok, this was more or less tong in cheek :)

I thought of another possibility for the phrase search issue:

Harry Potter ! Harry Potter !

Recoll discards most punctuation, so this would be a match for both orders.

837183 writes

Oh, well, I’m your neighborhood asperger so I have an excuse for not getting it :)

I’ve opened the PDF and searched for every occurrence of "potter", then searching for a harry afterwards - Recoll may discard it but my eyes won’t!

I have a hit for "Katrina Hurricane"Co3 that only contains Hurricane Katrina. I’m uploading it to send to you

medoc writes

Congrats, you found a Xapian bug !

837183 writes

Take some credit too, sheesh :)