Unknown reporter writes
First of all let me express my gratitude for providing such a powerful desktop search engine for Linux. Coming from Windows, I have had my fair share of experience with Adobe’s sluggish pdf indexing options and was relieved to find that recoll offered a far better experience.
There’s one feature, however that I feel recoll is lacking and which to my mind would greatly add to an already very good user experience.
That feature is displaying the document page on which the search term( was found. As of right now, searching through an index of long pdf documents such as textbooks or papers can be fairly tedious because in order to get to your search results you have to separately search for the excerpt within the document itself.
It would be far more convenient and a major perk to recoll if there was an option to display the page number on which the quoted excerpts have been found. I have no idea if such an implementation is even possible but it would be great if it was.
I have added a screenshot explaining the structure of the Adobe search engine results page for reference.
Thank you, again, for you hard work and also for taking your time to read this.
medoc writes
Thanks for suggesting this. I am not completely sure how and if this function would be feasible with recoll and poppler (the pdf to text extraction tool).
I am leaving for a few days now, but I’ll give a better thought to the question when I come back, and communicate my findings :)
Unknown User writes
Thanks for your reply!
I don’t know how much this will help but at least as far as output is concerned you can use evince to draw a specific page of a document by simply using the following parameters:
{{{ #!bash
evince "FILENAME" --page-index=NUMBER }}}
Acrobat Reader offers a similar functionality with
{{{ #!bash
acroread /a "page=NUM & highlight=lt,rt,top,btm" PDF_FILE.pdf }}}
where lt,rt,top,btm are the coordinates of the highlighted rectangle starting from the top left corner.
These findings might be trivial but I figured posting them could not hurt, either. Finding the page number and is a probably far more difficult task and I sincerely hope that there is some way of getting that information from libpoppler.
Unknown User writes
Hi again, I have recently stumbled upon this nifty command-line utility that does exactly what I would love to have in recoll: http://pdfgrep.sourceforge.net/. Manpage: http://pdfgrep.sourceforge.net/manpage.html
This command displays all instances of a search term within a document, including their respective page numbers: {{{ #!bash pdfgrep -n SearchTerm File(s)
}}}
Documents can then be easily opened through evince or acroread as described above.
Is there any way you could implement this in recoll?
medoc writes
Hi, Sorry to be so slow to respond and thanks for the new input. I hope to be able to get back to this week.
medoc writes
Hi,
I have added code to the current tip version (91ecb1568d6f) on bitbucket for displaying page numbers inside abstracts and open pdf documents at a page corresponding where a term match occurs (for example you could use {{{evince --page-label=%p %f}}} inside mimeview for this to work.
I think that it would be interesting that you try this version when you have a moment, so that we can then refine the functions (maybe add something to directly open the doc at a position matching a given snippet for example).
I can help for the details of installing from source, or maybe produce a binary package if you tell me what system you are running. This would be easier through email: jf at dockes dot org.
Unknown User writes
Hi,
I have dispatched an email to the address you provided with further details. Can’t wait to try it out!
yotatoy writes
I too was looking for this feature, I am running on Ubuntu 12.04, I initially installed teh version in the software center (1.16x), I then added the PPA and was able to install version: Recoll 1.17.3 + Xapian 1.2.8.
I would love for this feature to be integrated into the stable release, although I might also try to compile it?
I will shoot you an email as well.
Thanks!
medoc writes
Closing the issue as this is mostly implemented in the version currently available on the experimental PPA, with complements present in the current source trunk. This feature will be in the next release.