orbisvicis writes

The most accurate preview would be generated within firefox, but I rather not modify the browser extension. There is an html file on-disk, what steps need to be taken for recoll to display an image preview?

medoc writes

The firefox extension will need changing anyway, because it will not survive the firefox switch to multiprocess rendering (e10s, electrolysis).

The recoll preview is text-oriented, and I really don’t believe it’s the right tool for rendering an accurate likeness of the original page (it might be possible to do something like what is done for jpegs: default display of image with right-click option to see extracted text, but I really think that this is a dead-end).

Actually, I wonder if the best approach for what you seem to be wanting to do would not be to use another extension: https://addons.mozilla.org/fr/firefox/addon/mozilla-archive-format/

recoll already properly indexes maff files (they are just zip files), and the only thing lacking as far as I can see on a vanilla ubuntu install would be to teach the system to open .maff files with firefox (you then just use the Open Parent popup entry from a search result).

It seems to me that this is more adapted to managing an archive than the recoll web page indexer, which is more purposed to be an automatic memory helper for recently visited pages.

What would be the problem in doing this from your POV ?

orbisvicis writes

I agree that maff is better at saving web-pages than recoll’s extension, so it is something that should be looked into when WebExtensions and e10s become mandatory:

  • longer term storage: includes more linked content

  • better management: filesystem vs trcircache

  • more metadata: firefox-specific metadata (scroll position, zoom-level)

However, it complicates things:

  • content extraction: how to extract html content to text when the page is stored in the maff format?

Without solving the actual problem:

  • image preview: the maff format doesn’t store a preview of the saved page. Even if it did (probably not a good idea), or if the preview was saved alongside the archive, the problem remains: how to get recoll to show a preview of the archived web-page within the frontent?

I believe recoll’s firefox extension is suits my needs: automatic memory helper (database) for selected pages. It isn’t just for recently visited pages, but all pages I’d like to save, old or new. A text search of recoll’s index can present a sorted list of possible matches, but I find that image recall is more effective than text-based recognition (via the text-based preview) for quickly identifying the intended page.

The only approach to accurately generate previews is within firefox - as firefox renders it, including any possible addons. For example, [this screenshot plugin](https://addons.mozilla.org/en-US/firefox/addon/abduction/?src=search) seems to do everything we need:

  • screenshot of entire page, even off-screen

  • appropriate width/zoom maximization to minimize borders

  • square composition of long web-pages - cuts and reassembles long images into rectangles of near-square ratios.

medoc writes

maff is already supported by the indexer, these are zip files actually, so indexed like a zip archive. If you associate on the desktop the maff file type with a browser which understands it, an Open Parent from the hit on the html page inside the result list will properly open the maff file.

Also, a variation on the zip handler could be used for maff files so that the html is cleaned-up, but I don’t know how this would play with internal links.

About the image preview: I could see this making sense as part of the result list, but using the preview window is not much simpler than just opening the doc in Firefox, so I don’t see much point in doing this.

Displaying vignettes as part of the result list would be possible, it’s done for images, we could probably reuse and extend the mechanism for other docs, but it’s a significant development, and it’s not obvious how to combine operations performed by several firefox extensions (save to maff, generate screenshot), so that it’s basically one click and everything goes where we want it.

medoc writes

Status on this: if someone develops something around the current extension, so that it is able to save an image preview somewhere in the queue directory, I’ll do the rest of the work to arrange for the image to be displayed in the result list. Developping Firefox extensions is like pulling teeth for me, I already have a lot of trouble with the basic one. FYI it appears that the image-capturing to use is now https://addons.mozilla.org/en-US/firefox/addon/abduction-fixed/?src=search