mschwipps writes
I add OCR with tesseract for pdf files in rclpdf.
medoc writes
Thanks for doing this. The program has a few issues (for example it fails if tesseract is not installed, and it runs pdftotext twice, which is costly for text pdfs), but I’ll use it as a basis for adding a tesseract option to the current filter.
medoc writes
I just pushed a new rclpdf to the repository. The main code is from your version with the following differences: - The filter works even if tesseract is not installed - There are a number of ways to configure the ocr language - pdftotext is run only once
The OCR function is disabled by default. If you want to use the Recoll version of the filter at some point, see the comments at the top of the file for the simple way to configure it.
Thanks again ! Would you like your name in the file ?