r0bis writes

Hi, thanks for the great tool!

Unfortunately I get crashes on ABRT (Fedora 14 Laughlin)

{{{ #!shell

Package: recoll-1.14.2-2.fc14 Latest Crash: Wed 15 Dec 2010 09:28:27 Command: Reason: HTMLParser.py:115:error:HTMLParseError: EOF in middle of construct, at line 273, column 13 }}}

When I try to index a chm file separately via recollindex, it gets processed alright (as far as I can tell). Any ideas why this daily error keeps popping up?

backtrace from ABRT says:

{{{ #!shell

HTMLParser.py:115:error:HTMLParseError: EOF in middle of construct, at line 273, column 13

Traceback (most recent call last): File "/usr/share/recoll/filters/rclchm", line 150, in <module > e.mainloop(rclCHM(e)) File "/usr/share/recoll/filters/rclexecm.py", line 117, in mainloop self.processmessage(processor, params) File "/usr/share/recoll/filters/rclexecm.py", line 85, in processmessage if not processor.openfile(params): File "/usr/share/recoll/filters/rclchm", line 134, in openfile self.tp.close() File "/usr/lib64/python2.7/HTMLParser.py", line 112, in close self.goahead(1) File "/usr/lib64/python2.7/HTMLParser.py", line 164, in goahead self.error("EOF in middle of construct") File "/usr/lib64/python2.7/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParseError: EOF in middle of construct, at line 273, column 13

Local variables in innermost frame: message: EOF in middle of construct self: <main.ChmTopicsParser instance at 0x7fc6b8b62878 >


Many thanks for looking into this.


medoc writes

Hello and thanks for reporting this,

My feeling is that this is a python/libchm issue, but I would like to confirm it (a similar issue has been reported where python actually crashed with SIGSEGV instead of throwing an exception, ouch…).

In order to further investigate, I would really like to have a copy of the chm file which is causing the problem. Unfortunately, its name is not in the stack trace, but it should be relatively easy to determine by looking at the recoll log file. If the chm file is non-confidential data, please send it to me (jf at recoll dot org), or attach it here.

You may have to configure the recoll log, because it goes to stderr by default. This is done in the indexing preferences section of the gui configuration menu. I think that you should keep it at a fairly low level (ie: 2), so that the error won’t be swamped by debug info.

I plan to catch all python exceptions in a future version of the common python filter code. This won’t change the actual error, but at least there won’t be system reports about such a benign issue (errors will still go to the recoll log for those interested).

medoc writes

Closing this for lack of data. The python common code now catches all exceptions from the type-specific parts.