ddio writes
Hi,
I’m getting this error for many files, but I can’t see the reason for it.
The files I get this error for, I’m only seeing this:
and then a few lines later:
all the other lines near that are containing other files.
recollq is not finding this file with words I tested, which are inside that xls, obvisouly I didn’t test all words, so I don’t know if it has indexed anything for that file.
What could this error mean?
Thanks!
medoc writes
My best guess would be that the xls text extractor does not like the format of these files. The extractor is using Python code written by the libreoffice people for testing their Microsoft importer/exporter.
Any chance that you could share one of these files, so that I can verify that this is a file format issue, and maybe talk to the Python extractor authors about it ?
Otherwise, maybe you could try to run the command line extractor and see what it has to say:
/usr/share/recoll/filters/xls-dump.py --dump-mode=canonical-xml --utf-8 --catch /path/to/some/file.xls
ddio writes
Hi,
Thanks for your answer.
After updating to recoll 1.21.7 (had 1.19.5 before) and fixing xls2csv (the command was xls2csv-catdoc, so I symlinked it to xls2csv) it is now indexing this xls file.
But from my understanding after 1.19.12 it doesn’t need catdoc anymore and uses the internal python scripts, but this doesn’t seem to be the case here.
Is this perhaps a compiler option or sth?
The command you suggested does work:
#!xml
<xls-dump >
<workbook encrypted="false" >
<workbook-global/ >
<worksheet first-defined-cell="$A$2" first-free-cell="$P$49" name="Tabelle3" version="BIFF8" visible="true" >
<row id="1" >
<label-cell col="8" value="DATUM:"/ >
<label-cell col="1" value="ANFRAGE [__] oder AUFTRAG [__]"/ >
<label-cell col="4" value="KOMMISSION:"/ >
</row >
<row id="3" >
<label-cell col="0" value="Kühlkorpus"/ >
<label-cell col="8" value="Abdeckung"/ >
<label-cell col="4" value="Einbauten"/ >
</row >
<row id="4" >
<label-cell col="0" value="Ausführung"/ >
<label-cell col="1" value="Novis"/ >
<label-cell col="4" value="Züge/Einteilung"/ >
<label-cell col="5" value="B4/B5 360/365mm"/ >
<label-cell col="8" value="Links"/ >
<label-cell col="9" value="Wulstrand"/ >
</row >
<row id="5" >
<label-cell col="1" value="Exclusiv"/ >
<label-cell col="4" value="Flaschenhöhe"/ >
<label-cell col="5" value="C4/C5 305/430mm"/ >
<label-cell col="9" value="Aufkantung 27 mm"/ >
</row >
<row id="6" >
<label-cell col="1" value="Royal"/ >
<label-cell col="5" value="D4/D5 185/265/265mm"/ >
<label-cell col="9" value="Aufkantung ….. mm"/ >
</row >
<row id="7" >
<label-cell col="1" value="Novis Brillant"/ >
<label-cell col="5" value="M4/M5 245/245/245mm"/ >
<label-cell col="9" value="Kasten 200 mm"/ >
</row >
<row id="8" >
<label-cell col="1" value="Exclusiv Brillant"/ >
</row >
<row id="9" >
<label-cell col="8" value="Rechts "/ >
<label-cell col="9" value="Wulstrand"/ >
<label-cell col="4" value="Fassabteil"/ >
<label-cell col="5" value="Tür"/ >
</row >
<row id="10" >
<label-cell col="0" value="Beschlag"/ >
<label-cell col="1" value="Stangenbeschlag schwarz/chrom"/ >
<label-cell col="5" value="Tür für Fass"/ >
<label-cell col="9" value="Aufkantung 27 mm"/ >
</row >
<row id="11" >
<label-cell col="1" value="Stangenbeschlag chrom"/ >
<label-cell col="5" value="Rost "/ >
<label-cell col="9" value="Aufkantung ….. mm"/ >
</row >
<row id="12" >
<label-cell col="1" value="Griffleisten ohne Schloß"/ >
<label-cell col="5" value="Beleuchtung nur bei Rost"/ >
<label-cell col="9" value="Kasten 200 mm"/ >
</row >
<row id="13" >
<label-cell col="1" value="Griffleisten mit Schloß"/ >
</row >
<row id="14" >
<label-cell col="8" value="Vorne"/ >
<label-cell col="9" value="Wulstrand"/ >
<label-cell col="4" value="Beckenabteil"/ >
<label-cell col="5" value="Blende links"/ >
</row >
<row id="15" >
<label-cell col="0" value="Sockel"/ >
<label-cell col="1" value="Standard Alu 40mm = 960 mm"/ >
<label-cell col="5" value="Blende rechts"/ >
<label-cell col="9" value="Aufkantung 27 mm"/ >
</row >
<row id="16" >
<label-cell col="1" value="Standard CNS 30mm"/ >
<label-cell col="5" value="Abfallkipper"/ >
<label-cell col="9" value="Aufkantung ….. mm"/ >
</row >
<row id="17" >
<label-cell col="1" value="Sonderhöhe Alu"/ >
<label-cell col="4" value="Maschinenfach"/ >
<label-cell col="5" value="Lüftungsgitter"/ >
<label-cell col="9" value="Kasten 200 mm"/ >
</row >
<row id="18" >
<label-cell col="1" value="Sonderhöhe CNS"/ >
<label-cell col="5" value="Maschine links"/ >
</row >
<row id="19" >
<label-cell col="8" value="Bohrung SS"/ >
<label-cell col="9" value="nein"/ >
<label-cell col="5" value="Maschine rechts"/ >
</row >
<row id="20" >
<label-cell col="0" value="Korpus"/ >
<label-cell col="1" value="Höhe 870 + 40 + 50 = 960 mm"/ >
<label-cell col="9" value="Durchmesser"/ >
</row >
<row id="21" >
<label-cell col="1" value="Höhe 810 mm"/ >
<label-cell col="5" value="Maschine auf Auszug"/ >
<label-cell col="9" value="hinter Verdampfer oder"/ >
</row >
<row id="22" >
<label-cell col="1" value="Sonderhöhe"/ >
<label-cell col="9" value="seitlich versetzt"/ >
</row >
<row id="23" >
<label-cell col="1" value="Tiefe 600mm"/ >
<label-cell col="4" value="Kälte"/ >
</row >
<row id="24" >
<label-cell col="8" value="Armatur"/ >
<label-cell col="1" value="Tiefe 700mm für 2x Bierkasten"/ >
<label-cell col="4" value="intern"/ >
<label-cell col="5" value="eingebaut"/ >
<label-cell col="9" value="MB-HD-1ltg."/ >
</row >
<row id="25" >
<label-cell col="1" value="Sondertiefe"/ >
<label-cell col="5" value="Kasten 200 mm"/ >
<label-cell col="9" value="MB-HD-2ltg."/ >
</row >
<row id="26" >
<label-cell col="9" value="MB-ND-1ltg."/ >
</row >
<row id="27" >
<label-cell col="0" value="Auszüge"/ >
<label-cell col="1" value="Einfachauszüge"/ >
<label-cell col="4" value="extern"/ >
<label-cell col="5" value="welche Leistung erforderlich ?"/ >
<label-cell col="9" value="MB-ND-2ltg."/ >
</row >
<row id="28" >
<label-cell col="1" value="Vollauszüge 60 kg"/ >
<label-cell col="5" value="Kältemittel ??"/ >
<label-cell col="9" value="als Drehknopf"/ >
</row >
<row id="29" >
<label-cell col="1" value="Vollauszüge 100 kg"/ >
<label-cell col="5" value="wohin Tauwasser"/ >
<label-cell col="9" value="als Einhebelmischbatterie"/ >
</row >
<row id="30" >
<label-cell col="1" value="Verkürzte Züge wegen Python"/ >
<label-cell col="5" value="ohne = wo Leitungen hinlegen ?"/ >
<label-cell col="9" value="Kaltwasserhahn"/ >
</row >
<row id="31" >
<label-cell col="9" value="niedriger Auslauf (Barbrett)"/ >
</row >
<row id="32" >
<label-cell col="0" value="Logo"/ >
<label-cell col="9" value="ohne Unterspülrohr"/ >
<label-cell col="4" value="Digitalthermostat"/ >
<label-cell col="5" value="in der Blende"/ >
</row >
<row id="33" >
<label-cell col="0" value="Logo"/ >
<label-cell col="1" value="links auf Abdeckung"/ >
<label-cell col="5" value="im Steg"/ >
</row >
<row id="34" >
<label-cell col="8" value="Tropfmulde"/ >
<label-cell col="1" value="rechts auf Abdeckung"/ >
<label-cell col="5" value="Kabellänge"/ >
<label-cell col="9" value="Gläserdusche links"/ >
</row >
<row id="35" >
<label-cell col="1" value="Tür bzw. Blende Installation"/ >
<label-cell col="9" value="Gläserdusche rechts"/ >
</row >
<row id="36" >
<label-cell col="9" value="Druckminderer immer"/ >
<label-cell col="4" value="Tauwasser"/ >
<label-cell col="5" value="1 Verdampfer auf Maschine"/ >
</row >
<row id="37" >
<label-cell col="5" value="1 Verdampfer in Bodenablauf"/ >
</row >
<row id="38" >
<label-cell col="8" value="Nähte"/ >
<label-cell col="1" value="bei Doppelbecken kein Steg mittig"/ >
<label-cell col="5" value="2 Verdampfer auf Verdunster"/ >
<label-cell col="9" value="stumpf gestossen"/ >
</row >
<row id="39" >
<label-cell col="1" value="Einhebelmischer nach Möglichkeit rechts"/ >
<label-cell col="5" value="2 Verdampfer in Bodenablauf"/ >
<label-cell col="9" value="Schweißnaht"/ >
</row >
<row id="40" >
<label-cell col="1" value="wenn nicht möglich, dann Drehknopf"/ >
<label-cell col="9" value="Stecknaht"/ >
</row >
<row id="41" >
<label-cell col="1" value="Bohrung nicht über Verdampfer"/ >
<label-cell col="4" value="MB-HD"/ >
<label-cell col="5" value="auch für Durchlauferhitzer"/ >
</row >
<row id="42" >
<label-cell col="8" value="Bleche"/ >
<label-cell col="1" value="Tropfschale mit Luft unter Verdampfer"/ >
<label-cell col="4" value="MB-ND"/ >
<label-cell col="5" value="3-leitig für Boiler"/ >
<label-cell col="9" value="bis 800 mm Länge 1 Stück"/ >
</row >
<row id="43" >
<label-cell col="8" value="gelb = Standard"/ >
<label-cell col="9" value="bis 800 mm Länge 2 Stück"/ >
</row >
<row id="44" >
<label-cell col="8" value="wegen GS "/ >
<label-cell col="1" value="Montageblech auf dem Sockel für Aufnahme der bauseitigen Verkleidung - Front bündig mit CNS"/ >
<label-cell col="9" value="ab 800 mm Länge 2 Stück"/ >
</row >
<row id="45" >
<label-cell col="8" value="wegen Optik"/ >
<label-cell col="9" value="ab 800 mm Länge 1 Stück"/ >
</row >
<row id="47" >
<label-cell col="9" value="vor dem Drucken Auswahl markieren"/ >
</row >
<row-heights >
<range height="499" span="2:2"/ >
<range height="315" span="4:4"/ >
<range height="255" span="5:46"/ >
<range height="255" span="48:48"/ >
</row-heights >
<shapes >
<shape offset-begin="(dx=759,dy=184)" offset-end="(dx=180,dy=221)" range="(col=10,row=3)-(col=13,row=5)"/ >
</shapes >
</worksheet >
<worksheet first-defined-cell="$A$1" first-free-cell="$A$1" name="Tabelle1" version="BIFF8" visible="true"/ >
<worksheet first-defined-cell="$A$1" first-free-cell="$A$1" name="Tabelle2" version="BIFF8" visible="true"/ >
</workbook >
</xls-dump >
Sure I can send you the file, but I don’t think its a problem with the file anymore, I think its a configuration issue on my side.
Thanks for your help
medoc writes
Yes, you are right, I should have thought that maybe you were running an older version (catdoc mostly did not work at all), this is not a format issue, but a possible configuration one.
xls2csv should not be used at all in recent versions, so it’s strange that linking it should change anything.
You can check the two following things:
-
Check that /usr/share/recoll/filters/rclxls is using xls-dump.py (look for a line with XLSDUMP in it)
-
Check ~/.recoll/mimeconf for a setting which would override the standard extractor set in /usr/share/recoll/examples/mimeconf
When this is done, you should reset the index to make sure that the xls files are processed by the new extractor.
ddio writes
Thanks, that was indeed the case ( ~/.recoll/mimeconf was ovverriding the handler for ms-excel ).
Kinda abusing the possibility to talk to you and the issue (which is resolved for me):
Is there any way right now to index xlsx? If not I would skip those files entirely.
medoc writes
xlsx should be indexed "out of the box" by recoll 1.21 rclopxml extractor.
This needs xsltproc as helper, and it does a direct extraction, no need for the complicated code to handle the old Microsoft format.
I am not 100% sure that it extracts all text though, but it should be easy to add more templates if it is missing something.
ddio writes
Wow indeed it does :) Great! Thanks!
ddio writes
resolved by configuration change