en_nu writes

The kiwix offline wikipedias dictionaries use a xapian index. It’s easy to import the xapian index of a kiwix dictionary via the "recoll add external index dialog". But then you can’t open the article, without or with a kiwix-server. Probably there is an easy solution for this problem.

The site for the kiwix software: http://www.kiwix.org the specification for the zim file format: http://www.openzim.org

medoc writes

Nope, sorry, no easy solution for this. "Xapian Index" is a very vague definition, there is a lot of applicative structure inside, and, for example, Recoll indexes are incompatible with Xapian Omega ones, and I’m not surprised that the Kiwix ones are unusable. I can see no easy solution to this issue.

en_nu writes

Not sure if I am wrong, but the kiwix-indices (at least the entry keys) itself are readable (The unnamed entries in the screenshot only appear only with the added kiwix-xapian-index), only the ZIM-storage-format for the offline wikipedia articles are not accessible by recoll.

medoc writes

The problem is that the format of the data records is completly different. Here is how a Recoll doc data record looks:

Data for record #1:
url=file:///home/dockes/tmp
mtype=inode/directory
fmtime=01432013881
origcharset=
fbytes=4096
pcbytes=4096
dbytes=0
sig=40961432013881
filename=tmp

In the Kiwix index, things are simpler:

Data for record #24346:
A/Zygmunt_KrasiƄski.html

I’m quite certain that the term list prefixes etc. are quite different too.

I’m not saying that it would be impossible to modify recoll to do something with a kiwix index, but this would be very far from trivial. The first step, reasonably, would be to define an internal API and pluggable index modules, because the thing which queries the kiwix index will have very little in common with the one which queries the native Recoll one.

Then there is the question of actually accessing the data, which would be still more work.