The connect()
function connects to one or several Recoll index(es)
and returns a Db
object.
This call initializes the recoll module, and it should always be performed before any other call or object creation.
confdir
may specify a configuration directory. The usual defaults apply.extra_dbs
is a list of additional indexes (Xapian directories).writable
decides if we can index new data through this connection.
Examples:
from recoll import recoll # Opening the default db db = recoll.connect() # Opening the default db and a pair of additional indexes db = recoll.connect(extra_dbs=["/home/me/.someconfdir/xapiandb", "/data/otherconf/xapiandb"])
A Db object is created by a connect()
call and holds a
connection to a Recoll index.
- Db.query(), Db.cursor()
These (synonym) methods return a blank
Query
object for this index.- Db.termMatch(match_type, expr, field='', maxlen=-1, casesens=False, diacsens=False, lang='english')
Expand an expression against the index term list. Performs the basic function from the GUI term explorer tool.
match_type
can be one ofwildcard
,regexp
orstem
. Returns a list of terms expanded from the input expression.- Db.setAbstractParams(maxchars, contextwords)
Set the parameters used to build snippets (sets of keywords in context text fragments).
maxchars
defines the maximum total size of the abstract.contextwords
defines how many terms are shown around the keyword.- Db.close()
Closes the connection. You can't do anything with the
Db
object after this.
A Query
object (equivalent to a cursor in the Python DB API) is
created by a Db.query()
call. It is used to execute index
searches.
- Query.sortby(fieldname, ascending=True)
Sets the sorting order for future searches for using
fieldname
, in ascending or descending order. Must be called before executing the search.- Query.execute(query_string, stemming=1, stemlang="english", fetchtext=False, collapseduplicates=False)
Starts a search for
query_string
, a Recoll search language string. If the index stores the document texts andfetchtext
is True, store the document extracted text indoc.text
.- Query.executesd(SearchData, fetchtext=False, collapseduplicates=False)
Starts a search for the query defined by the SearchData object. If the index stores the document texts and
fetchtext
is True, store the document extracted text indoc.text
.- Query.fetchmany(size=query.arraysize)
Fetches the next
Doc
objects in the current search results, and returns them as an array of the required size, which is by default the value of thearraysize
data member.- Query.fetchone()
Fetches the next
Doc
object from the current search results. Generates a StopIteration exception if there are no results left.- Query.__iter__() and Query.next()
So that things like
Example:for doc in query:
will work.from recoll import recoll db = recoll.connect() q = db.query() nres = q.execute("some query") for doc in q: print("%s" % doc.title)
- Query.close()
Closes the query. The object is unusable after the call.
- Query.scroll(value, mode='relative')
Adjusts the position in the current result set.
mode
can berelative
orabsolute
.- Query.getgroups()
Retrieves the expanded query terms as a list of pairs. Meaningful only after executexx In each pair, the first entry is a list of user terms (of size one for simple terms, or more for group and phrase clauses), the second a list of query terms as derived from the user terms and used in the Xapian Query.
- Query.getxquery()
Return the Xapian query description as a Unicode string. Meaningful only after executexx.
- Query.highlight(text, ishtml = 0, methods = object)
Will insert <span "class=rclmatch">, </span> tags around the match areas in the input text and return the modified text.
ishtml
can be set to indicate that the input text is HTML and that HTML special characters should not be escaped.methods
if set should be an object with methods startMatch(i) and endMatch() which will be called for each match and should return a begin and end tag- Query.makedocabstract(doc, methods = object))
Create a snippets abstract for
doc
(aDoc
object) by selecting text around the match terms. If methods is set, will also perform highlighting. See the highlight method.- Query.getsnippets(doc, maxoccs = -1, ctxwords = -1, sortbypage=False, methods = object)
Will return a list of extracts from the result document by selecting text around the match terms. Each entry in the result list is a triple: page number, term, text. By default, the most relevants snippets appear first in the list. Set
sortbypage
to sort by page number instead. Ifmethods
is set, the fragments will be highlighted (see the highlight method). Ifmaxoccs
is set, it defines the maximum result list length.ctxwords
allows adjusting the individual snippet context size.
- Query.arraysize
Default number of records processed by fetchmany (r/w).
- Query.rowcount
Number of records returned by the last execute.
- Query.rownumber
Next index to be fetched from results. Normally increments after each fetchone() call, but can be set/reset before the call to effect seeking (equivalent to using
scroll()
). Starts at 0.
A Doc
object contains index data for a given document. The data
is extracted from the index when searching, or set by the indexer program when
updating. The Doc object has many attributes to be read or set by its user. It mostly
matches the Rcl::Doc C++ object. Some of the attributes are predefined, but, especially
when indexing, others can be set, the name of which will be processed as field names by
the indexing configuration. Inputs can be specified as Unicode or strings. Outputs are
Unicode objects. All dates are specified as Unix timestamps, printed as strings. Please
refer to the rcldb/rcldoc.cpp
C++ file for a full description of the
predefined attributes. Here follows a short list.
url
the document URL but see alsogetbinurl()
ipath
the documentipath
for embedded documents.fbytes, dbytes
the document file and text sizes.fmtime, dmtime
the document file and document times.xdocid
the document Xapian document ID. This is useful if you want to access the document through a direct Xapian operation.mtype
the document MIME type.Fields stored by default:
author
,filename
,keywords
,recipient
At query time, only the fields that are defined as stored
either
by default or in the fields
configuration file will be meaningful in
the Doc
object. The document processed text may be present or not,
depending if the index stores the text at all, and if it does, on
the fetchtext
query execute option. See also
the rclextract
module for accessing document contents.
- get(key), [] operator
Retrieve the named document attribute. You can also use
getattr(doc, key)
ordoc.key
.- doc.key = value
Set the the named document attribute. You can also use
setattr(doc, key, value)
.- getbinurl()
Retrieve the URL in byte array format (no transcoding), for use as parameter to a system call.
- setbinurl(url)
Set the URL in byte array format (no transcoding).
- items()
Return a dictionary of doc object keys/values
- keys()
list of doc object keys (attribute names).
A SearchData
object allows building
a query by combining clauses, for execution
by Query.executesd()
. It can be used
in replacement of the query language approach. The
interface is going to change a little, so no detailed doc
for now...
- addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', qstring=string, slack=0, field='', stemming=1, subSearch=SearchData)