Variables set inside the Recoll configuration files control which areas of the file system are indexed, and how files are processed. These variables can be set either by editing the text files or by using the dialogs in the recoll GUI.
The first time you start recoll, you
will be asked whether or not you would like it to build the
index. If you want to adjust the configuration before
indexing, just click Cancel at this
point, which will get you into the configuration interface. If
you exit at this point, recoll
will have
created a ~/.recoll
directory containing
empty configuration files, which you can edit by hand.
The configuration is documented inside the
installation chapter
of this document, or in the
recoll.conf(5)
man page, but the most
current information will most likely be the comments inside the
sample file. The most immediately useful variable you may
interested in is probably
topdirs
,
which determines what subtrees get indexed.
The applications needed to index file types other than text, HTML or email (ie: pdf, postscript, ms-word...) are described in the external packages section.
As of Recoll 1.18 there are two incompatible types of Recoll indexes, depending on the treatment of character case and diacritics. The next section describes the two types in more detail.
Multiple Recoll indexes can be created by
using several configuration directories which are usually set to
index different areas of the file system. A specific index can
be selected for updating or searching, using the
RECOLL_CONFDIR
environment variable or the
-c
option to recoll and
recollindex.
A typical usage scenario for the multiple index feature would be for a system administrator to set up a central index for shared data, that you choose to search or not in addition to your personal data. Of course, there are other possibilities. There are many cases where you know the subset of files that should be searched, and where narrowing the search can improve the results. You can achieve approximately the same effect with the directory filter in advanced search, but multiple indexes will have much better performance and may be worth the trouble.
A recollindex program instance can only update one specific index.
The main index (defined by
RECOLL_CONFDIR
or -c
) is
always active. If this is undesirable, you can set up your
base configuration to index an empty directory.
The different search interfaces (GUI, command line, ...) have different methods to define the set of indexes to be used, see the appropriate section.
If a set of multiple indexes are to be used together for searches, some configuration parameters must be consistent among the set. These are parameters which need to be the same when indexing and searching. As the parameters come from the main configuration when searching, they need to be compatible with what was set when creating the other indexes (which came from their respective configuration directories).
Most importantly, all indexes to be queried concurrently must have the same option concerning character case and diacritics stripping, but there are other constraints. Most of the relevant parameters are described in the linked section.
As of Recoll version 1.18 you have a choice of building an
index with terms stripped of character case and diacritics, or
one with raw terms. For a source term of
Résumé
, the former will store
resume
, the latter
Résumé
.
Each type of index allows performing searches insensitive to case and diacritics: with a raw index, the user entry will be expanded to match all case and diacritics variations present in the index. With a stripped index, the search term will be stripped before searching.
A raw index allows for another possibility which a stripped
index cannot offer: using case and diacritics to discriminate
between terms, returning different results when searching for
US
and us
or
resume
and résumé
.
Read the section about search
case and diacritics sensitivity for more details.
The type of index to be created is controlled by the
indexStripChars
configuration
variable which can only be changed by editing the
configuration file. Any change implies an index reset (not
automated by Recoll), and all indexes in a search must be set
in the same way (again, not checked by Recoll).
If the indexStripChars
is not set, Recoll
1.18 creates a stripped index by default, for
compatibility with previous versions.
As a cost for added capability, a raw index will be slightly bigger than a stripped one (around 10%). Also, searches will be more complex, so probably slightly slower, and the feature is still young, so that a certain amount of weirdness cannot be excluded.
One of the most adverse consequence of using a raw index is that some phrase and proximity searches may become impossible: because each term needs to be expanded, and all combinations searched for, the multiplicative expansion may become unmanageable.
Most parameters for a given index configuration can
be set from a recoll GUI running on this
configuration (either as default, or by setting
RECOLL_CONFDIR
or the -c
option.)
The interface is started from the Global parameters, Local parameters, Web history (which is explained in the next section) and Search parameters.
→ menu entry. It is divided in four tabs,The Global parameters tab allows setting global variables, like the lists of top directories, skipped paths, or stemming languages.
The Local parameters tab allows setting variables that can be redefined for subdirectories. This second tab has an initially empty list of customisation directories, to which you can add. The variables are then set for the currently selected directory (or at the top level if the empty line is selected).
The Search parameters section defines parameters which are used at query time, but are global to an index and affect all search tools, not only the GUI.
The meaning for most entries in the interface is
self-evident and documented by a ToolTip
popup on the text label. For more detail, you will need to
refer to the configuration
section of this guide.
The configuration tool normally respects the comments and most of the formatting inside the configuration file, so that it is quite possible to use it on hand-edited files, which you might nevertheless want to backup first...