Variables set inside the Recoll configuration files control which areas of the file system are indexed, and how files are processed. These variables can be set either by editing the text files or by using the dialogs in the recoll GUI.
The first time you start recoll, you will be asked whether or not you would like it to build the index. If you want to adjust the configuration before indexing, just click Cancel at this point, which will get you into the configuration interface. If you exit at this point, recoll will have created a ~/.recoll directory containing empty configuration files, which you can edit by hand.
The configuration is documented inside the installation chapter of this document, or in the
recoll.conf(5) man
page, but the most current information will most likely be the comments inside the sample
file. The most immediately useful variable you may interested in is probably topdirs
, which determines what subtrees get indexed.
The applications needed to index file types other than text, HTML or email (ie: pdf, postscript, ms-word...) are described in the external packages section.
As of Recoll 1.18 there are two incompatible types of Recoll indexes, depending on the treatment of character case and diacritics. The next section describes the two types in more detail.
Multiple Recoll indexes can be created by using several
configuration directories which are usually set to index different areas of the file
system. A specific index can be selected for updating or searching, using the RECOLL_CONFDIR environment variable or the -c
option to recoll and recollindex.
A typical usage scenario for the multiple index feature would be for a system administrator to set up a central index for shared data, that you choose to search or not in addition to your personal data. Of course, there are other possibilities. There are many cases where you know the subset of files that should be searched, and where narrowing the search can improve the results. You can achieve approximately the same effect with the directory filter in advanced search, but multiple indexes will have much better performance and may be worth the trouble.
A recollindex program instance can only update one specific index.
The main index (defined by RECOLL_CONFDIR or -c
) is always active. If this is undesirable, you can set up your base
configuration to index an empty directory.
The different search interfaces (GUI, command line, ...) have different methods to define the set of indexes to be used, see the appropriate section.
If a set of multiple indexes are to be used together for searches, some configuration parameters must be consistent among the set. These are parameters which need to be the same when indexing and searching. As the parameters come from the main configuration when searching, they need to be compatible with what was set when creating the other indexes (which came from their respective configuration directories).
Most importantly, all indexes to be queried concurrently must have the same option concerning character case and diacritics stripping, but there are other constraints. Most of the relevant parameters are described in the linked section.
As of Recoll version 1.18 you have a choice of building an index with terms stripped of character case and diacritics, or one with raw terms. For a source term of Résumé, the former will store resume, the latter Résumé.
Each type of index allows performing searches insensitive to case and diacritics: with a raw index, the user entry will be expanded to match all case and diacritics variations present in the index. With a stripped index, the search term will be stripped before searching.
A raw index allows for another possibility which a stripped index cannot offer: using case and diacritics to discriminate between terms, returning different results when searching for US and us or resume and résumé. Read the section about search case and diacritics sensitivity for more details.
The type of index to be created is controlled by the indexStripChars configuration variable which can only be changed by editing the configuration file. Any change implies an index reset (not automated by Recoll), and all indexes in a search must be set in the same way (again, not checked by Recoll).
If the indexStripChars is not set, Recoll 1.18 creates a stripped index by default, for compatibility with previous versions.
As a cost for added capability, a raw index will be slightly bigger than a stripped one (around 10%). Also, searches will be more complex, so probably slightly slower, and the feature is still young, so that a certain amount of weirdness cannot be excluded.
Most parameters for a given index configuration can be set from a recoll GUI running on this configuration (either as default, or by setting
RECOLL_CONFDIR or the -c
option.)
The interface is started from the Preferences->Index Configuration menu entry. It is divided in four tabs, Global parameters, Local parameters, Web history (which is explained in the next section) and Search parameters.
The Global parameters tab allows setting global variables, like the lists of top directories, skipped paths, or stemming languages.
The Local parameters tab allows setting variables that can be redefined for subdirectories. This second tab has an initially empty list of customisation directories, to which you can add. The variables are then set for the currently selected directory (or at the top level if the empty line is selected).
The Search parameters section defines parameters which are used at query time, but are global to an index and affect all search tools, not only the GUI.
The meaning for most entries in the interface is self-evident and documented by a ToolTip popup on the text label. For more detail, you will need to refer to the configuration section of this guide.
The configuration tool normally respects the comments and most of the formatting inside the configuration file, so that it is quite possible to use it on hand-edited files, which you might nevertheless want to backup first...