Installing and Configuring¶
PyTerrier is a declarative platform for information retrieval experiemnts in Python. It uses the Java-based Terrier information retrieval platform internally to support indexing and retrieval operations.
PyTerrier requires Python 3.6 or newer, and Java 11 or newer.
PyTerrier is natively supported on Linux and Mac OS X. PyTerrier uses Pytrec_eval for evaluation, and the latter does not install automatically on Windows.
Installing PyTerrier is easy - it can be installed from the command-line in the normal way using Pip:
pip install python-terrier
If you want the latest version of PyTerrier, you can install direct from the Github repo:
pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier
NB: There is no need to have a local installation of the Java component, Terrier. PyTerrier will download the latest release on startup.
You must always start by importing PyTerrier and running init():
import pyterrier as pt pt.init()
PyTerrier uses PyJnius as a “glue” layer in order to call Terrier’s Java classes. PyJnius will search the usual places on your machine for a Java installation. If you have problems, set the JAVA_HOME environment variable:
import os os.environ["JAVA_HOME"] = "/path/to/my/jdk" import pyterrier as pt pt.init()
pt.init() has a multitude of options, for instance that can make PyTerrier more notebook friendly, or to change the underlying version of Terrier, as described below.
All usages of PyTerrier start by importing PyTerrier and starting it using the init() method:
import pyterrier as pt pt.init()
PyTerrier uses some of the functionality of the Java-based Terrier IR platform for indexing and retrieval functionality. Calling pt.init() downloads, if necessary, the Terrier jar file, and starts the Java Virtual Machine (JVM). It also configures the Terrier so that it can be more easily used from Python, such as redirecting the stdout and stderr streams, logging level etc.
Below, there is more documentation about method related to starting Terrier using PyTerrier, and ways to change the configuration.
Methods to change PyTerrier configuration¶
Allows to add packages to Terrier’s classpath after the JVM has started.
Set the logging level. Equivalent to setting the logging= parameter to init(). The following string values are allowed, corresponding to Java logging levels:
‘ERROR’: only show error messages
‘WARN’: only show warnings and error messages (default)
‘INFO’: show information, warnings and error messages
‘DEBUG’: show debugging, information, warnings and error messages
Ensure that stdout and stderr have been redirected. Equivalent to setting the redirect_io parameter to init() as True.
Allows to set a property in Terrier’s global properties configuration. Example:
While Terrier has a variety of properties – as discussed in its indexing and retrieval configuration guides – in PyTerrier, we aim to expose Terrier configuration through appropriate methods or arguments. So this method should be seen as a safety-valve - a way to override the Terrier configuration not explicitly supported by PyTerrier.
Allows to set many properties in Terrier’s global properties configuration
Set the tqdm progress bar type that Pyterrier will use internally. Many PyTerrier transformations can be expensive to apply in some settings - users can view progress by using the verbose=True kwarg to many classes, such as BatchRetrieve.
The tqdm progress bar can be made prettier when using appropriately configured Jupyter notebook setups. We use this automatically when Google Colab is detected.
Allowable options for type are:
‘tqdm’: corresponds to the standard text progresss bar, ala from tqdm import tqdm.
‘notebook’: corresponds to a notebook progress bar, ala from tqdm.notebook import tqdm
‘auto’: allows tqdm to decide on the progress bar type, ala from tqdm.auto import tqdm. Note that this works fine on Google Colab, but not on Jupyter unless the ipywidgets have been installed.