Installing and Configuring

PyTerrier is a declarative platform for information retrieval experiemnts in Python. It uses the Java-based Terrier information retrieval platform internally to support indexing and retrieval operations.

Pre-requisites

PyTerrier requires Python 3.8 or newer, and Java 11 or newer. PyTerrier is natively supported on Linux, Mac OS X and Windows.

Installation

Installing PyTerrier is easy - it can be installed from the command-line in the normal way using Pip:

pip install python-terrier

If you want the latest version of PyTerrier, you can install direct from the Github repo:

pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier

NB: There is no need to have a local installation of the Java component, Terrier. PyTerrier will download the latest release on startup.

Installation Troubleshooting

We aim to ensure that there are pre-compiled binaries available for any dependencies with native components, for all supported Python versions and for all major platforms (Linux, macOS, Windows). One notable exception is Mac M1 etc., as there are no freely available GitHub Actions runners for M1. Mac M1 installs may require to compile some dependencies.

If the installation failed due to pyautocorpus did not run successfully, you may need to install pcre to your machine.

macOS:

brew install pcre

Linux:

apt-get update -y
apt-get install libpcre3-dev -y

Configuration

You must always start by importing PyTerrier and running init():

import pyterrier as pt
pt.init()

PyTerrier uses PyJnius as a “glue” layer in order to call Terrier’s Java classes. PyJnius will search the usual places on your machine for a Java installation. If you have problems, set the JAVA_HOME environment variable:

import os
os.environ["JAVA_HOME"] = "/path/to/my/jdk"
import pyterrier as pt
pt.init()

pt.init() has a multitude of options, for instance that can make PyTerrier more notebook friendly, or to change the underlying version of Terrier, as described below.

For users with an M1 Mac or later models, it is necessary to install the SSL certificates to avoid certificate errors. To do this, locate the Install Certificates.command file within the Application/Python[version] directory. Once found, double-click on it to run the installation process.

API Reference

All usages of PyTerrier start by importing PyTerrier and starting it using the init() method:

import pyterrier as pt
pt.init()

PyTerrier uses some of the functionality of the Java-based Terrier IR platform for indexing and retrieval functionality. Calling pt.init() downloads, if necessary, the Terrier jar file, and starts the Java Virtual Machine (JVM). It also configures the Terrier so that it can be more easily used from Python, such as redirecting the stdout and stderr streams, logging level etc.

Below, there is more documentation about method related to starting Terrier using PyTerrier, and ways to change the configuration.

Methods to change PyTerrier configuration

pyterrier.extend_classpath()[source]

Allows to add packages to Terrier’s classpath after the JVM has started.

pyterrier.logging()[source]

Set the logging level. Equivalent to setting the logging= parameter to init(). The following string values are allowed, corresponding to Java logging levels:

  • ‘ERROR’: only show error messages

  • ‘WARN’: only show warnings and error messages (default)

  • ‘INFO’: show information, warnings and error messages

  • ‘DEBUG’: show debugging, information, warnings and error messages

pyterrier.redirect_stdouterr()[source]

Ensure that stdout and stderr have been redirected. Equivalent to setting the redirect_io parameter to init() as True.

pyterrier.set_property()[source]

Allows to set a property in Terrier’s global properties configuration. Example:

pt.set_property("termpipelines", "")

While Terrier has a variety of properties – as discussed in its indexing and retrieval configuration guides – in PyTerrier, we aim to expose Terrier configuration through appropriate methods or arguments. So this method should be seen as a safety-valve - a way to override the Terrier configuration not explicitly supported by PyTerrier.

pyterrier.set_properties()[source]

Allows to set many properties in Terrier’s global properties configuration

pyterrier.set_tqdm()[source]

Set the tqdm progress bar type that Pyterrier will use internally. Many PyTerrier transformations can be expensive to apply in some settings - users can view progress by using the verbose=True kwarg to many classes, such as BatchRetrieve.

The tqdm progress bar can be made prettier when using appropriately configured Jupyter notebook setups. We use this automatically when Google Colab is detected.

Allowable options for type are:

  • ‘tqdm’: corresponds to the standard text progresss bar, ala from tqdm import tqdm.

  • ‘notebook’: corresponds to a notebook progress bar, ala from tqdm.notebook import tqdm

  • ‘auto’: allows tqdm to decide on the progress bar type, ala from tqdm.auto import tqdm. Note that this works fine on Google Colab, but not on Jupyter unless the ipywidgets have been installed.