Learning to Rank


PyTerrier makes it easy to formulate learning to rank pipelines. Conceptually, learning to rank consists of three phases:

  1. identifying a candidate set of documents for each query

  2. computing extra features on these documents

  3. using a learned model to re-rank the candidate documents to obtain a more effective ranking

PyTerrier allows each of these phases to be expressed as transformers, and for them to be composed into a full pipeline.

In particular, conventional retrieval transformers (such as pt.BatchRetrieve) can be used for the first phase. To permit the second phase, PyTerrier data model allows for a “features” column to be associated to each retrieved document. Such features can be generated using specialised transformers, or by combining other re-ranking transformers using the ** feature-union operator; Lastly, to facilitate the final phase, we provide easy ways to integrate PyTerrier pipelines with standard learning libraries such as sklearn, XGBoost and LightGBM.

In the following, we focus on the second and third phases, as well as describe ways to assist in conducting learning to rank experiments.

Calculating Features

Feature Union (**)

PyTerrier’s main way to faciliate calculating extra features is through the ** operator. Consider an example where the candidate set should be identified using the BM25 weighting model, and then additional features computed using the Tf and PL2 models:

bm25 = pt.BatchRetrieve(index, wmodel="BM25")
tf = pt.BatchRetrieve(index, wmodel="Tf")
pl2 = pt.BatchRetrieve(index, wmodel="PL2")
pipeline = bm25 >> (tf ** pl2)

The output of the bm25 ranker would look like:







(bm25 score)

Application of the feature-union operator (**) ensures that tf and pl2 operate as re-rankers, i.e. they are applied only on the documents retrieved by bm25. For each document, the score calculate by tf and pl2 are combined into the “features” column, as follows:








(bm25 score)

[tf score, pl2 score]


When executing the pipeline above, the re-ranking of the documents again can be slow, as each separate BatchRetrieve object has to re-access the inverted index. For this reason, PyTerrier provides a class called FeaturesBatchRetrieve, which allows multiple query dependent features to be calculated at once, by virtue of Terrier’s Fat framework.

class pyterrier.FeaturesBatchRetrieve(index_location, features, controls=None, properties=None, threads=1, **kwargs)[source]

Use this class for retrieval with multiple features

Init method

  • index_location – An index-like object - An Index, an IndexRef, or a String that can be resolved to an IndexRef

  • features (list) – List of features to use

  • controls (dict) – A dictionary with the control names and values

  • properties (dict) – A dictionary with the property keys and values

  • verbose (bool) – If True transform method will display progress

  • num_results (int) – Number of results to retrieve.


Performs the retrieval with multiple features


queries – String for a single query, list of queries, or a pandas.Dataframe with columns=[‘qid’, ‘query’]. For re-ranking, the DataFrame may also have a ‘docid’ and or ‘docno’ column.


pandas.DataFrame with columns=[‘qid’, ‘docno’, ‘score’, ‘features’]

An equivalent pipeline to the example above would be:

#pipeline = bm25 >> (tf ** pl2)
pipeline = pt.FeaturesBatchRetrieve(index, wmodel="BM25", features=["WMODEL:Tf", "WMODEL:PL2"]

Apply Functions

If you have a way to calculate one or multiple ranking features at once, you can use pt.apply functions to create your feature sets. See the pyterrier.apply - Custom Transformers for examples. Functions created by pt.apply can be combined using the ** operator.



Results in a transformer that can take in documents that have a “features” column, and pass that to the specified learner via its transform() function, to obtain the documents’ “score” column. Learners should follow the sklearn’s general pattern with a fit() method ( c.f. an sklearn Estimator) and a predict() method.

xgBoost and LightGBM are also supported through the use of type=’ltr’ kwarg.

  • learner – an sklearn-compatible estimator

  • form (str) – either ‘regression’ or ‘ltr’

The resulting transformer implements EstimatorBase, in other words it has a fit() method, that can be trained using training topics and qrels, as well as (optionally) validation topics and qrels. See also EstimatorBase.


A sklearn regressor can be passed directly to pt.ltr.apply_learned_model():

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=400)
rf_pipe = pipeline >> pt.ltr.apply_learned_model(rf)
rf_pipe.fit(train_topics, qrels)
pt.Experiment([bm25, rf_pipe], test_topics, qrels, ["map"], names=["BM25 Baseline", "LTR"])

Note that if the feature definitions in the pipeline change, you will need to create a new instance of rf.

For analysis purposes, the feature importances identified by RandomForestRegressor can be accessed through rf.features_importances_ - see the relevant sklearn documentation for more information.

Gradient Boosted Trees & LambdaMART

Both XGBoost and LightGBM provide gradient boosted regression tree and LambdaMART implementations. These support a sklearn-like interface that is supported by PyTerrier by supplying form=”ltr” kwarg to pt.ltr.apply_learned_model():

import xgboost as xgb
# this configures XGBoost as LambdaMART
lmart_x = xgb.sklearn.XGBRanker(objective='rank:ndcg',

lmart_x_pipe = pipeline >> pt.ltr.apply_learned_model(lmart_x, form="ltr")
lmart_x_pipe.fit(train_topics, train_qrels, validation_topics, validation_qrels)

import lightgbm as lgb
# this configures LightGBM as LambdaMART
lmart_l = lgb.LGBMRanker(task="train",
    ndcg_eval_at=[1, 3, 5, 10],
    learning_rate= .1,
lmart_l_pipe = pipeline >> pt.ltr.apply_learned_model(lmart_l, form="ltr")
lmart_l_pipe.fit(train_topics, train_qrels, validation_topics, validation_qrels)

    [bm25, lmart_x_pipe, lmart_l_pipe],
    names=["BM25 Baseline", "LambdaMART (xgBoost)", "LambdaMART (LightGBM)" ]

Note that if the feature definitions in the pipeline change, you will need to create a new instance of XGBRanker (or LGBMRanker, as appropriate).

In our experience, LightGBM tends to be more effective than xgBoost.

Similar to sklearn, both XGBoost and LightGBM provide feature importances via lmart_x.features_importances_ and lmart_l.features_importances_.

FastRank: Coordinate Ascent

We now support FastRank for learning models:

!pip install fastrank
import fastrank
train_request = fastrank.TrainRequest.coordinate_ascent()
params = train_request.params
params.init_random = True
params.normalize = True
params.seed = 1234567

ca_pipe = pipeline >> pt.ltr.apply_learned_model(train_request, form="fastrank")
ca_pipe.fit(train_topics, train_qrels)

FastRank provides two learners: a random forest implementation (fastrank.TrainRequest.random_forest()) and coordinate ascent (fastrank.TrainRequest.coordinate_ascent()), a linear model.

Working with Features

We provide additional transformations functions to aid the analysis of learned model, for instance, removing (ablating) features from a complex ranking pipeline.


Ablates features (sets feature value to 0) from a pipeline. This is useful for performing feature ablation studies, whereby a feature is removed from the pipeline before learning.


fids – one or a list of integers corresponding to features indices to be removed


# assume pipeline is a retrieval pipeline that produces four ranking features
rankers = []
names = []
# learn a model for all four features
full = pipeline >> pt.ltr.apply_learned_model(RandomForestRegressor(n_estimators=400))
full.fit(trainTopics, trainQrels, validTopics, validQrels)

# learn a model for 3 features, removing one each time
for fid in range(numf):
    ablated = pipeline >> pt.ltr.ablate_features(fid) >> pt.ltr.apply_learned_model(RandomForestRegressor(n_estimators=400))
    ablated.fit(trainTopics, trainQrels, validTopics, validQrels)

# evaluate the full (4 features) model, as well as the each model containing only 3 features)
    names=["Full Model"]  + ["Full Minus %d" % fid for fid in range(numf)

Reduces the features in a pipeline to only those mentioned. This is useful for performing feature ablation studies, whereby only some features are kept (and other removed) from a pipeline before learning occurs.


fids – one or a list of integers corresponding to the features indice to be kept


Applies a specified feature for ranking. Useful for evaluating which of a number of pre-computed features are useful for ranking.


fid – a single feature id that should be kept


Takes the document’s “score” from the score attribute, and uses it as a single feature. In particular, a feature union operator does not use any score of the documents in the candidate set as a ranking feaure. Using the resulting transformer within a feature-union means that an additional ranking feature is added to the “feature” column.


cands = pt.BatchRetrieve(index, wmodel="BM25")
bm25f = pt.BatchRetrieve(index, wmodel="BM25F")
pl2f = pt.BatchRetrieve(index, wmodel="PL2F")

two_features = cands >> (bm25f  **  pl2f)
three_features = cands >> (bm25f  **  pl2f ** pt.ltr.score_to_feature())