Learning to Rank¶

Introduction¶

PyTerrier makes it easy to formulate learning to rank pipelines. Conceptually, learning to rank consists of three phases:

identifying a candidate set of documents for each query
computing extra features on these documents
using a learned model to re-rank the candidate documents to obtain a more effective ranking

PyTerrier allows each of these phases to be expressed as transformers, and for them to be composed into a full pipeline.

In particular, conventional retrieval transformers (such as pt.terrier.Retriever) can be used for the first phase. To permit the second phase, PyTerrier data model allows for a “features” column to be associated to each retrieved document. Such features can be generated using specialised transformers, or by combining other re-ranking transformers using the ** feature-union operator; Lastly, to facilitate the final phase, we provide easy ways to integrate PyTerrier pipelines with standard learning libraries such as sklearn, XGBoost and LightGBM.

In the following, we focus on the second and third phases, as well as describe ways to assist in conducting learning to rank experiments.

Calculating Features¶

Feature Union (**)¶

PyTerrier’s main way to faciliate calculating and intgrating extra features is through the ** operator. Consider an example where the candidate set should be identified using the BM25 weighting model, and then additional features computed using the Tf and PL2 models:

bm25 = index.retriever("BM25")
tf = index.retriever("Tf")
pl2 = index.retriever("PL2")
pipeline = bm25 >> (tf ** pl2)

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                        
                            pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x75befc423d80 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199ba at 0x75bef6d07e10>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                        
                            pt._ops.FeatureUnion

FeatureUnion **

                        
                            pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x75befc423dd0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199ca at 0x75bef6d07bd0>>
num_results	1000
metadata	['docno']
wmodel	Tf
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                        
                            pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x75befc423ec0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199ea at 0x75bef6d13f50>>
num_results	1000
metadata	['docno']
wmodel	PL2
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

PL2

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)
features	np.array	Feature array for learning-to-rank

Output

Click to explore!

Rendering issue. Try running the cell again.

The output of the bm25 ranker would look like:

	qid	docno	score
1	q1	d5	(bm25 score)

Application of the feature-union operator (**) ensures that tf and pl2 operate as re-rankers, i.e. they are applied only on the documents retrieved by bm25. For each document, the score calculate by tf and pl2 are combined into the “features” column, as follows:

	qid	docno	score	features
1	q1	d5	(bm25 score)	[tf score, pl2 score]

Including Features during Retrieval¶

When executing the pipeline above, the re-ranking of the documents again can be slow, as each separate Retriever object has to re-access the inverted index. For this reason, the Terrier engine provides a class called FeaturesRetriever, which allows multiple query dependent features to be calculated at once, by virtue of Terrier’s Fat framework.

Therefore, these two pipelines are equivalent:

Example of FeaturesRetriever¶

pipeline1 = bm25 >> (tf ** pl2) # [1]
pipeline2 = pt.terrier.FeaturesRetriever(index, wmodel="BM25", features=["WMODEL:Tf", "WMODEL:PL2"]) # [2]

pipeline1 uses separate retrievers to compute each feature.
pipeline2 uses a single retriever that computes all features at once.

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
score	float	Ranking score of document to query (higher=better)

                        
                            pt.terrier.retriever.FeaturesRetriever

index_location	<org.terrier.querying.IndexRef at 0x75bef90fa9d0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199a2 at 0x75bef6d07b10>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
matching	FatFeaturedScoringMatching,org.terrier.matching.daat.FatFull
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer
fat.featured.scoring.matching.features	WMODEL:Tf;WMODEL:PL2

FeaturesRetriever: BM25 + 2f

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
rank	int	Ranking order of document to query (lower=better)
features	np.array	Feature array for learning-to-rank
score	float	Ranking score of document to query (higher=better)
docno	str	(External Document ID) String ID of document in collection

Output

Click to explore!

Rendering issue. Try running the cell again.

Apply Functions for Custom Features¶

If you have a way to calculate one or multiple ranking features at once, you can use pt.apply functions to create your feature sets. See pt.apply - Custom Transformers for more examples. In particular, use pt.apply.doc_score() for calculating a single feature based on a function. Transformers created by pt.apply can be combined using the ** operator.

For instance, consider you have two functions that each return one score that are to be used as features. We can instantiate these functions as Transformers using pt.apply.doc_score(). Such custom features can both be combined into a LTR pipeline using the ** operator:

featureA = pt.apply.doc_score(lambda row: 5)
featureB = pt.apply.doc_score(lambda row: 2)
pipeline = bm25 >> (featureA ** featureB)

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                        
                            pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x75befc423470 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199ba at 0x75bef6d07bd0>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                        
                            pt._ops.FeatureUnion

FeatureUnion **

                        
                            pt.apply_base.ApplyDocumentScoringTransformer

fn	<function <lambda> at 0x75befd4e9120>
batch_size	None
required_columns	None
verbose	False
label	None

doc_score

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)

                        
                            pt.apply_base.ApplyDocumentScoringTransformer

fn	<function <lambda> at 0x75befd4ea440>
batch_size	None
required_columns	None
verbose	False
label	None

doc_score

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)
features	np.array	Feature array for learning-to-rank

Output

Click to explore!

Rendering issue. Try running the cell again.

The output of pipeline would be as follows:

	qid	docno	score	features
1	q1	d5	(bm25 score)	[5, 2]

Of course, our example lambda functions return static scores for each document rather than computing meaningful features, for instance making a lookup based on row["docid"] or other attributes of each row.

If we want to calculate more than one feature at once, then we can go faster by using pt.apply.doc_features():

two_features = pt.apply.doc_features(lambda row: np.array([0,1])) # use doc_features ONLY when calculating multiple features
one_feature = pt.apply.doc_score(lambda row: 5)                   # use doc_score when calculating a single feature
pipeline3f = bm25 >> (two_features ** one_feature)

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                        
                            pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x75bef90f95d0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199c2 at 0x75bef6d20150>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                        
                            pt._ops.FeatureUnion

FeatureUnion **

                        
                            pt.apply_base.ApplyDocFeatureTransformer

fn	<function <lambda> at 0x75befb2ee440>
required_columns	['qid', 'query', 'docno']
verbose	False
label	None

doc_features

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)
features	np.array	Feature array for learning-to-rank

                        
                            pt.apply_base.ApplyDocumentScoringTransformer

fn	<function <lambda> at 0x75befb3fb490>
batch_size	None
required_columns	None
verbose	False
label	None

doc_score

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)
features	np.array	Feature array for learning-to-rank

Output

Click to explore!

Rendering issue. Try running the cell again.

The output of pipeline3f would be as follows:

	qid	docno	score	features
1	q1	d5	(bm25 score)	[0, 1, 5]

Learning¶

pyterrier.ltr.apply_learned_model(learner, form='regression', **kwargs)[source]¶

Results in a transformer that can take in documents that have a “features” column, and pass that to the specified learner via its transform() function, to obtain the documents’ “score” column. Learners should follow the sklearn’s general pattern with a fit() method ( c.f. an sklearn Estimator) and a predict() method.

xgBoost and LightGBM are also supported through the use of type=’ltr’ kwarg.

Parameters:

learner – an sklearn-compatible estimator
form (str) – either ‘regression’, ‘ltr’ or ‘fastrank’

Return type:

Transformer

The resulting transformer implements Estimator, in other words it has a fit() method, that can be trained using training topics and qrels, as well as (optionally) validation topics and qrels. See also Estimator. At inference time, the Estimator can be applied to new topics, and it will use the learned model to re-rank the candidate documents based on the features calculated in the previous phase. The resulting pipeline is shown below:

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
score	float	Ranking score of document to query (higher=better)

                        
                            pt.terrier.retriever.FeaturesRetriever

index_location	<org.terrier.querying.IndexRef at 0x75bef90f8b80 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5d7f68a199ba at 0x75bef6d20d50>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
matching	FatFeaturedScoringMatching,org.terrier.matching.daat.FatFull
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer
fat.featured.scoring.matching.features	WMODEL:Tf;WMODEL:PL2

FeaturesRetriever: BM25 + 2f

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
rank	int	Ranking order of document to query (lower=better)
features	np.array	Feature array for learning-to-rank
score	float	Ranking score of document to query (higher=better)
docno	str	(External Document ID) String ID of document in collection

                        
                            pt.ltr.RegressionTransformer

Regression

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
rank	int	Ranking order of document to query (lower=better)
features	np.array	Feature array for learning-to-rank
score	float	Ranking score of document to query (higher=better)
docno	str	(External Document ID) String ID of document in collection

Output

Click to explore!

Rendering issue. Try running the cell again.

A number of learning algorithms are supported, namely from scikit-learn, XGBoost, LightGBM and FastRank - see below for details.

scikit-learn¶

A sklearn regressor can be passed directly to pt.ltr.apply_learned_model():

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=400)
rf_pipe = pipeline >> pt.ltr.apply_learned_model(rf)
rf_pipe.fit(train_topics, qrels)
pt.Experiment([bm25, rf_pipe], test_topics, qrels, ["map"], names=["BM25 Baseline", "LTR"])

Note that if the feature definitions in the pipeline change, you will need to create a new instance of rf.

For analysis purposes, the feature importances identified by RandomForestRegressor can be accessed through rf.feature_importances_ - see the relevant sklearn documentation for more information.

Gradient Boosted Trees & LambdaMART¶

Both XGBoost and LightGBM provide gradient boosted regression tree and LambdaMART implementations. These support a sklearn-like interface that is supported by PyTerrier by supplying form=”ltr” kwarg to pt.ltr.apply_learned_model():

import xgboost as xgb
# this configures XGBoost as LambdaMART
lmart_x = xgb.sklearn.XGBRanker(objective='rank:ndcg',
      learning_rate=0.1,
      gamma=1.0,
      min_child_weight=0.1,
      max_depth=6,
      verbose=2,
      random_state=42)

lmart_x_pipe = pipeline >> pt.ltr.apply_learned_model(lmart_x, form="ltr")
lmart_x_pipe.fit(train_topics, train_qrels, validation_topics, validation_qrels)

import lightgbm as lgb
# this configures LightGBM as LambdaMART
lmart_l = lgb.LGBMRanker(task="train",
    min_data_in_leaf=1,
    min_sum_hessian_in_leaf=100,
    max_bin=255,
    num_leaves=7,
    objective="lambdarank",
    metric="ndcg",
    ndcg_eval_at=[1, 3, 5, 10],
    learning_rate= .1,
    importance_type="gain",
    num_iterations=10)
lmart_l_pipe = pipeline >> pt.ltr.apply_learned_model(lmart_l, form="ltr")
lmart_l_pipe.fit(train_topics, train_qrels, validation_topics, validation_qrels)

pt.Experiment(
    [bm25, lmart_x_pipe, lmart_l_pipe],
    test_topics,
    test_qrels,
    ["map"],
    names=["BM25 Baseline", "LambdaMART (xgBoost)", "LambdaMART (LightGBM)" ]
)

Note that if the feature definitions in the pipeline change, you will need to create a new instance of XGBRanker (or LGBMRanker, as appropriate) and the pt.ltr.apply_learned_model() transformer. If you attempt to reuse XGBRanker/LGBMRanker within different pipelines, the pt.ltr.apply_learned_model() transformer will try to warn you about this by raising a ValueError with Expected X number of features, but found Y features.

In our experience, LightGBM tends to be more effective than xgBoost.

Similar to sklearn, both XGBoost and LightGBM provide feature importances via lmart_x.feature_importances_ and lmart_l.feature_importances_.

FastRank: Coordinate Ascent¶

We now support FastRank for learning models:

!pip install fastrank
import fastrank
train_request = fastrank.TrainRequest.coordinate_ascent()
params = train_request.params
params.init_random = True
params.normalize = True
params.seed = 1234567

ca_pipe = pipeline >> pt.ltr.apply_learned_model(train_request, form="fastrank")
ca_pipe.fit(train_topics, train_qrels)

FastRank provides two learners: a random forest implementation (fastrank.TrainRequest.random_forest()) and coordinate ascent (fastrank.TrainRequest.coordinate_ascent()), a linear model.

Working with Features¶

We provide additional transformations functions to aid the analysis of learned model, for instance, removing (ablating) features from a complex ranking pipeline.

pyterrier.ltr.ablate_features(fids)[source]¶

Ablates features (sets feature value to 0) from a pipeline. This is useful for performing feature ablation studies, whereby a feature is removed from the pipeline before learning.

Parameters:: fids (Sequence[int] | int) – one or a list of integers corresponding to features indices to be removed
Return type:: Transformer

Example:

# assume pipeline is a retrieval pipeline that produces four ranking features
numf=4
rankers = []
names = []
# learn a model for all four features
full = pipeline >> pt.ltr.apply_learned_model(RandomForestRegressor(n_estimators=400))
full.fit(trainTopics, trainQrels, validTopics, validQrels)
rankers.append(full)

# learn a model for 3 features, removing one each time
for fid in range(numf):
    ablated = pipeline >> pt.ltr.ablate_features(fid) >> pt.ltr.apply_learned_model(RandomForestRegressor(n_estimators=400))
    ablated.fit(trainTopics, trainQrels, validTopics, validQrels)
    rankers.append(ablated)

# evaluate the full (4 features) model, as well as the each model containing only 3 features)
pt.Experiment(
    rankers,
    test_topics,
    test_qrels,
    ["map"],
    names=["Full Model"]  + ["Full Minus %d" % fid for fid in range(numf)
)

pyterrier.ltr.keep_features(fids)[source]¶

Reduces the features in a pipeline to only those mentioned. This is useful for performing feature ablation studies, whereby only some features are kept (and other removed) from a pipeline before learning occurs.

Parameters:: fids (Sequence[int] | int) – one or a list of integers corresponding to the features indice to be kept
Return type:: Transformer

pyterrier.ltr.feature_to_score(fid)[source]¶

Applies a specified feature for ranking. Useful for evaluating which of a number of pre-computed features are useful for ranking.

Parameters:: fid (int) – a single feature id that should be kept
Return type:: Transformer

pyterrier.ltr.score_to_feature()[source]¶

Takes the document’s “score” from the score attribute, and uses it as a single feature. In particular, a feature union operator does not use any score of the documents in the candidate set as a ranking feaure. Using the resulting transformer within a feature-union means that an additional ranking feature is added to the “feature” column.

Example:

cands = pt.terrier.Retriever(index, wmodel="BM25")
bm25f = pt.terrier.Retriever(index, wmodel="BM25F")
pl2f = pt.terrier.Retriever(index, wmodel="PL2F")

two_features = cands >> (bm25f  **  pl2f)
three_features = cands >> (bm25f  **  pl2f ** pt.ltr.score_to_feature())

Return type:: Transformer