Operators on Transformers¶

Part of the power of PyTerrier comes from the ease in which researchers can formulate complex retrieval pipelines. This is made possible by the operators available on Pyterrier’s transformer objects. The following table summarises the available operators:

Operator	Meaning
>>	Then - chaining pipes
+	Linear combination of scores
*	Scalar factoring of scores
&	Document Set Intersection
\|	Document Set Union
%	Apply rank cutoff
^	Concatenate the output of one transformer with another
**	Feature Union

NB: These operators retain their default Python operator precedence - that may not be aligned with your expectations in a PyTerrier context (e.g. & is higher than >>).

Then (>>)¶

Apply one transformation followed by another:

#rewrites topics to include #1 etc
sdm = pt.rewrite.SDM()
dph = pt.terrier.Retriever(index, "DPH")

res = dph.transform( sdm.transform(topics))

We use >> as a shorthand for then (also called compose):

res = (sdm >> dph).transform(topics)

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.rewrite.SDM

SDM

qid	str	(Query ID) ID of query in frame
query	str	Query text
query_0	str	Stashed query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9ba01688b0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc26505762 at 0x7d9a6a7cf510>>
num_results	1000
metadata	['docno']
wmodel	DPH
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

DPH

qid	str	(Query ID) ID of query in frame
query	str	Query text
query_0	str	Stashed query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

Output

Rendering issue. Try running the cell again.

Example:

Consider a topics dataframe as follows:

qid	query
q1	test query

Then the application of SDM() would produce:

qid	query
q1	test query #1(test query) #uw8(test query)

NB: In practice the query reformulation generated by SDM() is more complex, due to the presence of weights etc in the resulting query.

Then the final res dataframe would contain the results of applying a Retriever on the rewritten queries, as follows:

qid	query	docno	score	rank
q1	test query #1(test query) #uw8(test query)	d10	4	0
q1	test query #1(test query) #uw8(test query)	d04	3.8	1

NB: Then can also be used for retrieval and re-ranking pipelines, such as:

pipeline = pt.terrier.Retriever(index, "DPH") >> pt.terrier.Retriever(index, "BM25")

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695fef20 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650575a at 0x7d9a6a7fbab0>>
num_results	1000
metadata	['docno']
wmodel	DPH
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

DPH

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695ff240 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650578a at 0x7d9a6ba70970>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

Output

Rendering issue. Try running the cell again.

Linear Combine and Scalar Factor (+, *)¶

The linear combine (+) and scalar factor (*) operators allow the scores of different retrieval systems to be linearly combined (with weights).

Instead of the following Python:

DPH = pt.terrier.Retriever(index, "DPH")
BM25 = pt.terrier.Retriever(index, "BM25")

res1 = DPH.transform(topics)
res2 = BM25.transform(topics)
res = res1.merge(res2, on=["qid", "docno"])
res["score"] = 2 * res["score_x"] + res["score_y"]

We use binary + and * operators. This is natural, as it is intuitive to combine weighted retrieval functions using + and *

DPH = pt.terrier.Retriever(index, "DPH")
BM25 = pt.terrier.Retriever(index, "BM25")
res = (2* DPH + BM25).transform(topics)

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt._ops.Sum

Sum +

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9ba016b6f0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650577a at 0x7d9a6ba709b0>>
num_results	1000
metadata	['docno']
wmodel	DPH
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

DPH

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt._ops.ScalarProduct

scalar	2

* 2

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9ba01694e0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650578a at 0x7d9a6ba724f0>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)

Output

Rendering issue. Try running the cell again.

If the DPH and BM25 transformers respectively returned:

qid	docno	score	rank
q1	d10	2	0
q1	d12	1	1

qid	docno	score	rank
q1	d10	4	0
q1	d01	3	1

then the application of the transformer represented by the expression (2* DPH + BM25) would be:

qid	docno	score	rank
q1	d10	8	0
q1	d01	3	1
q1	d12	2	2

NB: Any documents not present in one of the constituent rankings will contribute a score of 0 to the final score of that document.

Precedence and Associativity

The + and * operators retain their classical precendence among Pythons operators. This means that the intended semantics of an expression of linear combinations and scalar factors are clear - indeed, * binds higher than +, so 2* DPH + BM25 is interpreted as (2* DPH) + BM25.

Set Intersection and Union (&, |)¶

The set that only includes documents that occur in the intersection (&) and union (|) of both retrieval sets. Scores and ranks are not returned - hence, the rankings documents would normally be re-scored:

BM25 = pt.terrier.Retriever(index, "BM25")
PL2 = pt.terrier.Retriever(index, "PL2")

res_intersection = (BM25 & PL2).transform(topics)
res_union = (BM25 | PL2).transform(topics)

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt._ops.SetIntersection

SetIntersection &

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695ffbf0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc26505772 at 0x7d9a6ba72bb0>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695ff6a0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650578a at 0x7d9a6ba72a30>>
num_results	1000
metadata	['docno']
wmodel	PL2
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

PL2

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection

Output

Rendering issue. Try running the cell again.

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt._ops.SetUnion

SetUnion |

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9ba01698a0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650577a at 0x7d9a6ba71d30>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9ba016a430 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650578a at 0x7d9a6ba72f70>>
num_results	1000
metadata	['docno']
wmodel	PL2
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

PL2

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection

Output

Rendering issue. Try running the cell again.

Examples:

If the BM25 and PL2 pipelines would respectively return:

qid	docno	score	rank
q1	d10	4.3	0
q1	d12	4.1	1

qid	docno	score	rank
q1	d10	4.3	0
q1	d01	3.9	1

then the application of the set intersection operator (&) would result in a ranking only containing documents appear in both transformers:

qid	docno
q1	d10

and the application of the set union operator (|) would return documents retrieved by either transformer:

qid	docno
q1	d10
q1	d12
q1	d01

Note that, as these are set operators, there are no ranks and scores returned in the output.

Rank Cutoff (%)¶

The % operator is called rank cutoff, and limits the number of results for each query:

pipe1 = pt.terrier.Retriever(index, "BM25") % 2

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9ba015cdb0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc26505772 at 0x7d9a6a800b30>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt._ops.RankCutoff

k	2

% 2

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

Output

Rendering issue. Try running the cell again.

Example:

If a retrieval pipeline returns:

qid	docno	score	rank
q1	d10	4.3	0
q1	d12	4.1	1
q1	d05	3.9	2
q1	d03	3.5	3
q1	d01	2.5	4

then the application of the rank cutoff operator would be:

qid	docno	score	rank
q1	d10	4.3	0
q1	d12	4.1	1

Concatenate (^)¶

Sometimes, we may only want to apply an expensive retrieval process on a few top-ranked documents, and fill up the rest of the ranking with the rest of the documents (removing duplicates). We can do that using the concatenate operator. Concretely, in the example below, alldocs is our candidate set, of say 1000 documents per query. We re-rank the top 3 documents for each query using ExpensiveReranker(), in a pipeline called topdocs. We then use the concatenate operator (^) to append the remaining documents from alldocs, such that they have scores and ranks adjusted to appear just after the documents obtained from the topdocs pipeline:

alldocs = BatchRetrieve(index, "BM25")
topdocs = alldocs % 3 >> ExpensiveReranker()
finaldocs = topdocs ^ alldocs

Example:

If alldocs returns:

qid	docno	score	rank
q1	d10	4.3	0
q1	d12	4.1	1
q1	d05	3.9	2
q1	d03	3.5	3
q1	d01	2.5	4

Then alldocs would compute scores on the top 3 ranked documents (d10, d12, d05). After applying ExpensiveReranker() to score and re-ranked these 3 documents, topdocs could be as follows:

qid	docno	score	rank
q1	d05	1.0	0
q1	d10	0.9	1
q1	d12	0.8	2

Then finaldocs would be:

qid	docno	score	rank
q1	d05	1.0	0
q1	d10	0.9	1
q1	d12	0.8	2
q1	d03	0.7999	3
q1	d01	-0.2001	4

Note that score of d03 is adjusted to appear just under the last ranked document from topdocs (we use a small value of epsilon=0.0001) as the minimum difference between the least ranked document from topdocs and the highest remaining document from alldocs. The relative ordering of documents from alldocs is unchanged, but the gaps between their scores are maintained, so the difference between d03 and d01 is a score delta of -1 in both alldocs and finaldocs.

Feature Union (**)¶

Here we take one system, e.g. DPH, to get an initial candidate set, then add more systems as features.

The Python would have looked like:

DPH_candidates = pt.terrier.Retriever(index, "DPH")
BM25F = pt.terrier.Retriever(index, "BM25F")
PL2F = pt.terrier.Retriever(index, "PL2F")

candRes = DPH_candidates.transform(topics)
# assumes candRes contains the query columns
BM25F_res = BM25F.transform(candRes) # rerankers adds BM25F score
PL2F_res = PL2F.transform(candRes)  # rerankers adds PL2F score

final_res = BM25F_res.join(PL2F_res, on=["qid", "docno"])
final_res["features"] = np.stack(final_res["features_x"], final_res["features_y"])

Instead, we use ** to denote feature union:

DPH_candidates = pt.terrier.Retriever(index, "DPH")
BM25F = pt.terrier.Retriever(index, "BM25F")
PL2F = pt.terrier.Retriever(index, "PL2F")

# ** is the feature union operator. It requires a candidate document set as input
(BM25F ** PL2F)).transform(DPH_candidates.transform(topics))
# or combined with the then operator, >>
(DPH_candidates >> (BM25F ** PL2F)).transform(topics)

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695fede0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650577a at 0x7d9a6a801a30>>
num_results	1000
metadata	['docno']
wmodel	DPH
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

DPH

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt._ops.FeatureUnion

FeatureUnion **

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695fde40 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc2650578a at 0x7d9a6a80a8b0>>
num_results	1000
metadata	['docno']
wmodel	BM25F
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25F

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7d9a695fef70 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ddc265057aa at 0x7d9a6a80aa30>>
num_results	1000
metadata	['docno']
wmodel	PL2F
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

PL2F

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

R_f

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)
features	np.array	Feature array for learning-to-rank

Output

Rendering issue. Try running the cell again.

NB: Feature union expects the documents being returned by each side of the union to be identical. It will produce a warning if they are not identical. Documents not returned will obtain a score of 0 for that feature.

Example:

For example, consider that DPH_candidates returns a ranking as follows:

qid	docno	score	rank
q1	d10	4.3	0

Further, for document d10, BM25F and PL2F return scores respectively of 4.9 and 13.0. The application of the feature union operator above would be a ranking with features as follows:

qid	docno	score	rank	features
q1	d10	4.3	0	[4.9, 13.0]

More examples of feature union can be found in the learning-to-rank documentation (Learning to Rank).

Precedence and Associativity

Feature union is associative, so in the following examples, x1, x2 and x3 have identical semantics:

x1 = DPH_candidates >> ( BM25F ** PL2F ** urllen)
x2 =  DPH_candidates >> ( (BM25F ** PL2F) ** urllen)
x3 =  DPH_candidates >> ( BM25F ** (PL2F ** urllen))

Pipelines x1, x2 and x3 are all pipelines that create identical document rankings with three features, in the precise order BM25F, PL2F and urllength.

Note that >> has higher operator precendence in Python than **. For this reason, feature unions usually need to be expressed in parentheses. In this way the semantics of pipelines a, b and c in the example below are not identical, and indeed, a is parsed like b, while c is almost always the desired outcome:

# a is parsed in the same way as b, when the likely desired parse was c
a = DPH_candidates >> BM25F ** PL2F
b = (DPH_candidates >> BM25F) ** PL2F)
c = DPH_candidates >> ( BM25F ** PL2F)

Caching Transformers¶

Some transformers are expensive to apply. For performing experiments, you may value using pyterrier-caching to allow the results of a transformer to be cached.