Adaptive Retrieval¶
pyterrier-adaptive provides PyTerrier functionality to support Adaptive Retrieval.
Adaptive Retrieval is a family of techniques that help overcome the recall limitation of re-ranking approaches by identifying relevant documents that were missed by earlier stages.
API Documentation¶
- class pyterrier_adaptive.GAR(scorer, corpus_graph, num_results=1000, batch_size=None, backfill=True, enabled=True, verbose=False)[source]¶
A
Transformer
that implements Graph-based Adaptive Re-ranking (GAR).Required input columns:
['qid', 'query', 'docno', 'score', 'rank']
Output columns:
['qid', 'query', 'docno', 'score', 'rank', 'iteration']
Note
The iteration column defines the batch number that first identified the document in the results. Due to the alternating nature of the algorithm,
even=initial retrieval
,odd=corpus graph
, and-1=backfilled
.Citation
MacAvaney et al. Adaptive Re-Ranking with a Corpus Graph. CIKM 2022. [link]
@inproceedings{DBLP:conf/cikm/MacAvaneyTM22, author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald}, editor = {Mohammad Al Hasan and Li Xiong}, title = {Adaptive Re-Ranking with a Corpus Graph}, booktitle = {Proceedings of the 31st {ACM} International Conference on Information {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022}, pages = {1491--1500}, publisher = {{ACM}}, year = {2022}, url = {https://doi.org/10.1145/3511808.3557231}, doi = {10.1145/3511808.3557231}, timestamp = {Wed, 19 Oct 2022 17:09:02 +0200}, biburl = {https://dblp.org/rec/conf/cikm/MacAvaneyTM22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
- Parameters:
scorer (
Transformer
) – A transformer that scores query-document pairs. It will only be provided with [‘qid, ‘query’, ‘docno’, ‘score’].corpus_graph (
CorpusGraph
) – A graph of the corpus, enabling quick lookups of nearest neighboursnum_results (int) – The maximum number of documents to score (called “budget” and $c$ in the paper)
batch_size (int) – The number of documents to score at once (called $b$ in the paper). If not provided, will attempt to use the batch size from the scorer
backfill (bool) – If True, always include all documents from the initial stage, even if they were not re-scored
enabled (bool) – If False, perform re-ranking without using the corpus graph
verbose (bool) – If True, print progress information
- class pyterrier_adaptive.NpTopKCorpusGraph(path, k=None)[source]¶
- class pyterrier_adaptive.Laff(model='macavaney/laff', *, device=None, batch_size=128, max_length=512, verbose=False)[source]¶
A transformer that computes a learned affinity score between two document texts using a transformer model.
Citation
Rathee et al. Quam: Adaptive Retrieval through Query Affinity Modelling. arXiv 2024. [link]
@article{DBLP:journals/corr/abs-2410-20286, author = {Mandeep Rathee and Sean MacAvaney and Avishek Anand}, title = {Quam: Adaptive Retrieval through Query Affinity Modelling}, journal = {CoRR}, volume = {abs/2410.20286}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2410.20286}, doi = {10.48550/ARXIV.2410.20286}, eprinttype = {arXiv}, eprint = {2410.20286}, timestamp = {Thu, 28 Nov 2024 21:32:46 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2410-20286.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
- Parameters:
model – the name of the transformer model to use.
device – the device to use for the transformer model.
batch_size – the batch size to use for processing.
max_length – the maximum length of the input text.
verbose – whether to display progress bars.
- compute_affinity(texts_left, texts_right)[source]¶
Compute the affinity scores between two lists of texts.
- Return type:
List
[float
]- Parameters:
texts_left – the left-hand side texts.
texts_right – the right-hand side texts.
If either the left or right text is a string (or length-1 list), it is projected to the length of the other input (akin to numpy or torch projection).
A higher affinity score indicates the documents are more similar to one another.
>>> from pyterrier_adaptive import Laff >>> model = Laff() >>> model.compute_affinity('the cat sat on the mat', ['cats like to sit in the sun', 'dogs like to play fetch']) [5.46875, -3.140625]
- Returns:
A list of affinity scores.
- transform(inp)[source]¶
Compute the affinity scores between two columns of texts in the input.
Expects columns
text
(for the left-side text) andother_text
(for the right-side text).Results are sorted by the left-side text and the affinity score.
- Return type:
DataFrame
- wrap_graph(graph, text_loader)[source]¶
Wrap a corpus graph with the LAFF transformer for on-the-fly LAFF scomre computation.
- Return type:
OnTheFlyLaffGraph
- Parameters:
graph – the input corpus graph.
text_loader – a transformer that loads the text for a given document.
- Returns:
A corpus graph that computes LAFF scores on-the-fly.
- apply_to_graph(graph, text_loader, out_path=None, *, verbose=None)[source]¶
Apply the LAFF transformer to a corpus graph to construct a new one.
- Return type:
- Parameters:
graph – the input corpus graph.
text_loader – a transformer that loads the text for a given document.
out_path – the path to save the output corpus graph. If not provided, the input graph’s path is used with a ‘.laff’ extension.
verbose – whether to display progress bars.
Bibliography¶
For more information on adaptive retrieval, see:
Citation
MacAvaney et al. Adaptive Re-Ranking with a Corpus Graph. CIKM 2022. [link]
@inproceedings{DBLP:conf/cikm/MacAvaneyTM22, author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald}, editor = {Mohammad Al Hasan and Li Xiong}, title = {Adaptive Re-Ranking with a Corpus Graph}, booktitle = {Proceedings of the 31st {ACM} International Conference on Information {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022}, pages = {1491--1500}, publisher = {{ACM}}, year = {2022}, url = {https://doi.org/10.1145/3511808.3557231}, doi = {10.1145/3511808.3557231}, timestamp = {Wed, 19 Oct 2022 17:09:02 +0200}, biburl = {https://dblp.org/rec/conf/cikm/MacAvaneyTM22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Citation
MacAvaney et al. Adaptive Re-Ranking as an Information-Seeking Agent. CIKM Workshops 2022. [link]
@inproceedings{DBLP:conf/cikm/MacAvaneyTM22a, author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald}, editor = {Georgios Drakopoulos and Eleanna Kafeza}, title = {Adaptive Re-Ranking as an Information-Seeking Agent}, booktitle = {Proceedings of the {CIKM} 2022 Workshops co-located with 31st {ACM} International Conference on Information and Knowledge Management {(CIKM} 2022), Atlanta, USA, October 17-21, 2022}, series = {{CEUR} Workshop Proceedings}, volume = {3318}, publisher = {CEUR-WS.org}, year = {2022}, url = {https://ceur-ws.org/Vol-3318/paper9.pdf}, timestamp = {Fri, 10 Mar 2023 16:22:32 +0100}, biburl = {https://dblp.org/rec/conf/cikm/MacAvaneyTM22a.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Citation
Kulkarni et al. Lexically-Accelerated Dense Retrieval. SIGIR 2023. [link]
@inproceedings{DBLP:conf/sigir/KulkarniMGF23, author = {Hrishikesh Kulkarni and Sean MacAvaney and Nazli Goharian and Ophir Frieder}, editor = {Hsin{-}Hsi Chen and Wei{-}Jou (Edward) Duh and Hen{-}Hsen Huang and Makoto P. Kato and Josiane Mothe and Barbara Poblete}, title = {Lexically-Accelerated Dense Retrieval}, booktitle = {Proceedings of the 46th International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval, {SIGIR} 2023, Taipei, Taiwan, July 23-27, 2023}, pages = {152--162}, publisher = {{ACM}}, year = {2023}, url = {https://doi.org/10.1145/3539618.3591715}, doi = {10.1145/3539618.3591715}, timestamp = {Fri, 21 Jul 2023 22:25:19 +0200}, biburl = {https://dblp.org/rec/conf/sigir/KulkarniMGF23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Citation
Frayling et al. Effective Adhoc Retrieval Through Traversal of a Query-Document Graph. ECIR (3) 2024. [link]
@inproceedings{DBLP:conf/ecir/FraylingMMO24, author = {Erlend Frayling and Sean MacAvaney and Craig Macdonald and Iadh Ounis}, editor = {Nazli Goharian and Nicola Tonellotto and Yulan He and Aldo Lipani and Graham McDonald and Craig Macdonald and Iadh Ounis}, title = {Effective Adhoc Retrieval Through Traversal of a Query-Document Graph}, booktitle = {Advances in Information Retrieval - 46th European Conference on Information Retrieval, {ECIR} 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part {III}}, series = {Lecture Notes in Computer Science}, volume = {14610}, pages = {89--104}, publisher = {Springer}, year = {2024}, url = {https://doi.org/10.1007/978-3-031-56063-7\_6}, doi = {10.1007/978-3-031-56063-7\_6}, timestamp = {Fri, 29 Mar 2024 23:01:35 +0100}, biburl = {https://dblp.org/rec/conf/ecir/FraylingMMO24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Citation
MacAvaney and Tonellotto. A Reproducibility Study of PLAID. SIGIR 2024. [link]
@inproceedings{DBLP:conf/sigir/MacAvaneyT24, author = {Sean MacAvaney and Nicola Tonellotto}, editor = {Grace Hui Yang and Hongning Wang and Sam Han and Claudia Hauff and Guido Zuccon and Yi Zhang}, title = {A Reproducibility Study of {PLAID}}, booktitle = {Proceedings of the 47th International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval, {SIGIR} 2024, Washington DC, USA, July 14-18, 2024}, pages = {1411--1419}, publisher = {{ACM}}, year = {2024}, url = {https://doi.org/10.1145/3626772.3657856}, doi = {10.1145/3626772.3657856}, timestamp = {Sun, 06 Oct 2024 21:14:16 +0200}, biburl = {https://dblp.org/rec/conf/sigir/MacAvaneyT24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Citation
Rathee et al. Quam: Adaptive Retrieval through Query Affinity Modelling. arXiv 2024. [link]
@article{DBLP:journals/corr/abs-2410-20286, author = {Mandeep Rathee and Sean MacAvaney and Avishek Anand}, title = {Quam: Adaptive Retrieval through Query Affinity Modelling}, journal = {CoRR}, volume = {abs/2410.20286}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2410.20286}, doi = {10.48550/ARXIV.2410.20286}, eprinttype = {arXiv}, eprint = {2410.20286}, timestamp = {Thu, 28 Nov 2024 21:32:46 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2410-20286.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }