Adaptive Retrieval¶
pyterrier-adaptive provides PyTerrier functionality to support Adaptive Retrieval.
Adaptive Retrieval is a family of techniques that help overcome the recall limitation of re-ranking approaches by identifying relevant documents that were missed by earlier stages.
API Documentation¶
- class pyterrier_adaptive.GAR(scorer, corpus_graph, num_results=1000, batch_size=None, backfill=True, enabled=True, verbose=False)[source]¶
A
Transformerthat implements Graph-based Adaptive Re-ranking (GAR).Required input columns:
['qid', 'query', 'docno', 'score', 'rank']Output columns:
['qid', 'query', 'docno', 'score', 'rank', 'iteration']Note
The iteration column defines the batch number that first identified the document in the results. Due to the alternating nature of the algorithm,
even=initial retrieval,odd=corpus graph, and-1=backfilled.Citation
MacAvaney et al. Adaptive Re-Ranking with a Corpus Graph. CIKM 2022. [link]
@inproceedings{DBLP:conf/cikm/MacAvaneyTM22, author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald}, editor = {Mohammad Al Hasan and Li Xiong}, title = {Adaptive Re-Ranking with a Corpus Graph}, booktitle = {Proceedings of the 31st {ACM} International Conference on Information {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022}, pages = {1491--1500}, publisher = {{ACM}}, year = {2022}, url = {https://doi.org/10.1145/3511808.3557231}, doi = {10.1145/3511808.3557231}, timestamp = {Wed, 19 Oct 2022 17:09:02 +0200}, biburl = {https://dblp.org/rec/conf/cikm/MacAvaneyTM22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }- Parameters:
scorer (
Transformer) – A transformer that scores query-document pairs. It will only be provided with [‘qid, ‘query’, ‘docno’, ‘score’].corpus_graph (
CorpusGraph) – A graph of the corpus, enabling quick lookups of nearest neighboursnum_results (int) – The maximum number of documents to score (called “budget” and $c$ in the paper)
batch_size (int) – The number of documents to score at once (called $b$ in the paper). If not provided, will attempt to use the batch size from the scorer
backfill (bool) – If True, always include all documents from the initial stage, even if they were not re-scored
enabled (bool) – If False, perform re-ranking without using the corpus graph
verbose (bool) – If True, print progress information
- class pyterrier_adaptive.CorpusGraph(path)[source]¶
Initialize the artifact at the provided URL.
- Parameters:
path (Path | str | _NoPath)
- class pyterrier_adaptive.NpTopKCorpusGraph(path, k=None)[source]¶
- class pyterrier_adaptive.Laff(model='macavaney/laff', *, device=None, batch_size=128, max_length=512, verbose=False)[source]¶
A transformer that computes a learned affinity score between two document texts using a transformer model.
Citation
Rathee et al. Quam: Adaptive Retrieval through Query Affinity Modelling. arXiv 2024. [link]
@article{DBLP:journals/corr/abs-2410-20286, author = {Mandeep Rathee and Sean MacAvaney and Avishek Anand}, title = {Quam: Adaptive Retrieval through Query Affinity Modelling}, journal = {CoRR}, volume = {abs/2410.20286}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2410.20286}, doi = {10.48550/ARXIV.2410.20286}, eprinttype = {arXiv}, eprint = {2410.20286}, timestamp = {Thu, 28 Nov 2024 21:32:46 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2410-20286.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }- Parameters:
model (str) – the name of the transformer model to use.
device (str | device | None) – the device to use for the transformer model.
batch_size (int) – the batch size to use for processing.
max_length (int) – the maximum length of the input text.
verbose (bool) – whether to display progress bars.
- compute_affinity(texts_left, texts_right)[source]¶
Compute the affinity scores between two lists of texts.
- Return type:
List[float]- Parameters:
texts_left (str | List[str]) – the left-hand side texts.
texts_right (str | List[str]) – the right-hand side texts.
If either the left or right text is a string (or length-1 list), it is projected to the length of the other input (akin to numpy or torch projection).
A higher affinity score indicates the documents are more similar to one another.
Compute the Learned Affinity (LAFF) score between documents.¶>>> from pyterrier_adaptive import Laff >>> model = Laff() >>> model.compute_affinity('the cat sat on the mat', ['cats like to sit in the sun', 'dogs like to play fetch']) [5.46875, -3.140625]
- Returns:
A list of affinity scores.
- Parameters:
texts_left (str | List[str])
texts_right (str | List[str])
- Return type:
List[float]
- transform(inp)[source]¶
Compute the affinity scores between two columns of texts in the input.
Expects columns
text(for the left-side text) andother_text(for the right-side text).Results are sorted by the left-side text and the affinity score.
- Return type:
DataFrame- Parameters:
inp (DataFrame)
- wrap_graph(graph, text_loader)[source]¶
Wrap a corpus graph with the LAFF transformer for on-the-fly LAFF scomre computation.
- Return type:
OnTheFlyLaffGraph- Parameters:
graph (CorpusGraph) – the input corpus graph.
text_loader (Transformer) – a transformer that loads the text for a given document.
- Returns:
A corpus graph that computes LAFF scores on-the-fly.
- apply_to_graph(graph, text_loader, out_path=None, *, verbose=None)[source]¶
Apply the LAFF transformer to a corpus graph to construct a new one.
- Return type:
- Parameters:
graph (NpTopKCorpusGraph) – the input corpus graph.
text_loader (Transformer) – a transformer that loads the text for a given document.
out_path (str | None) – the path to save the output corpus graph. If not provided, the input graph’s path is used with a ‘.laff’ extension.
verbose (bool | None) – whether to display progress bars.
Bibliography¶
For more information on adaptive retrieval, see:
Citation
MacAvaney et al. Adaptive Re-Ranking with a Corpus Graph. CIKM 2022. [link]
@inproceedings{DBLP:conf/cikm/MacAvaneyTM22,
author = {Sean MacAvaney and
Nicola Tonellotto and
Craig Macdonald},
editor = {Mohammad Al Hasan and
Li Xiong},
title = {Adaptive Re-Ranking with a Corpus Graph},
booktitle = {Proceedings of the 31st {ACM} International Conference on Information
{\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022},
pages = {1491--1500},
publisher = {{ACM}},
year = {2022},
url = {https://doi.org/10.1145/3511808.3557231},
doi = {10.1145/3511808.3557231},
timestamp = {Wed, 19 Oct 2022 17:09:02 +0200},
biburl = {https://dblp.org/rec/conf/cikm/MacAvaneyTM22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Citation
MacAvaney et al. Adaptive Re-Ranking as an Information-Seeking Agent. CIKM Workshops 2022. [link]
@inproceedings{DBLP:conf/cikm/MacAvaneyTM22a,
author = {Sean MacAvaney and
Nicola Tonellotto and
Craig Macdonald},
editor = {Georgios Drakopoulos and
Eleanna Kafeza},
title = {Adaptive Re-Ranking as an Information-Seeking Agent},
booktitle = {Proceedings of the {CIKM} 2022 Workshops co-located with 31st {ACM}
International Conference on Information and Knowledge Management {(CIKM}
2022), Atlanta, USA, October 17-21, 2022},
series = {{CEUR} Workshop Proceedings},
volume = {3318},
publisher = {CEUR-WS.org},
year = {2022},
url = {https://ceur-ws.org/Vol-3318/paper9.pdf},
timestamp = {Fri, 10 Mar 2023 16:22:32 +0100},
biburl = {https://dblp.org/rec/conf/cikm/MacAvaneyTM22a.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Citation
Kulkarni et al. Lexically-Accelerated Dense Retrieval. SIGIR 2023. [link]
@inproceedings{DBLP:conf/sigir/KulkarniMGF23,
author = {Hrishikesh Kulkarni and
Sean MacAvaney and
Nazli Goharian and
Ophir Frieder},
editor = {Hsin{-}Hsi Chen and
Wei{-}Jou (Edward) Duh and
Hen{-}Hsen Huang and
Makoto P. Kato and
Josiane Mothe and
Barbara Poblete},
title = {Lexically-Accelerated Dense Retrieval},
booktitle = {Proceedings of the 46th International {ACM} {SIGIR} Conference on
Research and Development in Information Retrieval, {SIGIR} 2023, Taipei,
Taiwan, July 23-27, 2023},
pages = {152--162},
publisher = {{ACM}},
year = {2023},
url = {https://doi.org/10.1145/3539618.3591715},
doi = {10.1145/3539618.3591715},
timestamp = {Fri, 21 Jul 2023 22:25:19 +0200},
biburl = {https://dblp.org/rec/conf/sigir/KulkarniMGF23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Citation
Frayling et al. Effective Adhoc Retrieval Through Traversal of a Query-Document Graph. ECIR (3) 2024. [link]
@inproceedings{DBLP:conf/ecir/FraylingMMO24,
author = {Erlend Frayling and
Sean MacAvaney and
Craig Macdonald and
Iadh Ounis},
editor = {Nazli Goharian and
Nicola Tonellotto and
Yulan He and
Aldo Lipani and
Graham McDonald and
Craig Macdonald and
Iadh Ounis},
title = {Effective Adhoc Retrieval Through Traversal of a Query-Document Graph},
booktitle = {Advances in Information Retrieval - 46th European Conference on Information
Retrieval, {ECIR} 2024, Glasgow, UK, March 24-28, 2024, Proceedings,
Part {III}},
series = {Lecture Notes in Computer Science},
volume = {14610},
pages = {89--104},
publisher = {Springer},
year = {2024},
url = {https://doi.org/10.1007/978-3-031-56063-7\_6},
doi = {10.1007/978-3-031-56063-7\_6},
timestamp = {Fri, 29 Mar 2024 23:01:35 +0100},
biburl = {https://dblp.org/rec/conf/ecir/FraylingMMO24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Citation
MacAvaney and Tonellotto. A Reproducibility Study of PLAID. SIGIR 2024. [link]
@inproceedings{DBLP:conf/sigir/MacAvaneyT24,
author = {Sean MacAvaney and
Nicola Tonellotto},
editor = {Grace Hui Yang and
Hongning Wang and
Sam Han and
Claudia Hauff and
Guido Zuccon and
Yi Zhang},
title = {A Reproducibility Study of {PLAID}},
booktitle = {Proceedings of the 47th International {ACM} {SIGIR} Conference on
Research and Development in Information Retrieval, {SIGIR} 2024, Washington
DC, USA, July 14-18, 2024},
pages = {1411--1419},
publisher = {{ACM}},
year = {2024},
url = {https://doi.org/10.1145/3626772.3657856},
doi = {10.1145/3626772.3657856},
timestamp = {Sun, 06 Oct 2024 21:14:16 +0200},
biburl = {https://dblp.org/rec/conf/sigir/MacAvaneyT24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Citation
Rathee et al. Quam: Adaptive Retrieval through Query Affinity Modelling. arXiv 2024. [link]
@article{DBLP:journals/corr/abs-2410-20286,
author = {Mandeep Rathee and
Sean MacAvaney and
Avishek Anand},
title = {Quam: Adaptive Retrieval through Query Affinity Modelling},
journal = {CoRR},
volume = {abs/2410.20286},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2410.20286},
doi = {10.48550/ARXIV.2410.20286},
eprinttype = {arXiv},
eprint = {2410.20286},
timestamp = {Thu, 28 Nov 2024 21:32:46 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2410-20286.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}