Anserini + PyTerrier¶
Anserini is a retrieval toolkit built on top of
Lucene. pyterrier-anserini
provides a PyTerrier-compatible
interface to Anserini, allowing you to easily run experiments and combine it with other systems.
Quick Start¶
You can install pyterrier-anserini
with pip:
$ pip install pyterrier-anserini
AnseriniIndex
is the main class for working with Anserini.
For instance, you can download a pre-built index from HuggingFace and retrieve with BM25 using the following
snippet:
>>> from pyterrier_anserini import AnseriniIndex
>>> index = AnseriniIndex.from_hf('macavaney/msmarco-passage.anserini')
>>> bm25 = index.bm25(include_fields=['contents'])
>>> bm25.search('terrier breeds')
qid query docno score rank contents
0 1 terrier breeds 5785957 11.9588 0 The Jack Russell Terrier and the Russell ...
1 1 terrier breeds 7455374 11.9343 1 FCI, ANKC, and IKC recognize the shorts a...
2 1 terrier breeds 1406578 11.8640 2 Norfolk terrier (English breed of small t...
3 1 terrier breeds 3984886 11.7518 3 Terrier Group is the name of a breed Grou...
4 1 terrier breeds 7728131 11.5660 4 The Yorkshire Terrier didn't begin as the...
...
Acknowledgements¶
This extension uses the Anserini package. If you use it, please be sure to cite Anserini:
Citation
Yang et al. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR 2017. [link]
@inproceedings{DBLP:conf/sigir/Yang0L17, author = {Peilin Yang and Hui Fang and Jimmy Lin}, editor = {Noriko Kando and Tetsuya Sakai and Hideo Joho and Hang Li and Arjen P. de Vries and Ryen W. White}, title = {Anserini: Enabling the Use of Lucene for Information Retrieval Research}, booktitle = {Proceedings of the 40th International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017}, pages = {1253--1256}, publisher = {{ACM}}, year = {2017}, url = {https://doi.org/10.1145/3077136.3080721}, doi = {10.1145/3077136.3080721}, timestamp = {Sun, 12 Nov 2023 02:10:03 +0100}, biburl = {https://dblp.org/rec/conf/sigir/Yang0L17.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
This extension was built as part of the PyTerrier project:
Citation
Macdonald et al. PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval. CIKM 2021. [link]
@inproceedings{DBLP:conf/cikm/MacdonaldTMO21, author = {Craig Macdonald and Nicola Tonellotto and Sean MacAvaney and Iadh Ounis}, editor = {Gianluca Demartini and Guido Zuccon and J. Shane Culpepper and Zi Huang and Hanghang Tong}, title = {PyTerrier: Declarative Experimentation in Python from {BM25} to Dense Retrieval}, booktitle = {{CIKM} '21: The 30th {ACM} International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021}, pages = {4526--4533}, publisher = {{ACM}}, year = {2021}, url = {https://doi.org/10.1145/3459637.3482013}, doi = {10.1145/3459637.3482013}, timestamp = {Tue, 16 Aug 2022 23:04:38 +0200}, biburl = {https://dblp.org/rec/conf/cikm/MacdonaldTMO21.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
This extension was written by Sean MacAvaney at the University of Glasgow and was based on an original implementation that was part of PyTerrier, written by Craig Macdonald. Check out the GitHub for a full list of contributors.