Anserini + PyTerrier

Anserini is a retrieval toolkit built on top of Lucene. pyterrier-anserini provides a PyTerrier-compatible interface to Anserini, allowing you to easily run experiments and combine it with other systems.

Quick Start

You can install pyterrier-anserini with pip:

Install pyterrier-anserini
$ pip install pyterrier-anserini

AnseriniIndex is the main class for working with Anserini. For instance, you can download a pre-built index from HuggingFace and retrieve with BM25 using the following snippet:

Load an Anserini index from HuggingFace and retrieve using BM25
>>> from pyterrier_anserini import AnseriniIndex
>>> index = AnseriniIndex.from_hf('macavaney/msmarco-passage.anserini')
>>> bm25 = index.bm25(include_fields=['contents'])
>>> bm25.search('terrier breeds')
  qid           query    docno    score  rank                                      contents
0   1  terrier breeds  5785957  11.9588     0  The Jack Russell Terrier and the Russell ...
1   1  terrier breeds  7455374  11.9343     1  FCI, ANKC, and IKC recognize the shorts a...
2   1  terrier breeds  1406578  11.8640     2  Norfolk terrier (English breed of small t...
3   1  terrier breeds  3984886  11.7518     3  Terrier Group is the name of a breed Grou...
4   1  terrier breeds  7728131  11.5660     4  The Yorkshire Terrier didn't begin as the...
...

Acknowledgements

This extension uses the Anserini package. If you use it, please be sure to cite Anserini:

Citation

Yang et al. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR 2017. [link]
@inproceedings{DBLP:conf/sigir/Yang0L17,
  author       = {Peilin Yang and
                  Hui Fang and
                  Jimmy Lin},
  editor       = {Noriko Kando and
                  Tetsuya Sakai and
                  Hideo Joho and
                  Hang Li and
                  Arjen P. de Vries and
                  Ryen W. White},
  title        = {Anserini: Enabling the Use of Lucene for Information Retrieval Research},
  booktitle    = {Proceedings of the 40th International {ACM} {SIGIR} Conference on
                  Research and Development in Information Retrieval, Shinjuku, Tokyo,
                  Japan, August 7-11, 2017},
  pages        = {1253--1256},
  publisher    = {{ACM}},
  year         = {2017},
  url          = {https://doi.org/10.1145/3077136.3080721},
  doi          = {10.1145/3077136.3080721},
  timestamp    = {Sun, 12 Nov 2023 02:10:03 +0100},
  biburl       = {https://dblp.org/rec/conf/sigir/Yang0L17.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

This extension was built as part of the PyTerrier project:

Citation

Macdonald et al. PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval. CIKM 2021. [link]
@inproceedings{DBLP:conf/cikm/MacdonaldTMO21,
  author       = {Craig Macdonald and
                  Nicola Tonellotto and
                  Sean MacAvaney and
                  Iadh Ounis},
  editor       = {Gianluca Demartini and
                  Guido Zuccon and
                  J. Shane Culpepper and
                  Zi Huang and
                  Hanghang Tong},
  title        = {PyTerrier: Declarative Experimentation in Python from {BM25} to Dense
                  Retrieval},
  booktitle    = {{CIKM} '21: The 30th {ACM} International Conference on Information
                  and Knowledge Management, Virtual Event, Queensland, Australia, November
                  1 - 5, 2021},
  pages        = {4526--4533},
  publisher    = {{ACM}},
  year         = {2021},
  url          = {https://doi.org/10.1145/3459637.3482013},
  doi          = {10.1145/3459637.3482013},
  timestamp    = {Tue, 16 Aug 2022 23:04:38 +0200},
  biburl       = {https://dblp.org/rec/conf/cikm/MacdonaldTMO21.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

This extension was written by Sean MacAvaney at the University of Glasgow and was based on an original implementation that was part of PyTerrier, written by Craig Macdonald. Check out the GitHub for a full list of contributors.