SuitEval Suites¶
BEIR¶
BEIR is a heterogeneous benchmark containing diverse IR tasks.
Citation
Thakur et al. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv 2021. [link]
@article{DBLP:journals/corr/abs-2104-08663,
author = {Nandan Thakur and
Nils Reimers and
Andreas R{\"{u}}ckl{\'{e}} and
Abhishek Srivastava and
Iryna Gurevych},
title = {{BEIR:} {A} Heterogenous Benchmark for Zero-shot Evaluation of Information
Retrieval Models},
journal = {CoRR},
volume = {abs/2104.08663},
year = {2021},
url = {https://arxiv.org/abs/2104.08663},
eprinttype = {arXiv},
eprint = {2104.08663},
timestamp = {Thu, 14 Oct 2021 09:14:46 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08663.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Usage¶
from suiteeval.suite import BEIR
results = BEIR(pipelines)
NanoBEIR¶
Compact BEIR subset for faster iteration.
Usage¶
from suiteeval.suite import NanoBEIR
results = NanoBEIR(pipelines)
LoTTE¶
LoTTE (Long-Tail Topic-stratified Evaluation) is a set of test collections focused on out-of-domain evaluation.
Citation
Santhanam et al. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. NAACL-HLT 2022. [link]
@inproceedings{DBLP:conf/naacl/SanthanamKSPZ22,
author = {Keshav Santhanam and
Omar Khattab and
Jon Saad{-}Falcon and
Christopher Potts and
Matei Zaharia},
editor = {Marine Carpuat and
Marie{-}Catherine de Marneffe and
Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
title = {ColBERTv2: Effective and Efficient Retrieval via Lightweight Late
Interaction},
booktitle = {Proceedings of the 2022 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies,
{NAACL} 2022, Seattle, WA, United States, July 10-15, 2022},
pages = {3715--3734},
publisher = {Association for Computational Linguistics},
year = {2022},
url = {https://doi.org/10.18653/v1/2022.naacl-main.272},
doi = {10.18653/V1/2022.NAACL-MAIN.272},
timestamp = {Mon, 01 Aug 2022 16:28:04 +0200},
biburl = {https://dblp.org/rec/conf/naacl/SanthanamKSPZ22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Usage¶
from suiteeval.suite import Lotte
results = Lotte(pipelines)
BRIGHT¶
BRIGHT comprises 12 diverse datasets, spanning biology, economics, robotics, math, code and more. The queries can be long StackExchange posts, math or code question.
Citation
Su et al. BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. ICLR 2025. [link]
@inproceedings{DBLP:conf/iclr/SuYXSMWLSST0YA025,
author = {Hongjin Su and
Howard Yen and
Mengzhou Xia and
Weijia Shi and
Niklas Muennighoff and
Han{-}yu Wang and
Haisu Liu and
Quan Shi and
Zachary S. Siegel and
Michael Tang and
Ruoxi Sun and
Jinsung Yoon and
Sercan {\"{O}}. Arik and
Danqi Chen and
Tao Yu},
title = {{BRIGHT:} {A} Realistic and Challenging Benchmark for Reasoning-Intensive
Retrieval},
booktitle = {The Thirteenth International Conference on Learning Representations,
{ICLR} 2025, Singapore, April 24-28, 2025},
publisher = {OpenReview.net},
year = {2025},
url = {https://openreview.net/forum?id=ykuc5q381b},
timestamp = {Thu, 15 May 2025 17:19:05 +0200},
biburl = {https://dblp.org/rec/conf/iclr/SuYXSMWLSST0YA025.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Usage¶
from suiteeval.suite import BRIGHT
results = BRIGHT(pipelines)
MS MARCO (Document & Passage)¶
MSMARCO is a large-scale dataset for training and evaluating information retrieval models. These suites contain TREC Deep Learning queries and relevance judgments for both document and passage retrieval tasks.
Citation
Craswell et al. MS MARCO: Benchmarking Ranking Models in the Large-Data Regime. SIGIR 2021. [link]
@inproceedings{DBLP:conf/sigir/CraswellMYCL21,
author = {Nick Craswell and
Bhaskar Mitra and
Emine Yilmaz and
Daniel Campos and
Jimmy Lin},
editor = {Fernando Diaz and
Chirag Shah and
Torsten Suel and
Pablo Castells and
Rosie Jones and
Tetsuya Sakai},
title = {{MS} {MARCO:} Benchmarking Ranking Models in the Large-Data Regime},
booktitle = {{SIGIR} '21: The 44th International {ACM} {SIGIR} Conference on Research
and Development in Information Retrieval, Virtual Event, Canada, July
11-15, 2021},
pages = {1566--1576},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3404835.3462804},
doi = {10.1145/3404835.3462804},
timestamp = {Sun, 02 Nov 2025 21:27:20 +0100},
biburl = {https://dblp.org/rec/conf/sigir/CraswellMYCL21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Usage¶
from suiteeval.suite import MSMARCODocument, MSMARCOPassage
doc_results = MSMARCODocument(pipelines)
pas_results = MSMARCOPassage(pipelines)