BMP + PyTerrier¶
BMP (Block-Max Pruning) is a retrieval approach and software package that provides fast exact and approximate sparse search functionality. It was introduced in the following article:
Citation
Mallia et al. Faster Learned Sparse Retrieval with Block-Max Pruning. SIGIR 2024. [link]
@inproceedings{DBLP:conf/sigir/MalliaST24,
author = {Antonio Mallia and
Torsten Suel and
Nicola Tonellotto},
editor = {Grace Hui Yang and
Hongning Wang and
Sam Han and
Claudia Hauff and
Guido Zuccon and
Yi Zhang},
title = {Faster Learned Sparse Retrieval with Block-Max Pruning},
booktitle = {Proceedings of the 47th International {ACM} {SIGIR} Conference on
Research and Development in Information Retrieval, {SIGIR} 2024, Washington
DC, USA, July 14-18, 2024},
pages = {2411--2415},
publisher = {{ACM}},
year = {2024},
url = {https://doi.org/10.1145/3626772.3657906},
doi = {10.1145/3626772.3657906},
timestamp = {Sun, 19 Jan 2025 13:11:15 +0100},
biburl = {https://dblp.org/rec/conf/sigir/MalliaST24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Overview¶
BMP provides a PyTerrier-compatible interface, which is covered in this documentation. You an install it with pip:
pip install bmp[pyterrier]
bmp.pyterrier.BmpIndex is an artifact that provides indexing and retrieval functionality. Most of the time,
you will likely use BmpIndex in conjunction with a LSR model, such as SPLADE.
from bmp.pyterrier import BmpIndex
from pyt_splade import SPLADE
index = BmpIndex('my_index.bmp') # [1]
model = Splade() # [2]
indexing_pipeline = model >> index.indexer() # [3]
indexing_pipeline.index([
{'docno': '1', 'text': 'My document'},
{'docno': '1', 'text': 'Another document'},
])
Specify the path that you want to index to. The
.bmpextension is optional.Load a learned sparse retrieval model. Here we use SPLADE, but you can use any LSR model that you wish.
The indexing pipeline first encodes documents with SPLADE, then adds them to the BMP index.
from bmp.pyterrier import BmpIndex
from pyt_splade import Splade
index = BmpIndex('my_index.bmp') # [1]
model = Splade() # [2]
retrieval_pipeline = model >> index.retriever() # [3]
retrieval_pipeline.search('my query')
Specify the path to a BMP index that you built.
Load the learned sparse model that you used to build your index
The retrieval pipeline first encodes queries with SPLDE, then retrieves over the BMP index.