BMP + PyTerrier

BMP (Block-Max Pruning) is a retrieval approach and software package that provides fast exact and approximate sparse search functionality. It was introduced in the following article:

Citation

Mallia et al. Faster Learned Sparse Retrieval with Block-Max Pruning. SIGIR 2024. [link]
@inproceedings{DBLP:conf/sigir/MalliaST24,
  author       = {Antonio Mallia and
                  Torsten Suel and
                  Nicola Tonellotto},
  editor       = {Grace Hui Yang and
                  Hongning Wang and
                  Sam Han and
                  Claudia Hauff and
                  Guido Zuccon and
                  Yi Zhang},
  title        = {Faster Learned Sparse Retrieval with Block-Max Pruning},
  booktitle    = {Proceedings of the 47th International {ACM} {SIGIR} Conference on
                  Research and Development in Information Retrieval, {SIGIR} 2024, Washington
                  DC, USA, July 14-18, 2024},
  pages        = {2411--2415},
  publisher    = {{ACM}},
  year         = {2024},
  url          = {https://doi.org/10.1145/3626772.3657906},
  doi          = {10.1145/3626772.3657906},
  timestamp    = {Sun, 19 Jan 2025 13:11:15 +0100},
  biburl       = {https://dblp.org/rec/conf/sigir/MalliaST24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Overview

BMP provides a PyTerrier-compatible interface, which is covered in this documentation. You an install it with pip:

pip install bmp[pyterrier]

bmp.pyterrier.BmpIndex is an artifact that provides indexing and retrieval functionality. Most of the time, you will likely use BmpIndex in conjunction with a LSR model, such as SPLADE.

Indexing with BMP and Splade
from bmp.pyterrier import BmpIndex
from pyt_splade import SPLADE
index = BmpIndex('my_index.bmp') # [1]
model = Splade() # [2]
indexing_pipeline = model >> index.indexer() # [3]
indexing_pipeline.index([
    {'docno': '1', 'text': 'My document'},
    {'docno': '1', 'text': 'Another document'},
])
  1. Specify the path that you want to index to. The .bmp extension is optional.

  2. Load a learned sparse retrieval model. Here we use SPLADE, but you can use any LSR model that you wish.

  3. The indexing pipeline first encodes documents with SPLADE, then adds them to the BMP index.

Retrieval with BMP and Splade
from bmp.pyterrier import BmpIndex
from pyt_splade import Splade
index = BmpIndex('my_index.bmp') # [1]
model = Splade() # [2]
retrieval_pipeline = model >> index.retriever() # [3]
retrieval_pipeline.search('my query')
  1. Specify the path to a BMP index that you built.

  2. Load the learned sparse model that you used to build your index

  3. The retrieval pipeline first encodes queries with SPLDE, then retrieves over the BMP index.

Additional Materials