BMP API Reference

class bmp.pyterrier.BmpIndex(path)[source]

Represents a Block-Max Pruning Index stored on disk.

Citation

Mallia et al. Faster Learned Sparse Retrieval with Block-Max Pruning. SIGIR 2024. [link]
@inproceedings{DBLP:conf/sigir/MalliaST24,
  author       = {Antonio Mallia and
                  Torsten Suel and
                  Nicola Tonellotto},
  editor       = {Grace Hui Yang and
                  Hongning Wang and
                  Sam Han and
                  Claudia Hauff and
                  Guido Zuccon and
                  Yi Zhang},
  title        = {Faster Learned Sparse Retrieval with Block-Max Pruning},
  booktitle    = {Proceedings of the 47th International {ACM} {SIGIR} Conference on
                  Research and Development in Information Retrieval, {SIGIR} 2024, Washington
                  DC, USA, July 14-18, 2024},
  pages        = {2411--2415},
  publisher    = {{ACM}},
  year         = {2024},
  url          = {https://doi.org/10.1145/3626772.3657906},
  doi          = {10.1145/3626772.3657906},
  timestamp    = {Sun, 19 Jan 2025 13:11:15 +0100},
  biburl       = {https://dblp.org/rec/conf/sigir/MalliaST24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
Parameters:

path (str) – Path to the index directory.

built()[source]

Checks whether the index has been built.

Return type:

bool

Returns:

True if the index exists on disk, False otherwise.

Return type:

bool

indexer(*, bsize=32, compress_range=False, scale_float=100.0)[source]

Creates a bmp.pyterrier.BmpIndexer for indexing documents.

Return type:

Indexer

Parameters:
  • bsize (int) – Block size for block-max pruning.

  • compress_range (bool) – Whether to compress the index.

  • scale_float (float) – Scaling factor for float token values into integers.

Returns:

The indexer instance.

Return type:

BmpIndexer

index(inp)[source]

Index the documents with default settings.

Return type:

Artifact

Parameters:

inp (Iterable[Dict[str, Any]]) – An iterable of documents (dicts containing docno and toks keys) to index.

retriever(*, num_results=1000, alpha=1.0, beta=1.0)[source]

Creates a bmp.pyterrier.BmpRetriever for this index.

Return type:

Transformer

Parameters:
  • num_results (int) – the number of results per query to retrieve.

  • alpha (float) – block termination threshold (terminate retrievel when the maximum block score is less than alpha of the threshold. Decreasing this value increases the chance documents are missed, but speeds up retrieval by pruning more blocks. For exact retrieval, use alpha=1.0 (default).

  • beta (float) – query term pruning factor (keeps the top beta weight of query terms). Decreasing this value introduces score approximation error, but reduces computational cost. For exact scoring, use beta=1.0 (default).

Returns:

The retriever instance.

transform(inp)[source]

Retrieve documents from the index for the given queries using default settings (exact retrieval),

Return type:

DataFrame

Parameters:

inp (DataFrame) – A DataFrame containing queries with a query_toks column.

Returns:

DataFrame containing retrieved documents with docno, score, and rank columns.

load_into_memory()[source]

Loads the index into memory and returns a Searcher instance.

If the searcher is already loaded, it returns the existing instance.

Returns:

The in-memory searcher instance.

Return type:

Searcher

close()[source]

Closes the in-memory searcher if it exists.

class bmp.pyterrier.BmpIndexer(bmp_index, bsize=32, compress_range=False, scale_float=100.0)[source]

An indexer for a BMP index.

Parameters:
  • bmp_index (BmpIndex) – BMP index object to create.

  • bsize (int) – Block size for block-max pruning.

  • compress_range (bool) – Whether to compress the index.

  • scale_float (float) – Scaling factor for float token values into integers.

Returns:

The indexer instance.

Return type:

BmpIndexer

index(inp)[source]

Index the documents with default settings.

Return type:

Artifact

Parameters:

inp (Iterable[Dict[str, Any]]) – An iterable of documents (dicts containing docno and toks keys) to index.

class bmp.pyterrier.BmpRetriever(bmp_index, *, num_results=1000, alpha=1.0, beta=1.0)[source]

A transformer that retrieves over a BMP index.

Parameters:
  • bmp_index (BmpIndex) – BMP index object to retrieve over.

  • num_results (int) – the number of results per query to retrieve.

  • alpha (float) – block termination threshold (terminate retrievel when the maximum block score is less than alpha of the threshold. Decreasing this value increases the chance documents are missed, but speeds up retrieval by pruning more blocks. For exact retrieval, use alpha=1.0 (default).

  • beta (float) – query term pruning factor (keeps the top beta weight of query terms). Decreasing this value introduces score approximation error, but reduces computational cost. For exact scoring, use beta=1.0 (default).

transform(inp)[source]

Retrieve documents from the index for the given queries.

Return type:

DataFrame

Parameters:

inp (DataFrame) – A DataFrame containing queries with a query_toks column.

Returns:

DataFrame containing retrieved documents with docno, score, and rank columns.