pyterrier.anserini - Anserini/Lucene Support

Through an integration of pyserini, PyTerrier can integrate results from the Lucene-based Anserini platform into retrieval pipelines.

class pyterrier.anserini.AnseriniBatchRetrieve(index_location, k=1000, wmodel='BM25', **kwargs)[source]

Allows retrieval from an Anserini index. To use this class, PyTerrier should have been started using pt.init(boot_packages=[“io.anserini:anserini:0.22.0:fatjar”]).

Construct an AnseriniBatchRetrieve retrieve from pyserini.search.lucene.LuceneSearcher.

Parameters:
  • index_location (str) – The location of the Anserini index.

  • wmodel (str) –

    Weighting models supported by Anserini. There are three options:

    • ”BM25” - the BM25 weighting model

    • ”QLD” - Dirichlet language modelling

    • ”TFIDF” - Lucene’s ClassicSimilarity.

  • k (int) – number of results to return. Default is 1000.

transform(queries)[source]

Performs the retrieval

Parameters:

queries – String for a single query, list of queries, or a pandas.Dataframe with columns=[‘qid’, ‘query’]

Returns:

pandas.DataFrame with columns=[‘qid’, ‘docno’, ‘rank’, ‘score’]

Examples

Comparative retrieval from Anserini and Terrier:

trIndex = "/path/to/data.properties"
luceneIndex "/path/to/lucene-index-dir"

BM25_tr = pt.BatchRetrieve(trIndex, wmodel="BM25")
BM25_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="BM25")

pt.Experiment([BM25_tr, BM25_ai], topics, qrels, eval_metrics=["map"])

AnseriniBatchRetrieve can also be used as a re-ranker:

BM25_tr = pt.BatchRetrieve(trIndex, wmodel="BM25")
QLD_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="QLD")

pipe = BM25_tr >> QLD_ai