pyterrier.anserini - Anserini/Lucene Support¶
Through an integration of pyserini, PyTerrier can integrate results from the Lucene-based Anserini platform into retrieval pipelines.
- class pyterrier.anserini.AnseriniBatchRetrieve(index_location, k=1000, wmodel='BM25', **kwargs)[source]¶
Allows retrieval from an Anserini index. To use this class, PyTerrier should have been started using pt.init(boot_packages=[“io.anserini:anserini:0.9.2:fatjar”]).
Construct an AnseriniBatchRetrieve retrieve.
- Parameters
index_location (str) – The location of the Anserini index.
wmodel (str) –
Weighting models supported by Anserini. There are three options:
”BM25” - the BM25 weighting model
”QLD” - Dirichlet language modelling
”TFIDF” - Lucene’s ClassicSimilarity.
k (int) – number of results to return. Default is 1000.
Examples¶
Comparative retrieval from Anserini and Terrier:
trIndex = "/path/to/data.properties"
luceneIndex "/path/to/lucene-index-dir"
BM25_tr = pt.BatchRetrieve(trIndex, wmodel="BM25")
BM25_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="BM25")
pt.Experiment([BM25_tr, BM25_ai], topics, qrels, eval_metrics=["map"])
AnseriniBatchRetrieve can also be used as a re-ranker:
BM25_tr = pt.BatchRetrieve(trIndex, wmodel="BM25")
QLD_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="QLD")
pipe = BM25_tr >> QLD_ai