Dense Retrieval How-To Guides¶
How do I retrieve over a dense index?¶
import pyterrier as pt
import pyterrier_dr
index = pyterrier_dr.FlexIndex('my_index.flex') # [1]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [2]
retr = model.query_encoder() >> index.retriever() # [3]
results = retr.search('a single query')
# or
results = retr([
{'qid': '1', 'query': 'multiple queries'},
{'qid': '2', 'query': 'can be passed as a list of dicts'},
])
Specify the path where you want the index to be stored.
Specify the model used to create the index.
Create a retrieval pipeline by chaining the query encoder and the retriever.
How do I index documents into a dense index?¶
import pyterrier as pt
import pyterrier_dr
index = pyterrier_dr.FlexIndex('my_index.flex') # [1]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [2]
indexer = model.doc_encoder() >> index.indexer() # [3]
docs = [ # [4]
{'docid': 'doc1', 'text': 'This is the first document.'},
{'docid': 'doc2', 'text': 'This is the second document.'},
# Add more documents as needed
]
indexer.index(docs)
Specify the path where you want the index to be stored.
Specify the model used to create the index.
Create an indexing pipeline by chaining the document encoder and the indexer.
docscan be any iterable of documents, including generators. This allows you to index collections that are too large to fit in memory at once.
How do I perform re-ranking using a dense index?¶
This example assumes that you already built a dense index for your collection. If you want to perform re-ranking “on-the-fly” for a dense model, check out the next guide.
import pyterrier as pt
import pyterrier_dr
sparse_index = pt.terrier.TerrierIndex('my_index.terrier') # [1]
dense_index = pyterrier_dr.FlexIndex('my_index.flex') # [2]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [3]
retr = sparse_index.bm25() >> model.query_encoder() >> dense_index.scorer() # [4]
retr.search('my query')
In this example, we use a sample sparse index with for initial retrieval.
Specify the path where you want the index to be stored.
Specify the model used to create the index.
Create a re-ranking pipeline by chaining an initial retriever, a query encoder, and a scorer.
How do I perform re-ranking using a dense model?¶
This example performs re-ranking “on-the-fly” using a dense model without requiring a dense index. If you want to perform re-ranking using a dense index, check out the previous guide.
import pyterrier as pt
import pyterrier_dr
sparse_index = pt.terrier.TerrierIndex('my_index.terrier') # [1]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [2]
retr = sparse_index.bm25(include_fields=['text']) >> model.text_scorer() # [3]
retr.search('my query')
In this example, we use a sample sparse index with for initial retrieval.
Specify the model you want to use as a re-ranker.
Create a re-ranking pipeline by chaining an initial retriever and a text scorer.