Dense Retrieval How-To Guides


How do I retrieve over a dense index?

Retrieving over a dense index
import pyterrier as pt
import pyterrier_dr

index = pyterrier_dr.FlexIndex('my_index.flex') # [1]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [2]
retr = model.query_encoder() >> index.retriever() # [3]

results = retr.search('a single query')
# or
results = retr([
    {'qid': '1', 'query': 'multiple queries'},
    {'qid': '2', 'query': 'can be passed as a list of dicts'},
])
  1. Specify the path where you want the index to be stored.

  2. Specify the model used to create the index.

  3. Create a retrieval pipeline by chaining the query encoder and the retriever.


How do I index documents into a dense index?

Indexing documents into a FlexIndex
import pyterrier as pt
import pyterrier_dr

index = pyterrier_dr.FlexIndex('my_index.flex') # [1]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [2]
indexer = model.doc_encoder() >> index.indexer() # [3]

docs = [ # [4]
    {'docid': 'doc1', 'text': 'This is the first document.'},
    {'docid': 'doc2', 'text': 'This is the second document.'},
    # Add more documents as needed
]

indexer.index(docs)
  1. Specify the path where you want the index to be stored.

  2. Specify the model used to create the index.

  3. Create an indexing pipeline by chaining the document encoder and the indexer.

  4. docs can be any iterable of documents, including generators. This allows you to index collections that are too large to fit in memory at once.


How do I perform re-ranking using a dense index?

This example assumes that you already built a dense index for your collection. If you want to perform re-ranking “on-the-fly” for a dense model, check out the next guide.

Re-ranking BM25 results using a FlexIndex
import pyterrier as pt
import pyterrier_dr

sparse_index = pt.terrier.TerrierIndex('my_index.terrier') # [1]
dense_index = pyterrier_dr.FlexIndex('my_index.flex') # [2]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [3]
retr = sparse_index.bm25() >> model.query_encoder() >> dense_index.scorer() # [4]

retr.search('my query')
  1. In this example, we use a sample sparse index with for initial retrieval.

  2. Specify the path where you want the index to be stored.

  3. Specify the model used to create the index.

  4. Create a re-ranking pipeline by chaining an initial retriever, a query encoder, and a scorer.


How do I perform re-ranking using a dense model?

This example performs re-ranking “on-the-fly” using a dense model without requiring a dense index. If you want to perform re-ranking using a dense index, check out the previous guide.

Re-ranking BM25 results using a dense model
import pyterrier as pt
import pyterrier_dr

sparse_index = pt.terrier.TerrierIndex('my_index.terrier') # [1]
model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # [2]
retr = sparse_index.bm25(include_fields=['text']) >> model.text_scorer() # [3]

retr.search('my query')
  1. In this example, we use a sample sparse index with for initial retrieval.

  2. Specify the model you want to use as a re-ranker.

  3. Create a re-ranking pipeline by chaining an initial retriever and a text scorer.