Dense Retrieval How-To Guides ============================================================ .. how-to:: How do I retrieve over a dense index? .. code-block:: python :caption: Retrieving over a dense index import pyterrier as pt import pyterrier_dr index = pyterrier_dr.FlexIndex('my_index.flex') # :footnote: Specify the path where you want the index to be stored. model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # :footnote: Specify the model used to create the index. retr = model.query_encoder() >> index.retriever() # :footnote: Create a retrieval pipeline by chaining the query encoder and the retriever. results = retr.search('a single query') # or results = retr([ {'qid': '1', 'query': 'multiple queries'}, {'qid': '2', 'query': 'can be passed as a list of dicts'}, ]) .. how-to:: How do I index documents into a dense index? .. code-block:: python :caption: Indexing documents into a FlexIndex import pyterrier as pt import pyterrier_dr index = pyterrier_dr.FlexIndex('my_index.flex') # :footnote: Specify the path where you want the index to be stored. model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # :footnote: Specify the model used to create the index. indexer = model.doc_encoder() >> index.indexer() # :footnote: Create an indexing pipeline by chaining the document encoder and the indexer. docs = [ # :footnote: ``docs`` can be any *iterable* of documents, including generators. This allows you to index collections that are too large to fit in memory at once. {'docid': 'doc1', 'text': 'This is the first document.'}, {'docid': 'doc2', 'text': 'This is the second document.'}, # Add more documents as needed ] indexer.index(docs) .. how-to:: How do I perform re-ranking using a dense index? .. _pyterrier-dr:how-to:dense-index-reranking: This example assumes that you already built a dense index for your collection. If you want to perform re-ranking "on-the-fly" for a dense model, check out :ref:`the next guide `. .. code-block:: python :caption: Re-ranking BM25 results using a FlexIndex import pyterrier as pt import pyterrier_dr sparse_index = pt.terrier.TerrierIndex('my_index.terrier') # :footnote: In this example, we use a sample sparse index with for initial retrieval. dense_index = pyterrier_dr.FlexIndex('my_index.flex') # :footnote: Specify the path where you want the index to be stored. model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # :footnote: Specify the model used to create the index. retr = sparse_index.bm25() >> model.query_encoder() >> dense_index.scorer() # :footnote: Create a re-ranking pipeline by chaining an initial retriever, a query encoder, and a scorer. retr.search('my query') .. how-to:: How do I perform re-ranking using a dense model? .. _pyterrier-dr:how-to:dense-model-reranking: This example performs re-ranking "on-the-fly" using a dense model without requiring a dense index. If you want to perform re-ranking using a dense index, check out :ref:`the previous guide `. .. code-block:: python :caption: Re-ranking BM25 results using a dense model import pyterrier as pt import pyterrier_dr sparse_index = pt.terrier.TerrierIndex('my_index.terrier') # :footnote: In this example, we use a sample sparse index with for initial retrieval. model = pyterrier_dr.SBertBiEncoder('sentence-transformers/all-MiniLM-L6-v2') # :footnote: Specify the model you want to use as a re-ranker. retr = sparse_index.bm25(include_fields=['text']) >> model.text_scorer() # :footnote: Create a re-ranking pipeline by chaining an initial retriever and a text scorer. retr.search('my query')