Neural Rankers and Rerankers¶
PyTerrier is designed with for ease of integration with neural ranking models, such as BERT. In short, neural re-rankers that can take the text of the query and the text of a document can be easily expressed using an pyterrier.apply - Custom Transformers transformer.
More complex rankers (for instance, that can be trained within PyTerrier, or that can take advantage of batching to speed up GPU operations) typically require more complex integrations. We have separate repositories with integrations of well-known neural re-ranking plaforms (CEDR, ColBERT).
Indexing, Retrieval and Scoring of Text using Terrier¶
If you are using a Terrier index for your first-stage ranking, you will want to record the text of the documents in the MetaIndex. More of PyTerrier’s support for operating on text is documented in Working with Document Texts.
Available Neural Dense Retrieval and Re-ranking Integrations¶
OpenNIR has integration with PyTerrier - see its notebook examples.
PyTerrier_ColBERT contains a ColBERT integration, including both a text-scorer and a end-to-end dense retrieval.
PyTerrier_ANCE contains an ANCE integration for end-to-end dense retrieval.
PyTerrier_T5 contains a monoT5 integration.
PyTerrier_GenRank contains RankVicuna and RankZephyr integrations.
PyTerrier_doc2query contains a docT5query integration.
PyTerrier_DeepCT contains a DeepCT integration.
The separate PyTerrier_BERT repository includes CEDR integration (including “vanilla” BERT models), as well as an earlier ColBERTPipeline integration.
An initial BERT-QE integration is available.
The following gives an example ranking pipeline using ColBERT for re-ranking documents in PyTerrier. Long documents are broken up into passages using a sliding-window operation. The final score for each document is the maximum of any consitutent passages:
from pyterrier_bert.colbert import ColBERTPipeline
pipeline = DPH_br_body >> \
pt.text.sliding() >> \
ColBERTPipeline("/path/to/checkpoint") >> \
pt.text.max_passage()
Outlook¶
We continue to work on improving the integration of neural rankers and re-rankers within PyTerrier.