Pseudo-Relevance Feedback¶
Dense Pseudo Relevance Feedback (PRF) is a technique to improve the performance of a retrieval system by expanding the original query vector with the vectors from the top-ranked documents. The idea is that the top-ranked documents.
PyTerrier-DR provides two dense PRF implementations: AveragePrf
and VectorPrf
.
API Documentation¶
- class pyterrier_dr.AveragePrf(*, k=3)[source]¶
Performs Average PRF (as described by Li et al.) by averaging the query_vec column with the doc_vec column of the top k documents.
- Parameters:
k (-) – number of pseudo-relevant feedback documents
Expected Input Columns:
['qid', 'query_vec', 'docno', 'doc_vec']
Output Columns:
['qid', 'query_vec']
(Any other query columns from the input are also pulled included in the output.)Example:
prf_pipe = model >> index >> index.vec_loader() >> pyterier_dr.average_prf() >> index
Citation
Li et al. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls. ACM Trans. Inf. Syst. 2023. [link]
@article{DBLP:journals/tois/0009MZKZ23, author = {Hang Li and Ahmed Mourad and Shengyao Zhuang and Bevan Koopman and Guido Zuccon}, title = {Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls}, journal = {{ACM} Trans. Inf. Syst.}, volume = {41}, number = {3}, pages = {62:1--62:40}, year = {2023}, url = {https://doi.org/10.1145/3570724}, doi = {10.1145/3570724}, timestamp = {Fri, 21 Jul 2023 22:26:51 +0200}, biburl = {https://dblp.org/rec/journals/tois/0009MZKZ23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
- class pyterrier_dr.VectorPrf(*, alpha=1, beta=0.2, k=3)[source]¶
Performs a Rocchio-esque PRF by linearly combining the query_vec column with the doc_vec column of the top k documents.
- Parameters:
alpha (-) – weight of original query_vec
beta (-) – weight of doc_vec
k (-) – number of pseudo-relevant feedback documents
Expected Input Columns:
['qid', 'query_vec', 'docno', 'doc_vec']
Output Columns:
['qid', 'query_vec']
(Any other query columns from the input are also pulled included in the output.)Example:
prf_pipe = model >> index >> index.vec_loader() >> pyterier_dr.vector_prf() >> index
Citation
Li et al. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls. ACM Trans. Inf. Syst. 2023. [link]
@article{DBLP:journals/tois/0009MZKZ23, author = {Hang Li and Ahmed Mourad and Shengyao Zhuang and Bevan Koopman and Guido Zuccon}, title = {Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls}, journal = {{ACM} Trans. Inf. Syst.}, volume = {41}, number = {3}, pages = {62:1--62:40}, year = {2023}, url = {https://doi.org/10.1145/3570724}, doi = {10.1145/3570724}, timestamp = {Fri, 21 Jul 2023 22:26:51 +0200}, biburl = {https://dblp.org/rec/journals/tois/0009MZKZ23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }