Pseudo-Relevance Feedback

Dense Pseudo Relevance Feedback (PRF) is a technique to improve the performance of a retrieval system by expanding the original query vector with the vectors from the top-ranked documents. The idea is that the top-ranked documents.

PyTerrier-DR provides two dense PRF implementations: AveragePrf and VectorPrf.

API Documentation

class pyterrier_dr.AveragePrf(*, k=3)[source]

Performs Average PRF (as described by Li et al.) by averaging the query_vec column with the doc_vec column of the top k documents.

Parameters:

k (-) – number of pseudo-relevant feedback documents

Expected Input Columns: ['qid', 'query_vec', 'docno', 'doc_vec']

Output Columns: ['qid', 'query_vec'] (Any other query columns from the input are also pulled included in the output.)

Example:

prf_pipe = model >> index >> index.vec_loader() >> pyterier_dr.average_prf() >> index

Citation

Li et al. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls. ACM Trans. Inf. Syst. 2023. [link]
@article{DBLP:journals/tois/0009MZKZ23,
  author       = {Hang Li and
                  Ahmed Mourad and
                  Shengyao Zhuang and
                  Bevan Koopman and
                  Guido Zuccon},
  title        = {Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers:
                  Successes and Pitfalls},
  journal      = {{ACM} Trans. Inf. Syst.},
  volume       = {41},
  number       = {3},
  pages        = {62:1--62:40},
  year         = {2023},
  url          = {https://doi.org/10.1145/3570724},
  doi          = {10.1145/3570724},
  timestamp    = {Fri, 21 Jul 2023 22:26:51 +0200},
  biburl       = {https://dblp.org/rec/journals/tois/0009MZKZ23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
transform(inp)[source]

Performs Average PRF on the input dataframe.

Return type:

DataFrame

class pyterrier_dr.VectorPrf(*, alpha=1, beta=0.2, k=3)[source]

Performs a Rocchio-esque PRF by linearly combining the query_vec column with the doc_vec column of the top k documents.

Parameters:
  • alpha (-) – weight of original query_vec

  • beta (-) – weight of doc_vec

  • k (-) – number of pseudo-relevant feedback documents

Expected Input Columns: ['qid', 'query_vec', 'docno', 'doc_vec']

Output Columns: ['qid', 'query_vec'] (Any other query columns from the input are also pulled included in the output.)

Example:

prf_pipe = model >> index >> index.vec_loader() >> pyterier_dr.vector_prf() >> index

Citation

Li et al. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls. ACM Trans. Inf. Syst. 2023. [link]
@article{DBLP:journals/tois/0009MZKZ23,
  author       = {Hang Li and
                  Ahmed Mourad and
                  Shengyao Zhuang and
                  Bevan Koopman and
                  Guido Zuccon},
  title        = {Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers:
                  Successes and Pitfalls},
  journal      = {{ACM} Trans. Inf. Syst.},
  volume       = {41},
  number       = {3},
  pages        = {62:1--62:40},
  year         = {2023},
  url          = {https://doi.org/10.1145/3570724},
  doi          = {10.1145/3570724},
  timestamp    = {Fri, 21 Jul 2023 22:26:51 +0200},
  biburl       = {https://dblp.org/rec/journals/tois/0009MZKZ23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
transform(inp)[source]

Performs Vector PRF on the input dataframe.

Return type:

DataFrame