Readers

Generic Reader

class pyterrier_rag.readers.Reader(backend, prompt='Use the context information to answer the Question: \\n Context: {{ qcontext }} \\n Question: {{ query }} \\n Answer:', output_field='qanswer')[source]

Bases: Transformer

Transformer that generates answers from context and queries using an LLM backend.

Combines a PromptTransformer with a Backend to produce text or logprobs, then applies answer extraction to return final responses.

Parameters:
  • backend (Backend or str) – A Backend instance or model identifier string.

  • prompt (PromptTransformer or str) – Prompt template or raw instruction.

  • output_field (str) – Field name in the output DataFrame for answers.

Raises:

ValueError – If the prompt expects logprobs but the backend does not support logprobs.

Example using a local LLM:

from pyterrier_rag.backend import Seq2SeqLMBackend
from pyterrier_rag.prompt import Concatenator
from pyterrier_rag.readers import Reader

flant5 = Reader(Seq2SeqLMBackend('google/flan-t5-base'))
bm25_flant5 = bm25_ret % 10 >> Concatenator() >> flant5
bm25_flant5.search("What is the capital of France?")

Example using a remote LLM:

from pyterrier_rag.backend import OpenAIBackend
from pyterrier_rag.prompt import Concatenator
from pyterrier_rag.readers import Reader

llamma = Reader(OpenAIBackend("llama-3-8b-instruct", api_key="your_api_key", base_url="your_base_url"))
bm25_llamma = bm25_ret % 10 >> Concatenator() >> llamma
bm25_llamma.search("What is the capital of Italy?")
transform(inp)[source]
Return type:

DataFrame

Parameters:

inp (DataFrame)

Specific Readers

class pyterrier_rag.readers.T5FiD(model_name_or_path, tokenizer_name_or_path=None, batch_size=4, text_field='text', text_max_length=256, num_context='auto', max_new_tokens=32, generation_config=None, verbose=False, device=None, **kwargs)[source]

T5 FiD Reader for PyTerrier-RAG

Error

Failed to fetch BibTeX for DBLP ID 'conf/eacl/IzacardG21': ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Parameters:
  • model_name_or_path (str)

  • tokenizer_name_or_path (str)

  • batch_size (int)

  • text_field (str)

  • text_max_length (int)

  • num_context (int | str)

  • max_new_tokens (int)

  • generation_config (GenerationConfig)

  • verbose (bool)

  • device (str | device)

class pyterrier_rag.readers.BARTFiD(model_name_or_path, tokenizer_name_or_path=None, batch_size=4, text_field='text', text_max_length=256, num_context='auto', max_new_tokens=32, generation_config=None, verbose=False, device=None, **kwargs)[source]

BART FiD Reader for PyTerrier-RAG

Error

Failed to fetch BibTeX for DBLP ID 'conf/eacl/IzacardG21': ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Parameters:
  • model_name_or_path (str)

  • tokenizer_name_or_path (str)

  • batch_size (int)

  • text_field (str)

  • text_max_length (int)

  • num_context (int | str)

  • max_new_tokens (int)

  • generation_config (GenerationConfig)

  • verbose (bool)

  • device (str | device)