Prompt Construction

This module provides classes for constructing prompts in a Retrieval-Augmented Generation (RAG) system. It includes functionality for aggregating context from multiple documents, and constructing prompts with system messages etc.

class pyterrier_rag.prompt.Concatenator(in_fields=['text'], out_field='qcontext', text_loader=None, intermediate_format=None, tokenizer=None, max_length=-1, max_elements=-1, max_per_context=-1, truncation_rate=50, aggregate_func=None, ordering_func=<function score_sort>)[source]

Transformer that concatenates specified fields from document records into a context string.

At query time, orders, loads text (if needed), and aggregates records into a single context.

Parameters:
  • in_fields (List[str]) – Fields to extract from each record. Defaults to [“text”].

  • out_field (str) – Name of the output context field. Defaults to “qcontext”.

  • text_loader (Callable, optional) – Function to load document text by doc ID.

  • intermediate_format (Callable, optional) – Formatter for individual records.

  • tokenizer (Any, optional) – Tokenizer used for length-based truncation.

  • max_length (int) – Max total token length of the context. Defaults to -1 (no limit).

  • max_elements (int) – Max number of records to include. Defaults to -1 (no limit).

  • max_per_context (int) – Max tokens per record.

  • truncation_rate (int) – Token drop rate during truncation. Defaults to 50.

  • aggregate_func (Callable, optional) – Custom aggregation function.

  • ordering_func (Callable) – Record ordering function before aggregation. Defaults to score_sort, which sorts by “score” descending.

Raises:

ValueError – If ‘text’ is in in_fields but no text_loader is set.

class pyterrier_rag.prompt.PromptTransformer(instruction=None, model_name_or_path=None, system_message=None, conversation_template=None, api_type=None, output_field='prompt', input_fields=['query', 'qcontext'], expects_logprobs=False, answer_extraction=None, raw_instruction=False)[source]

Transformer that constructs and formats prompts for conversational LLMs.

Parameters:
  • instruction (callable|str) – Template or function returning the instruction segment.

  • model_name_or_path (str, optional) – Model identifier for selecting conversation template.

  • system_message (str, optional) – System context message for the conversation.

  • conversation_template (Any, optional) – Preconfigured conversation template.

  • api_type (str, optional) – API format: ‘openai’,’gemini’,’vertex’,’reka’.

  • output_field (str) – Field name to store the generated prompt.

  • input_fields (List[str]) – Input record fields required to build the prompt.

  • expects_logprobs (bool) – Indicator for logprob-based backends.

  • answer_extraction (callable, optional) – Function to parse model outputs.

  • raw_instruction (bool) – If True, returns raw instruction without template.