Prompt Construction¶

This module provides classes for constructing prompts in a Retrieval-Augmented Generation (RAG) system. It includes functionality for aggregating context from multiple documents, and constructing prompts with system messages etc.

class pyterrier_rag.prompt.Concatenator(in_fields=['text'], out_field='qcontext', text_loader=None, intermediate_format=None, tokenizer=None, max_length=-1, max_elements=-1, max_per_context=-1, truncation_rate=50, aggregate_func=None, ordering_func=<function score_sort>)[source]¶

Transformer that concatenates specified fields from document records into a context string.

At query time, orders, loads text (if needed), and aggregates records into a single context.

Parameters:

in_fields (List[str]) – Fields to extract from each record. Defaults to [“text”].
out_field (str) – Name of the output context field. Defaults to “qcontext”.
text_loader (Callable, optional) – Function to load document text by doc ID.
intermediate_format (Callable, optional) – Formatter for individual records.
tokenizer (Any, optional) – Tokenizer used for length-based truncation.
max_length (int) – Max total token length of the context. Defaults to -1 (no limit).
max_elements (int) – Max number of records to include. Defaults to -1 (no limit).
max_per_context (int) – Max tokens per record.
truncation_rate (int) – Token drop rate during truncation. Defaults to 50.
aggregate_func (Callable, optional) – Custom aggregation function.
ordering_func (Callable) – Record ordering function before aggregation. Defaults to score_sort, which sorts by “score” descending.

Raises:

ValueError – If ‘text’ is in in_fields but no text_loader is set.

class pyterrier_rag.prompt.PromptTransformer(instruction=None, model_name_or_path=None, system_message=None, conversation_template=None, api_type=None, output_field='prompt', input_fields=['query', 'qcontext'], expects_logprobs=False, answer_extraction=None, raw_instruction=False)[source]¶

Transformer that constructs and formats prompts for conversational LLMs.

Parameters:

instruction (callable|str) – Template or function returning the instruction segment.
model_name_or_path (str, optional) – Model identifier for selecting conversation template.
system_message (str, optional) – System context message for the conversation.
conversation_template (Any, optional) – Preconfigured conversation template.
api_type (str, optional) – API format: ‘openai’,’gemini’,’vertex’,’reka’.
output_field (str) – Field name to store the generated prompt.
input_fields (List[str]) – Input record fields required to build the prompt.
expects_logprobs (bool) – Indicator for logprob-based backends.
answer_extraction (callable, optional) – Function to parse model outputs.
raw_instruction (bool) – If True, returns raw instruction without template.