LLM Backends ================= PyTerrier RAG supports a variety of LLM backends for generating responses. This functionality is facilitated by the :class:`~pyterrier_rag.Backend` interface, which currently has three implementations: :class:`~pyterrier_rag.backends.HuggingFaceBackend`, :class:`~pyterrier_rag.backends.VllmBackend`, and :class:`~pyterrier_rag.backends.OpenAIBackend`. This architecture also allows different components to share the same backend, which is particularly useful for multi-stage RAG pipelines. Basics ------------------------ Start by creating an instance of a backend. For example, using the :class:`~pyterrier_rag.backends.OpenAIBackend`: .. code-block:: python :caption: Create an instance of a :class:`~pyterrier_rag.backends.OpenAIBackend` import pyterrier_rag as ptr backend = ptr.OpenAIBackend('gpt-4o-mini', api_key="your_openai_api_key") # or loaded from OPENAI_API_KEY environment variable, if available The backend can be used to generate responses to prompts. For example, using the `generate` method: .. code-block:: python :caption: Generate a response using the backend backend.generate(["What is the capital of France?"]) # Outputs: [BackendOutput(text='The capital of France is Paris.', logprobs=None)] Backends also function as PyTerrier Transformers. By default, they take input from the ``prompt`` column and output to the ``qanswer`` column: .. code-block:: python :caption: Generate a response using the backend inp = pd.DataFrame([ {'prompt': 'What is the capital of France?'}, {'prompt': 'What is the capital of Germany?'}, ]) backend(inp) # prompt qanswer # 0 What is the capital of France? The capital of France is Paris. # 1 What is the capital of Germany? The capital of Germany is Berlin. Usually you won't use a backend directly though -- they are instead typically used by other components, such as Prompts and Frameworks. To set the global default backend, you can call :func:`pyterrier_rag.default_backend.set`. Note that this must be called before any components that use the default backend are used (i.e., if you are using ``default_backend``, we recommended setting it at the top of your script/notebook). .. code-block:: python :caption: Set the default backend import pyterrier_rag as ptr ptr.default_backend.set(ptr.OpenAIBackend('gpt-4o-mini')) ptr.default_backend.generate(['What is the capital of France?']) # -> uses the OpenAIBackend from above The default backend is automatically loaded via the ``PYTERRIER_RAG_DEFAULT_BACKEND`` (using :meth:`pyterrier_rag.Backend.from_dsn`) if it is set when PyTerrier RAG is first loaded. Token Probabilities ------------------------ Some components need the log probabilities of the generated tokens (and alternative tokens). This is included as part of the :class:`~pyterrier_rag.backend.BackendOutput` object when using `return_logprobs=True` in :meth:`~pyterrier_rag.Backend.generate` or by using :meth:`~pyterrier_rag.Backend.logprobs_generator`. For example: .. code-block:: python :caption: Include log probabilities for response backend.generate(["What is the capital of France?"], return_logprobs=True) # [BackendOutput(text='The capital of France is Paris.', logprobs=[ # {'The': -0.04, 'That': -0.31, ...}, # ..., # {'Paris': -0.01, 'Berlin': -2.12, ...}, # ..., # ])] inp = pd.DataFrame([ {'prompt': 'What is the capital of France?'}, {'prompt': 'What is the capital of Germany?'}, ]) generator = backend.logprobs_generator() generator(inp) # prompt qanswer qanswer_logprobs # 0 What is the capital of France? The capital of France is Paris. [{'The': -0.04, 'That': -0.31, ...}, ...] # 1 What is the capital of Germany? The capital of Germany is Berlin. [{'The': -0.02, 'That': -0.29, ...}, ...] This feature is typically most useful when a you have a single-token response. You can force the backend to generate a single token using ``max_new_tokens=1`` and a suitable prompt: .. code-block:: python :caption: Force a single token response backend.generate(["What is the capital of France? Answer in a single word only."], max_new_tokens=1, return_logprobs=True) # [BackendOutput(text='Paris', logprobs=[{'Paris': -0.01, 'Berlin': -2.12, ...}])] inp = pd.DataFrame([ {'prompt': 'What is the capital of France? Answer in a single word only.'}, {'prompt': 'What is the capital of Germany? Answer in a single word only.'}, ]) generator = backend.logprobs_generator(max_new_tokens=1) generator(inp) # prompt qanswer qanswer_logprobs # 0 What is the capital of France? Paris [{'Paris': -0.01, 'Berlin': -2.12, ...}] # 1 What is the capital of Germany? Berlin [{'Berlin': -0.02, 'Paris': -2.29, ...}] Reasoning ------------------------ Some models output reasoning steps (contained within a ```` tag) before the final answer. If you want to extract these reasoning steps, you can use the ``ReasoningExtractor`` transformer in your pipeline. .. code-block:: python :caption: Extract reasoning steps from a response from pyterrier_rag import OpenAIBackend, ReasoningExtractor # An example of a model that outputs reasoning steps in tags: backend = OpenAIBackend('deepseek-llama-3-8b-instruct', api_key="your_api_key", base_url="http://localhost:8000/v1") pipeline = backend >> ReasoningExtractor() # extract reasoning after running the backend inp = pd.DataFrame([ {'prompt': 'What is the capital of France?'}, {'prompt': 'What is the capital of Germany?'}, ]) reasoning_extractor(inp) # prompt qanswer reasoning # 0 What is the capital of France? Paris Ok, let me think about this. The capital of France is Paris. # 1 What is the capital of Germany? Berlin Ok, let me think about this. The capital of Germany is Berlin. API Documentation ------------------------ General ~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: pyterrier_rag.Backend :members: .. autoclass:: pyterrier_rag.backend.TextGenerator :members: .. autoclass:: pyterrier_rag.backend.BackendOutput :members: Implementations ~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: pyterrier_rag.HuggingFaceBackend :members: .. autoclass:: pyterrier_rag.VLLMBackend :members: .. autoclass:: pyterrier_rag.OpenAIBackend :members: