API Documentation¶

Splade is the primary way to interact with this package:

class pyterrier_splade.Splade(model='naver/splade-cocondenser-ensembledistil', tokenizer=None, agg='max', max_length=256, device=None)[source]¶

A SPLADE model, which provides transformers for sparse encoding documents and queries, and scoring documents.

Initializes the SPLADE model.

Parameters:

model (Module | str) – the SPLADE model to use, either a PyTorch model or a string to load from HuggingFace
tokenizer – the tokenizer to use, if not included in the model
agg – the aggregation function to use for the SPLADE model
max_length – the maximum length of the input sequences
device – the device to use, e.g. ‘cuda’ or ‘cpu’

doc_encoder(text_field='text', batch_size=100, sparse=True, verbose=False, scale=100)[source]¶

Returns a transformer that encodes a text field into a document representation.

Return type:

Transformer

Parameters:

text_field – the text field to encode
batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies

indexing(text_field='text', batch_size=100, sparse=True, verbose=False, scale=100)¶

Returns a transformer that encodes a text field into a document representation.

Return type:

Transformer

Parameters:

text_field – the text field to encode
batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies

query_encoder(batch_size=100, sparse=True, verbose=False, scale=100)[source]¶

Returns a transformer that encodes a query field into a query representation.

Return type:

Transformer

Parameters:

batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies

query(batch_size=100, sparse=True, verbose=False, scale=100)¶

Returns a transformer that encodes a query field into a query representation.

Return type:

Transformer

Parameters:

batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies

scorer(text_field='text', batch_size=100, verbose=False)[source]¶

Returns a transformer that scores documents against queries.

Return type:

Transformer

Parameters:

text_field – the text field to score
batch_size – the batch size to use when scoring
verbose – if True, show a progress bar

encode(texts, rep='d', format='dict', scale=1.0)[source]¶

Encodes a batch of texts into their SPLADE representations.

Return type:

List[Dict[str, float]] | List[ndarray] | Tensor

Parameters:

texts (List[str]) – the list of texts to encode
rep (Literal['d', 'q']) – ‘q’ for query, ‘d’ for document
format (Literal['dict', 'np', 'torch']) – ‘dict’ for a dict of term frequencies, ‘np’ for a list of numpy arrays, ‘torch’ for a torch tensor
scale (float) – the scale to apply to the term frequencies

Utils / Internals¶

class pyterrier_splade.Toks2Doc(mult=100.0)[source]¶

Converts a toks field into a text field, by scaling the weights by mult and repeating them.

Initializes the transformer.

Parameters:: mult (float) – the multiplier to apply to the term frequencies

transform(inp)[source]¶

Converts the toks field into a text field.

Return type:: DataFrame
Parameters:: inp (DataFrame)

class pyterrier_splade.SpladeEncoder(splade, text_field, out_field, rep, sparse=True, batch_size=100, verbose=False, scale=1.0)[source]¶

Encodes a text field using a SPLADE model. The output is a dense or sparse representation of the text field.

Initializes the SPLADE encoder.

Parameters:

splade (Splade) – pyterrier_splade.Splade instance
text_field (str) – the input text field to encode
out_field (str) – the output field to store the encoded representation
rep (Literal['q', 'd']) – ‘q’ for query, ‘d’ for document
sparse (bool) – if True, the output will be a dict of term frequencies, otherwise a dense vector
batch_size (int) – the batch size to use when encoding
verbose (bool) – if True, show a progress bar
scale (float) – the scale to apply to the term frequencies

transform(df)[source]¶

Encodes the text field in the input DataFrame.

Return type:: DataFrame
Parameters:: df (DataFrame)

class pyterrier_splade.SpladeScorer(*args, **kwargs)[source]¶

Scores (re-ranks) documents against queries using a SPLADE model.

Initializes the SPLADE scorer.

Parameters:

splade – pyterrier_splade.Splade instance
text_field – the text field to score
batch_size – the batch size to use when scoring
verbose – if True, show a progress bar

transform(df)[source]¶

Scores (re-ranks) the documents against the queries in the input DataFrame.

Return type:: DataFrame
Parameters:: df (DataFrame)