API Documentation

Splade is the primary way to interact with this package:

class pyterrier_splade.Splade(model='naver/splade-cocondenser-ensembledistil', tokenizer=None, agg='max', max_length=256, device=None)[source]

A SPLADE model, which provides transformers for sparse encoding documents and queries, and scoring documents.

Initializes the SPLADE model.

Parameters:
  • model (Module | str) – the SPLADE model to use, either a PyTorch model or a string to load from HuggingFace

  • tokenizer – the tokenizer to use, if not included in the model

  • agg – the aggregation function to use for the SPLADE model

  • max_length – the maximum length of the input sequences

  • device – the device to use, e.g. ‘cuda’ or ‘cpu’

doc_encoder(text_field='text', batch_size=100, sparse=True, verbose=False, scale=100)[source]

Returns a transformer that encodes a text field into a document representation.

Return type:

Transformer

Parameters:
  • text_field – the text field to encode

  • batch_size – the batch size to use when encoding

  • sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector

  • verbose – if True, show a progress bar

  • scale – the scale to apply to the term frequencies

indexing(text_field='text', batch_size=100, sparse=True, verbose=False, scale=100)

Returns a transformer that encodes a text field into a document representation.

Return type:

Transformer

Parameters:
  • text_field – the text field to encode

  • batch_size – the batch size to use when encoding

  • sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector

  • verbose – if True, show a progress bar

  • scale – the scale to apply to the term frequencies

query_encoder(batch_size=100, sparse=True, verbose=False, scale=100)[source]

Returns a transformer that encodes a query field into a query representation.

Return type:

Transformer

Parameters:
  • batch_size – the batch size to use when encoding

  • sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector

  • verbose – if True, show a progress bar

  • scale – the scale to apply to the term frequencies

query(batch_size=100, sparse=True, verbose=False, scale=100)

Returns a transformer that encodes a query field into a query representation.

Return type:

Transformer

Parameters:
  • batch_size – the batch size to use when encoding

  • sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector

  • verbose – if True, show a progress bar

  • scale – the scale to apply to the term frequencies

scorer(text_field='text', batch_size=100, verbose=False)[source]

Returns a transformer that scores documents against queries.

Return type:

Transformer

Parameters:
  • text_field – the text field to score

  • batch_size – the batch size to use when scoring

  • verbose – if True, show a progress bar

encode(texts, rep='d', format='dict', scale=1.0)[source]

Encodes a batch of texts into their SPLADE representations.

Return type:

List[Dict[str, float]] | List[ndarray] | Tensor

Parameters:
  • texts (List[str]) – the list of texts to encode

  • rep (Literal['d', 'q']) – ‘q’ for query, ‘d’ for document

  • format (Literal['dict', 'np', 'torch']) – ‘dict’ for a dict of term frequencies, ‘np’ for a list of numpy arrays, ‘torch’ for a torch tensor

  • scale (float) – the scale to apply to the term frequencies

Utils / Internals

class pyterrier_splade.Toks2Doc(mult=100.0)[source]

Converts a toks field into a text field, by scaling the weights by mult and repeating them.

Initializes the transformer.

Parameters:

mult (float) – the multiplier to apply to the term frequencies

transform(inp)[source]

Converts the toks field into a text field.

Return type:

DataFrame

Parameters:

inp (DataFrame)

class pyterrier_splade.SpladeEncoder(splade, text_field, out_field, rep, sparse=True, batch_size=100, verbose=False, scale=1.0)[source]

Encodes a text field using a SPLADE model. The output is a dense or sparse representation of the text field.

Initializes the SPLADE encoder.

Parameters:
  • splade (Splade) – pyterrier_splade.Splade instance

  • text_field (str) – the input text field to encode

  • out_field (str) – the output field to store the encoded representation

  • rep (Literal['q', 'd']) – ‘q’ for query, ‘d’ for document

  • sparse (bool) – if True, the output will be a dict of term frequencies, otherwise a dense vector

  • batch_size (int) – the batch size to use when encoding

  • verbose (bool) – if True, show a progress bar

  • scale (float) – the scale to apply to the term frequencies

transform(df)[source]

Encodes the text field in the input DataFrame.

Return type:

DataFrame

Parameters:

df (DataFrame)

class pyterrier_splade.SpladeScorer(*args, **kwargs)[source]

Scores (re-ranks) documents against queries using a SPLADE model.

Initializes the SPLADE scorer.

Parameters:
  • spladepyterrier_splade.Splade instance

  • text_field – the text field to score

  • batch_size – the batch size to use when scoring

  • verbose – if True, show a progress bar

transform(df)[source]

Scores (re-ranks) the documents against the queries in the input DataFrame.

Return type:

DataFrame

Parameters:

df (DataFrame)