API Documentation¶
Splade is the primary way to interact with this package:
- class pyterrier_splade.Splade(model='naver/splade-cocondenser-ensembledistil', tokenizer=None, agg='max', max_length=256, device=None)[source]¶
A SPLADE model, which provides transformers for sparse encoding documents and queries, and scoring documents.
Initializes the SPLADE model.
- Parameters:
model (Module | str) – the SPLADE model to use, either a PyTorch model or a string to load from HuggingFace
tokenizer – the tokenizer to use, if not included in the model
agg – the aggregation function to use for the SPLADE model
max_length – the maximum length of the input sequences
device – the device to use, e.g. ‘cuda’ or ‘cpu’
- doc_encoder(text_field='text', batch_size=100, sparse=True, verbose=False, scale=100)[source]¶
Returns a transformer that encodes a text field into a document representation.
- Return type:
- Parameters:
text_field – the text field to encode
batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies
- indexing(text_field='text', batch_size=100, sparse=True, verbose=False, scale=100)¶
Returns a transformer that encodes a text field into a document representation.
- Return type:
- Parameters:
text_field – the text field to encode
batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies
- query_encoder(batch_size=100, sparse=True, verbose=False, scale=100)[source]¶
Returns a transformer that encodes a query field into a query representation.
- Return type:
- Parameters:
batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies
- query(batch_size=100, sparse=True, verbose=False, scale=100)¶
Returns a transformer that encodes a query field into a query representation.
- Return type:
- Parameters:
batch_size – the batch size to use when encoding
sparse – if True, the output will be a dict of term frequencies, otherwise a dense vector
verbose – if True, show a progress bar
scale – the scale to apply to the term frequencies
- scorer(text_field='text', batch_size=100, verbose=False)[source]¶
Returns a transformer that scores documents against queries.
- Return type:
- Parameters:
text_field – the text field to score
batch_size – the batch size to use when scoring
verbose – if True, show a progress bar
- encode(texts, rep='d', format='dict', scale=1.0)[source]¶
Encodes a batch of texts into their SPLADE representations.
- Return type:
List[Dict[str,float]] |List[ndarray] |Tensor- Parameters:
texts (List[str]) – the list of texts to encode
rep (Literal['d', 'q']) – ‘q’ for query, ‘d’ for document
format (Literal['dict', 'np', 'torch']) – ‘dict’ for a dict of term frequencies, ‘np’ for a list of numpy arrays, ‘torch’ for a torch tensor
scale (float) – the scale to apply to the term frequencies
Utils / Internals¶
- class pyterrier_splade.Toks2Doc(mult=100.0)[source]¶
Converts a toks field into a text field, by scaling the weights by
multand repeating them.Initializes the transformer.
- Parameters:
mult (float) – the multiplier to apply to the term frequencies
- class pyterrier_splade.SpladeEncoder(splade, text_field, out_field, rep, sparse=True, batch_size=100, verbose=False, scale=1.0)[source]¶
Encodes a text field using a SPLADE model. The output is a dense or sparse representation of the text field.
Initializes the SPLADE encoder.
- Parameters:
splade (Splade) –
pyterrier_splade.Spladeinstancetext_field (str) – the input text field to encode
out_field (str) – the output field to store the encoded representation
rep (Literal['q', 'd']) – ‘q’ for query, ‘d’ for document
sparse (bool) – if True, the output will be a dict of term frequencies, otherwise a dense vector
batch_size (int) – the batch size to use when encoding
verbose (bool) – if True, show a progress bar
scale (float) – the scale to apply to the term frequencies
- class pyterrier_splade.SpladeScorer(*args, **kwargs)[source]¶
Scores (re-ranks) documents against queries using a SPLADE model.
Initializes the SPLADE scorer.
- Parameters:
splade –
pyterrier_splade.Spladeinstancetext_field – the text field to score
batch_size – the batch size to use when scoring
verbose – if True, show a progress bar