Input Validation¶
DataFrame Validation¶
When writing a transformer, it’s a good idea to check its inputs to make sure they are compatible
before you start using it. pt.validate provides functions for this.
def MyTransformer(pt.Transformer):
def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
# e.g., expects a query frame with query_vec
pt.validate.query_frame(inp, extra_columns=['query_vec'])
# raises an error if the specification doesn't match
Validation also underlies inspection: pyterrier.inspect.transformer_inputs() will call a transformer with an
empty DataFrame to see what it expects.
Function |
Must have column(s) |
Must NOT have column(s) |
|---|---|---|
|
qid + |
docno |
|
docno + |
qid |
|
qid + docno + |
|
|
|
|
Note
Besides providing helpful error messages to users, these methods also help perform inspection of pipelines, e.g., for drawing pipeline schematic representations of pipelines and ensuring that transformers are compatible before running them.
Iterable validation¶
For indexing pipelines that accept iterators, it checks the fields of the first element. You need
to first wrap inp in pt.utils.peekable() for this to work.
my_iterator = [{'docno': 'doc1'}, {'docno': 'doc2'}, {'docno': 'doc3'}]
my_iterator = pt.utils.peekable(my_iterator)
pt.validate.columns_iter(my_iterator, includes=['docno']) # passes
pt.validate.columns_iter(my_iterator, includes=['docno', 'toks']) # raises errors
Advanced Usage¶
Sometimes a transformer has multiple acceptable input specifications, e.g., if
it can act as either a retriever (with a query input) or re-ranker (with a result input).
In this case, you can specify multiple possible configurations in a with pt.validate.any(inpt) as v: block:
def MyTransformer(pt.Transformer):
def transform(self, inp: pd.DataFrame):
# e.g., expects a query frame with query_vec
with pt.validate.any(inp) as v:
v.query_frame(extra_columns=['query'], mode='retrieve')
v.result_frame(extra_columns=['query', 'text'], mode='rerank')
# raises an error if ALL specifications do not match
# v.mode is set to the FIRST specification that matches
if v.mode == 'retrieve':
...
if v.mode == 'rerank':
...
API Documentation¶
- pyterrier.validate.columns(inp, *, includes=None, excludes=None, warn=False, context=None)[source]¶
Check that the input frame has the expected columns.
- Return type:
None- Parameters:
inp (DataFrame | List[str]) – Input DataFrame or columns to validate
includes (List[str] | None) – List of required columns
excludes (List[str] | None) – List of forbidden columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
Changed in version 0.15.0: Accept
List[str]inp columns
- pyterrier.validate.query_frame(inp, extra_columns=None, warn=False, context=None)[source]¶
Check that the input frame is a valid query frame.
- Return type:
None- Parameters:
inp (DataFrame | List[str]) – Input DataFrame or columns to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
Changed in version 0.15.0: Accept
List[str]inp columns
- pyterrier.validate.result_frame(inp, extra_columns=None, warn=False, context=None)[source]¶
Check that the input frame is a valid result frame.
- Return type:
None- Parameters:
inp (DataFrame | List[str]) – Input DataFrame or columns to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
Changed in version 0.15.0: Accept
List[str]inp columns
- pyterrier.validate.document_frame(inp, extra_columns=None, warn=False, context=None)[source]¶
Check that the input frame is a valid document frame.
- Return type:
None- Parameters:
inp (DataFrame | List[str]) – Input DataFrame or columns to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
Changed in version 0.15.0: Accept
List[str]inp columns
- pyterrier.validate.any(inp, warn=False, context=None)[source]¶
Create a validation context manager for a DataFrame or list of columns to test multiple possible modes.
- Return type:
_ValidationContextManager- Parameters:
inp (DataFrame | List[str]) – Input DataFrame or list of columns to validate
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer for which to validate input, used for more informative error messages. This is optional, but recommended when validating inside a transformer.
- pyterrier.validate.columns_iter(inp, *, includes=None, excludes=None, warn=False, context=None)[source]¶
Check that the input frame has the expected columns.
- Return type:
None- Parameters:
inp (PeekableIter) – Input DataFrame to validate
includes (List[str] | None) – List of required columns
excludes (List[str] | None) – List of forbidden columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
- pyterrier.validate.query_iter(inp, extra_columns=None, warn=False, context=None)[source]¶
Check that the input iterator is a valid query iterator.
- Return type:
None- Parameters:
inp (PeekableIter) – Input iterator to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
- pyterrier.validate.document_iter(inp, extra_columns=None, warn=False, context=None)[source]¶
Check that the input iterator is a valid document iterator.
- Return type:
None- Parameters:
inp (PeekableIter) – Input iterator to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
- pyterrier.validate.result_iter(inp, extra_columns=None, warn=False, context=None)[source]¶
Check that the input iterator is a valid result iterator.
- Return type:
None- Parameters:
inp (PeekableIter) – Input iterator to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer context for error messages
- Raises:
InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails
- pyterrier.validate.any_iter(inp, warn=False, context=None)[source]¶
Create a validation context manager for an iterator to test multiple possible modes.
- Return type:
_IterValidationContextManager- Parameters:
inp (PeekableIter) – Input iterator to validate
warn (bool) – If True, raise warnings instead of exceptions for validation errors
context (Transformer | None) – The transformer for which to validate input, used for more informative error messages. This is optional, but recommended when validating inside a transformer.