Input Validation

DataFrame Validation

When writing a transformer, it’s a good idea to check its inputs to make sure they are compatible before you start using it. pt.validate provides functions for this.

DataFrame input validation in a Transformer
def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
        # e.g., expects a query frame with query_vec
        pt.validate.query_frame(inp, extra_columns=['query_vec'])
        # raises an error if the specification doesn't match

Validation also underlies inspection: pyterrier.inspect.transformer_inputs() will call a transformer with an empty DataFrame to see what it expects.

Function

Must have column(s)

Must NOT have column(s)

pt.validate.query_frame(inp, extra_columns=...)

qid + extra_columns

docno

pt.validate.document_frame(inp, extra_columns=...)

docno + extra_columns

qid

pt.validate.result_frame(inp, extra_columns=...)

qid + docno + extra_columns

pt.validate.columns(inp, includes=..., excludes=...)

includes

excludes

Note

Besides providing helpful error messages to users, these methods also help perform inspection of pipelines, e.g., for drawing pipeline schematic representations of pipelines and ensuring that transformers are compatible before running them.

Iterable validation

For indexing pipelines that accept iterators, it checks the fields of the first element. You need to first wrap inp in pt.utils.peekable() for this to work.

Iterable input validation in a Transformer
my_iterator = [{'docno': 'doc1'}, {'docno': 'doc2'}, {'docno': 'doc3'}]
my_iterator = pt.utils.peekable(my_iterator)
pt.validate.columns_iter(my_iterator, includes=['docno']) # passes
pt.validate.columns_iter(my_iterator, includes=['docno', 'toks']) # raises errors

Advanced Usage

Sometimes a transformer has multiple acceptable input specifications, e.g., if it can act as either a retriever (with a query input) or re-ranker (with a result input). In this case, you can specify multiple possible configurations in a with pt.validate.any(inpt) as v: block:

Validation with multiple acceptable input specifications
def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        # e.g., expects a query frame with query_vec
        with pt.validate.any(inp) as v:
            v.query_frame(extra_columns=['query'], mode='retrieve')
            v.result_frame(extra_columns=['query', 'text'], mode='rerank')
        # raises an error if ALL specifications do not match
        # v.mode is set to the FIRST specification that matches
        if v.mode == 'retrieve':
            ...
        if v.mode == 'rerank':
            ...

API Documentation

pyterrier.validate.columns(inp, *, includes=None, excludes=None, warn=False, context=None)[source]

Check that the input frame has the expected columns.

Return type:

None

Parameters:
  • inp (DataFrame | List[str]) – Input DataFrame or columns to validate

  • includes (List[str] | None) – List of required columns

  • excludes (List[str] | None) – List of forbidden columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier.validate.query_frame(inp, extra_columns=None, warn=False, context=None)[source]

Check that the input frame is a valid query frame.

Return type:

None

Parameters:
  • inp (DataFrame | List[str]) – Input DataFrame or columns to validate

  • extra_columns (List[str] | None) – Additional required columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier.validate.result_frame(inp, extra_columns=None, warn=False, context=None)[source]

Check that the input frame is a valid result frame.

Return type:

None

Parameters:
  • inp (DataFrame | List[str]) – Input DataFrame or columns to validate

  • extra_columns (List[str] | None) – Additional required columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier.validate.document_frame(inp, extra_columns=None, warn=False, context=None)[source]

Check that the input frame is a valid document frame.

Return type:

None

Parameters:
  • inp (DataFrame | List[str]) – Input DataFrame or columns to validate

  • extra_columns (List[str] | None) – Additional required columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier.validate.any(inp, warn=False, context=None)[source]

Create a validation context manager for a DataFrame or list of columns to test multiple possible modes.

Return type:

_ValidationContextManager

Parameters:
  • inp (DataFrame | List[str]) – Input DataFrame or list of columns to validate

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer for which to validate input, used for more informative error messages. This is optional, but recommended when validating inside a transformer.

pyterrier.validate.columns_iter(inp, *, includes=None, excludes=None, warn=False, context=None)[source]

Check that the input frame has the expected columns.

Return type:

None

Parameters:
  • inp (PeekableIter) – Input DataFrame to validate

  • includes (List[str] | None) – List of required columns

  • excludes (List[str] | None) – List of forbidden columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

pyterrier.validate.query_iter(inp, extra_columns=None, warn=False, context=None)[source]

Check that the input iterator is a valid query iterator.

Return type:

None

Parameters:
  • inp (PeekableIter) – Input iterator to validate

  • extra_columns (List[str] | None) – Additional required columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

pyterrier.validate.document_iter(inp, extra_columns=None, warn=False, context=None)[source]

Check that the input iterator is a valid document iterator.

Return type:

None

Parameters:
  • inp (PeekableIter) – Input iterator to validate

  • extra_columns (List[str] | None) – Additional required columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

pyterrier.validate.result_iter(inp, extra_columns=None, warn=False, context=None)[source]

Check that the input iterator is a valid result iterator.

Return type:

None

Parameters:
  • inp (PeekableIter) – Input iterator to validate

  • extra_columns (List[str] | None) – Additional required columns

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer context for error messages

Raises:
  • InputValidationError – If warn=False and validation fails

  • InputValidationWarning – If warn=True and validation fails

pyterrier.validate.any_iter(inp, warn=False, context=None)[source]

Create a validation context manager for an iterator to test multiple possible modes.

Return type:

_IterValidationContextManager

Parameters:
  • inp (PeekableIter) – Input iterator to validate

  • warn (bool) – If True, raise warnings instead of exceptions for validation errors

  • context (Transformer | None) – The transformer for which to validate input, used for more informative error messages. This is optional, but recommended when validating inside a transformer.