Input Validation¶

DataFrame Validation¶

It’s a good idea to check the input to a transformer to make sure its compatible before you start using it. pta.validate provides functions for this.

DataFrame input validation in a Transformer¶

def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        # e.g., expects a query frame with query_vec
        pta.validate.query_frame(inp, extra_columns=['query_vec'])
        # raises an error if the specification doesn't match

Function	Must have column(s)	Must NOT have column(s)
`pta.validate.query_frame(inp, extra_columns=...)`	qid + `extra_columns`	docno
`pta.validate.document_frame(inp, extra_columns=...)`	docno + `extra_columns`	qid
`pta.validate.result_frame(inp, extra_columns=...)`	qid + docno + `extra_columns`
`pta.validate.columns(inp, includes=..., excludes=...)`	`includes`	`excludes`

Iterable validation¶

For indexing pipelines that accept iterators, it checks the fields of the first element. You need to first wrap inp in pta.utils.peekable() for this to work.

Iterable input validation in a Transformer¶

import pyterrier_alpha as pta
my_iterator = [{'docno': 'doc1'}, {'docno': 'doc2'}, {'docno': 'doc3'}]
my_iterator = pta.utils.peekable(my_iterator)
pta.validate.columns_iter(my_iterator, includes=['docno']) # passes
pta.validate.columns_iter(my_iterator, includes=['docno', 'toks']) # raises errors

Advanced Usage¶

Sometimes a transformer has multiple acceptable input specifications, e.g., if it can act as either a retriever (with a query input) or re-ranker (with a result input). In this case, you can specify multiple possible configurations in a with pta.validate.any(inpt) as v: block:

Validation with multiple acceptable input specifications¶

def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        # e.g., expects a query frame with query_vec
        with pta.validate.any(inp) as v:
            v.query_frame(extra_columns=['query'], mode='retrieve')
            v.result_frame(extra_columns=['query', 'text'], mode='rerank')
        # raises an error if ALL specifications do not match
        # v.mode is set to the FIRST specification that matches
        if v.mode == 'retrieve':
            ...
        if v.mode == 'rerank':
            ...

API Documentation¶

pyterrier_alpha.validate.columns(inp, *, includes=None, excludes=None, warn=False)[source]¶

Check that the input frame has the expected columns.

Return type:

None

Parameters:

inp (DataFrame) – Input DataFrame to validate
includes (List[str] | None) – List of required columns
excludes (List[str] | None) – List of forbidden columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors

Raises:

InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier_alpha.validate.query_frame(inp, extra_columns=None, warn=False)[source]¶

Check that the input frame is a valid query frame.

Return type:

None

Parameters:

inp (DataFrame) – Input DataFrame to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors

Raises:

InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier_alpha.validate.result_frame(inp, extra_columns=None, warn=False)[source]¶

Check that the input frame is a valid result frame.

Return type:

None

Parameters:

inp (DataFrame) – Input DataFrame to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors

Raises:

InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier_alpha.validate.document_frame(inp, extra_columns=None, warn=False)[source]¶

Check that the input frame is a valid document frame.

Return type:

None

Parameters:

inp (DataFrame) – Input DataFrame to validate
extra_columns (List[str] | None) – Additional required columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors

Raises:

InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails

Changed in version 0.15.0: Accept List[str] inp columns

pyterrier_alpha.validate.columns_iter(inp, *, includes=None, excludes=None, warn=False)[source]¶

Check that the input frame has the expected columns.

Return type:

None

Parameters:

inp (PeekableIter) – Input DataFrame to validate
includes (List[str] | None) – List of required columns
excludes (List[str] | None) – List of forbidden columns
warn (bool) – If True, raise warnings instead of exceptions for validation errors

Raises:

InputValidationError – If warn=False and validation fails
InputValidationWarning – If warn=True and validation fails

pyterrier_alpha.validate.any(inp, warn=False)[source]¶

Create a validation context manager for a DataFrame.

Return type:

_ValidationContextManager

Parameters:

inp (DataFrame | List[str])
warn (bool)

pyterrier_alpha.validate.any_iter(inp, warn=False)[source]¶

Create a validation context manager for an iterator.

Return type:

_IterValidationContextManager

Parameters:

inp (PeekableIter)
warn (bool)