Input Validation¶
DataFrame Validation¶
It’s a good idea to check the input to a transformer to make sure its compatible before you start using it. pta.validate provides functions for this.
def MyTransformer(pt.Transformer):
def transform(self, inp: pd.DataFrame):
# e.g., expects a query frame with query_vec
pta.validate.query_frame(inp, extra_columns=['query_vec'])
# raises an error if the specification doesn't match
Function |
Must have column(s) |
Must NOT have column(s) |
---|---|---|
|
qid + |
docno |
|
docno + |
qid |
|
qid + docno + |
|
|
|
|
Iterable validation¶
For indexing pipelines that accept iterators, it checks the fields of the first element. You need to first wrap inp in pta.utils.peekable() for this to work.
import pyterrier_alpha as pta
my_iterator = [{'docno': 'doc1'}, {'docno': 'doc2'}, {'docno': 'doc3'}]
my_iterator = pta.utils.peekable(my_iterator)
pta.validate.columns_iter(my_iterator, includes=['docno']) # passes
pta.validate.columns_iter(my_iterator, includes=['docno', 'toks']) # raises errors
Advanced Usage¶
Sometimes a transformer has multiple acceptable input specifications, e.g., if it can act as either a retriever (with a query input) or re-ranker (with a result input). In this case, you can specify multiple possible configurations in a with pta.validate.any(inpt) as v: block:
def MyTransformer(pt.Transformer):
def transform(self, inp: pd.DataFrame):
# e.g., expects a query frame with query_vec
with pta.validate.any(inp) as v:
v.query_frame(extra_columns=['query'], mode='retrieve')
v.result_frame(extra_columns=['query', 'text'], mode='rerank')
# raises an error if ALL specifications do not match
# v.mode is set to the FIRST specification that matches
if v.mode == 'retrieve':
...
if v.mode == 'rerank':
...
API Documentation¶
- pyterrier_alpha.validate.columns(inp, *, includes=None, excludes=None)[source]¶
Check that the input frame has the expected columns.
- Return type:
None
- Raises:
InputValidationError – If is missing any of the includes columns, or has any of the excludes columns.
- pyterrier_alpha.validate.query_frame(inp, extra_columns=None)[source]¶
Check that the input frame is a valid query frame.
- Return type:
None
- Raises:
InputValidationError – If the input frame is not a valid query frame.
- pyterrier_alpha.validate.result_frame(inp, extra_columns=None)[source]¶
Check that the input frame is a valid result frame.
- Return type:
None
- Raises:
InputValidationError – If the input frame is not a valid result frame.
- pyterrier_alpha.validate.document_frame(inp, extra_columns=None)[source]¶
Check that the input frame is a valid document frame.
- Return type:
None
- Raises:
InputValidationError – If the input frame is not a valid document frame.
- pyterrier_alpha.validate.columns_iter(inp, *, includes=None, excludes=None)[source]¶
Check that the input frame has the expected columns.
- Return type:
None
- Raises:
InputValidationError – If is missing any of the includes columns, or has any of the excludes columns.