Inspection¶
The pyterrier_alpha.inspect module provides a way to inspect PyTerrier pyterrier.Transformer
and pyterrier_alpha.Artifact objects.
- pyterrier_alpha.inspect.transformer_outputs(transformer, input_columns, *, strict=True)[source]¶
Infers the output columns for a transformer based on the inputs.
The method first checks if the transformer provides a
transform_outputsmethod. If it does, this method is called and the result is returned. If the transformer does not provide this method, the method tries to infer the outputs by calling the transformer with an empty DataFrame.- Return type:
List[str] |None- Parameters:
transformer (Transformer) – An instance of the transformer to inspect.
input_columns (List[str]) – A list of the columns present in the input frame.
strict (bool) – If True, raises an error if the transformer cannot be inferred or are not accepted. If False, returns None in these cases.
- Returns:
A list of the columns present in the output for
transformergiveninput_columns.- Raises:
InspectError – If the artifact’s type or format could not be determined and
strict==True.pta.validate.InputValidationError – If input validation fails in the trnsformer and
strict==True.
Added in version 0.11.0.
Changed in version 0.15.0: Direct passthrough of
pta.validate.InputValidationError
- class pyterrier_alpha.inspect.ProvidesTransformerOutputs(*args, **kwargs)[source]¶
Protocol for transformers that provide a
transform_outputsmethod.transform_outputsallows for inspection of the outputs of transformers without needing to run it.When this method is present in a
Transformerobject, it must return a list of the output columns present given the provided input columns or raise anInputValidationErrorif the inputs are not accepted by the transformer.This method need not be present in Transformer - it is an optional extension; an alternative is that the output columns are determined by calling the transformer with an empty
DataFrame.Due to risks and maintanence burden in ensuring that
transformandtransform_outputsbehave identically, it is recommended to only implementtransform_outputswhen calling the transformer with an empty DataFrame to inspect the behavior is undesireable, e.g., if calling the transformer is expensive.Exampletransform_outputfunction, implementingProvidesTransformerOutputs.¶class MyRetriever(pt.Transformer): def transform(self, inp: pd.DataFrame) -> pd.DataFrame: pta.validate.query_frame(inp, ['query']) # ... perform retrieval ... # return the same columns as inp plus docno, score, and rank. E.g., using DataFrameBuilder. def transform_outputs(self, input_columns: List[str]) -> List[str]: pta.validate.query_frame(input_columns, ['query']) return input_columns + ['docno', 'score', 'rank']