Inspection

The pyterrier_alpha.inspect module provides a way to inspect PyTerrier pyterrier.Transformer and pyterrier_alpha.Artifact objects.

pyterrier_alpha.inspect.transformer_outputs(transformer, input_columns, *, strict=True)[source]

Infers the output columns for a transformer based on the inputs.

The method first checks if the transformer provides a transform_outputs method. If it does, this method is called and the result is returned. If the transformer does not provide this method, the method tries to infer the outputs by calling the transformer with an empty DataFrame.

Return type:

List[str] | None

Parameters:
  • transformer (Transformer) – An instance of the transformer to inspect.

  • input_columns (List[str]) – A list of the columns present in the input frame.

  • strict (bool) – If True, raises an error if the transformer cannot be inferred or are not accepted. If False, returns None in these cases.

Returns:

A list of the columns present in the output for transformer given input_columns.

Raises:
  • InspectError – If the artifact’s type or format could not be determined and strict==True.

  • pta.validate.InputValidationError – If input validation fails in the trnsformer and strict==True.

Added in version 0.11.0.

Changed in version 0.15.0: Direct passthrough of pta.validate.InputValidationError

class pyterrier_alpha.inspect.ProvidesTransformerOutputs(*args, **kwargs)[source]

Protocol for transformers that provide a transform_outputs method.

transform_outputs allows for inspection of the outputs of transformers without needing to run it.

When this method is present in a Transformer object, it must return a list of the output columns present given the provided input columns or raise an InputValidationError if the inputs are not accepted by the transformer.

This method need not be present in Transformer - it is an optional extension; an alternative is that the output columns are determined by calling the transformer with an empty DataFrame.

Due to risks and maintanence burden in ensuring that transform and transform_outputs behave identically, it is recommended to only implement transform_outputs when calling the transformer with an empty DataFrame to inspect the behavior is undesireable, e.g., if calling the transformer is expensive.

Example transform_output function, implementing ProvidesTransformerOutputs.
class MyRetriever(pt.Transformer):

    def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
        pta.validate.query_frame(inp, ['query'])
        # ... perform retrieval ...
        # return the same columns as inp plus docno, score, and rank. E.g., using DataFrameBuilder.

    def transform_outputs(self, input_columns: List[str]) -> List[str]:
        pta.validate.query_frame(input_columns, ['query'])
        return input_columns + ['docno', 'score', 'rank']
exception pyterrier_alpha.inspect.InspectError[source]

Base exception for inspection errors.