pt.inspect - Inspecting Live Objects

This module provides useful utility methods for getting information about PyTerrier objects.

Note

This is an advanced module that is not typically used by end users.

pyterrier.inspect.artifact_type_format(artifact, *, strict=True)[source]

Returns the type and format of the specified artifact.

These values are sourced by either the ARTIFACT_TYPE and ARTIFACT_FORMAT constants of the artifact, or (if these are not available) by matching on the entry points.

Return type:

Tuple[str, str] | None

Parameters:
  • artifact (Type | Artifact) – The artifact to inspect.

  • strict (bool) – If True, raises an error if the artifact’s type or format could not be determined.

Returns:

A tuple containing the artifact’s type and format, or None if the type and format could not be determined and strict==False.

Raises:

InspectError – If the artifact’s type or format could not be determined and strict==True

pyterrier.inspect.transformer_inputs(transformer, *, strict=True)[source]

Infers supported input column configurations (a List[List[str]]) for a transformer.

The method tries to infer the input columns that the transformer accepts by calling it with an empty DataFrame and inspecting a resulting pt.validate.InputValidationError. If the transformer does not raise an error, it tries to infer the input columns by calling it with a pre-defined set of input columns.

To handle edge cases, you can implement the HasTransformInputs protocol, which allows you to define a custom transform_inputs method that returns a list of input column configurations accepted by the transformer. transform_inputs can also be an attribute instead of a method. In this case, it can be a list of lists of input columns (i.e., a list of valid input column configurations). Note that transform_inputs is allowed to return a List[str]. If this is the case, it is converted to a List[List[str]] automatically.

The list of input specifications is assumed to be prioritized. For instance, schematics will show the first valid specification when multiple are valid for the pipeline.

Return type:

List[List[str]] | None

Parameters:
  • transformer (Transformer) – An instance of the transformer to inspect.

  • strict (bool) – If True, raises an error if the transformer cannot be inferred or are not accepted. If False, returns None in these cases.

Returns:

A list of input column configurations (List[List[str]]) accepted by this transformer.

Raises:

InspectError – If the transformer cannot be inspected and strict==True.

pyterrier.inspect.transformer_outputs(transformer, input_columns, *, strict=True)[source]

Infers the output columns for a transformer based on the provided input columns.

If the transformer implements the HasTransformOutputs protocol, the method calls its transform_outputs method to determine the output columns. If the transformer does not implement the protocol, it attempts to infer the output columns by calling the transformer with an empty DataFrame.

Return type:

List[str] | None

Parameters:
  • transformer (Transformer) – An instance of the transformer to inspect.

  • input_columns (List[str]) – A list of the columns present in the input frame.

  • strict (bool) – If True, raises an error if the transformer cannot be inferred or are not accepted. If False, returns None in these cases.

Returns:

A list of the columns present in the output for transformer given input_columns.

Raises:
  • InspectError – If the transformer’s outputs could not be determined and strict==True.

  • pt.validate.InputValidationError – If input validation fails in the transformer and strict==True.

pyterrier.inspect.transformer_attributes(transformer, *, strict=True)[source]

Infers a list of attributes of the transformer.

Here, an attribute is defined as any attribute of the transformer that is explicitly set by the __init__ method, either under the same name (e.g., self.foo = foo) or as a private attribute (e.g., self._foo = foo).

This definition allows for a set of attributes that should describe the state of a transformer. These attributes can be used to reconstruct the transformer from its attributes, e.g., by calling transformer_apply_attributes().

To handle edge cases (e.g., where the __init__ parameters do not match the attribute names), you can implement the HasAttributes protocol.

Return type:

List[TransformerAttribute]

Parameters:
  • transformer (Transformer) – The transformer to inspect.

  • strict (bool) – If True, raises an error if an attribute cannot be identified from the transformer. If False, the attribute’s value is set to TransformerAttribute.MISSING in these cases.

Returns:

A list of TransformerAttribute objects representing the attributes of the transformer.

Raises:

InspectError – If the attributes cannot be identified from the transformer.

pyterrier.inspect.transformer_apply_attributes(transformer, **kwargs)[source]

Returns a new transformer instance from the provided transformer and updated attributes (as keyword arguments).

This method is useful for constructing new transformer with some attributes replaced. For instance, when implemeting methods like fuse_rank_cutoff(), you frequently need to replace the num_results attribute of a transformer with a new value while keeping the remainder of the attributes the same.

This method uses transformer_attributes() to identify the attributes of the transformer and then applies the provided keyword arguments to the transformer attributes. The method then reconstructs the transformer by calling its __init__ method with the updated attributes.

To handle edge cases (e.g., where the __init__ parameters do not match the attribute names), you can implement the HasApplyAttributes protocol.

Return type:

Transformer

Parameters:
  • transformer (Transformer) – The transformer to apply the attributes to.

  • **kwargs (Any) – Keyword arguments representing the attributes to set on the transformer.

Returns:

A new instance of the transformer with the provided attributes applied.

Raises:

InspectError – If an attribute is not found in the transformer or if attributes cannot be identified from the transformer.

pyterrier.inspect.subtransformers(transformer)[source]

Infers a dictionary of subtransformers for the given transformer.

A subtransformer is a transformer that is used by another transformer to complete its task. Examples include those used by caches (e.g., scorer in pyterrier_caching.ScorerCache) and the list of transformers that are used by a pyterrier_alpha.fusion.RRFusion transformer.

If the transformer implements the HasSubtransformers protocol, the method calls its subtransformers method to retrieve the subtransformers. If the transformer does not implement the protocol, the method inspects the transformer to identify any attributes of a transformer that are instance of pt.Transformer (or list/tuple of Transformer), returning a dictionary where the keys where the keys are the names of the subtransformers and the values are the subtransformers themselves. If the transformer does not have any subtransformers, an empty dictionary is returned.

Return type:

Dict[str, Transformer | List[Transformer]]

Parameters:

transformer (Transformer) – The transformer to inspect.

Returns:

A dictionary of the provided transformer’s subtransformers.

Raises:

InspectError – If the subtransformers cannot be identified from the transformer.

exception pyterrier.inspect.InspectError[source]

Base exception for inspection errors.

class pyterrier.inspect.TransformerAttribute(name, value, init_default_value, init_parameter_kind=None)[source]

A dataclass representing an attribute of a transformer.

Parameters:
  • name (str)

  • value (Any)

  • init_default_value (Any)

  • init_parameter_kind (_ParameterKind | None)

name

The name of the attribute.

value

The value of the attribute.

init_default_value

The default value of the attribute for the __init__ method (if available) or inspect.Parameter.empty if not available.

init_parameter_kind

The kind of the parameter in the __init__ method (if available) or None if not available.

class pyterrier.inspect.HasTransformInputs(*args, **kwargs)[source]

Protocol for transformers that provide a transform_inputs method.

transform_inputs allows for inspection of the inputs accepted by transformers without needing to run it.

When this method is present in a Transformer object, it must return either:

  • A list of lists of input columns (i.e., a list of valid input column configurations)

  • A list of input columns (i.e., a single valid input column configuration)

If the input columns of the transformer do not depend on the instance, transform_inputs can also be an attribute with a value of type List[str] or List[List[str]].

If transform_inputs is None, it is ignored.

This method need not be present in a Transformer class - it is an optional extension; an alternative is that the input columns are determined by calling the transformer with an empty DataFrame.

Example transform_inputs function, implementing HasTransformInputs.
class MyRetriever(pt.Transformer):

    def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
        pt.validate.query_frame(inp, ['query'])
        # ... perform retrieval ...
        # return the same columns as inp plus docno, score, and rank. E.g., using DataFrameBuilder.

    def transform_inputs(self) -> Union[List[str], List[List[str]]]:
        # NOTE: This method isn't required in this case, since inspect will be able to infer required
        # columns from pt.validate. It's just a demonstration.
        return ['qid', 'query']
transform_inputs()[source]

Returns a list of input columns accepted by the transformer.

Return type:

List[List[str]] | List[str]

Returns:

Input column configuration(s) accepted by this transformer.

class pyterrier.inspect.HasTransformOutputs(*args, **kwargs)[source]

Protocol for transformers that provide a transform_outputs method.

transform_outputs allows for inspection of the outputs of transformers without needing to run it.

When this method is present in a Transformer object, it must return a list of the output columns present given the provided input columns or raise an InputValidationError if the inputs are not accepted by the transformer.

This method need not be present in a Transformer class - it is an optional extension; an alternative is that the output columns are determined by calling the transformer with an empty DataFrame.

Due to risks and maintanence burden in ensuring that transform and transform_outputs behave identically, it is recommended to only implement transform_outputs when calling the transformer with an empty DataFrame to inspect the behavior is undesireable, e.g., if calling the transformer is expensive.

Example transform_outputs function, implementing HasTransformOutputs.
class MyRetriever(pt.Transformer):

    def transform(self, inp: pd.DataFrame) -> pd.DataFrame:
        pt.validate.query_frame(inp, ['query'])
        # ... perform retrieval ...
        # return the same columns as inp plus docno, score, and rank. E.g., using DataFrameBuilder.

    def transform_outputs(self, input_columns: List[str]) -> List[str]:
        pt.validate.query_frame(input_columns, ['query'])
        return input_columns + ['docno', 'score', 'rank']
transform_outputs(input_columns)[source]

Returns a list of the output columns present given the input_columns.

The method must return exactly the same output columns as transform would given the provided input columns. If the input columns are not accepted by the transformer, the method should raise an InputValidationError (e.g., through pt.validate).

Return type:

List[str]

Parameters:

input_columns (List[str]) – A list of the columns present in the input frame.

Returns:

A list of the columns present in the output for this transformer given input_columns.

Raises:
  • pt.validate.InputValidationError – If the input columns are not accepted by the transformer.

  • pt.inspect.InspectError – If the transformer is uninspectable.

class pyterrier.inspect.HasAttributes(*args, **kwargs)[source]

Protocol for transformers that provide an attributes method.

attributes allows for identifying the attributes of a transformer without needing to traverse its attributes manually.

When this method is present in a Transformer object, it must return a list of TransformerAttribute objects, where each object represents an attribute of the transformer and corresponding metadata about how the attribute is assigned.

This method need not be present in a Transformer class - it is an optional extension.

attributes()[source]

Returns a list of attributes of the transformer.

Return type:

List[TransformerAttribute]

class pyterrier.inspect.HasApplyAttributes(*args, **kwargs)[source]

Protocol for transformers that provide an apply_attributes method.

apply_attributes returns a new transformer with updated attributes (as keyword arguments).

This method need not be present in a Transformer class - it is an optional extension.

apply_attributes(**kwargs)[source]

Returns a new transformer instance from the provided transformer and updated attributes (as keyword arguments).

Return type:

Transformer

Parameters:

kwargs (Any)

class pyterrier.inspect.HasSubtransformers(*args, **kwargs)[source]

Protocol for transformers that provide a subtransformers method.

subtransformers allows for identifying subtransformers of a transformer without needing to traverse its attributes manually.

When this method is present in a Transformer object, it must return a dict where the keys are the names of the subtransformers and the values are the subtransformers (or list of subtransformers) themselves.

This method need not be present in a Transformer class - it is an optional extension. See pyterrier.inspect.subtransformers() for the default implementation.

subtransformers()[source]

Returns a dictionary of subtransformers for the transformer.

The method must return a dictionary where the keys are the names of the subtransformers and the values are the subtransformers themselves. If the transformer does not have any subtransformers, an empty dictionary should be returned.

Return type:

Dict[str, Transformer | List[Transformer]]