Writing Custom Transformers

Note

This page is a work in progress.

Pipeline Optimization

Pipelines can be optimized using pyterrier.Transformer.compile(). You can implement your own optimizations by overriding this method. For instance, a pseudo-relevance feedback method that only uses the top fb_docs documents per query can re-write itself with a preceding RankCutoff transformer, as follows:

Optimizing a pseudo-relevance feedback transformer by implementing compile().
class MyPrf(pt.Transformer):
    ...
    def compile(self) -> pt.Transformer:
        return pt.RankCutoff(self.fb_docs) >> self

Why is this helpful? RankCutoff knows it can combine (“fuse”) itself with any preceeding transformers that are able to reduce computation by knowing how many documents are required by the subsequent step. For instance, most retrievers can reduce computaional cost by reducing the top k documents retrieved per query.

This functionality is faciliated through the SupportsFuseRankCutoff protocol, which defines the fuse_rank_cutoff() method. You can choose to implement this method if your transformer can benefit from being combined with a RankCutoff transformer.

Implementing fuse_rank_cutoff to allow combining with RankCutoff.
class MyRetriever(pt.Transformer):
    ...
    def fuse_rank_cutoff(self, k: int) -> Optional[pt.Transformer]:
        if self.num_results > k:
            return pt.inspect.transformer_apply_attributes(self, num_results=k)

Hint

transformer_apply_attributes() lets you easily construct a new transformer with some attributes replaced (here, num_results). This can be expecially handy when your transformer has a lot of attributes.

Caution

The result of fusion methods should be functionally equivalent to the original transformer. If the if self.num_results > k: condition above was not applied, it would behave differently when num_results<k.

Several transformers implement compile to allow themselves to be combined (“fused”) with other transformers. When writing your own transformer, consider implementing the following protocols to allow for fusing with other transformers:

If your transformer benefits from…

Consider implementing…

Returning fewer results per query

fuse_rank_cutoff

Combining with a known transformer before it in a pipeline

fuse_left

Combining with a known transformer after it in a pipeline

fuse_right

Computing multiple scores/features at once (instead of individually)

fuse_feature_union

Other arbitrary optimizations

compile

Supporting Inspection

pt.inspect allows users to gather information about live transformer objects, for instance input/output specifications. This can be useful for things like pipeline validation or or drawing schematic diagrams of pipelines. Default implementations for these methods usually work well, but sometimes you may need to override them to handle idiosyncratic cases.

You can override the behavior of the following methods by implementing python Protocols (in these cases, it’s just adding a method with a specific signature that implements the same functionality).