pt.schematic - Visualizing Pipelines

Schematics let you visualize Transformer objects. They are especially useful for understanding the structure of complex pipelines and checking the whether the input/output specifications of individual transformers are compatible with one another.

For example, here is a schematic of a complex pipeline that uses multiple retrieval methods and query rewrites:

Rendering issue. Try running the cell again.

In notebooks (Jupyter, Colab, etc.) schematics are rendered automatically when the output of a cell is a Transformer. You can also pass a transformer to pyterrier.schematic.draw() to get a self-contained HTML version of the schematic for rendering elsewhere.

Note

If you just want to use schematics to view the structure of a transformer or pipeline, this is all you need to know! The rest of this page provides more technical detail on how schematics are constructed and rendered.

Schematics are generated by first converting a transformer into an intermediate simple object format (SchematicDict). Transformers can always have their corresponding SchematicDict representation generated automatically (using pt.inspect). They can also override and extend the default behavior to customize the appearance of the schematic by implementing the HasSchematic protocol. SchematicDict representations are then rendered into HTML by draw().

SchematicDict

See below for the structure of the SchematicDict representation.

SchematicDict structure
SCHEMATIC = PIPELINES | PIPELINE | TRANSFORMER

PIPELINES = [PIPELINE | TRANSFORMER]

PIPELINE = {
    "type": "pipeline",
    "label": str | None,           # Short label for presentation on schematic
    "input_columns": [str],        # Overall input columns of entire pipeline
    "output_columns": [str],       # Overall output columns of entire pipeline
    "transformers": [TRANSFORMER], # List of transformers in this pipeline
}

TRANSFORMER = {
    "type": "transformer" | "indexer",
    "label": str,                        # Short label for presentation on schematic (default from .__class__.__name__)
    "name": str,                         # Full name of the transformer class for the title of the tooltip (default from .__class__.__name__)
    "input_columns": [str],              # (default from pt.inspect.transformer_inputs)
    "output_columns": [str],             # (default from pt.inspect.transformer_outputs)
    "input_validation_error": IVL | None # Input validation error, if any (type: pt.validate.InputValidationError)
    "help_url": str | None,              # URL of documentation page (default from pt.documentation.url_for_class)
    "settings": Dict[str, Any],          # Transformer configruation to show in body of tooltip (default from pt.inspect.transformer_attributes)
    "inner_pipelines": PIPELINES | None, # Pipelines to show within this block (default from pt.inspect.subtransformers)
    "inner_pipelines_mode": "unlinked" | "linked" | "combine" | None, # How to display the inner pipelines
    "inner_pipelines_labels": [str],     # When inner_pipelines_mode="unlinked", the names to show beside each inner pipeline
}

Transformers

A type="transformer" value in a SchematicDict represents a typical Transformer object. A transformer block shows a short label (label) and its input columns ("input_columns") and output columns ("output_columns") on the schematic. A a tooltip shows the class name of the transformer ("name") and its attributes ("settings"). Many of these values are obtained using pt.inspect by default. The values can be overritten by implementing the HasSchematic protocol.

Here is an example BM25 retrieval transformer schematic:

Rendering issue. Try running the cell again.

Its underlying SchematicDict representation looks like this:

BM25 SchematicDict representation
{
    'type': 'transformer',
    'label': 'BM25',
    'name': 'pt.terrier.retriever.Retriever',
    'help_url': 'https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html#pyterrier.terrier.Retriever',
    'input_columns': ['qid', 'query'],
    'output_columns': ['qid', 'docid', 'docno', 'rank', 'score', 'query'],
    'settings': {
        'applypipeline': 'on',
        'bm25.b': 0.75,
        'bm25.k_1': 1.2,
        ...
    }
}

Indexers

A type="indexer" value in a SchematicDict represents a Indexer object. An indexer block shows a short label (label) and its input columns ("input_columns") on the schematic. A a tooltip shows the class name of the indexer ("name") and its attributes ("settings"). Many of these values are obtained using pt.inspect by default. The values can be overritten by implementing the HasSchematic protocol. Indexers should not have output_columns specified and should only appear on their own or as the final transformer of a pipeline.

When an indexer also implements transform() or transformer_iter(), it is treated as a transformer instead of an indexer, by default.

Here is an example indexer schematic:

Rendering issue. Try running the cell again.

Its underlying SchematicDict representation looks like this:

Indexer SchematicDict representation
{
    'type': 'indexer'
    'label': 'TerrierIndexer',
    'name': 'pt.terrier.index.IterDictIndexer',
    'help_url': None,
    'input_columns': ['docno', 'text'],
    'output_columns': None,
    'settings': {}
}

Inner Pipelines

Some transformers can contain other transformers (i.e., subtransformers). There are a few ways to display these inner pipelines in schematics, depending on the how it the inner pipeline is used. These are configured with the inner_pipelines_mode setting.

unlinked (default). This mode shows each inner pipeline as a separate block without linking them together. This is useful when the transformer has logic that controls how it applies its subtransformers. Each inner pipeline is labeled with the name of the subtransformer. An example is RetrieverCache, which conditionally applies its retriever based on whether the query is in the cache or not:

Rendering issue. Try running the cell again.

This format is in all cases where a transformer has subtransformers (which is why it is the default). However, it may not be the most visually descriptive for all cases, which is why "linked" and "combine" modes are also available.

linked. This mode shows the inputs and outputs of the inner pipelines linked together, with the values contained in the transformer block itself. This signifies that all the pipelines are always run with the same inputs (potentially modified by the transformer first) and that the outputs of the inner pipelines are merged together. An example of this kind of pipeline is FeatureUnion:

Rendering issue. Try running the cell again.

combine. This is a special case of linked mode where the transformer runs all of its inner pipelines with the original input and then combines the outputs into a single output. An example is RRFusion, which runs multiple retrieval methods and combines their outputs into a single result set:

Rendering issue. Try running the cell again.

Rendering in Notebooks

The pyterrier.Transformer base class implements the _repr_html_ method, which enables automatic rendering of schematics in Jupyter notebooks, Google Colab, and other notebook environments. This means that if the output of a cell is a transformer (including pipelines of transformers), its schematic will be rendered automatically as the output of the cell.

If you want to disable this behavior, you can set the PYTERRIER_DISABLE_NOTEBOOK_SCHEMATIC=1 environment variable. (This works even if PyTerrier is already imported.)

Rendering in Documentation

You can render schematics directly in PyTerrier documentation using the custom .. schematic:: directive. The body of the directive should be either a Python code block that creates a transformer to render or a SchematicDict object to render. The former is useful for documenting individual transformers, while the latter is useful for demonstrative/abstract purposes, or cases where running the code to construct the transformer is too costly for documentation (e.g., if it involves loading a large neural network).

PyTerrier is imported by default, so you can use the pt shorthand.

Rendering a BM25 transformer schematic in RST-formatted documentation.
.. schematic::
    pt.terrier.TerrierIndex.example().bm25()
Rendering issue. Try running the cell again.
Rendering a simple SchematicDict in RST-formatted documentation.
.. schematic::
    {
        "type": "transformer",
        "label": "Retriever",
        "input_columns": ["qid", "query"],
        "output_columns": ["qid", "query", "docno", "score", "rank"]
    }
Rendering issue. Try running the cell again.

API Documentation

pyterrier.schematic.draw(transformer, *, outer_class=None, input_columns=None)[source]

Draws a transformer as an HTML schematic.

If the transformer is already a SchematicDict, it will be drawn directly. Otherwise, it will first convert the transformer to a structured schematic using transformer_schematic(), and draw that.

Return type:

str

Parameters:
  • transformer (Transformer | dict) – The transformer to draw, or a dict in SchematicDict format.

  • input_columns (List[str] | None) – If you want to specify the input columns for the transformer (pipeline).

  • outer_class (str | None) – An optional CSS class to apply to the outer container of the schematic.

Returns:

An HTML string representing the schematic of the transformer.

class pyterrier.schematic.HasSchematic[source]

Protocol for transformers override details about their schematic representation.

This is an optional extension interface to pyterrier.Transformer that allows transformers to provide customizations to their schematics.

schematic(*, input_columns)[source]

Returns a structured schematic representation of the transformer.

The schematic should be a dictionary that follows the structure defined in pt.schematic.

For ease of use, the method can optionally return only some of the fields of the schematic; any missing fields will be filled in with default values.

It can also be implemented as an instance or class member when the values do not need to be computed on-the-fly (e.g., overriding the schematic label). When schematic is not callable, it uses its dict value directly as the schematic.

Return type:

Dict[str, Any]

Parameters:

input_columns (List[str] | None) – The input columns of the transformer, used to determine schematic fields such as the output columns.

Returns:

A dictionary representing the schematic of the transformer, which will be used to draw the schematic diagram.