.. _pyterrier.schematic:

pt.schematic - Visualizing Pipelines
-----------------------------------------------------

Schematics let you visualize :class:`~pyterrier.Transformer` objects. They are especially useful for
understanding the structure of complex pipelines and checking the whether the input/output specifications of
individual transformers are compatible with one another.

For example, here is a schematic of a complex pipeline that uses multiple retrieval methods and query rewrites:

.. schematic::
    import pyterrier_alpha as pta
    index = pt.terrier.TerrierIndex.example()
    dataset = pt.get_dataset('irds:vaswani')
    pta.fusion.RRFusion(
        index.bm25(),
        pt.rewrite.SDM() >> index.bm25(),
        index.bm25() >> pt.rewrite.RM3(index.index_ref()) >> index.bm25(),
    ) >> dataset.text_loader()

In notebooks (Jupyter, Colab, etc.) schematics are rendered automatically when the output of a cell is a
:class:`~pyterrier.Transformer`. You can also pass a transformer to :func:`pyterrier.schematic.draw`
to get a self-contained HTML version of the schematic for rendering elsewhere.

.. note::
    If you just want to use schematics to view the structure of a transformer or pipeline, this is all you need to know!
    The rest of this page provides more technical detail on how schematics are constructed and rendered.

Schematics are generated by first converting a transformer into an intermediate simple object format (``SchematicDict``).
Transformers can always have their corresponding ``SchematicDict`` representation generated automatically (using :ref:`pt.inspect <pyterrier.inspect>`).
They can also override and extend the default behavior to customize the appearance of the schematic by implementing the
:class:`~pyterrier.schematic.HasSchematic` protocol. ``SchematicDict`` representations are then rendered into HTML by
:func:`~pyterrier.schematic.draw`.

``SchematicDict``
============================================================

See below for the structure of the ``SchematicDict`` representation.

.. code-block:: text
    :caption: SchematicDict structure

    SCHEMATIC = PIPELINES | PIPELINE | TRANSFORMER

    PIPELINES = [PIPELINE | TRANSFORMER]

    PIPELINE = {
        "type": "pipeline",
        "label": str | None,           # Short label for presentation on schematic
        "input_columns": [str],        # Overall input columns of entire pipeline
        "output_columns": [str],       # Overall output columns of entire pipeline
        "transformers": [TRANSFORMER], # List of transformers in this pipeline
    }

    TRANSFORMER = {
        "type": "transformer" | "indexer",
        "label": str,                        # Short label for presentation on schematic (default from .__class__.__name__)
        "name": str,                         # Full name of the transformer class for the title of the tooltip (default from .__class__.__name__)
        "input_columns": [str],              # (default from pt.inspect.transformer_inputs)
        "output_columns": [str],             # (default from pt.inspect.transformer_outputs)
        "input_validation_error": IVL | None # Input validation error, if any (type: pt.validate.InputValidationError)
        "help_url": str | None,              # URL of documentation page (default from pt.documentation.url_for_class)
        "settings": Dict[str, Any],          # Transformer configruation to show in body of tooltip (default from pt.inspect.transformer_attributes)
        "inner_pipelines": PIPELINES | None, # Pipelines to show within this block (default from pt.inspect.subtransformers)
        "inner_pipelines_mode": "unlinked" | "linked" | "combine" | None, # How to display the inner pipelines
        "inner_pipelines_labels": [str],     # When inner_pipelines_mode="unlinked", the names to show beside each inner pipeline
    }


Transformers
============================================================

A ``type="transformer"`` value in a ``SchematicDict`` represents a typical :class:`~pyterrier.Transformer` object. A transformer block
shows a short label (``label``) and its input columns (``"input_columns"``) and output columns (``"output_columns"``) on the schematic.
A a tooltip shows the class name of the transformer (``"name"``) and its attributes (``"settings"``). Many of these values are obtained using
:ref:`pt.inspect <pyterrier.inspect>` by default. The values can be overritten by implementing the :class:`~pyterrier.schematic.HasSchematic`
protocol.

Here is an example BM25 retrieval transformer schematic:

.. schematic::
    pt.terrier.TerrierIndex.example().bm25()

Its underlying ``SchematicDict`` representation looks like this:

.. code-block:: python
    :caption: BM25 ``SchematicDict`` representation

    {
        'type': 'transformer',
        'label': 'BM25',
        'name': 'pt.terrier.retriever.Retriever',
        'help_url': 'https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html#pyterrier.terrier.Retriever',
        'input_columns': ['qid', 'query'],
        'output_columns': ['qid', 'docid', 'docno', 'rank', 'score', 'query'],
        'settings': {
            'applypipeline': 'on',
            'bm25.b': 0.75,
            'bm25.k_1': 1.2,
            ...
        }
    }


Indexers
============================================================

A ``type="indexer"`` value in a ``SchematicDict`` represents a :class:`~pyterrier.Indexer` object. An indexer block
shows a short label (``label``) and its input columns (``"input_columns"``) on the schematic.
A a tooltip shows the class name of the indexer (``"name"``) and its attributes (``"settings"``). Many of these values are obtained using
:ref:`pt.inspect <pyterrier.inspect>` by default. The values can be overritten by implementing the :class:`~pyterrier.schematic.HasSchematic`
protocol. Indexers should not have ``output_columns`` specified and should only appear on their own or as the final transformer of a pipeline.

When an indexer also implements ``transform()`` or ``transformer_iter()``, it is treated as a transformer instead of an indexer, by default.

Here is an example indexer schematic:

.. schematic::
    pt.terrier.TerrierIndex('./test').indexer()

Its underlying ``SchematicDict`` representation looks like this:

.. code-block:: python
    :caption: Indexer ``SchematicDict`` representation

    {
        'type': 'indexer'
        'label': 'TerrierIndexer',
        'name': 'pt.terrier.index.IterDictIndexer',
        'help_url': None,
        'input_columns': ['docno', 'text'],
        'output_columns': None,
        'settings': {}
    }


Inner Pipelines
============================================================

Some transformers can contain other transformers (i.e., subtransformers). There are a few ways to display these
inner pipelines in schematics, depending on the how it the inner pipeline is used. These are configured with the
``inner_pipelines_mode`` setting.

**unlinked** (default). This mode shows each inner pipeline as a separate block without linking
them together. This is useful when the transformer has logic that controls how it applies its subtransformers. Each
inner pipeline is labeled with the name of the subtransformer. An example is :class:`~pyterrier_caching.RetrieverCache`,
which conditionally applies its ``retriever`` based on whether the query is in the cache or not:

.. schematic::
    import pyterrier_caching
    retr = pt.terrier.TerrierIndex.example().bm25()
    pyterrier_caching.RetrieverCache("/tmp/cache", retriever=retr)

This format is in all cases where a transformer has subtransformers (which is why it is the default). However, it may not
be the most visually descriptive for all cases, which is why ``"linked"`` and ``"combine"`` modes are also available.

**linked**. This mode shows the inputs and outputs of the inner pipelines linked together, with
the values contained in the transformer block itself. This signifies that all the pipelines are always run with the same
inputs (potentially modified by the transformer first) and that the outputs of the inner pipelines are merged together.
An example of this kind of pipeline is :class:`~pyterrier._ops.FeatureUnion`:

.. schematic::
    index = pt.terrier.TerrierIndex.example()
    index.bm25() ** index.dph()

**combine**. This is a special case of ``linked`` mode where the transformer runs all of its inner
pipelines with the original input and then combines the outputs into a single output. An example is :class:`~pyterrier_alpha.fusion.RRFusion`,
which runs multiple retrieval methods and combines their outputs into a single result set:

.. schematic::
    import pyterrier_alpha as pta
    index = pt.terrier.TerrierIndex.example()
    dataset = pt.get_dataset('irds:vaswani')
    pta.fusion.RRFusion(
        index.bm25(),
        pt.rewrite.SDM() >> index.bm25(),
        index.bm25() >> pt.rewrite.RM3(index.index_ref()) >> index.bm25(),
    )


Rendering in Notebooks
=================================================================

The :class:`pyterrier.Transformer` base class implements the ``_repr_html_`` method, which
enables automatic rendering of schematics in Jupyter notebooks, Google Colab, and other notebook environments.
This means that if the output of a cell is a transformer (including pipelines of transformers), its schematic
will be rendered automatically as the output of the cell.

If you want to disable this behavior, you can set the ``PYTERRIER_DISABLE_NOTEBOOK_SCHEMATIC=1`` environment
variable. (This works even if PyTerrier is already imported.)


Rendering in Documentation
=================================================================

You can render schematics directly in PyTerrier documentation using the custom ``.. schematic::`` directive. The body
of the directive should be either a Python code block that creates a transformer to render or a ``SchematicDict`` object
to render. The former is useful for documenting individual transformers, while the latter is useful for demonstrative/abstract
purposes, or cases where running the code to construct the transformer is too costly for documentation (e.g., if it involves
loading a large neural network).

PyTerrier is imported by default, so you can use the ``pt`` shorthand.

.. code-block:: text
    :caption: Rendering a BM25 transformer schematic in RST-formatted documentation.

    .. schematic::
        pt.terrier.TerrierIndex.example().bm25()


.. schematic::
    pt.terrier.TerrierIndex.example().bm25()


.. code-block:: text
    :caption: Rendering a simple SchematicDict in RST-formatted documentation.

    .. schematic::
        {
            "type": "transformer",
            "label": "Retriever",
            "input_columns": ["qid", "query"],
            "output_columns": ["qid", "query", "docno", "score", "rank"]
        }

.. schematic::
    {
        "type": "transformer",
        "label": "Retriever",
        "input_columns": ["qid", "query"],
        "output_columns": ["qid", "query", "docno", "score", "rank"]
    }


API Documentation
============================================================

.. autofunction:: pyterrier.schematic.draw

.. autoclass:: pyterrier.schematic.HasSchematic()
    :members: