pt.schematic - Visualizing Pipelines¶

Schematics let you visualize Transformer objects. They are especially useful for understanding the structure of complex pipelines and checking the whether the input/output specifications of individual transformers are compatible with one another.

For example, here is a schematic of a complex pipeline that uses multiple retrieval methods and query rewrites:

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672247ac0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda75592 at 0x7ba67347e210>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.rewrite.SDM

SDM

qid	str	(Query ID) ID of query in frame
query	str	Query text
query_0	str	Stashed query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba6722462b0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755ba at 0x7ba6734bcd90>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
query_0	str	Stashed query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672246210 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755f2 at 0x7ba6734bcd10>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.rewrite.RM3

RM3

qid	str	(Query ID) ID of query in frame
query_0	str	Stashed query text
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba6722469e0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fdaa74ca at 0x7ba6734be050>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query_0	str	Stashed query text
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pyterrier_alpha.fusion.RRFusion

k	60
num_results	1000

RRF

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid_x
docno	str	(External Document ID) String ID of document in collection
rank_x
query_0_x
docid_y
rank_y
query_0_y
docid	int	(Internal Document ID) Integer ID of document in a specific index
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)

                
                    pt.datasets._irds.IRDSTextLoader

dataset	IRDSDataset('vaswani')
fields	['text']
verbose	False

TextLoader

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid_x
docno	str	(External Document ID) String ID of document in collection
rank_x
query_0_x
docid_y
rank_y
query_0_y
docid	int	(Internal Document ID) Integer ID of document in a specific index
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)
text	str	Document text

Output

Rendering issue. Try running the cell again.

In notebooks (Jupyter, Colab, etc.) schematics are rendered automatically when the output of a cell is a Transformer. You can also pass a transformer to pyterrier.schematic.draw() to get a self-contained HTML version of the schematic for rendering elsewhere.

Note

If you just want to use schematics to view the structure of a transformer or pipeline, this is all you need to know! The rest of this page provides more technical detail on how schematics are constructed and rendered.

Schematics are generated by first converting a transformer into an intermediate simple object format (SchematicDict). Transformers can always have their corresponding SchematicDict representation generated automatically (using pt.inspect). They can also override and extend the default behavior to customize the appearance of the schematic by implementing the HasSchematic protocol. SchematicDict representations are then rendered into HTML by draw().

`SchematicDict`¶

See below for the structure of the SchematicDict representation.

SchematicDict structure¶

SCHEMATIC = PIPELINES | PIPELINE | TRANSFORMER

PIPELINES = [PIPELINE | TRANSFORMER]

PIPELINE = {
    "type": "pipeline",
    "label": str | None,           # Short label for presentation on schematic
    "input_columns": [str],        # Overall input columns of entire pipeline
    "output_columns": [str],       # Overall output columns of entire pipeline
    "transformers": [TRANSFORMER], # List of transformers in this pipeline
}

TRANSFORMER = {
    "type": "transformer" | "indexer",
    "label": str,                        # Short label for presentation on schematic (default from .__class__.__name__)
    "name": str,                         # Full name of the transformer class for the title of the tooltip (default from .__class__.__name__)
    "input_columns": [str],              # (default from pt.inspect.transformer_inputs)
    "output_columns": [str],             # (default from pt.inspect.transformer_outputs)
    "input_validation_error": IVL | None # Input validation error, if any (type: pt.validate.InputValidationError)
    "help_url": str | None,              # URL of documentation page (default from pt.documentation.url_for_class)
    "settings": Dict[str, Any],          # Transformer configruation to show in body of tooltip (default from pt.inspect.transformer_attributes)
    "inner_pipelines": PIPELINES | None, # Pipelines to show within this block (default from pt.inspect.subtransformers)
    "inner_pipelines_mode": "unlinked" | "linked" | "combine" | None, # How to display the inner pipelines
    "inner_pipelines_labels": [str],     # When inner_pipelines_mode="unlinked", the names to show beside each inner pipeline
}

Transformers¶

A type="transformer" value in a SchematicDict represents a typical Transformer object. A transformer block shows a short label (label) and its input columns ("input_columns") and output columns ("output_columns") on the schematic. A a tooltip shows the class name of the transformer ("name") and its attributes ("settings"). Many of these values are obtained using pt.inspect by default. The values can be overritten by implementing the HasSchematic protocol.

Here is an example BM25 retrieval transformer schematic:

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba6790963a0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda7558a at 0x7ba67a1a3390>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

Output

Rendering issue. Try running the cell again.

Its underlying SchematicDict representation looks like this:

BM25 SchematicDict representation¶

{
    'type': 'transformer',
    'label': 'BM25',
    'name': 'pt.terrier.retriever.Retriever',
    'help_url': 'https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html#pyterrier.terrier.Retriever',
    'input_columns': ['qid', 'query'],
    'output_columns': ['qid', 'docid', 'docno', 'rank', 'score', 'query'],
    'settings': {
        'applypipeline': 'on',
        'bm25.b': 0.75,
        'bm25.k_1': 1.2,
        ...
    }
}

Indexers¶

A type="indexer" value in a SchematicDict represents a Indexer object. An indexer block shows a short label (label) and its input columns ("input_columns") on the schematic. A a tooltip shows the class name of the indexer ("name") and its attributes ("settings"). Many of these values are obtained using pt.inspect by default. The values can be overritten by implementing the HasSchematic protocol. Indexers should not have output_columns specified and should only appear on their own or as the final transformer of a pipeline.

When an indexer also implements transform() or transformer_iter(), it is treated as a transformer instead of an indexer, by default.

Here is an example indexer schematic:

docno	str	(External Document ID) String ID of document in collection
text	str	Document text

Rendering issue. Try running the cell again.

Its underlying SchematicDict representation looks like this:

Indexer SchematicDict representation¶

{
    'type': 'indexer'
    'label': 'TerrierIndexer',
    'name': 'pt.terrier.index.IterDictIndexer',
    'help_url': None,
    'input_columns': ['docno', 'text'],
    'output_columns': None,
    'settings': {}
}

Inner Pipelines¶

Some transformers can contain other transformers (i.e., subtransformers). There are a few ways to display these inner pipelines in schematics, depending on the how it the inner pipeline is used. These are configured with the inner_pipelines_mode setting.

unlinked (default). This mode shows each inner pipeline as a separate block without linking them together. This is useful when the transformer has logic that controls how it applies its subtransformers. Each inner pipeline is labeled with the name of the subtransformer. An example is RetrieverCache, which conditionally applies its retriever based on whether the query is in the cache or not:

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pyterrier_caching.retriever_cache.DbmRetrieverCache

path	/tmp/cache
on	None
verbose	False

DbmRetrieverCache

retriever

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672cc8500 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda7559a at 0x7ba67a1a3310>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

Output

Rendering issue. Try running the cell again.

This format is in all cases where a transformer has subtransformers (which is why it is the default). However, it may not be the most visually descriptive for all cases, which is why "linked" and "combine" modes are also available.

linked. This mode shows the inputs and outputs of the inner pipelines linked together, with the values contained in the transformer block itself. This signifies that all the pipelines are always run with the same inputs (potentially modified by the transformer first) and that the outputs of the inner pipelines are merged together. An example of this kind of pipeline is FeatureUnion:

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
docno	str	(External Document ID) String ID of document in collection
query	str	Query text

                
                    pt._ops.FeatureUnion

FeatureUnion **

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672ff43c0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755a2 at 0x7ba67aaf6910>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672ff5e00 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755ba at 0x7ba67a97bbf0>>
num_results	1000
metadata	['docno']
wmodel	DPH
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

DPH

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

R_f

qid	str	(Query ID) ID of query in frame
docno	str	(External Document ID) String ID of document in collection
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
rank	int	Ranking order of document to query (lower=better)
features	np.array	Feature array for learning-to-rank

Output

Rendering issue. Try running the cell again.

combine. This is a special case of linked mode where the transformer runs all of its inner pipelines with the original input and then combines the outputs into a single output. An example is RRFusion, which runs multiple retrieval methods and combines their outputs into a single result set:

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba6730b78e0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda7559a at 0x7ba67a97bb30>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.rewrite.SDM

SDM

qid	str	(Query ID) ID of query in frame
query	str	Query text
query_0	str	Stashed query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672e76170 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755ca at 0x7ba67a7a93d0>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
query_0	str	Stashed query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672e76030 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755fa at 0x7ba67a7a85b0>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pt.terrier.rewrite.RM3

RM3

qid	str	(Query ID) ID of query in frame
query_0	str	Stashed query text
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672e75cc0 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda75662 at 0x7ba67a7a9eb0>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query_0	str	Stashed query text
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

                
                    pyterrier_alpha.fusion.RRFusion

k	60
num_results	1000

RRF

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid_x
docno	str	(External Document ID) String ID of document in collection
rank_x
query_0_x
docid_y
rank_y
query_0_y
docid	int	(Internal Document ID) Integer ID of document in a specific index
score	float	Ranking score of document to query (higher=better)
rank	int	Ranking order of document to query (lower=better)

Output

Rendering issue. Try running the cell again.

Rendering in Notebooks¶

The pyterrier.Transformer base class implements the _repr_html_ method, which enables automatic rendering of schematics in Jupyter notebooks, Google Colab, and other notebook environments. This means that if the output of a cell is a transformer (including pipelines of transformers), its schematic will be rendered automatically as the output of the cell.

If you want to disable this behavior, you can set the PYTERRIER_DISABLE_NOTEBOOK_SCHEMATIC=1 environment variable. (This works even if PyTerrier is already imported.)

Rendering in Documentation¶

You can render schematics directly in PyTerrier documentation using the custom .. schematic:: directive. The body of the directive should be either a Python code block that creates a transformer to render or a SchematicDict object to render. The former is useful for documenting individual transformers, while the latter is useful for demonstrative/abstract purposes, or cases where running the code to construct the transformer is too costly for documentation (e.g., if it involves loading a large neural network).

PyTerrier is imported by default, so you can use the pt shorthand.

Rendering a BM25 transformer schematic in RST-formatted documentation.¶

.. schematic::
    pt.terrier.TerrierIndex.example().bm25()

Click to explore!

Input

qid	str	(Query ID) ID of query in frame
query	str	Query text

                
                    pt.terrier.retriever.Retriever

index_location	<org.terrier.querying.IndexRef at 0x7ba672727f70 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x5ff0fda755a2 at 0x7ba67a7ab770>>
num_results	1000
metadata	['docno']
wmodel	BM25
threads	1
verbose	False
terrierql	on
parsecontrols	on
parseql	on
applypipeline	on
localmatching	on
filters	on
decorate	on
bm25.k_1	1.2
bm25.b	0.75
decorate_batch	on
querying.processes	terrierql:TerrierQLParser,parsecontrols:TerrierQLToControls,parseql:TerrierQLToMatchingQueryTerms,matchopql:MatchingOpQLParser,applypipeline:ApplyTermPipeline,context_wmodel:org.terrier.python.WmodelFromContextProcess,localmatching:LocalManager$ApplyLocalMatching,qe:QueryExpansion,labels:org.terrier.learning.LabelDecorator,filters:LocalManager$PostFilterProcess,decorate:SimpleDecorateProcess
querying.postfilters	decorate:SimpleDecorate,site:SiteFilter,scope:Scope
querying.default.controls	wmodel:DPH,parsecontrols:on,parseql:on,applypipeline:on,terrierql:on,localmatching:on,filters:on,decorate:on
querying.allowed.controls	scope,qe,qemodel,start,end,site,scope,applypipeline
termpipelines	Stopwords,PorterStemmer

BM25

qid	str	(Query ID) ID of query in frame
query	str	Query text
docid	int	(Internal Document ID) Integer ID of document in a specific index
docno	str	(External Document ID) String ID of document in collection
rank	int	Ranking order of document to query (lower=better)
score	float	Ranking score of document to query (higher=better)

Output

Rendering issue. Try running the cell again.

Rendering a simple SchematicDict in RST-formatted documentation.¶

.. schematic::
    {
        "type": "transformer",
        "label": "Retriever",
        "input_columns": ["qid", "query"],
        "output_columns": ["qid", "query", "docno", "score", "rank"]
    }

Rendering issue. Try running the cell again.

API Documentation¶

pyterrier.schematic.draw(transformer, *, outer_class=None, input_columns=None)[source]¶

Draws a transformer as an HTML schematic.

If the transformer is already a SchematicDict, it will be drawn directly. Otherwise, it will first convert the transformer to a structured schematic using transformer_schematic(), and draw that.

Return type:

str

Parameters:

transformer (Transformer | dict) – The transformer to draw, or a dict in SchematicDict format.
input_columns (List[str] | None) – If you want to specify the input columns for the transformer (pipeline).
outer_class (str | None) – An optional CSS class to apply to the outer container of the schematic.

Returns:

An HTML string representing the schematic of the transformer.

class pyterrier.schematic.HasSchematic[source]¶

Protocol for transformers override details about their schematic representation.

This is an optional extension interface to pyterrier.Transformer that allows transformers to provide customizations to their schematics.

schematic(*, input_columns)[source]¶

Returns a structured schematic representation of the transformer.

The schematic should be a dictionary that follows the structure defined in pt.schematic.

For ease of use, the method can optionally return only some of the fields of the schematic; any missing fields will be filled in with default values.

It can also be implemented as an instance or class member when the values do not need to be computed on-the-fly (e.g., overriding the schematic label). When schematic is not callable, it uses its dict value directly as the schematic.

Return type:: Dict[str, Any]
Parameters:: input_columns (List[str] | None) – The input columns of the transformer, used to determine schematic fields such as the output columns.
Returns:: A dictionary representing the schematic of the transformer, which will be used to draw the schematic diagram.

pt.schematic - Visualizing Pipelines¶

SchematicDict¶

Transformers¶

Indexers¶

Inner Pipelines¶

Rendering in Notebooks¶

Rendering in Documentation¶

API Documentation¶

`SchematicDict`¶