pt.debug - Tools for Debugging¶

Its very easy to write complex pipelines with PyTerrier. Sometimes you need to inspect dataframes in the middle of a pipeline. The pt.debug transformers display the columns or the data, and can be inserted into pipelines during development.

Debug Methods¶

pyterrier.debug.print_columns(by_query=False, message=None)[source]¶

Returns a transformer that can be inserted into pipelines that can print the column names of the dataframe at this stage in the pipeline:

Parameters:

by_query (bool) – whether to display for each query. Defaults to False.
message (str | None) – whether to display a message before printing. Defaults to None, which means no message. This is useful when print_columns() is being used multiple times within a pipeline

Return type:

Transformer

Example:

pipe = (
    bm25
    >> pt.debug.print_columns()
    >> pt.rewrite.RM3()
    >> pt.debug.print_columns()
    bm25)

When the above pipeline is executed, two sets of columns will be displayed

[“qid”, “query”, “docno”, “rank”, “score”] - the output of BM25, a ranking of documents
[“qid”, “query”, “query_0”] - the output of RM3, a reformulated query

pyterrier.debug.print_num_rows(by_query=True, msg='num_rows')[source]¶

Returns a transformer that can be inserted into pipelines that can print the number of rows names of the dataframe at this stage in the pipeline:

Parameters:

by_query (bool) – whether to display for each query. Defaults to True.
message – whether to display a message before printing. Defaults to “num_rows”. This is useful when print_columns() is being used multiple times within a pipeline
msg (str)

Return type:

Transformer

Example:

pipe = (
    bm25
    >> pt.debug.print_num_rows()
    >> pt.rewrite.RM3()
    >> pt.debug.print_num_rows()
    bm25)

When the above pipeline is executed, the following output will be displayed

num_rows 1: 1000 - the output of BM25, a ranking of documents
num_rows 1: 1 - the output of RM3, the reformulated query

pyterrier.debug.print_rows(by_query=True, jupyter=True, head=2, message=None, columns=None)[source]¶

Returns a transformer that can be inserted into pipelines that can print some of the dataframe at this stage in the pipeline:

Parameters:

by_query (bool) – whether to display for each query. Defaults to True.
jupyter (bool) – Whether to use IPython’s display function to display the dataframe. Defaults to True.
head (int) – The number of rows to display. None means all rows.
columns (List[str] | None) – Limit the columns for which data is displayed. Default of None displays all columns.
message (str | None) – whether to display a message before printing. Defaults to None, which means no message. This is useful when print_rows() is being used multiple times within a pipeline

Return type:

Transformer

Example:

pipe = (
    bm25
    >> pt.debug.print_rows()
    >> pt.rewrite.RM3()
    >> pt.debug.print_rows()
    bm25)

class pyterrier.debug.pdb(*args, **kwargs)[source]¶

Returns a transformer that starts an interactive pdb debugger session. The interactive session can be used to inspect the dataframe at this stage in the pipeline.

Example:

pipe = (
    bm25
    >> pt.debug.pdb()
    >> pt.rewrite.RM3()
    >> pt.debug.pdb()
    bm25)