pyterrier.debug - Transformers for Debugging

Its very easy to write complex pipelines with PyTerrier. Sometimes you need to inspect dataframes in the middle of a pipeline. The pt.debug transformers display the columns or the data, and can be inserted into pipelines during development.

Debug Methods

pyterrier.debug.print_columns(by_query=False, message=None)[source]

Returns a transformer that can be inserted into pipelines that can print the column names of the dataframe at this stage in the pipeline:

Return type:

Transformer

Parameters:
  • by_query (-) – whether to display for each query. Defaults to False.

  • message (-) – whether to display a message before printing. Defaults to None, which means no message. This is useful when print_columns() is being used multiple times within a pipeline

Example:

pipe = (
    bm25
    >> pt.debug.print_columns()
    >> pt.rewrite.RM3()
    >> pt.debug.print_columns()
    bm25
When the above pipeline is executed, two sets of columns will be displayed
  • [“qid”, “query”, “docno”, “rank”, “score”] - the output of BM25, a ranking of documents

  • [“qid”, “query”, “query_0”] - the output of RM3, a reformulated query

pyterrier.debug.print_num_rows(by_query=True, msg='num_rows')[source]

Returns a transformer that can be inserted into pipelines that can print the number of rows names of the dataframe at this stage in the pipeline:

Return type:

Transformer

Parameters:
  • by_query (-) – whether to display for each query. Defaults to True.

  • message (-) – whether to display a message before printing. Defaults to “num_rows”. This is useful when print_columns() is being used multiple times within a pipeline

Example:

pipe = (
    bm25
    >> pt.debug.print_num_rows()
    >> pt.rewrite.RM3()
    >> pt.debug.print_num_rows()
    bm25
When the above pipeline is executed, the following output will be displayed
  • num_rows 1: 1000 - the output of BM25, a ranking of documents

  • num_rows 1: 1 - the output of RM3, the reformulated query

pyterrier.debug.print_rows(by_query=True, jupyter=True, head=2, message=None, columns=None)[source]

Returns a transformer that can be inserted into pipelines that can print some of the dataframe at this stage in the pipeline:

Return type:

Transformer

Parameters:
  • by_query (-) – whether to display for each query. Defaults to True.

  • jupyter (-) – Whether to use IPython’s display function to display the dataframe. Defaults to True.

  • head (-) – The number of rows to display. None means all rows.

  • columns (-) – Limit the columns for which data is displayed. Default of None displays all columns.

  • message (-) – whether to display a message before printing. Defaults to None, which means no message. This is useful when print_rows() is being used multiple times within a pipeline

Example:

pipe = (
    bm25
    >> pt.debug.print_rows()
    >> pt.rewrite.RM3()
    >> pt.debug.print_rows()
    bm25