pyterrier.debug - Transformers for Debugging¶
Its very easy to write complex pipelines with PyTerrier. Sometimes you need to inspect dataframes in the middle of a pipeline. The pt.debug transformers display the columns or the data, and can be inserted into pipelines during development.
Debug Methods¶
- pyterrier.debug.print_columns(by_query=False, message=None)[source]¶
Returns a transformer that can be inserted into pipelines that can print the column names of the dataframe at this stage in the pipeline:
- Return type:
- Parameters:
by_query (-) – whether to display for each query. Defaults to False.
message (-) – whether to display a message before printing. Defaults to None, which means no message. This is useful when
print_columns()
is being used multiple times within a pipeline
Example:
pipe = ( bm25 >> pt.debug.print_columns() >> pt.rewrite.RM3() >> pt.debug.print_columns() bm25)
- When the above pipeline is executed, two sets of columns will be displayed
[“qid”, “query”, “docno”, “rank”, “score”] - the output of BM25, a ranking of documents
[“qid”, “query”, “query_0”] - the output of RM3, a reformulated query
- pyterrier.debug.print_num_rows(by_query=True, msg='num_rows')[source]¶
Returns a transformer that can be inserted into pipelines that can print the number of rows names of the dataframe at this stage in the pipeline:
- Return type:
- Parameters:
by_query (-) – whether to display for each query. Defaults to True.
message (-) – whether to display a message before printing. Defaults to “num_rows”. This is useful when
print_columns()
is being used multiple times within a pipeline
Example:
pipe = ( bm25 >> pt.debug.print_num_rows() >> pt.rewrite.RM3() >> pt.debug.print_num_rows() bm25)
- When the above pipeline is executed, the following output will be displayed
num_rows 1: 1000 - the output of BM25, a ranking of documents
num_rows 1: 1 - the output of RM3, the reformulated query
- pyterrier.debug.print_rows(by_query=True, jupyter=True, head=2, message=None, columns=None)[source]¶
Returns a transformer that can be inserted into pipelines that can print some of the dataframe at this stage in the pipeline:
- Return type:
- Parameters:
by_query (-) – whether to display for each query. Defaults to True.
jupyter (-) – Whether to use IPython’s display function to display the dataframe. Defaults to True.
head (-) – The number of rows to display. None means all rows.
columns (-) – Limit the columns for which data is displayed. Default of None displays all columns.
message (-) – whether to display a message before printing. Defaults to None, which means no message. This is useful when
print_rows()
is being used multiple times within a pipeline
Example:
pipe = ( bm25 >> pt.debug.print_rows() >> pt.rewrite.RM3() >> pt.debug.print_rows() bm25)