pyterrier.debug - Transformers for Debugging¶
Its very easy to write complex pipelines with PyTerrier. Sometimes you need to inspect dataframes in the middle of a pipeline. The pt.debug transformers display the columns or the data, and can be inserted into pipelines during development.
Debug Methods¶
- pyterrier.debug.print_columns(by_query=False, message=None)[source]¶
Returns a transformer that can be inserted into pipelines that can print the column names of the dataframe at this stage in the pipeline:
- Parameters:
by_query (
bool
) – whether to display for each query. Defaults to False.message (
str
|None
) – whether to display a message before printing. Defaults to None, which means no message. This is useful whenprint_columns()
is being used multiple times within a pipeline
- Return type:
Example:
pipe = ( bm25 >> pt.debug.print_columns() >> pt.rewrite.RM3() >> pt.debug.print_columns() bm25)
- When the above pipeline is executed, two sets of columns will be displayed
[“qid”, “query”, “docno”, “rank”, “score”] - the output of BM25, a ranking of documents
[“qid”, “query”, “query_0”] - the output of RM3, a reformulated query
- pyterrier.debug.print_num_rows(by_query=True, msg='num_rows')[source]¶
Returns a transformer that can be inserted into pipelines that can print the number of rows names of the dataframe at this stage in the pipeline:
- Parameters:
by_query (
bool
) – whether to display for each query. Defaults to True.message – whether to display a message before printing. Defaults to “num_rows”. This is useful when
print_columns()
is being used multiple times within a pipelinemsg (str)
- Return type:
Example:
pipe = ( bm25 >> pt.debug.print_num_rows() >> pt.rewrite.RM3() >> pt.debug.print_num_rows() bm25)
- When the above pipeline is executed, the following output will be displayed
num_rows 1: 1000 - the output of BM25, a ranking of documents
num_rows 1: 1 - the output of RM3, the reformulated query
- pyterrier.debug.print_rows(by_query=True, jupyter=True, head=2, message=None, columns=None)[source]¶
Returns a transformer that can be inserted into pipelines that can print some of the dataframe at this stage in the pipeline:
- Parameters:
by_query (
bool
) – whether to display for each query. Defaults to True.jupyter (
bool
) – Whether to use IPython’s display function to display the dataframe. Defaults to True.head (
int
) – The number of rows to display. None means all rows.columns (
List
[str
] |None
) – Limit the columns for which data is displayed. Default of None displays all columns.message (
str
|None
) – whether to display a message before printing. Defaults to None, which means no message. This is useful whenprint_rows()
is being used multiple times within a pipeline
- Return type:
Example:
pipe = ( bm25 >> pt.debug.print_rows() >> pt.rewrite.RM3() >> pt.debug.print_rows() bm25)