pyterrier.new - Creating new dataframes

This module provides useful utility methods for creating example dataframes for queries and ranked documents.

pyterrier.new.empty_Q()[source]

Returns an empty dataframe with columns [“qid”, “query”].

Return type:

DataFrame

pyterrier.new.queries(queries, qid=None, **others)[source]

Creates a new queries dataframe. Will return a dataframe with the columns [“qid”, “query”]. Any further lists in others will also be added.

Return type:

DataFrame

Parameters:
  • queries – The search queries. Either a string, for a single query, or a sequence (e.g. list of strings)

  • qids – Corresponding query ids. Either a string, for a single query, or a sequence (e.g. list of strings). Must have same length as queries.

  • others – A dictionary of other attributes to add to the query dataframe

Examples:

# create a dataframe with one query, qid "1"
one_query = pt.new.queries("what the noise was was the question")

# create a dataframe with one query, qid "5"
one_query = pt.new.queries("what the noise was was the question", 5)

# create a dataframe with two queries
one_query = pt.new.queries(["query text A", "query text B"], ["1", "2"])

# create a dataframe with two queries
one_query = pt.new.queries(["query text A", "query text B"], ["1", "2"], categories=["catA", "catB"])
pyterrier.new.empty_R()[source]

Returns an empty dataframe with columns [“qid”, “query”, “docno”, “rank”, “score”].

Return type:

DataFrame

pyterrier.new.ranked_documents(scores, qid=None, docno=None, **others)[source]

Creates a new ranked documents dataframe. Will return a dataframe with the columns [“qid”, “docno”, “score”, “rank”]. Any further lists in others will also be added.

Return type:

DataFrame

Parameters:
  • scores – The scores of the retrieved documents. Must be a list of lists.

  • qid – Corresponding query ids. Must have same length as the first dimension of scores. If omitted, documents, qids are computed as strings starting from “1”

  • docno – Corresponding docnos. Must have same length as the first dimension of scores and each 2nd dimension must be the same as the number of documents retrieved. If omitted, docnos are computed as strings starting from “d1” for each query.

  • others – A dictionary of other attributes to add to the query dataframe.

Examples:

# one query, one document
R1 = pt.new.ranked_documents([[1]])

# one query, two documents
R2 = pt.new.ranked_documents([[1, 2]])

# two queries, one documents each
R3 = pt.new.ranked_documents([[1], [2]])

# one query, one document, qid specified
R4 = pt.new.ranked_documents([[1]], qid=["q100"])

# one query, one document, qid and docno specified
R5 = pt.new.ranked_documents([[1]], qid=["q100"], docno=[["d20"]])