Semantic Scholar

Semantic Scholar is a search engine over academic papers provided by the Allen Institute for AI.

pyterrier-services provides access to the Semantic Scholar search API through SemanticScholarRetriever.

Example:

Retrieve from the Semantic Scholar API
>>> from pyterrier_services import SemanticScholar
>>> s2 = SemanticScholar()
>>> retr = s2.retriever(num_results=5)
>>> retr.search('pyterrier')
# qid      query                                     docno  score  rank                                              title                                           abstract
#   1  pyterrier  7fa92ed08eee68a945884b8744e7db9887aed9d3      0     0  PyTerrier: Declarative Experimentation in Pyth...  PyTerrier is a Python-based retrieval framewor...
#   1  pyterrier  a6b1126e058262c57d36012d0fdedc2417ad04e1     -1     1  Declarative Experimentation in Information Ret...  The advent of deep machine learning platforms ...
#   1  pyterrier  833b453c621099bccca028752aaa74262123706a     -2     2  PyTerrier-based Research Data Recommendations ...  Research data is of high importance in scienti...
#   1  pyterrier  73feb5cfe491342d52d47e8817d113c072067306     -3     3      The Information Retrieval Experiment Platform  We integrate irdatasets, ir_measures, and PyTe...
#   1  pyterrier  90b8a1adae2761e48c87fdeb68a595dc11161970     -4     4  QPPTK@TIREx: Simplified Query Performance Pred...  We describe our software submission to the ECI...
class pyterrier_services.SemanticScholarApi[source]

Represents a reference to the Semantic Scholar search API.

retriever(*, num_results=100, fields=['title', 'abstract'], verbose=True)[source]

Returns a Transformer that retrieves articles from Semantic Scholar.

Return type:

Transformer

Parameters:
  • num_results – The number of results to retrieve. Defaults to 100.

  • fields – The fields to include in the retrieved results. Defaults to [‘title’, ‘abstract’].

  • verbose – Whether to log the progress. Defaults to True.

search(query, *, offset=0, limit=100, fields=['title', 'abstract'], return_next=False, return_total=False)[source]

Searches for papers on Semantic Scholar with the provided query.

Return type:

Union[DataFrame, Tuple[DataFrame, int], Tuple[DataFrame, int, int]]

Parameters:
  • query – The search query.

  • offset – The offset of the first result to retrieve. Defaults to 0.

  • limit – The maximum number of results to retrieve. Defaults to 100.

  • fields – The fields to include in the retrieved results. Defaults to [‘title’, ‘abstract’].

  • return_next – Whether to return the next query URL. Defaults to False.

  • return_total – Whether to return the total number of results. Defaults to False.

class pyterrier_services.SemanticScholarRetriever(*, api=None, num_results=100, fields=['title', 'abstract'], verbose=True)[source]

A Transformer retriever that queries the Semantic Scholar search API.

Parameters:
  • api – The Semantic Scholar api service. Defaults to a new instance of SemanticScholarApi.

  • num_results – The number of results to retrieve per query. Defaults to 100.

  • fields – The fields to include in the retrieved results. Defaults to [‘title’, ‘abstract’].

  • verbose – Whether to log the progress. Defaults to True.