Pinecone¶
Pinecone provides a Hosted Inference API to various embedding
and reranking models. pyterrier-services
provides access to these APIs through
PineconeApi
.
Note
To use this API, you will need to have the pinecone package installed (pip install pinecone
)
and have a Pinecone API Key. You can
provide your API key through the environment variable PINECONE_API_KEY
(preferred), or pass it
to the constructor of PineconeApi
.
Examples¶
Learned Sparse¶
# Setup
>>> from pyterrier_services import PineconeApi
>>> from pyterrier_pisa import PisaIndex
>>> pinecone = PineconeApi()
>>> model = pinecone.sparse_model()
>>> index = PisaIndex('my_index.pisa', stemmer='none')
# Indexing
>>> pipeline = model >> index
>>> pipeline.index([
... {'docno': 'doc1', 'text': 'PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval'},
... {'docno': 'doc2', 'text': 'QPPTK@TIREx: Simplified Query Performance Prediction for Ad-Hoc Retrieval Experiments'},
... ])
# Retrieval
>>> pipeline = model >> index.quantized()
>>> pipeline.search('pyterrier')
qid query query_toks docno score rank
0 1 Retrieval {'retrieval': 1.0} doc2 30900.0 0
1 1 Retrieval {'retrieval': 1.0} doc1 29400.0 1
Dense¶
# Setup
>>> from pyterrier_services import PineconeApi
>>> from pyterrier_dr import FlexIndex
>>> pinecone = PineconeApi()
>>> model = pinecone.dense_model()
>>> index = FlexIndex('my_index.flex')
# Indexing
>>> pipeline = model >> index
>>> pipeline.index([
... {'docno': 'doc1', 'text': 'PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval'},
... {'docno': 'doc2', 'text': 'QPPTK@TIREx: Simplified Query Performance Prediction for Ad-Hoc Retrieval Experiments'},
... ])
# Retrieval
>>> pipeline = model >> index.retriever()
>>> pipeline.search('pyterrier')
qid query query_vec docno docid score rank
0 1 pyterrier [0.00923919677734375, -0.0171356201171875, -0.... doc1 0 0.814679 0
1 1 pyterrier [0.00923919677734375, -0.0171356201171875, -0.... doc2 1 0.722664 1
Re-Ranking¶
>>> import pandas as pd
>>> from pyterrier_services import PineconeApi
>>> pinecone = PineconeApi()
>>> model = pinecone.reranker()
>>> model(pd.DataFrame([
... {'qid': '1', 'query': 'retrieval', 'docno': 'doc1', 'text': 'PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval'},
... {'qid': '1', 'query': 'retrieval', 'docno': 'doc2', 'text': 'QPPTK@TIREx: Simplified Query Performance Prediction for Ad-Hoc Retrieval Experiments'},
]))
qid query docno text score rank
0 1 retrieval doc2 QPPTK@TIREx: Simplified Query Performance Pred... 0.004811 0
1 1 retrieval doc1 PyTerrier: Declarative Experimentation in Pyth... 0.001598 1
API Documentation¶
- class pyterrier_services.PineconeApi(api_key=None)[source]¶
Represents a reference to the Pinecone API.
This class wraps
pinecone.Pinecone
.- Parameters:
api_key (str, optional) – The Pinecone API key. Defaults to the value from
PINECONE_API_KEY
.
- dense_model(model_name='multilingual-e5-large')[source]¶
Creates a
PineconeDenseModel
instance.- Return type:
- Parameters:
model_name (str) – The name of the model. See the list of supported models.
- sparse_model(model_name='pinecone-sparse-english-v0')[source]¶
Creates a
PineconeSparseModel
instance.- Return type:
- Parameters:
model_name (str) – The name of the model. See the list of supported models.
- reranker(model_name='pinecone-rerank-v0')[source]¶
Creates a
PineconeReranker
instance.- Return type:
- Parameters:
model_name (str) – The name of the model. See the list of supported models.
- class pyterrier_services.PineconeSparseModel(model_name='pinecone-sparse-english-v0', *, api=None)[source]¶
A PyTerrier transformer that provies access to a Pinecone sparse model.
- Parameters:
model_name (str) – The name of the model. See the list of supported models.
api (PineconeApi, optional) – The Pinecone API object. Defaults to a new instance.
- transform(inp)[source]¶
Encodes either queries or documents using this model (based on input columns)
- Return type:
DataFrame
- query_encoder()[source]¶
Creates a transformer that encodes queries using this model.
- Return type:
PineconeSparseEncoder
- class pyterrier_services.PineconeDenseModel(model_name='multilingual-e5-large', *, api=None)[source]¶
A PyTerrier transformer that provides access to a Pinecone dense model.
- Parameters:
model_name (str) – The name of the model. See the list of supported models.
api (PineconeApi, optional) – The Pinecone API object. Defaults to a new instance.
- transform(inp)[source]¶
Encodes either queries or documents using this model (based on input columns)
- Return type:
DataFrame
- query_encoder()[source]¶
Creates a transformer that encodes queries using this model.
- Return type:
PineconeDenseEncoder
- class pyterrier_services.PineconeReranker(model_name='pinecone-rerank-v0', *, api=None)[source]¶
A PyTerrier transformer that provies access to a Pinecone reranker model.
- Parameters:
model_name (str) – The name of the model. See the list of supported models.
api (PineconeApi, optional) – The Pinecone API object. Defaults to a new instance.