Artifacts¶
An artifact is a component stored on disk, such as an index.
Artifacts usually act as factories for transformers that use them. For example, an index artifact
may provide a .retriever()
method that returns a transformer that searches the index.
You can use pta.Artifact.load('path/to/artifact')
to load an artifact. The function automatically
identifies the artifact’s type and initializes it:
index = pta.Artifact.load('path/to/msmarco-passage.pisa')
# PisaIndex('path/to/msmarco-passage.pisa')
index.bm25() # returns a BM25 PisaRetriever for the index
You can also save and load artifacts from HuggingFace Hub:
# uploads the artifact to HuggingFace Hub
index.to_hf('username/repo')
# loads an artifact from HuggingFace Hub
pta.Artifact.from_hf('username/repo')
API Documentation¶
- class pyterrier_alpha.Artifact(path)[source]¶
Base class for PyTerrier artifacts.
An artifact is a component stored on disk, such as an index.
Artifacts usually act as factories for transformers that use them. For example, an index artifact may provide a .retriever() method that returns a transformer that searches the index.
Initialize the artifact at the provided URL.
- classmethod load(path)[source]¶
Load the artifact from the specified path.
If invoked on the base class, this method will attempt to find a supporting Artifact implementation that can load the artifact at the specified path. If invoked on a subclass, it will attempt to load the artifact using the specific implementation.
- Return type:
- Parameters:
path – The path of the artifact on disk.
- Returns:
The loaded artifact.
- Raises:
FileNotFoundError – If the specified path does not exist.
ValueError – If no implementation is found that supports the artifact at the specified path.
- classmethod from_url(url, *, expected_sha256=None)[source]¶
Load the artifact from the specified URL.
The artifact at the specified URL will be downloaded and stored in PyTerrier’s artifact cache.
- Return type:
- Parameters:
url – The URL or file path of the artifact.
expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.
- Returns:
The loaded artifact.
- Raises:
ValueError – If no implementation is found that supports the artifact at the specified path.
- classmethod from_hf(repo, branch=None, *, expected_sha256=None)[source]¶
Load an artifact from Hugging Face Hub.
- Return type:
- Parameters:
repo – The Hugging Face repository name.
branch – The branch or tag of the repository to load. (Default: main). A branch can also be provided directly in the repository name using
owner/repo@branch
.expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.
- to_hf(repo, *, branch=None, pretty_name=None)[source]¶
Upload this artifact to Hugging Face Hub.
- Return type:
None
- Parameters:
repo – The Hugging Face repository name.
branch – The branch or tag of the repository to upload to. (Default: main) A branch can also be provided directly in the repository name using
owner/repo@branch
.pretty_name – The human-readable name of the artifact. (Default: the repository name)
- classmethod from_dataset(dataset, variant, *, expected_sha256=None)[source]¶
Load an artifact from a PyTerrier dataset.
- Return type:
- Parameters:
dataset – The name of the dataset.
variant – The variant of the dataset.
expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.
- build_package(package_path=None, *, max_file_size=None, metadata_out=None, verbose=True)[source]¶
Builds a package for this artifact.
Packaged artifacts are useful for distributing an artifact as a single file, such as from Artifact.from_url(). A separate metadata file is also generated, which gives information about the package’s contents, including file sizes and an expected hash for the package.
- Return type:
str
- Parameters:
package_path – The path of the package to create. Defaults to the artifact path with a .tar.lz4 extension.
max_file_size – the (approximate) maximum size of each file.
metadata_out – A dictionary that is updated with the metadata of the artifact (if provided).
verbose – Whether to display a progress bar when packaging.
- Returns:
The path of the package created.