Artifacts

An artifact is a component stored on disk, such as an index.

Artifacts usually act as factories for transformers that use them. For example, an index artifact may provide a .retriever() method that returns a transformer that searches the index.

You can use pta.Artifact.load('path/to/artifact') to load an artifact. The function automatically identifies the artifact’s type and initializes it:

Loading an Artifact
index = pta.Artifact.load('path/to/msmarco-passage.pisa')
# PisaIndex('path/to/msmarco-passage.pisa')
index.bm25() # returns a BM25 PisaRetriever for the index

You can also save and load artifacts from HuggingFace Hub:

Save and Load an artifact from HuggingFace Hub
# uploads the artifact to HuggingFace Hub
index.to_hf('username/repo')

# loads an artifact from HuggingFace Hub
pta.Artifact.from_hf('username/repo')

API Documentation

class pyterrier_alpha.Artifact(path)[source]

Base class for PyTerrier artifacts.

An artifact is a component stored on disk, such as an index.

Artifacts usually act as factories for transformers that use them. For example, an index artifact may provide a .retriever() method that returns a transformer that searches the index.

Initialize the artifact at the provided URL.

classmethod load(path)[source]

Load the artifact from the specified path.

If invoked on the base class, this method will attempt to find a supporting Artifact implementation that can load the artifact at the specified path. If invoked on a subclass, it will attempt to load the artifact using the specific implementation.

Return type:

Artifact

Parameters:

path – The path of the artifact on disk.

Returns:

The loaded artifact.

Raises:
  • FileNotFoundError – If the specified path does not exist.

  • ValueError – If no implementation is found that supports the artifact at the specified path.

classmethod from_url(url, *, expected_sha256=None)[source]

Load the artifact from the specified URL.

The artifact at the specified URL will be downloaded and stored in PyTerrier’s artifact cache.

Return type:

Artifact

Parameters:
  • url – The URL or file path of the artifact.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

Returns:

The loaded artifact.

Raises:

ValueError – If no implementation is found that supports the artifact at the specified path.

classmethod from_hf(repo, branch=None, *, expected_sha256=None)[source]

Load an artifact from Hugging Face Hub.

Return type:

Artifact

Parameters:
  • repo – The Hugging Face repository name.

  • branch – The branch or tag of the repository to load. (Default: main). A branch can also be provided directly in the repository name using owner/repo@branch.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

to_hf(repo, *, branch=None, pretty_name=None)[source]

Upload this artifact to Hugging Face Hub.

Return type:

None

Parameters:
  • repo – The Hugging Face repository name.

  • branch – The branch or tag of the repository to upload to. (Default: main) A branch can also be provided directly in the repository name using owner/repo@branch.

  • pretty_name – The human-readable name of the artifact. (Default: the repository name)

classmethod from_dataset(dataset, variant, *, expected_sha256=None)[source]

Load an artifact from a PyTerrier dataset.

Return type:

Artifact

Parameters:
  • dataset – The name of the dataset.

  • variant – The variant of the dataset.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

build_package(package_path=None, *, max_file_size=None, metadata_out=None, verbose=True)[source]

Builds a package for this artifact.

Packaged artifacts are useful for distributing an artifact as a single file, such as from Artifact.from_url(). A separate metadata file is also generated, which gives information about the package’s contents, including file sizes and an expected hash for the package.

Return type:

str

Parameters:
  • package_path – The path of the package to create. Defaults to the artifact path with a .tar.lz4 extension.

  • max_file_size – the (approximate) maximum size of each file.

  • metadata_out – A dictionary that is updated with the metadata of the artifact (if provided).

  • verbose – Whether to display a progress bar when packaging.

Returns:

The path of the package created.