Artifacts

An artifact is a component stored on disk, such as an index.

Artifacts usually act as factories for transformers that use them. For example, an index artifact may provide a .retriever() method that returns a transformer that searches the index.

You can use pta.Artifact.load('path/to/artifact') to load an artifact. The function automatically identifies the artifact’s type and initializes it:

Loading an Artifact
index = pta.Artifact.load('path/to/msmarco-passage.pisa')
# PisaIndex('path/to/msmarco-passage.pisa')
index.bm25() # returns a BM25 PisaRetriever for the index

You can also save and load artifacts from HuggingFace Hub:

Save and Load an artifact from HuggingFace Hub
# uploads the artifact to HuggingFace Hub
index.to_hf('username/repo')

# loads an artifact from HuggingFace Hub
pta.Artifact.from_hf('username/repo')

API Documentation

class pyterrier_alpha.Artifact(path)[source]

Base class for PyTerrier artifacts.

An artifact is a component stored on disk, such as an index.

Artifacts usually act as factories for transformers that use them. For example, an index artifact may provide a .retriever() method that returns a transformer that searches the index.

Initialize the artifact at the provided URL.

classmethod load(path, **kwargs)[source]

Load the artifact from the specified path.

If invoked on the base class, this method will attempt to find a supporting Artifact implementation that can load the artifact at the specified path. If invoked on a subclass, it will attempt to load the artifact using the specific implementation.

Return type:

Artifact

Parameters:
  • path – The path of the artifact on disk.

  • **kwargs – arguments that will be passed to the constructor of the artifact class

Returns:

The loaded artifact.

Raises:
  • FileNotFoundError – If the specified path does not exist.

  • ValueError – If no implementation is found that supports the artifact at the specified path.

classmethod from_url(url, *, expected_sha256=None, **kwargs)[source]

Load the artifact from the specified URL.

The artifact at the specified URL will be downloaded and stored in PyTerrier’s artifact cache.

Return type:

Artifact

Parameters:
  • url – The URL or file path of the artifact.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs – arguments that will be passed to the constructor of the artifact class

Returns:

The loaded artifact.

Raises:

ValueError – If no implementation is found that supports the artifact at the specified path.

build_package(package_path=None, *, max_file_size=None, metadata_out=None, verbose=True)[source]

Builds a package for this artifact.

Packaged artifacts are useful for distributing an artifact as a single file, such as from Artifact.from_url(). A separate metadata file is also generated, which gives information about the package’s contents, including file sizes and an expected hash for the package.

Return type:

str

Parameters:
  • package_path – The path of the package to create. Defaults to the artifact path with a .tar.lz4 extension.

  • max_file_size – the (approximate) maximum size of each file.

  • metadata_out – A dictionary that is updated with the metadata of the artifact (if provided).

  • verbose – Whether to display a progress bar when packaging.

Returns:

The path of the package created.

classmethod from_dataset(dataset, variant, *, expected_sha256=None)[source]

Load an artifact from a PyTerrier dataset.

Return type:

Artifact

Parameters:
  • dataset – The name of the dataset.

  • variant – The variant of the dataset.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

classmethod from_hf(repo, branch=None, *, expected_sha256=None, **kwargs)[source]

Load an artifact from Hugging Face Hub.

Return type:

Artifact

Parameters:
  • repo – The Hugging Face repository name.

  • branch – The branch or tag of the repository to load. (Default: main). A branch can also be provided directly in the repository name using owner/repo@branch.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs – arguments that will be passed to the constructor of the artifact class

to_hf(repo, *, branch=None, pretty_name=None, private=None)[source]

Upload this artifact to Hugging Face Hub.

Return type:

None

Parameters:
  • repo – The Hugging Face repository name.

  • branch – The branch or tag of the repository to upload to. (Default: main) A branch can also be provided directly in the repository name using owner/repo@branch.

  • pretty_name – The human-readable name of the artifact. (Default: the repository name)

  • private – Whether make the repository private. New repositories default to public unless the organization’s default is private. No change to the repository’s visiblity will be made if private=None (default).

classmethod from_zenodo(zenodo_id, *, expected_sha256=None, **kwargs)[source]

Load an artifact from Zenodo.

Return type:

Artifact

Parameters:
  • zenodo_id – The Zenodo record ID of the artifact.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs – arguments that will be passed to the constructor of the artifact class

to_zenodo(*, pretty_name=None, sandbox=False)[source]

Upload this artifact to Zenodo.

Return type:

None

Parameters:
  • pretty_name – The human-readable name of the artifact.

  • sandbox – Whether to perform a test upload to the Zenodo sandbox.

classmethod from_p2p(code, path, *, expected_sha256=None, **kwargs)[source]

Load an artifact from a peer using a Magic Wormhole code.

Return type:

Artifact

Parameters:
  • code – The Magic Wormhole code.

  • path – The path to save the artifact to.

  • expected_sha256 – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs – arguments that will be passed to the constructor of the artifact class

to_p2p()[source]

Send this artifact directly to a peer using Magic Wormhole.

The recipient can use the provided code to download the artifact.