Artifact API Reference

class pyterrier.Artifact(path)[source]

Base class for PyTerrier artifacts.

An artifact is a component stored on disk, such as an index.

Artifacts usually act as factories for transformers that use them. For example, an index artifact may provide a .retriever() method that returns a transformer that searches the index.

Initialize the artifact at the provided URL.

Parameters:

path (Path | str | _NoPath)

classmethod load(path, **kwargs)[source]

Load the artifact from the specified path.

If invoked on the base class, this method will attempt to find a supporting Artifact implementation that can load the artifact at the specified path. If invoked on a subclass, it will attempt to load the artifact using the specific implementation.

Return type:

Artifact

Parameters:
  • path (str) – The path of the artifact on disk.

  • **kwargs (Any) – arguments that will be passed to the constructor of the artifact class

Returns:

The loaded artifact.

Raises:
  • FileNotFoundError – If the specified path does not exist.

  • ValueError – If no implementation is found that supports the artifact at the specified path.

classmethod from_url(url, *, expected_sha256=None, **kwargs)[source]

Load the artifact from the specified URL.

The artifact at the specified URL will be downloaded and stored in PyTerrier’s artifact cache.

Return type:

Artifact

Parameters:
  • url (str) – The URL or file path of the artifact.

  • expected_sha256 (str | None) – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs (Any) – arguments that will be passed to the constructor of the artifact class

Returns:

The loaded artifact.

Raises:

ValueError – If no implementation is found that supports the artifact at the specified path.

build_package(package_path=None, *, max_file_size=None, metadata_out=None, verbose=True)[source]

Builds a package for this artifact.

Packaged artifacts are useful for distributing an artifact as a single file, such as from Artifact.from_url(). A separate metadata file is also generated, which gives information about the package’s contents, including file sizes and an expected hash for the package.

Return type:

str

Parameters:
  • package_path (str | None) – The path of the package to create. Defaults to the artifact path with a .tar.lz4 extension.

  • max_file_size (float | None) – the (approximate) maximum size of each file.

  • metadata_out (Dict[str, Any] | None) – A dictionary that is updated with the metadata of the artifact (if provided).

  • verbose (bool) – Whether to display a progress bar when packaging.

Returns:

The path of the package created.

classmethod from_dataset(dataset, variant, *, expected_sha256=None)[source]

Load an artifact from a PyTerrier dataset.

Return type:

Artifact

Parameters:
  • dataset (str) – The name of the dataset.

  • variant (str) – The variant of the dataset.

  • expected_sha256 (str | None) – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

classmethod from_hf(repo, branch=None, *, expected_sha256=None, **kwargs)[source]

Load an artifact from Hugging Face Hub.

Return type:

Artifact

Parameters:
  • repo (str) – The Hugging Face repository name.

  • branch (str | None) – The branch or tag of the repository to load. (Default: main). A branch can also be provided directly in the repository name using owner/repo@branch.

  • expected_sha256 (str | None) – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs (Any) – arguments that will be passed to the constructor of the artifact class

to_hf(repo, *, branch=None, pretty_name=None, private=None)[source]

Upload this artifact to Hugging Face Hub.

Return type:

None

Parameters:
  • repo (str) – The Hugging Face repository name.

  • branch (str | None) – The branch or tag of the repository to upload to. (Default: main) A branch can also be provided directly in the repository name using owner/repo@branch.

  • pretty_name (str | None) – The human-readable name of the artifact. (Default: the repository name)

  • private (bool | None) – Whether make the repository private. New repositories default to public unless the organization’s default is private. No change to the repository’s visiblity will be made if private=None (default).

classmethod from_zenodo(zenodo_id, *, expected_sha256=None, **kwargs)[source]

Load an artifact from Zenodo.

Return type:

Artifact

Parameters:
  • zenodo_id (str) – The Zenodo record ID of the artifact.

  • expected_sha256 (str | None) – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs (Any) – arguments that will be passed to the constructor of the artifact class

to_zenodo(*, pretty_name=None, sandbox=False)[source]

Upload this artifact to Zenodo.

Return type:

None

Parameters:
  • pretty_name (str | None) – The human-readable name of the artifact.

  • sandbox (bool) – Whether to perform a test upload to the Zenodo sandbox.

classmethod from_p2p(code, path, *, expected_sha256=None, **kwargs)[source]

Load an artifact from a peer using a Magic Wormhole code.

Return type:

Artifact

Parameters:
  • code (str) – The Magic Wormhole code.

  • path (str) – The path to save the artifact to.

  • expected_sha256 (str | None) – The expected SHA-256 hash of the artifact. If provided, the downloaded artifact will be verified against this hash and an error will be raised if the hash does not match.

  • **kwargs (Any) – arguments that will be passed to the constructor of the artifact class

to_p2p()[source]

Send this artifact directly to a peer using Magic Wormhole.

The recipient can use the provided code to download the artifact.