Caching Indexing Pipeline Results ==================================== :class:`~pyterrier_caching.IndexerCache` saves the sequence of documents encountered in an indexing pipeline. It allows you to repeat that sequence without needing to re-execute the computations up to that point. **Example use case:** I want to test how different retrieval engines perform over `learned sparse representations `_, but I don't want to re-compute the representations each time. You use an ``IndexerCache`` the same way you would use an indexer: as the last component of a pipeline. Rather than building an index of the data, the ``IndexerCache`` will save your results to a file on disk. This file can be re-read by iterating over the cache object with ``iter(cache)``. Example: .. code-block:: python :caption: Caching the results of an expensive transformer using :class:`~pyterrier_caching.IndexerCache` import pyterrier as pt from pyterrier_caching import IndexerCache # Setup cache = IndexerCache('path/to/cache') dataset = pt.get_dataset('irds:msmarco-passage') # Use the IndexerCache cache object just as you would an indexer cache_pipeline = MyExpensiveTransformer() >> cache # The following line will save the results of MyExpensiveTransformer() to path/to/cache cache_pipeline.index(dataset.get_corpus_iter()) # Now you can build multiple indexes over the results of MyExpensiveTransformer without # needing to re-run it each time indexer1 = ... # e.g., pt.IterDictIndexer('./path/to/index.terrier') indexer1.index(iter(cache)) indexer2 = ... # e.g., pyterrier_pisa.PisaIndex('./path/to/index.pisa') indexer2.index(iter(cache)) ``IndexerCache`` provides a variety of other functionality over the cached results. See the API documentation below for more details. API Documentation -------------------------- .. autoclass:: pyterrier_caching.IndexerCache :members: .. autoclass:: pyterrier_caching.Lz4PickleIndexerCache :members: :special-members: __len__, __iter__, __getitem__