# Experiments on TREC Robust 2004 This document gives a flavour of indexing and obtaining retrieval baselines on the TREC Robust04 test collections. You can run these experiments for yourself by using the [associated provided notebook](https://github.com/terrier-org/pyterrier/blob/master/examples/experiments/Robust04.ipynb). You need to have obtain the TREC Disks 4 & 5 corpora [from NIST](https://trec.nist.gov/data/cd45/index.html). Topics and Qrels are provided through the `"trec-robust-2004"` PyTerrier dataset. ## Indexing Indexing is fairly simply. We apply a filter to remove files that shouldn't be indexed, including the Congressional Record. Indexing on a reasonable machine using a single-thread takes around 7 minutes. ```python DISK45_PATH="/path/to/disk45" INDEX_DIR="/path/to/create/the/index" files = pt.io.find_files(DISK45_PATH) # no-one indexes the congressional record in directory /CR/ # indeed, recent copies from NIST dont contain it # we also remove some of the other unneeded files bad = ['/CR/', '/AUX/', 'READCHG', 'READFRCG'] for b in bad: files = list(filter(lambda f: b not in f, files)) indexer = pt.TRECCollectionIndexer(INDEX_DIR, verbose=True) indexref = indexer.index(files) ``` ## Retrieval - Simple Weighting Models Here we define and evaluate standard weighting models. ```python BM25 = pt.terrier.Retriever(index, wmodel="BM25") DPH = pt.terrier.Retriever(index, wmodel="DPH") PL2 = pt.terrier.Retriever(index, wmodel="PL2") DLM = pt.terrier.Retriever(index, wmodel="DirichletLM") pt.Experiment( [BM25, DPH, PL2, DLM], pt.get_dataset("trec-robust-2004").get_topics(), pt.get_dataset("trec-robust-2004").get_qrels(), eval_metrics=["map", "P_10", "P_20", "ndcg_cut_20"], names=["BM25", "DPH", "PL2", "Dirichlet QL"] ) ``` Results are as follows: | | name | map | P_10 | P_20 | ndcg_cut_20 | |---:|:-------------|---------:|---------:|---------:|--------------:| | 0 | BM25 | 0.241763 | 0.426104 | 0.349398 | 0.408061 | | 1 | DPH | 0.251307 | 0.44739 | 0.361446 | 0.422524 | | 2 | PL2 | 0.229386 | 0.420884 | 0.343775 | 0.402179 | | 3 | Dirichlet QL | 0.236826 | 0.407631 | 0.337952 | 0.39687 | ## Retrieval - Query Expansion Here we define and evaluate standard weighting models on top of DPH and BM25, respectively. We use the default Terrier parameters for query expansion, namely: - 10 expansion terms - 3 documents - For RM3, a lambda value of 0.5 ```python Bo1 = pt.rewrite.Bo1QueryExpansion(index) KL = pt.rewrite.KLQueryExpansion(index) RM3 = pt.rewrite.RM3(index) pt.Experiment( [ BM25, BM25 >> Bo1 >> BM25, BM25 >> KL >> BM25, BM25 >> RM3 >> BM25, ], pt.get_dataset("trec-robust-2004").get_topics(), pt.get_dataset("trec-robust-2004").get_qrels(), eval_metrics=["map", "P_10", "P_20", "ndcg_cut_20"], names=["BM25", "+Bo1", "+KL", "+RM3"] ) pt.Experiment( [ DPH, DPH >> Bo1 >> DPH, DPH >> KL >> DPH, DPH >> RM3 >> DPH, ], pt.get_dataset("trec-robust-2004").get_topics(), pt.get_dataset("trec-robust-2004").get_qrels(), eval_metrics=["map", "P_10", "P_20", "ndcg_cut_20"], names=["DPH", "+Bo1", "+KL", "+RM3"] ) ``` Results are as follows: | | name | map | P_10 | P_20 | ndcg_cut_20 | |---:|:-------|---------:|---------:|---------:|--------------:| | 0 | BM25 | 0.241763 | 0.426104 | 0.349398 | 0.408061 | | 1 | +Bo1 |*0.279458*| 0.448996 | 0.378916 | *0.436533*| | 2 | +KL | 0.279401 | 0.444177 | 0.378313 | 0.435196 | | 3 | +RM3 | 0.276544 |*0.453815*|*0.379518*| 0.430367 | |----|--------|----------|----------|----------|---------------| | 0 | DPH | 0.251307 | 0.447390 | 0.361446 | 0.422524 | | 1 | +Bo1 | 0.285334 | 0.458635 | 0.387952 | *0.444528*| | 2 | +KL |*0.285720*| 0.458635 | 0.386948 | 0.442636 | | 3 | +RM3 | 0.281796 |*0.461044*|*0.389960*| 0.441863 |