Purpose (The Canterbury Corpus)


	home \| purpose \| summary \| details \| corpora \| methods \| related \| credits \| faq

Purpose of the Canterbury Corpus

These pages report results from the new Canterbury Corpus, a replacement for the Calgary Corpus. They also have results from a number of other files, including the Calgary Corpus.

The Canterbury Corpus file set has being developed specifically for testing new compression algorithms. The files were selected based on their ability to provide representative performance results.

This set of files is designed to replace the Calgary Corpus, which is now over ten years old. The Calgary Corpus was first presented in the book Text Compression by Bell, Cleary, and Witten, published in 1990. Results on files from the corpus have been reported by many researchers since then.

Several sets of results are available on this web site. As well as the new Canterbury Corpus, a corpus of large files has been tested, and results for the original Calgary Corpus are also available.

We would like to add results from your favorite compression algorithm to this page. You can supply us with a copy of your algorithm, or get the Canterbury Corpus by ftp and send us your results.

This corpus is primarily intended for testing new algorithms, rather than to compare the numerous production systems around (see the Archive Compression Test for the latter). Also, it is for lossless algorithms; see the Waterloo BragZone for image compression comparisons.


This page last updated Monday, December 11, 2000 by Matt Powell