( The Canterbury Corpus )
home | purpose | summary | details | corpora | methods | related | credits | faq

Frequently Asked Questions

Q: Why don't you have results for popular compression methods like JPEG and MP3?

The Canterbury Corpus focuses on lossless (text) compression methods. Many popular compression methods like JPEG and MP3 are lossy—that is, they achieve high compression ratios at the expense of output fidelity.

The related sites page contains links to other sites with similar purposes which examine lossy methods. For more information about lossless compression, see Managing Gigabytes by Witten, Moffatt and Bell, or Text Compression by Bell, Cleary, and Witten.

That's not to say that the methods presented here aren't used in popular software. A notable example is lzrw, a variant of LZW compression, the driving force behind GIF and TIFF image compression.

Q: My favourite compression method isn't listed here. What's up with that?

We focus on compression algorithms, rather than individual implementations. Chances are that your favourite compression software uses one (or more) of the compression methods investigated here.

Q: Can I get the source code for (some compression method)?

This depends on the method. Some methods are "standard" in the Unix world, and as such, source code is readily available. Source code for other methods may be obtained by e-mailing the author; contact information (where available) is provided on the methods description pages.

Naturally, it is necessary that the source code for certain compression methods not be released for reasons of commercial sensitivity.

Q: Can I get the results in (some format)?

The results from the compression experiments are currently available in HTML, LaTeX and comma-delimited formats, which should be suitable for most purposes. However, if several people request a new format, and that format is easily generated as plain text, then we may consider adding it. Note that Microsoft Word format is an unlikely candidate.

Q: I love your FAQ list, but there aren't enough questions in it. Why so few?

Not many people ask questions, I guess.

This page last updated Friday, December 15, 2000 by Matt Powell Department of Computer Science University of Canterbury