CorpusCatcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publically available web resources. This can be very useful for many purposes, especially for data to build spell checkers.
If you are interested in CorpusCather, or are working on spell checkers, you might also be interested in Spelt.
Subversion: https://translate.svn.sourceforge.net/svnroot/translate/src/trunk/corpuscatcher
See the Advanced Topics section in the README for some notes on the code.