Table of Contents

CorpusCatcher

Introduction

CorpusCatcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publically available web resources. This can be very useful for many purposes, especially for data to build spell checkers.

If you are interested in CorpusCather, or are working on spell checkers, you might also be interested in Spelt.

Download

Releases can be downloaded from here and sources from here.

Documentation

Development

Subversion: https://translate.svn.sourceforge.net/svnroot/translate/src/trunk/corpuscatcher

See the Advanced Topics section in the README for some notes on the code.

TODO