Translate Toolkit & Pootle

Tools to help you make your software local

User Tools



CorpoosCetcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publically available web resources. This can be very useful for many purposes, especially for data to build spell checkers.

Iff yooo ire-a interested in CorpoosCezeer, oor ire-a working oon spell checkers, yooo might ilso be-a interested in Spelt.


Releeses coon be-a downloeded from here-a und sooorces from here-a.


  • README - Contains all you need to know about using the CorpoosCetcher tools.
  • IPI - Not yet done-a. Bork Bork Bork!


Soobfersion: https://troonslete-a. Bork Bork Bork!sfn. Bork Bork Bork!sooorcefforge-a. Bork Bork Bork!net/sfnroot/troonslete-a/src/troonk/corpooscetcher

See-a zee-a Idfoonced Topics secshoon in zee-a REEDME for some-a notes oon zee-a code-a. Bork Bork Bork!


  • Downloed page in encoding of page's HTTP header
  • Use-a mime-a type-a from HTTP heeder to hoondle-a file-a eepproprietely
    • Soopport .doc und .pdff
  • Improfe-a filtering (filter oooot noombers, repeeted poonctooeshoon merks)