Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.
Tool | Description | Categories | Platform | Pricing |
---|---|---|---|---|
jTokenizer | Tokenizing natural language | tokenizer | Free | |
Natural Language Toolkit | Platform for building Python programs to work with human language data | tokenizer, tagger | Unix, Mac, Windows (+Python 3.4) | Free |
Tweet NLP | Tweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html | pos tagger, tokenizer, parser | Free | |
TXM | XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment. | text analysis, concordancer, r, statistics, search tool, tokenizer, xml | Windows,Mac,Linux,Tomcat | Free |
Unitok | Tool that splits texts into tokens | tokenizer | Free | |
YACSI Chinese Tokeniser / PoS Tagger | A Chinese tokenizer and PoS tagger | chinese, tokenizer, pos tagger | Windows | Free |
SoMaJo | A tokenizer and sentence splitter for German and English web and social media texts. | tokenizer, sentence boundary detector | Linux, Mac, Windows | Free, Open Source |