Useful analysis data: word lists
Feb. 23rd, 2010 05:32 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[community profile]](https://www.dreamwidth.org/img/silk/identity/community.png)
Every now and again I get the urge to do something with mass amounts of post data I'm accumulating, and I thought I'd share a useful tool for text analysis, which is lists of words:
http://wordlist.sourceforge.net/
I'm currently considering using the 2+2lemma.txt from 12dicts for something, to reduce the number of distinct words. I'll probably have to add common slang words, though, and names for services like Twitter/Facebook/MySpace/GMail/etc.
ETA: Social Media, Data Mining & Machine Learning might be an interesting blog read along these lines. And if you have University access, you may be able to get ahold of Modeling and Data Mining
in Blogosphere.
http://wordlist.sourceforge.net/
I'm currently considering using the 2+2lemma.txt from 12dicts for something, to reduce the number of distinct words. I'll probably have to add common slang words, though, and names for services like Twitter/Facebook/MySpace/GMail/etc.
ETA: Social Media, Data Mining & Machine Learning might be an interesting blog read along these lines. And if you have University access, you may be able to get ahold of Modeling and Data Mining
in Blogosphere.