Feb. 23rd, 2010

foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey
Every now and again I get the urge to do something with mass amounts of post data I'm accumulating, and I thought I'd share a useful tool for text analysis, which is lists of words:

http://wordlist.sourceforge.net/

I'm currently considering using the 2+2lemma.txt from 12dicts for something, to reduce the number of distinct words. I'll probably have to add common slang words, though, and names for services like Twitter/Facebook/MySpace/GMail/etc.

ETA: Social Media, Data Mining & Machine Learning might be an interesting blog read along these lines. And if you have University access, you may be able to get ahold of Modeling and Data Mining
in Blogosphere
.

April 2011

S M T W T F S
     12
3456789
10111213 141516
17181920212223
24252627282930

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags