exor674: Computer Science is my girlfriend (Default)
[personal profile] exor6742011-04-24 11:14 pm

six-degrees-esque tool

http://edges.andreanall.com/ -- Enter a username in box A, enter a username in box B and the site will give you a path between the two users. For this to work, both accounts have to had existed since mid-March. You can choose for the tool to consider "Mutual Trust", "Mutual Watch", "Mutual Trust ( with membership )", "Any Edge", and "Using Interests". Please note that "Any Edge" and "Using Interests" have quite strict limits on how many accounts will be considered.

If anyone is interested in the underlying data, feel free to comment and I'll give you a link ( note however that it'll be something that's not in a format you can easily read or in a format that is likely to cause massive issues when unpacked on pretty much any filesystem )
foxfirefey: Fox stealing an egg. (Default)
[personal profile] foxfirefey2011-04-14 05:27 pm

Recent DW stats

So, we know that when LJ is on the fritz it means a spike in DW activity until LJ smooths out and most people go back. You can see some pretty serious spikes on the newbyday stat:

newbyday 2011-04-02 402
newbyday 2011-04-03 243
newbyday 2011-04-04 2506
newbyday 2011-04-05 3658
newbyday 2011-04-06 16226
newbyday 2011-04-07 22136
newbyday 2011-04-08 10839
newbyday 2011-04-09 9735
newbyday 2011-04-10 4214
newbyday 2011-04-11 468
newbyday 2011-04-12 6897
newbyday 2011-04-13 8317

Of course, newbyday includes ALL accounts, including OpenID accounts that are created during imports through comments or access giving. That's why 4/11 is so low compared to the others--the importer had been shut down.

So in order to gauge activity spikes, it's easier to look at other metrics like active/posting, which would include people coming back to fallow accounts they haven't touched in a month as well as new accounts that have been signing in and posting content. Here's the numbers comparing 3/31/2011 to 4/13/2011 (just after 12AM two weeks apart to the day, so it's a better comparison than different days, since activity varies through the week):

Stat3/31/20114/13/2011# increase% increase
Posted last 30 days18,44822,6334,18522.6%
Posted last 7 days12,34116,3113,97033.2%
Posted last 24 hours5,6906,8471,15720.3%
foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2010-05-04 05:07 pm

Danah Boyd on data collection

A critique of Facebook's approach to privacy: Privacy and publicity in the context of "Big Data"

This article has some very good considerations when it comes to doing data analysis on Big Data.
foxfirefey: Dreamwidth: social content with dimension. (dreamwidth)
[personal profile] foxfirefey2010-04-30 03:29 am

User to user mutual relationship graphs

So, this is my offering to [community profile] three_weeks_for_dw: I am making graphs of participant circles for posts and comments to communities! (And I hope to continue making more kinds of fun graphs you can get by certain activities.) They can be explained thusly:

  • The graph consists of nodes (the circles with the journals names on them) and edges (the lines between the nodes).
  • A line between two nodes means that those two users have a mutual relationship. That means they subscribe to each other, give each other access, or both.
  • The size of the node indicates how many mutual relationships that user has in total--the more, the bigger the node.
  • The color of the node indicates how many relative mutual relationships that user has with you--the darker the red, the more connections you two share.
  • Note that you don't show up in the graphs, since you are connected to everybody
  • If you have a relationship with a user but they don't have any relationships with anybody else you're connected to, they don't show up either (I call them "singlets").

Example time! )

You can get a network graph if the account you're requesting if with that account you have made, since April 26th or later:

  • Three posts to three communities.
  • Comments to six different posts in at least three different communities.

For each additional post or 3 comments you make to a community, you can choose one of the bonus add ons:

  • A list of all of the singlets--people who don't have any mutual relationships with other people in your circle
  • A graph that highlights connections between people in your biggest clique and a list of your biggest clique(s)
  • For you geeks, a dot file you can play around with in GraphViz.

Put the links to the posts or comments in your request, and follow the rules of the communities you are posting to!
Unfortunately, the posts and comments will need to be public ones, so I can see them and verify. By default, I will put your graph(s) up on my web hosting and link you to it, so you can point to it in your journal easily (although it really is best if you host it for yourself, just in case), but if you'd prefer me to email it to you, just let me know.

If you don't know how to find communities you're interested in, feel free to make a comment here asking me for possible recommendations. I or others passing by will try to find some for you--though I'll note that I'm not in fandom, so my experience will be limited in that area.

foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2010-03-20 01:42 pm

Word frequencies in posts

The following's a list of word frequencies from the latest page, which displays a truncated/cut version of the latest public posts to Dreamwidth. A word only counts once per post; this data wasn't cleaned for unicode entities, although I should do that, so it's not as perfect as it should be, but I thought some people might have a passing curiosity like I did. The following table only shows words that show up 500 times or more in posts between the beginning of the year and yesterday. All the words have been lowercased.

Table of word frequency doom! )
foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2010-02-23 05:32 pm

Useful analysis data: word lists

Every now and again I get the urge to do something with mass amounts of post data I'm accumulating, and I thought I'd share a useful tool for text analysis, which is lists of words:


I'm currently considering using the 2+2lemma.txt from 12dicts for something, to reduce the number of distinct words. I'll probably have to add common slang words, though, and names for services like Twitter/Facebook/MySpace/GMail/etc.

ETA: Social Media, Data Mining & Machine Learning might be an interesting blog read along these lines. And if you have University access, you may be able to get ahold of Modeling and Data Mining
in Blogosphere
foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2010-01-08 12:56 pm

Update to DWminion

It pretty much only spiders edges files still, but I fixed it now that they're only available at username.dreamwidth.org/data/edges

You can find the Mercurial library to my Python library in progress here: http://bitbucket.org/foxfirefey/dwminion/
foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2009-10-27 07:10 am

Some changes made to the edges file

* Now, the username and account number are available, as "name" and "account_id" values.
* OpenID accounts now use their username forms (ext_12345) to simplify spidering (some OpenID URLs are tricky to fetch as a URL parameter), and have "display_name" values for their edges.
lastdance: (Default)
[personal profile] lastdance2009-09-15 05:49 pm

S2 Client Protocol

If this isn't the right place, I'd be happy to be directed elsewhere. I'm looking for any documentation or examples around Dreamwidth's S2 style interface.

I know LJ has a few pages about it in their S2 manual. I can see that http://www.dreamwidth.org/interface/s2/layerid correctly retrives S2 layers, so it's available on Dreamwidth as well.

Are there any existing clients that use this interface for retrieving/uploading layers? Does Dreamwidth's implementation differ from LJ (for example, is it still required to send "application/x-danga-s2-layer" as Content-Type)?

I'd like to be able to use this interface to dynamically change my style layer, so I thought I'd check if any guidelines/best practices/documentation existed before I resorted to the trial-and-error method.
cesy: "Cesy" - An old-fashioned quill and ink (Default)
[personal profile] cesy2009-09-15 04:15 pm

Styles stats

I just had an idea. Is it possible to collect statistics on how many users have chosen each of the system styles? Obviously it would need to exclude / account for custom styles in some way, and feed accounts which can't change their style. It would be interesting to see actual statistics on which styles and colour themes are most commonly used.
foxfirefey: A firework bursts over the Las Vegas night skyline. (yay)
[personal profile] foxfirefey2009-09-13 02:25 pm
Entry tags:

New data sources wiki page

Edge data is live from last night's code push, folks! In celebration, I've made wiki pages on some Dreamwidth data sources:

foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2009-09-12 01:37 pm
Entry tags:

Upcoming bug fix for ?auth=digest

There's currently a bug with an attachment for fixing ?auth=digest. (See this post for more info of the problem on LJ.) This should make it so that wget should work again--currently curl still does, though.
foxfirefey: Fox stealing an egg. (Default)
[personal profile] foxfirefey2009-09-09 05:19 pm
Entry tags:

Edges patch is committed!

After some further refinements by [staff profile] mark, edge patches have been committed! That means they're going out in the next code push! Exciting!

You can see the format here (I recommend JSONView) for viewing):


Names have been replaced with userids, which are then defined with username/type in a different struct.
foxfirefey: Fox stealing an egg. (Default)
[personal profile] foxfirefey2009-08-22 12:13 am

Interesting book on Natural Language Processing

For people who want to learn how to do natural language processing, the NLTK library has a CC'd book, Natural Language Processing with Python. Might be a useful resource for people wanting to make things based off of people's post content.
foxfirefey: Fox stealing an egg. (Default)
[personal profile] foxfirefey2009-08-20 10:10 am
Entry tags:

Turn data/interests into JSON?

Now that fdata.bml is in JSON, would it be expedient to make interests.bml into JSON?

Pros: would be easier for Javascript mashups to load, is a standard format for parsing
Cons: would break compatibility with older LJ-based tools, would take up a bit more space than the current file.
foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2009-08-18 06:09 pm
Entry tags:

Let's talk about FOAF

All users have a FOAF file, like so:


There's a lot of useful data here, but I think it could be expanded to have some more data. Onward and out! )
foxfirefey: A series of interconnected dots in the shape of an M. (memewidth)
[personal profile] foxfirefey2009-08-17 06:56 pm
Entry tags:

Upcoming fdata replacement: edges

So far Dreamwidth has been sorely lacking an equivalent to LJ's fdata, the data file that told you the relationships of a user with other users. Our new equivalent to that will be called edges. Because of the new complexity of DW's relationship structure, and due to a desire to use more standards, this data file will be in the JSON format. The bug for it is Bug 857.

I currently have my proposed patch in. It's not committed yet, or live on the site, but I think it probably will be within the next month. If you want a chance to prepare, I have my patch applied on my development server. It's possible that details will change with feedback, though, so be forewarned! My server has open registration, so you can feel free to create accounts and play around, but I've made a small set of test accounts such as this one. You can view an example edge data file here:


Here is an example of a community's edges file:


The data is in one big hash. There is an "account_type" variable, set to the type of the account. That makes it easy to determine what kind of account you are fetching data for. The other base variables are relationship types, such as "watched", "watched_by", or "member_of", that are a hash of arrays--all the accounts with a given relationship are split up into account types.

Any commentary on this planned data file? Problems you want to fix now, before it's implemented? Any plans for it you want to share?
tobyaw: (Default)
[personal profile] tobyaw2009-05-28 07:27 pm
Entry tags:

Which of your LJ friends are on DW?

I’ve written a web page that looks at your LJ friends list, and tells you which of the usernames exist as accounts on Dreamwidth. Of course having the same username doesn’t necessarily mean they are the same person, but it might give you a head start in populating your Dreamwidth friends list.


The implementation is naïve; there may be a more efficient way to see if a DW account exists. Source (and a command-line version) at http://github.com/filmgold/dw-tools

(Apologies if this isn’t an appropriate community to post this to — not absolutely clear from the community description.)
foxfirefey: Dreamwidth: social content with dimension. (dreamwidth)
[personal profile] foxfirefey2009-04-15 06:43 pm
Entry tags:

This is how our site grows

Hello, memers! Boy, there sure are a lot more of you than there were two weeks ago, because of how much DW is growing. With the help of some scripts I set up a while ago and [personal profile] sophie's charting prowess, we've made a graph of exactly how much DW has grown lately. )