Stack Overflow: Voting Patterns in Detail

Up, Down, all around. Offensive? Close? Spam! Inform Moderator…

Continuing to investigate user voting patterns on Stack Overflow has become a hobby (obsession?) of mine.  Thanks in part to my curiosity and in part to nobody_(known know mysteriously as ‘Kyle Cronin‘; the administrator of the Unofficial Stack Overflow Meta Discussion Forum) egging me on, I quickly whipped up a graph showing the propensity to “Up Vote versus Reputation”.

Up Vote (as a percentage; % = up / (up + down)) for five reputation tiers of users with at least one up vote and one down vote and a reputation of at least 100 (when one is allowed to down vote.) The x-axis represent five user tiers.
The first three represent ~5,000 each, the fourth ~1,500 and the fifth ~125.

It is very clear that users with higher reputations are more likely to down vote.  But this lead to other questions, such as:

  • Do the users with older accounts, especially beta users, make up the negative voting club?
  • Did the down votes shift downwards because of new features introduced to Stack Overflow, specifically new voting options like ‘Spam’, ‘Offensive’, ‘Inform Moderator’ and ‘Close’?

To try and answer the first question, I queried the database, tinkered with it in Excel and the resulting graph is below.  The blue line is “Average % Up Votes of All Votes by User Join Date”.  The red series is the “Average Reputation by Join Date”.

(Larger Image)

Notes on the above graph:

  • The percentage is only for users with
    • at least one up vote
    • one down vote
    • reputation of at least 100
  • The yellow data point on the Average Reputation series represents the day Stack Overflow sign-ups were open to the general public.  Our own little Eternal September, if you will.  (Not that bad, actually.)
  • The purple spike in the %-Up Votes is caused by Niel Butterworth, who has both many votes (~1700 in the data dump) and a 50/50 Up Vote versus Down Vote ratio.
  • The leveling of the average reputation curve a few weeks after (end of September/early October) Stack Overflow went public is interesting.  It seems, to no surprise, that beta users and the initial public users are much more into SO than the follow up users.
  • The far left data point represents seven users who got accounts on 31 July 2008, and are the movers-n-shakers of Stack Overflow.  (Jeff Atwood, Jarrod Dixon, Joel Spolsky, and Jon Galloway) who understandably have very high reputation scores.

To try and answer the second question above (“Did down votes shift to other types of voting options as they came on line”), I queried and graphed the data again to produce a view of “Vote Type as a Percentage of All Votes Cast”. Example: Where people in the past SOpedians would down vote a question or answer if they found it was spam, later on they could mark the post as spam or offensive. While the two options are not mutually exclusive, the down vote costs the user a reputation point. Since roughly 90% of votes are up votes, the graph zooms in on the top 10%.

(Larger Image)

It seems that the new voting options do not impact up/down voting patterns significantly.  Note the sudden growth of ‘close’ votes on the trailing week or two of the graph.  It seems to be that this is a change in the raw data rather than a sudden burst of close votes, but am not sure because I myself did not rank the power to vote for closing a question until around that time.   Also, ‘close’ votes are only valid for questions and not answers, unlike up and down votes.

The last few weeks of the data dump look interesting, so I zoomed in there and produced the below graph.

(Larger Image)

The burst and subsequent tapering off of Spam, Offensive and Inform Moderator votes seems very suspicious to me.  Was this actual activity?  Or was there a data collection issue?  Or was that when these voting features were created?   I’m guessing it was a data collection issue.  Future data dumps will show this to be true or not, I hope.

Whatever the cause, the number of votes here is still to small to impact the percentage of up votes over time to any degree.  My conclusion is that

  1. SOpedians with high reputations are more likely to vote down questions and answers
  2. As Stack Overflow gains more and more users with lower reputations, these users are less likely to vote down and bring up the over all percentage of up votes against all votes over time.