StatBot: Analysing Digg’s Frontpage Part I - Growth
September 28, 2007 | 2:51 pmUpdate: Digg this here. Thanks (again) to Rob for submitting!
Digg. The perfect time killer, as well as server killer if you are a webmaster. A Google search for the word “Digg” gives us 349,000,000 hits, while a search result for the actual word “Dig” gives us just 122,000,000 hits. It’s that effin popular. Also, it was the first site which I tried to analyze with Statbot, an experiment that went completely bonkers due to my underpowered machine. But this time, Rob La Gesse, total cool-head he is, ran my code on his monster servers and is the man who made this dream 5 part analysis of Digg possible. Despite many stupid mistakes on my part, he held his cool and made it possible for me to get an archive of the Comments and Metadata of all the Digg Stories that made it to the front page by June. Thanks yo!
So, here’s the first part of the 5 part analysis, dealing with the how Digg’s front page grew. Note that I’m dealing with only stories that made it to the front page: I could do it for the whole of Digg, but I just don’t have that much power in place.
Size
Just the size of Digg is huge, intimidating even. An XML file containing all the front page stories and their comments was 2.5 gigs, and that was only till June. Here, I’m not considering comments for now, and I did update dataset to include every front page story till 24/9/07. Starting at 1/12/2004 till 9/24/2007, that gives the dataset 1027 days (dangerously close to 210, no?) or 2.8 years of stories. Those 1027 days contained a total of 61,614 stories, at an average of 60 stories a day. Here’s a graph showing the average number of stories reaching the front page per day:
It started small, took off around Feb 2005, had a stable period during starting July 2005 (when they launched Digg v2) to June 2006, and then took off again. In fact, the average number of stories reaching the front page before June 2006 was 35, and now, it is 60. This means, on a typical day, compared to June 2006, 15 more stories get on the front page today! Investigating the reasons, we arrive at this graph, showing just the No. of Stories reaching the front page per day:
See that huge first spike during July 2006? That was when Digg v3 was launched. There is a similar spike during July 2005 when Digg v2 was launched, though it was not that big. During Digg v3 launch day, a total of 212 stories made it to the front page on that day. I also think that they made some changes to the front page-story-picking algorithm when they released Digg v3, as the number keeps fluctuating for the next few months: Up at around 100 for a month or two, then back to normal for about 2 months, than up again during Sep-Nov 2006 before gradually coming down. Then a spike on May 2007 – the infamous (is it correct in this context?) censorship of the weirdo digits day(aka the AACS encryption key fiasco), when an all-time high of 215 stories made it, with at least 90% of them about the magic number. And ever since, cruising along at an average of 60 stories on the front page a day…
Also, note that the “girth” of the graph above is rather thin before Digg v3, and rather thick after it, meaning that before v3, there was not a lot of day-to-day variation in the number of stories that made it to the front page everyday. That changes a lot with v3. Methinks yet another algorithmic change…
Now, comparing this with traffic data from Alexa (the unreliable-yet-only-traffic-source-I-have-access-to):
That steep climb was around the same time as the launch of v3, so the increase in the number of stories on to the front page can be attributed at least partly to more visitors and consequently more users. However, the peak of traffic was during Nov-Dec 2006, yet the number of stories “dugg” during that time was lower than the Sept-Oct 2006 period, which had considerably lesser traffic. So, my hypothesis is that new user registrations had peaked during that time, causing a flood of diggs at that time. Unfortunately, I can’t test this hypothesis, since I don’t have access to the user registrations data, and Alexa is known to be damn inaccurate. It’s my best guess with the available data though.
Update:Indeed, it turns out that Alexa was highly inaccurate. Nathan Waters, an entrepreneur, comments about it:
I think the massive spike in traffic according to Alexa during Nov-Dec 2006 was because many people installed the Alexa toolbar. Because if you look at Slashdot and a couple of other popular tech sites they all had that same spike.
So the hypothesis is that a couple of “OMG Digg is in the top 100 sites on Alexa” stories made it to the front page, people checked it out, installed the Alexa toolbar and so when they also visited Slashdot and others it caused the spike.
So, inorder to check it, I compared the Alexa Data for Digg and Slashdot,

So, no sudden growth, no major spikes. Move along, people.
The traffic data shows that Digg traffic is tapering off. I wonder why that is…
So, summarizing:
- An average of 60 stories make it to the front page every day
- This is up from 35 during June 2006, and it jumped up when Digg v3 was launched
- They almost definitely made changes to the promoting algorithm during the Digg v3 launch.
- The day on which they started (and ended) censoring those HD-DVD Encryption keys had the highest number of stories promoted to the front page. And, most of those were about the keys.
- It got a huge spike of traffic just around the same time they launched Digg v3. Traffic peaked during Dec 2006, and has gradually been declining since.
Diggs
Diggs are the very basis on which Digg runs. Totally, those 1027 days of front page stories were dugg 47,053,590 times (yes, 43 million times), at an average of 763 diggs a story or a whooping 45,816 diggs a day. That’s a lot of finger taps and mouse clicks to build the Digg front page. Here’s how the Average Number of Diggs per story that made it to the front page grew:
Very stable, under 100 till around July 2005, and then takes off with the launch of Digg v2. Another plateau around Sept-Dec 2006, and then it starts crawling up again. So, till July of 2005, with v1 of Digg, you could get your Story to the front page with way less than 100 diggs, and it would go off the front page with less than 100 diggs, but not so now! The average number of Diggs per story now is 763, but the Median is 587, meaning that there are more stories with lesser than 763 diggs than above. In fact, 58% of all stories on the front page or 35,905 stories have less than 763 Diggs. Extreme outliers were not few: only 16 have more than 10,000 Diggs, but 9,863 had less than 100. So, here’s another, more accurate graph:
Yay, better one. So, yes, till July 2005, they were getting 100 or so Diggs on the front page at most. On July 1st, Digg v2 launched & real growth in the number of stories started then. It started growing steadily, and goes up and down, sometimes even reaching 1200 diggs a story on the front page. There’s a small dip during Oct-2006, which is when Digg had the largest volume of stories hitting the front page per day, and it rose to an all time high during the AACS Encryption Key fiasco day. That whole month reverberated with energy, with a lot of stories getting Dugg high. It still is pretty higher than average though, with an average story reaching the front page today getting about 1000 Diggs.
Also, the Digg v3 launch had almost no effect on the Diggs per story. In fact, if anything, the V3 launch induced a small dip in it, but besides the initial growth during July 2005, the dip during Sept-Dec 2006 and the spike during the Key Censorship period, there seems to be no excitement around, just organic growth. And, to confirm that it is just organic growth…
Yeah, so besides the big spike during the AACS Encryption Key Censorship fiasco, the No. of Diggs that stories on the front page get is growing organically.
Distribution of Diggs
Now, looking at the distribution of Diggs,
That 75% of all stories on the front page have less than 1000 diggs, with 43% less than 500 diggs. Only 2% have more than 3000. So, if your story does make it to the front page, there’s only a 2% chance it’ll get more than 3000 diggs, and a big 43% chance that it won’t cross 500 diggs. If your story’s crossed the 1000 Diggs mark, consider yourself well above the rest.
I’ll post the specials (Most diggs, least Diggs, etc) on another part.
So, summarizing,
- Each story on the front page gets an average of 763 Diggs.
- Each day, stories on the front page get 45,816 diggs.
- The median no. of Diggs per Story, however, is 587 diggs per story, meaning that there are more stories (58%) with lesser than 763 diggs.
- Till July 2005, when Digg v2 launched or for 7 months after starting up, they didn’t have much people to Digg stories: The average number of Diggs per story that reached the front page was just around 75.
- The number of Diggs per story is growing, but organically.
- The day with the most number of Diggs ever was May 1st, when the HD-DVD Encryption Key censorship Saga took place. A whooping 367,385 diggs were dugg just on the front page stories that day alone. Beats the closest competitor (Digg v3 Launch) by about a 3x margin.
- 75% of the stories on the front page do not cross the 1000 Diggs mark. 43% do not cross the 500 diggs mark. Only 2% have crossed the 3000 diggs mark.
Comments
Comments, the place where the Digg community lives. It takes just a click to Digg something, but a longer attention span to actually type out the comment. The 61,614 stories that made it to the front page had 4,553,052 (yes, 4.5 million) comments, at an average of just 73.8 comments per story or 4,433 comments per day. Since we are ignoring those Average graphs anyway, I’ll skip that and show you the more-useful Comments per Story graph:
As usual, kicked off after July 2005, when Digg v2 went live: They had close to 0 comments per post for quite some time till then. Unlike the number of Diggs, the number of comments per post does not show many hiccups where it goes way up or way down. There is, an upward trend starting at Nov 2006 and accelerating heavily at May 2006 (The AACS Encryption Key Censorship Fiasco again (I hate having to type that all over again!))
Splitting the comments up,
We see that a bulk of the stories (50% of them) get 25-100 comments, with only 254 having ever gotten more than 500 (the percentage is so small excel rounded it off to 0). Also, there were 338 stories with absolutely no comments, but still made it to the front page. However, only 20 stories with 0 comments made it to the front page this year, so I’d guess it’s a thing of the past.
Customarily summarizing,
- Not too many commenting binges around.
- 50% of all stories on the front page get 25-100 comments. The number of stories with greater than 500 comments is so low, the percentage is 0.
- 338 stories made it to the front page and out of it with absolutely 0 comments, though only 20 of them were this year.
Weekends or Weekdays?
When do the most number of stories make it to the frontpage?
Saturday has the lowest number of stories reaching the front page, followed by Friday. I kind of expected Sunday to follow Saturday, but turns out more Diggers are active on Sundays than I thought. Monday, Tuesday & Wednesday are neck to neck, with Thursday falling slightly behind (and, in case you are wondering, July 1st, the Day with the Most No. of Stories on the front page, was a Sunday). Friday is slow news day, something which I observed in my analysis of Engadget as well.
And, the number of Diggs each story is getting?
Saturday and Sunday might have lesser number of stories getting to the front page, but those that do get there receive more diggs than the average. Friday has lesser number of stories and lesser number of Diggs per story: It truly is slow news day. Monday too, while having the highest number of stories on the front page, fares a bit poorly on the number of Diggs each story gets. However, note that the average number of diggs each story on the front page gets as a whole is 763, so the variations, while they do exist, are not that big.
So, if you want to get dugg, the easiest day to get dug is on a Monday-Wednesday, while the best time to get dugg (in terms of traffic, that is) is during a weekend. However, the previous sentence is extremely shabby and breakable: The easiest and best way to get dugg is to write great content J
Now, moving on to comments,
Friday’s weird here as well, with lower number of comments than the other days. Still, not by much though, as the overall average is 73, and Friday’s average is just about 5 less than that. The other days all are almost equal, with a small dip during Monday.
Customarily Summarization:
- Saturday has the least number of stories on to the front page, with Friday closely following
- Saturday, however, has the most number of diggs per story on the front page.
- Friday has the least number of diggs per story on the front page.
- In general, Digg activity is more during weekdays than on weekends. Methinks this is due to more people surfing Digg at work, though I could be wrong here.
What’s next?
So, that concludes Part I of my Digg analysis. I still have lot more data to analyze (users, topics, the actual sites to which they are linking out to, etc.) Part II will be link analysis, containing analysis of which sites are getting dugg the most, how many of them are actually alive right now, how many are blogs, how are they distributed, are many stories coming from a small number of sites, etc,. Part III would be about Topics, Part IV about the users who got those stories on to the front page, Part V would probably be about the words used in the Title and Description and then I’ll end with Part VI containing a long list of Trivia that nobody needs to know. Except for the waiting part (some reports take about 10 minutes to run, which sucks), writing this is Fun!
I would kill to have the data about when the story was submitted to Digg as well, but sadly, I don’t have it. Also, my data’s dates have no time data, which eliminates another area of interesting analysis as well. If only Digg provided data dumps the way Wikipedia does…. (Oh please, Kevin Rose or any of the other Digg staff, if you read this, can you do something like that? It shouldn’t be a problem, with the data being licensed under creative commons public domain, no?)
Teasers for Part II: Sites-Linked-To Analysis
Here’s some of the interesting data I found while researching for Part II:
- Nearly 72% of all sites on Digg get on to the front page only once. So, if this story gets on Digg, I’ve broken a good barrier!
- The top ten most submitted sites make up around 11% of all front page stories.
- There are two blogs in the top two, and they’re what I’ll call usual suspects: Engadget and Gizmodo. Engadget is higher up the order than Gizmodo, and I’ll reveal the order in Part II
- Youtube’s up in the top ten, while Flickr’s up in the top 15.
- Torrentfreak has about the same number of stories on the frontpage than all of ZDNet blogs combined. Go figure!
- The Unofficial Apple Weblog had more stories than Apple.com itself.
- Digg.com itself has been dugg 25 times.
This is just a preview: More to come!
Found this useful? Want a custom analysis?
Found this useful? You can help me write more stuff like this. You see, the main technical bottleneck for me, besides school, is my computer. She’s dying piece by piece now. First her graphics card died, and then the AGP slot itself (I tried 3 graphics cards, they all black out intermittently). And the on board graphics I am now on is slowly dying as well. The noisy Pentium 4 2.4 GHz single core machine with 1 gig of RAM not something that’s enough for me. I have a lot of experiments to do in my mind, but they all require more powerful hardware.
So, dear people-who-read-this, please consider helping me buy a newer, faster computer. The specs are up here and I estimate it’ll cost $1500. There are several ways you can do this:
· Donate to this at my ChipIn page here. Or use the widget on the sidebar or at the bottom. I’ll name a part of my computer after everyone who donates!
· Get your questions about Digg answered. Want to know something more specific about the Digg front page (Like, how many stories in the Apple category had Microsoft in the description (or vice versa?))? Ask! Simple question about Digg costs from $5-$20, depending on the complexity, while I can also do more complete analysis for a reasonable price(hey, I’m just trying to work my way to a new computer, okay?). For example, asking about the total number of links to sites other than apple.com in the category Apple would cost you $10, while an analysis of the relative popularity of your favorite linux distros would cost about $35. It’s easily negotiable. Contact me via email (yuvipanda@gmail.com)
· Get me to do custom analysis for your blog. I’ll accept most, and for a very good, double digit price, I’ll do an analysis of your blog as well as give you the data in a machine readable form if the inner-geek in you wants to do something more with it.
· Paypal me directly. My Paypal email is yuvipanda@gmail.com.
Offer open till I get $1,400(Yes, $1,400, as Rob had already sent off $100 towards this. Thanks Yo!) to buy a new computer ![]()



















[...] read more | digg story ***
STUFFLEUFAGUS » Blog Archive » Analyzing Digg | September 28, 2007 | 3:28 pm[…] read more | digg story *** Random Post *** […]
Great work mate. One point... I think the massive spike in
Nathan Waters | September 29, 2007 | 7:02 amGreat work mate.
One point… I think the massive spike in traffic according to Alexa during Nov-Dec 2006 was because many people installed the Alexa toolbar. Because if you look at Slashdot and a couple of other popular tech sites they all had that same spike.
So the hypothesis is that a couple of “OMG Digg is in the top 100 sites on Alexa” stories made it to the front page, people checked it out, installed the Alexa toolbar and so when they also visited Slashdot and others it caused the spike.
cheers
@NathanWaters: Yep, you're damn right. Updating the article. Thanks mate!
yuvipanda | September 29, 2007 | 7:05 am@NathanWaters: Yep, you’re damn right. Updating the article.
Thanks mate!
[...] finished, I could even post it up today if
Mini StatBot: Leaderboard vs Technorati Top 100 (and Digg Top 100) | YuviSense: Codin Kid | October 1, 2007 | 7:45 pm[…] finished, I could even post it up today if I go through it. Part I, in case you are interested, is here. Oh, and, yeah, I did reveal what the most Dugg site on Digg is in this post, but I don’t […]
[...] This is Part II of the Analysis of Digg,
StatBot: Analysing Digg Part II - Sites on the Frontpage | YuviSense: Codin Kid | October 2, 2007 | 6:36 pm[…] This is Part II of the Analysis of Digg, where I’m going to analyze the sites that are dugg the most. Note that when I say ‘most dugg sites’, I mean the number of times the site has been to Digg’s frontpage. Part I is here. […]
[...] front page, from over 14,338 sites. Lot of interesting
blogpost » Digg Analysis | October 3, 2007 | 8:41 pm[…] front page, from over 14,338 sites. Lot of interesting patterns uncovered here. Also check out part 1 of the Digg analysis. permalink | […]
[...] lying around in my hard disk (for the analysis
StatBot: Top 100 Sites on Digg.com by Nett Diggs | YuviSense: Codin Kid | October 4, 2007 | 4:20 pm[…] lying around in my hard disk (for the analysis of Digg which I am doing right now. Part I on growth here, and Part II on sites here), and it took me about 10 minutes to write 5 lines of VB9 code to […]
[...] full story here [...]
unlimitedsq » Blog Archive » StatBot: Analysing Digg’s Frontpage Part I - Growth | October 28, 2007 | 8:33 am[…] full story here […]
[...] read more | digg story [...]
Mark SoftWare Top » Analyzing Digg | November 7, 2007 | 12:09 pm[…] read more | digg story […]
[...] according to days, but stay very high. The number
Digging made easier with Smart Digg button | November 19, 2007 | 8:50 am[…] according to days, but stay very high. The number of stories that make the front page is ~60/day( source ). If you observe the top Diggers, you will understand that they have submitted hundreds of posts […]