YuviSense: Codin Kid

Yuvi, a 17 year old wannabe geek from India.
  • rss
  • Blog
  • Photography
  • Indians on Twitter
  • About Me
  • Contact

Code: Number of Times a Term occurs in Google

October 28, 2007 | 5:49 pm

Just some simple code to grab the number of times a term occurs on the web, according to Google. I found myself using this, and figured this’d be useful for others as well. And, I did this in C# rather than my usual darlin VB9 just for a change.


                Regex r = new Regex(@”\<\/b\> of about \<b\>(?<count>.*)\<\/b\> for”);
                WebClient wc = new WebClient();
                string url = “http://google.com/search?q=” + Uri.EscapeUriString(q);
                string page = wc.DownloadString(url);
                Match m = r.Match(page);
                if (m.Success) { Console.WriteLine(m.Groups[“count”].Value); }
                else { Console.WriteLine(0); } //No matches found,

Just put q into whatever term you want it to be, and you’re ready to go. Since the Google SOAP API was killed, scrapping is perhaps the fastest way to get data out of Google, and they have some of the most uglily structured code lying around. If I had first tried scrapping against Google instead of Wordpress when I started, I would’ve very certainly given up. It’s that bad.

And, btw, the last else is there to support searches for which there are no results. Most of the rest is self explanatory.

Comments
11 Comments »
Categories
Tech
Comments rss Comments rss
Trackback Trackback

Back from Vacation!

| 5:10 pm

[Warning: Personal matters ahead. Feel free to skip]

There! I said it! I’ve been trying to say it since I came back from Ooty last monday, but today, I said it! It’s been one friggin awesome tour. I throw the adjective “friggin awesome” around a lot, but I really mean it here. When the teachers kept referring to this as an “educational” tour, there were a lot of giggles amongst us. But now, I see why it was “educational”: I really learnt something there. I learnt to actually live with a group, to compromise, to actually solve problems, and one more thing which I can’t discuss right now. That last thing will probably be more important to me in my life than anything else I’ve studied, so this tour was friggin awesome.

Pictures? I took 3,000 of them :) I’ll be posting up the good ones soon :)

And right on the day after we returned from the tour, we went to St. Bede’s for an interschool cultural competition, where I won the Powerpoint competition. The best thing about the whole event were the girls<gasp!>. Our presentation was supposed to be about a Girl’s School. However, being from an all-boy’s school, girls are more alien to me than those Neptunians who landed on my backyard 3 weeks back. Thankfully, I had two more guys (Swapnil & Yusuf) with me for whom they were a lot less alien, and with their help I was able to win the thing. Girls turned out to be much more similar to Homo Sapiens than I thought ;)

And, thanks a lot to my cousin (and great friend (and the guy who inspired me to code (and photograph as well))) Sudar for lending his iPod to me.

Oh, and, I’m releasing the code for my blog analyser, and that too after (yet another) complete rewrite! More info soon…

Comments
No Comments »
Categories
Personal
Comments rss Comments rss
Trackback Trackback

On Vacation: To Ooty!

October 18, 2007 | 2:21 pm

My first trip outside of a very limited area with my Camera, and I’ll be back in 4 days with tons of pictures! Sad thing being, I couldn’t put up the TechMeme/Memeorandum analysis for which I already have the data. And, my cousin Sudar gave me his iPod for the trip (Thanks a lot!), but after loving it for a Day, I’ve started hating it because it lost all the songs in it because of a simple write error and made me recopy all the songs even though they are already on the disk. As I am speaking, I’ve been delayed because of the damn iPod’s closed proprietary db. Sigh.

And, my Chipin has reached it’s goal! I can buy a new computer! Yippes! I’ll come back and give you details, but it was Chuck again who did it. Thanks a lot Chuck! More details after I come back…

Comments
1 Comment »
Categories
Personal
Comments rss Comments rss
Trackback Trackback

Imposing

October 15, 2007 | 7:04 am

Imposing, by YuviPanda.

I love perspective :D

This is Yusuf whose mother changed his name to Abdul Quadir since she thought having your name at the last of the Roll is not good :P Fantastic guy to hang out with, and draws like a the-guy-who-is-great-at-drawing-stuff.

Comments
1 Comment »
Categories
Pictures
Comments rss Comments rss
Trackback Trackback

Worried

October 14, 2007 | 7:01 am


Worried, by YuviPanda.

Unbelievably Tall(seriously), Manoj manages to be an absolute basketball freak without becoming a jock. Cool guy. Here, he was just waiting for the ball (he plays mid-field), and I added the extra lighting.

Comments
No Comments »
Categories
Uncategorizable
Comments rss Comments rss
Trackback Trackback

Zoro’s Mask and Hordes of Horses

October 11, 2007 | 5:16 pm


Zoro’s Mask and Hordes of Horses, by YuviPanda.

Unintentional shots look cool folks. The Zoro-mask effect was from the tail lights of two bikes, while the white-horses effect was from the Shop Lights. I love long-exposure motion photography! :D

More of my Roadside abstracts here: Roadside Abstracts on Flickr

Comments
3 Comments »
Categories
Uncategorizable
Comments rss Comments rss
Trackback Trackback

StatBot: TechCrunch Data Analysis

| 5:10 pm

One of the things I’ve re-learnt in the past few weeks is to not write like an ad copy drone. Being precise when needed to be helps. My last two Digg Analysis went over 3000 words. Not good. So, I’m writing this analysis of TechCrunch, topper of the TechMeme leaderboard and 4th most linked to blog according to Technorati. Let’s see if I can keep this under 1.5k words.

[private service advertisement]

This ad here is to fund my hope for a new computer. My old one is crumbling, and slow. I couldn’t run as much code as I want to in this machine. So, you can help me buy a new computer, by either buying data about Digg or your blog (or your favorite blog (like, say, TechCrunch?)) for a damn small price, or by donating at the ChipIn page. I’ll name a part of the computer after everyone who donates :) Thanks to Deanna McNeil, Rob LaGesse and ‘Chuck’ Rector for donating!

Size

Starting from 11/6/2005 till 11/10/2007, TechCrunch has been around for 852 days (2.3 years), putting up 4042 posts at 4.7 posts a day. A total of 1,071,423 words have been written, by 28 different people at an average of 265 words per post/1,257 words per day. A total of 159,734 comments, at an average of almost 40 comments per post. TechCrunch is not exactly the biggest of blogs.

I’m skipping the parts about Posting Frequency (it doubled after May 2007) and Posting Frequency by Weekday (highest on Monday, lowest on Tuesday, low on Friday). If anybody’s interested, leave me a comment.

Links

There were a total of 25,103 outbound links, at the rate of 6 links a post. Not too many, not too few.

Here’re the top ten most linked to sites from TechCrunch:

  1. techcrunch.com    6365
  2. crunchbase.com    1146
  3. technorati.com    722
  4. google.com    432
  5. yahoo.com    356
  6. blogspot.com    342
  7. crunchgear.com    306
  8. flickr.com    193
  9. typepad.com    182
  10. gigaom.com    173

Self-linking

Expected: TechCrunch links the most to itself. Atleast 14 times more than the nearest content contributor, which is google.com. The links to all the TechCrunch group of sites (TechCrunch, CrunchBase, CrunchGear, TalkCrunch, CrunchNotes) make up 32% of all outbound links. Almost 1/3 of all links are to TechCrunch’s own sites. That’s the data. Interpret it your way.

Here’s a chart thrown in for a good measure:

image

As I said, interpret it your own way. For me, I’d rather have them link directly to the site, and put a link to the CrunchBase entry in parenthesis. But that’s just me.

Update: Fredrick over at the Last Podcast blog thinks that since CrunchBase launched relatively recently, it’ll be linked to more often than the average cited above. Turns out he’s right: Since June 2007, a full 47% of all links are to TC sites. Here’s a chart, and thanks for asking this, Fred:

image

Almost a full half of links from TechCrunch today are links to TC sites. Means, if you are to randomly click any link, there’s a 50% chance you’ll end up in another TC site. Ouch.

Diversity

The 25,103 links are spread out over 4,048 high-level domains, making each domain get an average of 6.2 links. However, exluding the TechCrunch sites from the equation gives us 17,017 links spread out over 4,043 high-level domains, making each domain get an average of 4.2 links. Pretty fair I’d guess.

Also, 1,941 domains got just 1 link. Means that 48% of the sites got mentioned only once. Startups die, mate.

Here’s the chart to go along with it:

image

Only 13% of sites get more than 5 links from TechCrunch. Many are linked to once and never heard from again. As I said, Startups die, mate.

Popularity

Before I end, let’s measure the True Count of Popularity of A Blog ™, comments. A total of 159,734 comments, at an average of almost 40 comments per post. Here’s the graph:

image

It is steadily growing up, and right now, TechCrunch gets an average of almost 300 comments a day, ocassionally reaching even 600 a day! That big spike that pushed it over 1,300 was when they gave away free Oomas. Same thing as to what happened at Engadget when they gave away Zunes.

List of Top 100 Linked To Sites From TechCrunch

Now that I have managed to finish this within 1.5k words, here’s the list of the top 100 most linked to sites from TechCrunch along with the number of links to each of them.

  1. techcrunch.com - 6,365
  2. crunchbase.com - 1,146
  3. technorati.com - 722
  4. google.com - 432
  5. yahoo.com - 356
  6. blogspot.com - 342
  7. crunchgear.com - 306
  8. flickr.com - 193
  9. typepad.com - 182
  10. gigaom.com - 173
  11. wikipedia.org - 156
  12. facebook.com - 113
  13. digg.com - 110
  14. youtube.com - 108
  15. blogs.com - 99
  16. live.com - 98
  17. wordpress.com - 95
  18. nytimes.com - 90
  19. zdnet.com - 84
  20. talkcrunch.com - 84
  21. myspace.com - 83
  22. readwriteweb.com - 81
  23. scripting.com - 76
  24. amazon.com - 70
  25. microsoft.com - 69
  26. techmeme.com - 69
  27. wsj.com - 65
  28. icio.us - 63
  29. msn.com - 63
  30. mobilecrunch.com - 63
  31. businessweek.com - 62
  32. crunchboard.com - 62
  33. com.au - 60
  34. zoho.com - 59
  35. memeorandum.com - 58
  36. crunchnotes.com - 57
  37. feedburner.com - 54
  38. techcrunch20.com - 54
  39. netvibes.com - 53
  40. skype.com - 50
  41. micropersuasion.com - 49
  42. com.com - 48
  43. aol.com - 48
  44. weblogs.com - 45
  45. edgeio.com - 45
  46. text-link-ads.com - 44
  47. softtechvc.com - 42
  48. wired.com - 42
  49. alexaholic.com - 42
  50. venturebeat.com - 42
  51. weblogsinc.com - 41
  52. ebay.com - 40
  53. siliconbeat.com - 38
  54. cnn.com - 38
  55. flock.com - 37
  56. apple.com - 36
  57. solutionwatch.com - 36
  58. podtech.net - 35
  59. wikia.com - 35
  60. reuters.com - 35
  61. pluck.com - 34
  62. compete.com - 33
  63. ning.com - 31
  64. alexa.com - 30
  65. valleywag.com - 30
  66. battellemedia.com - 29
  67. feedster.com - 29
  68. simplyhired.com - 29
  69. adobe.com - 29
  70. scobleizer.com - 29
  71. meebo.com - 28
  72. zooomr.com - 28
  73. secondlife.com - 28
  74. twitter.com - 28
  75. bloglines.com - 27
  76. rojo.com - 27
  77. podshow.com - 27
  78. paidcontent.org - 27
  79. nwsource.com - 26
  80. riya.com - 26
  81. webreakstuff.com - 25
  82. blogherald.com - 25
  83. jot.com - 25
  84. ysearchblog.com - 25
  85. videoegg.com - 25
  86. allpeers.com - 25
  87. calacanis.com - 25
  88. comscore.com - 25
  89. hitwise.com - 25
  90. newsgator.com - 24
  91. oreilly.com - 24
  92. photobucket.com - 24
  93. stumbleupon.com - 24
  94. techcrunch40.com - 24
  95. sixapart.com - 23
  96. pandora.com - 23
  97. bubbleshare.com - 23
  98. pubsub.com - 22
  99. searchenginewatch.com - 22
  100. engadget.com - 22

Once you get past the top 15, the number of links to each site decreases rapidly. You either get linked to a lot, or not. Black, White, and very, very few shades of grey.

That’s the end of it folks! Want any more data about TechCrunch? Just leave a comment, and I’ll pull the data up for ye!

[private service advertisement]

Yeah, the ad guy again. This ad here is to fund my hope for a new computer. My old one is crumbling, and slow. I couldn’t run as much code as I want to in this machine. So, you can help me buy a new computer, by either buying data about Digg or your blog (or your favorite blog (like, say, TechCrunch? Engadget?)) for a damn small price, or by donating at the ChipIn page. I’ll name a part of the computer after everyone who donates :) Thanks to Deanna McNeil, Rob LaGesse and ‘Chuck’ Rector for donating!

Comments
27 Comments »
Categories
StatBot
Comments rss Comments rss
Trackback Trackback

StatBot: Top 100 Sites on Digg By Frontpage Story Count

October 7, 2007 | 9:39 am

A few days ago, I published the list of Top 100 sites on Digg.com by the cumulative No. of Diggs. While that is pretty useful, the list of Top 100 sites on Digg.com by the Frontpage Story count would be useful as well, so here they are:

Here are the Top 100 Sites on Digg.com by Frontpage Stories Count:

  1. blogspot.com - 1,224
  2. yahoo.com - 1,217
  3. arstechnica.com - 1,155
  4. engadget.com - 897
  5. cnn.com - 871
  6. news.com.com - 811
  7. bbc.co.uk - 748
  8. nytimes.com - 731
  9. wired.com - 646
  10. youtube.com - 616
  11. google.com - 522
  12. msn.com - 519
  13. gizmodo.com - 512
  14. com.au - 466
  15. reuters.com - 457
  16. flickr.com - 447
  17. washingtonpost.com - 421
  18. physorg.com - 381
  19. thinkprogress.org - 345
  20. rawstory.com - 335
  21. techcrunch.com - 305
  22. kotaku.com - 284
  23. zdnet.com - 283
  24. destructoid.com - 270
  25. businessweek.com - 264
  26. apple.com - 259
  27. joystiq.com - 259
  28. ign.com - 249
  29. guardian.co.uk - 244
  30. appleinsider.com - 242
  31. crooksandliars.com - 235
  32. treehugger.com - 231
  33. consumerist.com - 229
  34. theinquirer.net - 228
  35. livescience.com - 227
  36. lifehacker.com - 214
  37. abcnews.go.com - 199
  38. espn.go.com - 195
  39. usatoday.com - 195
  40. nasa.gov - 193
  41. torrentfreak.com - 192
  42. theregister.co.uk - 191
  43. newscientist.com - 190
  44. timesonline.co.uk - 184
  45. wordpress.com - 184
  46. wikipedia.org - 175
  47. macrumors.com - 173
  48. tuaw.com - 173
  49. breitbart.com - 170
  50. forbes.com - 167
  51. gamespot.com - 160
  52. autoblog.com - 159
  53. lewrockwell.com - 152
  54. extremetech.com - 150
  55. boingboing.net - 141
  56. eweek.com - 141
  57. pcworld.com - 139
  58. sfgate.com - 139
  59. nwsource.com - 137
  60. dailymail.co.uk - 133
  61. typepad.com - 132
  62. latimes.com - 131
  63. time.com - 128
  64. 1up.com - 127
  65. techeblog.com - 127
  66. informationweek.com - 126
  67. linux.com - 125
  68. sourceforge.net - 125
  69. microsoft.com - 122
  70. michellemalkin.com - 118
  71. ebay.com - 116
  72. nwfdailynews.com - 116
  73. slate.com - 116
  74. pcmag.com - 112
  75. scifi.com - 110
  76. computerworld.com - 104
  77. technologyreview.com - 104
  78. downloadsquad.com - 102
  79. readwriteweb.com - 101
  80. cbsnews.com - 98
  81. telegraph.co.uk - 98
  82. eurekalert.org - 97
  83. newsforge.com - 97
  84. betanews.com - 96
  85. imageshack.us - 94
  86. wsj.com - 93
  87. space.com - 91
  88. cnet.com - 90
  89. sciam.com - 90
  90. hosted.ap.org - 86
  91. sciencedaily.com - 85
  92. mashable.com - 84
  93. foxnews.com - 83
  94. independent.co.uk - 83
  95. howtoforge.com - 82
  96. ibm.com - 82
  97. instructables.com - 81
  98. macworld.com - 81
  99. nationalgeographic.com - 81
  100. bit-tech.net - 80

One thing to note here is that this list is pretty sparse: The 100th Site has only 80 stories on the frontpage, and only 79 sites have more than 100 stories on the frontpage. I made a more detailed analysis of this in my Digg Analysis Part II here.

12 sites were present in the Top 100 Most Dugg sites but absent in this list.

  1. digg.com - 53
  2. cracked.com - 80
  3. doubleviking.com - 81
  4. wikimedia.org - 82
  5. techdirt.com - 84
  6. valleywag.com - 86
  7. salon.com - 89
  8. jalopnik.com - 92
  9. popsci.com - 93
  10. deviantart.com - 94
  11. photobucket.com - 97
  12. somethingawful.com - 99

Digg.com reached the Top 100 Most Dugg sites only because Diggers submitted other Digg stories and got them dugg up high during the AACS Encryption Key quasi-Rebellion. That is why it has been knocked out of this list, since most of those Diggs went to a relatively small number of stories.

The 12 sites which are new in this list are listed below:

  1. michellemalkin.com - 69
  2. nwfdailynews.com - 71
  3. technologyreview.com - 76
  4. readwriteweb.com - 78
  5. eurekalert.org - 81
  6. newsforge.com - 82
  7. betanews.com - 83
  8. space.com - 86
  9. sciencedaily.com - 90
  10. independent.co.uk - 93
  11. howtoforge.com - 94
  12. ibm.com - 95

My Observation? Of the 12 new sites, 10 are Tech/Science related. The bread and butter of Digg is still Tech/Science/General-Nerdery. While sensational stories do get more than average Diggs, they are not that frequent.

[private service advertisement]Want some data about Digg? Or your own Blog? Buy it from me for a small price, and help me buy a better computer to practice these skills on. Head over to the sidebar for details :) (Or, just drop in a donation towards buying that computer: I’ll name a part of it after you!)

Thanks to Danny Sullivan of SearchEngineLand for a link (which, strangely, Technorati didn’t pickup at all), and Patrick Altoft of Blogstorm for the original idea.

Comments
10 Comments »
Categories
StatBot
Comments rss Comments rss
Trackback Trackback

StatBot: Top 100 Sites on Digg.com by Nett Diggs

October 4, 2007 | 4:20 pm

Patrick Altoft at BlogStorm has compiled a list of some of the top sites on Digg by using the site: Google command. And, in the comments, Danny Sullivan says that a list of those sites ordered by the Total Number of Diggs would be way much better.

Ofcourse, I have the Digg data lying around in my hard disk (for the analysis of Digg which I am doing right now. Part I on growth here, and Part II on sites here), and it took me about 10 minutes to write 5 lines of VB9 code to produce the output he wanted. Here’s the Top 100 Sites on Digg.com ordered by Cumulative No. of Diggs starting right from 1st December 2004 right up till 24th September 2007(Sorry for the poor formatting, folks. I pushed this out in a hurry, got a Maths test tomorrowUpdate: I wrote a small script to fix the formatting for me, and I think it’s alright now. Thanks to Gamermk for poking me to get this done.):

  1. blogspot.com - 1,132,811
  2. arstechnica.com - 813,396
  3. yahoo.com - 783,399
  4. engadget.com - 670,608
  5. flickr.com - 645,450
  6. youtube.com - 614,522
  7. cnn.com - 596,024
  8. bbc.co.uk - 511,970
  9. nytimes.com - 509,872
  10. com.com - 490,131
  11. gizmodo.com - 488,527
  12. wired.com - 477,920
  13. msn.com - 431,513
  14. google.com - 410,007
  15. com.au - 376,332
  16. rawstory.com - 321,846
  17. reuters.com - 317,296
  18. consumerist.com - 308,470
  19. washingtonpost.com - 290,175
  20. apple.com - 282,238
  21. thinkprogress.org - 263,323
  22. crooksandliars.com - 230,617
  23. physorg.com - 220,040
  24. kotaku.com - 205,614
  25. torrentfreak.com - 204,999
  26. lifehacker.com - 204,198
  27. techcrunch.com - 197,470
  28. guardian.co.uk - 197,278
  29. zdnet.com - 196,734
  30. wikipedia.org - 194,053
  31. nasa.gov - 189,698
  32. joystiq.com - 188,153
  33. wordpress.com - 185,072
  34. appleinsider.com - 180,348
  35. abcnews.go.com - 175,968
  36. ign.com - 163,175
  37. destructoid.com - 158,697
  38. treehugger.com - 151,674
  39. livescience.com - 149,776
  40. businessweek.com - 149,486
  41. dailymail.co.uk - 148,382
  42. theinquirer.net - 139,299
  43. macrumors.com - 138,388
  44. timesonline.co.uk - 137,089
  45. tuaw.com - 126,303
  46. techeblog.com - 126,239
  47. usatoday.com - 121,769
  48. boingboing.net - 121,104
  49. breitbart.com - 118,566
  50. imageshack.us - 118,436
  51. theregister.co.uk - 115,992
  52. newscientist.com - 113,583
  53. time.com - 108,982
  54. digg.com - 107,151
  55. microsoft.com - 104,842
  56. autoblog.com - 104,553
  57. latimes.com - 104,263
  58. sfgate.com - 103,786
  59. forbes.com - 102,074
  60. gamespot.com - 98,700
  61. scifi.com - 98,324
  62. typepad.com - 97,705
  63. nwsource.com - 97,284
  64. slate.com - 93,360
  65. extremetech.com - 91,470
  66. 1up.com - 90,685
  67. pcworld.com - 89,223
  68. informationweek.com - 88,164
  69. espn.go.com - 86,186
  70. computerworld.com - 82,265
  71. downloadsquad.com - 81,380
  72. linux.com - 80,230
  73. cbsnews.com - 78,508
  74. telegraph.co.uk - 76,071
  75. eweek.com - 74,990
  76. pcmag.com - 73,234
  77. lewrockwell.com - 73,057
  78. mashable.com - 72,899
  79. instructables.com - 72,328
  80. cnet.com - 71,830
  81. cracked.com - 70,713
  82. doubleviking.com - 70,620
  83. wikimedia.org - 69,974
  84. sourceforge.net - 69,315
  85. techdirt.com - 67,957
  86. hosted.ap.org - 67,671
  87. valleywag.com - 66,668
  88. wsj.com - 66,591
  89. nationalgeographic.com - 64,630
  90. salon.com - 61,675
  91. ebay.com - 61,541
  92. foxnews.com - 61,346
  93. jalopnik.com - 59,837
  94. popsci.com - 59,751
  95. deviantart.com - 58,819
  96. sciam.com - 58,730
  97. macworld.com - 57,631
  98. photobucket.com - 56,983
  99. bit-tech.net - 56,248
  100. somethingawful.com - 55,937

There, real hard data. And, I betcha this will be waay less popular than the BlogStorm post ;) I would’ve put up the Top 100 sites sorted by No. of Stories on the frontpage as well, but I’ve left that out since most of it is covered in Digg Analysis Part II already. Oh, and if you want to know where your site Ranks, drop a comment and I’ll tell you.

Oh, and these numbers are from stories that have reached the frontpage, which is a good thing since you can’t submit a site 56k times and get it to show up here, and a bad thing as well since we’ll be able to know which site is spamming by getting the ratio of articles on frontpage to total articles from that site. Even if I wanted to, I cannot get the data about all the articles submitted to Digg simply because I do not have that much processing power. Sigh.

[private service advertisement]Want some data about Digg? Or your own Blog? Buy it from me for a small price, and help me buy a better computer to practice these skills on. Head over to the sidebar for details :) (Or, just drop in a donation towards buying that computer: I’ll name a part of it after you!)

Comments
14 Comments »
Categories
StatBot
Comments rss Comments rss
Trackback Trackback

StatBot: Analysing Digg Part II - Sites on the Frontpage

October 2, 2007 | 5:43 pm

Update: Digg this here.

This is Part II of the Analysis of Digg, where I’m going to analyze the sites that are dugg the most. Note that when I say ‘most dugg sites’, I mean the number of times the site has been to Digg’s frontpage. Part I is here.

But before you start

I’m seriously short of processing power, and am not able to run many linguistic analysis because of that. You can help me earn a new computer by buying little tidbits of analysis for your blog (or Digg) from me. Very reasonable, small fees. See the bottom of the post for more information. Thanks folks!

Diversity

The 61,608 stories on the Digg frontpage are spread out around 14,338 sites(or ‘high level domains’, as Jeff Clark (from whom I stole this method) likes to call it) at an average of just 4.29 stories per site. That first looked like incredible diversity to me. And, make no mistake, it is incredible diversity. But, it’s not as diverse as it seems. Here, look at this chart:

image001

The Top 100 (or a ridiculous 0.007%) sites make up 41% of the stories! So, a relatively small number of sites make up a large number of frontpage stories, forming some sort of a “core”, while a very large number of sites(14,338 sites, or 99.993%) contribute the remaining. A good balance, I’d say.

Here’s the chart splitting up sites by the number of stories they have contributed to the front page:

image003

As you can see,71% of sites that make it to Digg’s frontpage make it there only once, while 25% make it 2-10 times. Only 4% ever managed to get to the frontpage more than 25 times, and 1% over 100 times. Excel is being optimistic here: It’s just 79 sites that got more than 100 stories on the frontpage. If your site does make it to the Digg frontpage once, there’s a 71% chance that it won’t go there again :)

Also, here’s another chart, showing how much the sites with just 1 story contribute vs. the bigger ones:

image005

So, those 1% of sites (just 79 sites, actually: Excel is bad at math) which had more than 100 stories on the frontpage, make up 39% of the frontpage, while the 71% of the sites that contributed just one story make up 16% of the frontpage.  On the whole, this is a very fairly divided pie, and things are pretty much very balanced at Digg. The frontpage is not monopolized by a few sites, nor scattered across the web.

  • A total of 14,338 sites have been on Digg’s frontpage at least once.
  • The top 100 sites (just 0.007%) make up 41% of Digg’s frontpage stories.
  • 71% of the sites that make it to the frontpage never get another story on the frontpage.
  • There are a total of 79 sites with more than 100 stories hitting the frontpage
  • Those 79 “mainstream” sites contribute 39% of all of Digg’s frontpage stories.

Top Ten Sites on the frontpage

Here is the list of the top ten sites that made it to the frontpage:

Rank

Site

Stories

1

blogspot.com

1224

2

yahoo.com

1217

3

arstechnica.com

1155

4

engadget.com

897

5

cnn.com

871

6

com.com

811

7

bbc.co.uk

748

8

nytimes.com

731

9

wired.com

646

10

youtube.com

616

 

Note that this is not exactly absolute: For example, all the BlogSpot blogs are glued together! Let’s examine each one in detail:

Hosted Blogs (BlogSpot, Wordpress, Typepad)

As you should’ve guessed by now, it’s ALL the BlogSpot blogs combined together that take the first spot. A total of 789 BlogSpot blogs made it to the frontpage. The top 3 most dugg BlogSpot blogs are the Google Blog, the Google Operating System Blog, and the old Digg Blog. However, most of the BlogSpot stories which are Dugg come from those almost unknown blogs. Here’s the chart showing that:

image007

As you can see, the Top 10 ranked blogspot blogs (Top 13 actually, as 10th place was split between Labnol, GoNext & TopMac (but, with just 9 stories)) contribute only 22% of the total. Diversity here, but it also means that if your BlogSpot blog gets to the frontpage for the first time, then there’s a 78% chance that it has gotten there for the last time as well. This is just a tad higher than the overall just-a-single-story percentage of 71%. Also, each story gets an average of 925 Diggs, which is about 150 diggs higher than the overall average of 763 diggs per site. So, even if you get there only once, you get more than the average number of Diggs!

Here’s a chart comparing Blogger to the other free hosted blogging sites, Wordpress and Typepad. Windows Live Spaces isn’t included because there are only 8 stories ever from MSN Spaces, while MySpace has 11, most of which are to services and announcements about the site rather than actual MySpace pages (I don’t really consider them comparable to Blogger or Wordpress though)

image009

As you can see, BlogSpot is several times as big as Wordpress and Typepad. I think this is primarily because Blogger is older than the other two, while the fact that both Wordpress & MovableType can be self hosted more easily can also contribute this.  However, Wordpress.com blogs get an average of 1005 diggs a post, which is higher than that of BlogSpot, while TypePad gets a much lower 740 Diggs per post. Here’s that comparison chart:

image011

Both Blogger and Wordpress are well above average, while Typepad is just a tad below it. Heck even Seth Godin didn’t make it to the frontpage once!

Yahoo.com

News, which makes up 1015 or 83% of the 1217 stories from Yahoo. The rest are just spread out among Yahoo Business, Yahoo Sports, Yahoo Finance, and a bunch of stuff that Yahoo gave a burst of life to and then promptly forgot(like Yahoo Pipes, which has 3 stories to it).  I can’t really think of any graph to put up here.

has a dismal 643 Diggs per story, well below the average.

ArsTechnica

Ars Technica, with 1,155 stories on the frontpage is actually the individual site with the most number of stories on Digg’s frontpage! Here’s the chart showing from where on ArsTechnica the stories are coming from:

image013

Majority of them come from the News section, while a good part comes from the Journals section as well. Only a small number of featured articles are present here, though that is probably because featured articles aren’t really that frequent. However, it gets a lesser than average 704 diggs per story. Breaking up the Diggs per Story by the section,

image015

Features and Guides get quite lesser than average Diggs per Story, while News and Journals are just about average.

Engadget

Engadget, the most linked to blog in the world, comes in at No 4 with 897 stories with a slightly-lesser-than-average 747 diggs per story. It’s just a tad lower than the average though, I guess just normal variation accounts for it. Diggers do love Gadgets!

Comparison to Gizmodo

Gizmodo, the third most linked to blog, is also pretty much high on the list, at No. 13 with 512 stories with a much higher than average 954 Diggs per story. Means it gets to the frontpage lesser number of times than Engadget, but when it does, it gets more Diggs. Here’s a chart showing this visually:

image017

So, yep, while Gizmodo does get to the frontpage lesser number of times than Engadget, it gets a comparatively larger number of Diggs.

CNN, News.com, BBC, NYTimes, Wired

Mainstream Media. Comparing the large number of stories that these sites actually churn out, the number of them that made it to the Digg frontpage is relatively small. Here’s a chart showing their relative popularity:

image019

CNN.com has the most number of stories, Wired has the highest number of Diggs per Story. This is expected, since Wired is quite a lot geekier than the other ones. However, even Wired has lesser Diggs per Site than the average, and quite a bit lesser than Gizmodo and some of the other high ranking sites (YouTube, for example). Note that the difference is quite negligible in the case of Wired, but pretty big in the others, with News.com a good 160 Diggs below the average. I think it’s just that Digg is more biased towards technology, something which I will analyze in Part III.

YouTube

YouTube made it to the top ten even though I didn’t include the Videos section. It’s got 616 stories at a well-above-average 997 diggs a story! However, most of those are older ones: Excluding the video section, only 54 YouTube videos made it to the frontpage this year. Just 6 made it in September. So, yeah, this is mostly a leftover, as most videos are now posted (correctly) to the videos section.

Still, it beats the competition handsomely. Google Video is the closest, with 200 stories, but MSN Soapbox & Revver has only 1 each! This is just nitpicking though: A better comparison would seek these numbers from the Video section, data for which I unfortunately do not have. If there’s enough interest, I’ll do Digg’s video section separately, okay?

Other Geeky Sites in the Top Fifty 

Rank

Site

Stories

11

google.com

522

12

msn.com

519

13

gizmodo.com

512

16

flickr.com

447

21

techcrunch.com

305

22

kotaku.com

284

23

zdnet.com

283

24

destructoid.com

270

26

apple.com

259

27

joystiq.com

259

28

ign.com

249

30

appleinsider.com

242

33

consumerist.com

229

35

livescience.com

227

36

lifehacker.com

214

40

nasa.gov

193

41

torrentfreak.com

192

44

wordpress.com

184

46

wikipedia.org

175

47

macrumors.com

173

48

tuaw.com

173

Google & MSN

Most of the stories pointing to Google.com come from Google Video, though there are 14 stories linking to just the frontpage, most of which promptly ask the reader to ignore the story link. There’s even one about a “new search engine called Google” and about “all Google servers have crashed”. Also, comments about Digg itself seem to have Google in the Story Link. Also, in what I might call “funny”, a guy offered $100 to a “Random Digger” who diggs the story at a pre-selected position. I don’t really know if he kept up the offer though :) Also, there are a long list of things that Google once started, and then abandoned (Google Pages has 2 stories, Google Pack has 3, etc). Also, surprisingly (at least for me), Google Groups has only 6 stories to it. Even Google Code has a higher number of stories (7 stories). Google.com gets an about average 785 diggs per story, though I don’t think it really says anything because the Google.com ‘brand’ is so diluted, at least here.

MSN is more like Yahoo than Google here: Most of their links come from MSNBC, their news network. 434 stories or 83% of those 519 stories are from MSNBC. The remaining is scattered around content like MSN Health, Slate, Encarta, MSN Movies, etc. MSN has an above average 831 diggs per story, which means absolutely nothing.

Flickr

Photos. Of the 447 links to Flickr, 2 are to the Flickr Blog and one is to the announcement about the improved uploading feature. The rest are links to pictures, though I cannot determine to whom they belong to because most of the links link directly to the image rather than to the Flickr Page. Duh.

I’ll do an analysis of just the pictures posted to Digg separately.

Gaming (Kotaku, Joystiq, IGN & Destructoid)

Four gaming sites in there. Diggers are gaming freaks :) Here’s a chart comparing Kotaku, Joystiq, IGN and Destructoid:

image021

Kotaku has the most number of stories, followed by Destructoid and Joystiq. IGN is a bit behind, but still, note that the difference between IGN and Kotaku is just around 285. However, looking at the Diggs per Story:

image023

Here, Kotaku and Joystiq take the lead, while the not-a-blog IGN is left behind. Harsher blog Destructoid takes a good hit as well. In fact, while all of them have less-than-average number of Diggs per story, Destructoid seems be quite less popular than the others, with almost 200 diggs lesser than the average.

Apple (Apple.com, AppleInsider, MacRumors, TUAW)

Four Apple related sites, including Apple.com. Here’s the comparison chart:

image025

So, the number of ‘official’ stuff from Apple is pretty small when compared to the combined amount of stories coming from the apple related blogs. AppleInsider seems to be the most popular among them. They do have a lot of people doing PR for them for free, don’t they? :)

image027

So, while stories from Apple.com might not be much, those that do get posted get dugg heavily. In fact, at 1090, it is has the second highest number of Diggs per Story, just slightly behind another very popular site (see below).

Wordpress.com

All the Wordpress.com blogs are bunched together here. There are actually 140 blogs from Wordpress.com that made it to the frontpage, with Biosingularity, Ubuntu Blog & Robert Scoble taking the top spot with six stories each. However, most of the links come from single blogs: 115 of those 140 blogs were on the frontpage just once. Note that this might under-represent many popular Wordpress blogs which are on their own domain, and hence are not counted. I covered Wordpress up there with blogger, go have a look again if you want to.

nasa.gov

The most dugg part of NASA was, of course, the Astronomy Picture of the Day, being on the frontpage 32 times, followed by press releases from various parts of NASA. It’s spread out all around NASA, really. Also, NASA stories get more diggs than average, with 982 diggs per story.

Wikipedia.org

The Diggarticle on Wikipedia article on Wikipedia made it to the frontpage 5 times, while the Diggnation article and the Article about Made up words in the Simpsons made it two times. A lot of awesome articles listed here, like the ones about unusual deaths, songs deemed ‘inappropriate’ after Sep 11 & community currencies in the United States. Heck, they even have a list of ‘unusual’ articles! Also, Wikipedia has the highest number of links per story in the top 100 sites, with 1108 diggs per story, just 18 above Apple.com’s 1090 diggs per story. This, I think, is because Wikipedia is itself so big and varied that anything that is interesting enough to make it to the frontpage is interesting enough to get a lot of Diggs as well.

Others

Techcrunch’s pretty much high up in the list, but has way less than average 647 diggs per story. However, more Techcrunch stories are dugg than all of ZDNet blogs and content combined together, while ZDNet has a slightly higher 695 diggs per story. Below the average though. Also, don’t forget the awesomeness that is Lifehacker: It’s pretty high in the list too, with 214 stories at a well above average of 954 diggs per story. Comparatively, lifehack.org, similar to lifehacker, gets only 36 stories, but a higher 1476 diggs per story. I think this is because if something from lifehack is interesting enough to make it to the frontpage, it is interesting enough to get a lot more Diggs as well!

Other Non-Geek Sites in the Top Fifty

Rank

Site

Stories

4

news.com.au

466

5

reuters.com