StatBot: TechCrunch Data Analysis
October 11, 2007 | 5:10 pmOne of the things I’ve re-learnt in the past few weeks is to not write like an ad copy drone. Being precise when needed to be helps. My last two Digg Analysis went over 3000 words. Not good. So, I’m writing this analysis of TechCrunch, topper of the TechMeme leaderboard and 4th most linked to blog according to Technorati. Let’s see if I can keep this under 1.5k words.
[private service advertisement]
This ad here is to fund my hope for a new computer. My old one is crumbling, and slow. I couldn’t run as much code as I want to in this machine. So, you can help me buy a new computer, by either buying data about Digg or your blog (or your favorite blog (like, say, TechCrunch?)) for a damn small price, or by donating at the ChipIn page. I’ll name a part of the computer after everyone who donates
Thanks to Deanna McNeil, Rob LaGesse and ‘Chuck’ Rector for donating!
Size
Starting from 11/6/2005 till 11/10/2007, TechCrunch has been around for 852 days (2.3 years), putting up 4042 posts at 4.7 posts a day. A total of 1,071,423 words have been written, by 28 different people at an average of 265 words per post/1,257 words per day. A total of 159,734 comments, at an average of almost 40 comments per post. TechCrunch is not exactly the biggest of blogs.
I’m skipping the parts about Posting Frequency (it doubled after May 2007) and Posting Frequency by Weekday (highest on Monday, lowest on Tuesday, low on Friday). If anybody’s interested, leave me a comment.
Links
There were a total of 25,103 outbound links, at the rate of 6 links a post. Not too many, not too few.
Here’re the top ten most linked to sites from TechCrunch:
- techcrunch.com 6365
- crunchbase.com 1146
- technorati.com 722
- google.com 432
- yahoo.com 356
- blogspot.com 342
- crunchgear.com 306
- flickr.com 193
- typepad.com 182
- gigaom.com 173
Self-linking
Expected: TechCrunch links the most to itself. Atleast 14 times more than the nearest content contributor, which is google.com. The links to all the TechCrunch group of sites (TechCrunch, CrunchBase, CrunchGear, TalkCrunch, CrunchNotes) make up 32% of all outbound links. Almost 1/3 of all links are to TechCrunch’s own sites. That’s the data. Interpret it your way.
Here’s a chart thrown in for a good measure:
As I said, interpret it your own way. For me, I’d rather have them link directly to the site, and put a link to the CrunchBase entry in parenthesis. But that’s just me.
Update: Fredrick over at the Last Podcast blog thinks that since CrunchBase launched relatively recently, it’ll be linked to more often than the average cited above. Turns out he’s right: Since June 2007, a full 47% of all links are to TC sites. Here’s a chart, and thanks for asking this, Fred:
Almost a full half of links from TechCrunch today are links to TC sites. Means, if you are to randomly click any link, there’s a 50% chance you’ll end up in another TC site. Ouch.
Diversity
The 25,103 links are spread out over 4,048 high-level domains, making each domain get an average of 6.2 links. However, exluding the TechCrunch sites from the equation gives us 17,017 links spread out over 4,043 high-level domains, making each domain get an average of 4.2 links. Pretty fair I’d guess.
Also, 1,941 domains got just 1 link. Means that 48% of the sites got mentioned only once. Startups die, mate.
Here’s the chart to go along with it:
Only 13% of sites get more than 5 links from TechCrunch. Many are linked to once and never heard from again. As I said, Startups die, mate.
Popularity
Before I end, let’s measure the True Count of Popularity of A Blog ™, comments. A total of 159,734 comments, at an average of almost 40 comments per post. Here’s the graph:
It is steadily growing up, and right now, TechCrunch gets an average of almost 300 comments a day, ocassionally reaching even 600 a day! That big spike that pushed it over 1,300 was when they gave away free Oomas. Same thing as to what happened at Engadget when they gave away Zunes.
List of Top 100 Linked To Sites From TechCrunch
Now that I have managed to finish this within 1.5k words, here’s the list of the top 100 most linked to sites from TechCrunch along with the number of links to each of them.
- techcrunch.com - 6,365
- crunchbase.com - 1,146
- technorati.com - 722
- google.com - 432
- yahoo.com - 356
- blogspot.com - 342
- crunchgear.com - 306
- flickr.com - 193
- typepad.com - 182
- gigaom.com - 173
- wikipedia.org - 156
- facebook.com - 113
- digg.com - 110
- youtube.com - 108
- blogs.com - 99
- live.com - 98
- wordpress.com - 95
- nytimes.com - 90
- zdnet.com - 84
- talkcrunch.com - 84
- myspace.com - 83
- readwriteweb.com - 81
- scripting.com - 76
- amazon.com - 70
- microsoft.com - 69
- techmeme.com - 69
- wsj.com - 65
- icio.us - 63
- msn.com - 63
- mobilecrunch.com - 63
- businessweek.com - 62
- crunchboard.com - 62
- com.au - 60
- zoho.com - 59
- memeorandum.com - 58
- crunchnotes.com - 57
- feedburner.com - 54
- techcrunch20.com - 54
- netvibes.com - 53
- skype.com - 50
- micropersuasion.com - 49
- com.com - 48
- aol.com - 48
- weblogs.com - 45
- edgeio.com - 45
- text-link-ads.com - 44
- softtechvc.com - 42
- wired.com - 42
- alexaholic.com - 42
- venturebeat.com - 42
- weblogsinc.com - 41
- ebay.com - 40
- siliconbeat.com - 38
- cnn.com - 38
- flock.com - 37
- apple.com - 36
- solutionwatch.com - 36
- podtech.net - 35
- wikia.com - 35
- reuters.com - 35
- pluck.com - 34
- compete.com - 33
- ning.com - 31
- alexa.com - 30
- valleywag.com - 30
- battellemedia.com - 29
- feedster.com - 29
- simplyhired.com - 29
- adobe.com - 29
- scobleizer.com - 29
- meebo.com - 28
- zooomr.com - 28
- secondlife.com - 28
- twitter.com - 28
- bloglines.com - 27
- rojo.com - 27
- podshow.com - 27
- paidcontent.org - 27
- nwsource.com - 26
- riya.com - 26
- webreakstuff.com - 25
- blogherald.com - 25
- jot.com - 25
- ysearchblog.com - 25
- videoegg.com - 25
- allpeers.com - 25
- calacanis.com - 25
- comscore.com - 25
- hitwise.com - 25
- newsgator.com - 24
- oreilly.com - 24
- photobucket.com - 24
- stumbleupon.com - 24
- techcrunch40.com - 24
- sixapart.com - 23
- pandora.com - 23
- bubbleshare.com - 23
- pubsub.com - 22
- searchenginewatch.com - 22
- engadget.com - 22
Once you get past the top 15, the number of links to each site decreases rapidly. You either get linked to a lot, or not. Black, White, and very, very few shades of grey.
That’s the end of it folks! Want any more data about TechCrunch? Just leave a comment, and I’ll pull the data up for ye!
[private service advertisement]
Yeah, the ad guy again. This ad here is to fund my hope for a new computer. My old one is crumbling, and slow. I couldn’t run as much code as I want to in this machine. So, you can help me buy a new computer, by either buying data about Digg or your blog (or your favorite blog (like, say, TechCrunch? Engadget?)) for a damn small price, or by donating at the ChipIn page. I’ll name a part of the computer after everyone who donates
Thanks to Deanna McNeil, Rob LaGesse and ‘Chuck’ Rector for donating!







[...] link analysis Yuvi does another one of his great analysis
TechCrunch link analysis « Scobleizer | October 11, 2007 | 5:38 pm[…] link analysis Yuvi does another one of his great analysis of linking patterns on blogs. This time he’s looked into TechCrunch. Found that 1/3rd of all links on TechCrunch are back to TechCrunch itself! Heheh, I do the same […]
[...] does another one of his great analysis of linking
TechCrunch link analysis » Ecommerce Blog | October 11, 2007 | 5:46 pm[…] does another one of his great analysis of linking patterns on blogs. This time he’s looked into TechCrunch. Found that 1/3rd of all links on TechCrunch are back to TechCrunch itself! Heheh, I do the same […]
Hi Good number analysis. I suspect that the conversational
Sam Sethi | October 11, 2007 | 5:49 pmHi
Good number analysis. I suspect that the conversational index is becoming less important due to peoples lack of time and attention to make comments on sites when they prefer to twitter\jaiku and link which is what Scoble just did to make me read this. It interesting to see how TC supports its own with links and top sites only. The next metric we will all start to use is number of views and time on page/per user. i.e attention.
[...] (via Scoble) Haven’t had much time to parse this
TechCrunch Data : The Last Podcast | October 11, 2007 | 5:55 pm[…] (via Scoble) Haven’t had much time to parse this yet, but Yuvi did a great analysis of TechCrunch. […]
When I clicked on the ChipIn! button at http://yuvipanda.chipin.com/computer I
-gary | October 11, 2007 | 6:03 pmWhen I clicked on the ChipIn! button at http://yuvipanda.chipin.com/computer I was redirected to a 404 at PayPal.
Oh well, maybe next time.
It was a Paypal issue as they seem to be
carnet | October 11, 2007 | 8:44 pmIt was a Paypal issue as they seem to be doing a major upgrade. We are seeing intermittent outages on *some* of our widgets. I just checked on this event and it seems to be working for me as of 1:45 pm PDT.
yuvi: great analysis. do you mind sharing how you
wlai | October 11, 2007 | 9:18 pmyuvi: great analysis. do you mind sharing how you were able to gather the data for the link analysis? custom crawler or a was it a tool that you used?
[...] does another one of his great analysis of linking
Skype Cracks » TechCrunch link analysis | October 12, 2007 | 1:09 am[…] does another one of his great analysis of linking patterns on blogs. This time he’s looked into TechCrunch. Found that 1/3rd of all links on TechCrunch are back to TechCrunch itself! Heheh, I do the same […]
Thanks for crunching those numbers again Yuvi. That's pretty astonishing
Frederic | October 12, 2007 | 6:20 amThanks for crunching those numbers again Yuvi. That’s pretty astonishing that TC now pretty much links to its own properties more than to anybody else.
Hi Yuvi, nice work. Provides good insight into the picture no
Alexander van Elsas | October 12, 2007 | 8:03 amHi Yuvi,
nice work. Provides good insight into the picture no one ever tells
It seems to me that most of the mathematics in these rankings is about “who has the biggest…”. This, of course, being an important measure for advertisement revenues and ego. but as an unwanted side effect, it also helps the blogging sphere to become a bounded sterile surrounding, like a vacuum. Once a few sites have become sufficiently big enough it will lead to 2 unwanted things:
1. New blogs, no matter how hard they try will not succeed in becoming read by others, as their importance is diminished by the bigger ones.
2. Everyone links to the same sites, bringing the same “scoops” making the blogosphere more and more predictable and immune to new or creative thoughts (same blah blah right?)
So the question becomes, isn’t there a way to come up with a metric that will benefit every blogger, as well as preventing the blogosphere to become inert and boring?
I wrote a small proposal to change the metrics to something I would like better and called it after the scientist I borrowed the idea from. It’s called Newton’s Universal Law of Blog atraction. Care to implement this and see what happens? Keep up the good work!
http://vanelsas.wordpress.com/2007/10/09/newtons-universal-law-of-blog-attraction-better-than-a-techmeme-leaderboard/
@Sam: I'd agree that the rise of Twitter and Facebook
yuvipanda | October 12, 2007 | 2:45 pm@Sam: I’d agree that the rise of Twitter and Facebook are reducing the value of comments, but ultimately, when it comes to attention data, comments demand the most attention from the user and so are the most valuble. It’s a pity that twitter doesn’t issue trackbacks.
@Gary: Heh. Talk about lost business value
@Carnet: Thanks for clarifying. This isn’t the first time this issue had been reported…
@Wlai: I have built a generic tool that can take in XML files and crunch out stats like these (I call it Chinki after my friend). For each blog I want to analyse, I just write a small piece of scrapper code that converts the entire blog to a single XML file. I’ll put up a post about this in a few days.
@Alexander: That’s a rather damn interesting theory you have in there! I was thinking along the same lines some time back, with the whole “universe” centering around “you”. I’ll certainly look into this when school permits.
Thanks yo, folks!
[...] In amongst his well written overview was the reference
Why WordPress 2.3 tags are wrong | WinExtra | October 12, 2007 | 5:43 pm[…] In amongst his well written overview was the reference to Yuvi the StatBot coder who did an analysis of TechCrunch which pointed out that it is one of the biggest tech blogs for linking back to itself rather than […]
[...] does another one of his great analysis of linking
iTablet.mobi » TechCrunch link analysis | October 16, 2007 | 1:25 am[…] does another one of his great analysis of linking patterns on blogs. This time he’s looked into TechCrunch. Found that 1/3rd of all links on TechCrunch are back to TechCrunch itself! Heheh, I do the same […]
[...] Read the rest of this great post here [...]
amazon » StatBot: TechCrunch Data Analysis | October 18, 2007 | 10:15 pm[…] Read the rest of this great post here […]
[...] According to Yuvi, if you are to randomly click
cebeci » Blog Arşivi » A Lesson from Valleywag - Good Linking EtiquettesIndia Inc. » feature | October 19, 2007 | 9:05 am[…] According to Yuvi, if you are to randomly click any link on TechCrunch, there’s a 50% chance that you’ll end up […]
[...] greatest offender, by the way, is TechCrunch. According to
Internal linking still sleazy : The Last Podcast | October 20, 2007 | 6:44 pm[…] greatest offender, by the way, is TechCrunch. According to Yuvi’s numbers, they link to themselves almost 50% of the time since they started the CrunchBase, even though a […]
[...] was to do it even more than us; so
Internal Linking Explained | October 21, 2007 | 12:54 am[…] was to do it even more than us; so you end up with a situation where some blogs begin to make 47% of their links internal, and frequently the only link in the post is to an internal page, not to the site being discussed. […]
[...] was to do it even more than us; so
Internal Linking Explained « ShortNet | October 21, 2007 | 1:05 am[…] was to do it even more than us; so you end up with a situation where some blogs begin to make 47% of their links internal, and frequently the only link in the post is to an internal page, not to the site being discussed. […]
Good review. Very insightful.
mark | October 21, 2007 | 2:23 amGood review. Very insightful.
[...] was to do it even more than us; so
Internal Linking Explained | moraaz.org - feed all tech! | October 21, 2007 | 4:11 am[…] was to do it even more than us; so you end up with a situation where some blogs begin to make 47% of their links internal, and frequently the only link in the post is to an internal page, not to the site being discussed. […]
[...] was to do it even more than us; so
» Internal Linking Explained Tech Web Daily: Just another Tech News Blog | October 21, 2007 | 4:44 am[…] was to do it even more than us; so you end up with a situation where some blogs begin to make 47% of their links internal, and frequently the only link in the post is to an internal page, not to the site being discussed. […]
[...] According to Yuvi, if you are to randomly click
daily digital blog » A Lesson from Valleywag - Good Linking Etiquettes | October 24, 2007 | 11:30 am[…] According to Yuvi, if you are to randomly click any link on TechCrunch, there’s a 50% chance that you’ll end up […]
[...] all comes a week after the linking characteristics of
Google Changing the PageRank Algorithm? | Crenk | October 24, 2007 | 5:33 pm[…] all comes a week after the linking characteristics of Techcrunch was analysed. Where it was reported that 1/3 of all Techcrunch outgoing links where to related Techcrunch sites. Hence, link farms do […]
halooo good http://www.moon25.com http://www.moon25.com/vb
totti | October 25, 2007 | 9:35 amhalooo
good
http://www.moon25.com
http://www.moon25.com/vb
[...] Read the rest of this great post here [...]
photobucket » StatBot: TechCrunch Data Analysis | October 27, 2007 | 4:42 am[…] Read the rest of this great post here […]
[...] the advertiser, backed up by the ADHD “scoop seeking”
My pitch for the Web 2.0 EXPO 2008 « Alexander van Elsas’s Weblog on new media & technologies and their effect on social behavior | October 31, 2007 | 11:42 am[…] the advertiser, backed up by the ADHD “scoop seeking” tech blogger publishing about it (and other bloggers demystifying the popularity of that). To get out of this web 2.0 advertisement trap we need some lateral thinking and entrepreneurs […]
[...] I have really become addicted to stats now and
Night Dreaming (by Sudar) » Year End Stats for 2007 (Graphs) | January 2, 2008 | 6:22 pm[…] I have really become addicted to stats now and will try to dig more deep into my blog to unearth other valuable information when I get some time. This explains why everybody was soo fascinated by Yuvi’s cool graphs. […]
Amazing post, i love Techcrunch, great site isnt it?
Playstation Gams | October 9, 2008 | 11:44 pmAmazing post, i love Techcrunch, great site isnt it?