StatBot: Analysis of Scobleizer.com – Part 0 General Statistics
December 30, 2006 | 4:49 amAfter reading Sriram’s old post on building a search engine, I wanted to teach myself screen scrapping. So, I just knocked up Visual Basic 2005 Express Edition, and for no apparent reason, named the project PaleRash. Then after learning the hard way that most sites aren’t XHTML compliant (heck, even the old digg!), I looked around for a .NETty forgiving HTML parser, and thankfully found the awesome HTMLAgilityPack by Simon Mourier.
Now, with the technical side settled, I needed a target. A simple blog with a semantically meaningful structure. And the first name that popped up in my head was Scoble.
I’ve been reading Scoble since I arrived at the blogsphere, mostly as a news source: A human filter who filters the news so that it might not alter my faith in Microsoft Religion, add some opinions of his own, and boldly debunk some of the facts that could make the heretics lure out the faithful.
Anyway, so after a weekend of coding, a week of debugging, and another weekend of rewriting, I finally had a base which could be modified to run link and word statistics on any blog. The results were interesting, to say the least.
Now, since writing a single post about all that would be a crime on humanity, I’m going to split it up. So, here’s the part zero, just the lame General Statistics. Stay tuned, for I have more in the works.
General Posting Statistics
I used all the posts from scobleizer.com, from the time he switched to Wordpress.com (1/10/05) including the “Hello World” post, till 22/12/06, or the time during which he didn’t post for two days. That gives us 447 days of blogging, which equals to 2405 posts, at an average of 5.3 posts a day.
What’s a statistics post without graphs? I’ll forewarn you that today’s charts aren’t pretty interesting. Here’s our first:

That worm’s pretty straight. So straight, that after September, I could barely see the straight trend line I drew! So, though a bit rough in the early days, it has pretty much stabilized now.
General Links Statistics
In those 2405 posts, he’s linked to 4992 pages, 4321
of them unique, spread out across 1858 domains(including subdomains). That gives you an average of 11.16 links a day and 2 links a post, with 86% of links being unique.
I love graphs!

Dang, another straight line! A slight drop in linking frequency at May-06 to Sep-06, but then it’s resumed to normal propotions.
General Word Usage Statistics
He used 464,078 words in all, at an average of 192 words a post and 1038 words a day. He however used only 14,898 unique words, including derivatives, meaning that each word repeated, on average, 31
times. In contrast, Shakespeare used 884,647 words in all, of which 31,534 were unique, and repeated each word 28 times on average[source]. Translation: Shakespeare knew twice as many words, but Scoble wrote as much as half of all Shakespeare in a year and a half! I guess that maybe if I had considered his full blogging career, he might’ve surpassed Shakespeare in quantity, but…
One more Graph!

Another Straight line! So, the number of words he’s using is pretty constant. There’s a small cliff in September, where he started writing bigger posts, before getting back to his normal routine of smaller posts…
General Character Usage Statistics
2,103,816 Characters (or more precisely, Unicode Code Points) were used by him, including spaces, full stops, and all sorts of punctuation. That makes an average of 874 characters a post, 4706 characters a day. Poor keyboard. Poor Fingers.:) He used an average of 4 characters a word, which seems okay to me. This excludes HTML markup.
Unicode Code Points

Yippee!! Another Straight Line!
That ends it for today. As I said, today’s post is pretty tame. As my Math teacher once said, Data is not interesting: Information is. Today’s post is data. Coming posts are information:D
Lot more charts to come!
You know those teasers they air at the end of every episode of most Animes? Here’s the equivalent:
Teasers
- Scoble linked more to Ze Frank than to Scoble Show
(The domain, that is. It redirects to podtech) - Before Joining PodTech, he’s linked only 7 times to Pod Tech. Today, PodTech is at #3 in the total number of links, and is actually #2 if you count in links to scobleshow.com, which redirect to PodTech.
- In a move that confounds me, he had more closing parenthesis than opening ones!
- The top ten sites he links to make up about one fifth of his total links.
- He has linked more to typepad.com hosted blogs than any other. Livejournal blogs are so low on the list you’d think they barely exist.
- Wordpress’s WYSIWYG editor saved him and his keyboard quite a lot of keystrokes. A lot.
- He used the word Microsoft more than the word Google:D
- He used the word I about twice as much as the word You. That egoistical bastard:D
Part I coming soon!
Any guesses on the most linked to site?
P.S. Incase you are wondering, I did respect the robots.txt file at scobleizer.com.
P.P.S: I have a whole mountain of data, and am pretty short on Ideas on what to do with it. Any suggestions?
P.P.P.S: Scoble’s now busy with a political campaign. Let’s see how long it takes till he finds this post by an almost unknown Z-lister (if he ever does!)







This is amazing!! an image is really worth 1000 words
Aswin Anand | December 30, 2006 | 10:31 amThis is amazing!! an image is really worth 1000 words
[...] The Zeitgeist of Scoble Wow, a 15-year-old did a huge
The Zeitgeist of Scoble « Scobleizer - Tech Geek Blogger | December 30, 2006 | 5:08 pm[…] The Zeitgeist of Scoble Wow, a 15-year-old did a huge analysis of my blog. Some of the things he’s found: […]
Yuvi, You are no longer a wanna-be geek, anybody that would
Guy Pelletier | December 30, 2006 | 5:32 pmYuvi,
You are no longer a wanna-be geek, anybody that would dissect a blog as you have is definitely a geek!
I will have to subscribe just to find out everything anyone would ever need to know.
I would ask the following:
1. What post has the most response to it
2. What post did Robert comment the most on
3. How many posts have no comments (if any)
4. Will you upload and share your program
Great work, Yuvi. You should put your work as an
Krishna Kumar | December 30, 2006 | 6:21 pmGreat work, Yuvi. You should put your work as an application online. I would definitely like to use it to analyze my blog!
Brilliant analysis, well done as #4 says can we have
Geoff | December 30, 2006 | 6:42 pmBrilliant analysis, well done as #4 says can we have the application.
Can you get some of your friends to analyse all of Hughs drawings at http://www.gapingvoid.com so we can do a search for the text he’s used.
Good stuff. I think that the information on his linking
Alfred Thompson | December 30, 2006 | 7:21 pmGood stuff. I think that the information on his linking is the most interesting and useful BTW.
I can answer the question on use of the close
deannie | December 30, 2006 | 7:54 pmI can answer the question on use of the close parens: often, when writing a list with numbers, one will do it this way:
1)
2)
3)
That is probably what produced the pattern you saw.
Awesome. Could the closing parens be smilies? :-) What impresses me
Michael Markman | December 30, 2006 | 7:58 pmAwesome. Could the closing parens be smilies?
What impresses me most (about Scoble, that is) is the consistency of his output. It’s almost as though he has cruise control.
Well done! I'm looking forward to the rest of
Karoli | December 30, 2006 | 8:05 pmWell done! I’m looking forward to the rest of your series.
I don't wish to be bad influence, but I for
Ron K Jeffries | December 30, 2006 | 8:19 pmI don’t wish to be bad influence, but I for one
would pay a modest fee to have you analize
my blog site.
Yes, I’m serious.
You got "Scobleized". awesome!
Sharath | December 31, 2006 | 8:45 amYou got “Scobleized”. awesome!
Thanks for all the comments guys... @deannie: I think that's it:D
yuvipanda | January 1, 2007 | 1:22 amThanks for all the comments guys…
@deannie: I think that’s it:D It didn’t occur to me…
@Micheal: No, I excluded smileys.
@Aswin, Sharath: Thanks a lot my friends:) Thanks for the support…
Kudos! Yuvi! on being Scobleized ( The olympic medal
sb | January 1, 2007 | 2:22 amKudos! Yuvi! on being Scobleized ( The olympic medal of blogging )
Dont be surprised if kiruba comes knocking at your door someday….
Are you considering releasing your program?
Andrew Ferguson | January 2, 2007 | 2:07 amAre you considering releasing your program?
That's the most fascinating post I've ever come across on
PeterP | January 8, 2007 | 6:21 amThat’s the most fascinating post I’ve ever come across on the internet….
I have to ask two questions:
1) How long did this all take?
2) Why?
Fascinating though, thanks!
[...] http://blog.yuvisense.net/2006/12/30/statbot-analysis-of-scobleiz ercom-%e2%80%93-part-0-general-statistics/ [...]
Life - from inside my head » Blog Archive » Having a smashing time | January 8, 2007 | 6:03 pm[…] http://blog.yuvisense.net/2006/12/30/statbot-analysis-of-scobleiz ercom-%e2%80%93-part-0-general-statistics/ […]
[...] yaşayan 15 yaşında bir genç. Bir süre önce Scobleizer.com
Bak şu Yuvi’nin yaptığına « Mustafa Ulu | February 10, 2008 | 12:19 pm[…] yaşayan 15 yaşında bir genç. Bir süre önce Scobleizer.com günlüğü hakkında ayrıntılı bir inceleme […]
[...] per tweet. Still, just about half as many chars
Statbot visits Scoble at Twitter - The StatBot - Fun stats. Visualizations. Leaderboards. | May 1, 2008 | 1:53 pm[…] per tweet. Still, just about half as many chars as found on his Wordpress blog when I last profiled it more than a year […]
Cialis.... Cialis....
Cialis. | October 9, 2008 | 10:38 amCialis….
Cialis….