YuviSense: Codin Kid

Yuvi, a 17 year old wannabe geek from India.
  • rss
  • Blog
  • Photography
  • Indians on Twitter
  • About Me
  • Contact

StatBot: Analysis of Scobleizer.com – Part 0 General Statistics

December 30, 2006 | 4:49 am

After reading Sriram’s old post on building a search engine, I wanted to teach myself screen scrapping. So, I just knocked up Visual Basic 2005 Express Edition, and for no apparent reason, named the project PaleRash. Then after learning the hard way that most sites aren’t XHTML compliant (heck, even the old digg!), I looked around for a .NETty forgiving HTML parser, and thankfully found the awesome HTMLAgilityPack by Simon Mourier.

Now, with the technical side settled, I needed a target. A simple blog with a semantically meaningful structure. And the first name that popped up in my head was Scoble.

I’ve been reading Scoble since I arrived at the blogsphere, mostly as a news source: A human filter who filters the news so that it might not alter my faith in Microsoft Religion, add some opinions of his own, and boldly debunk some of the facts that could make the heretics lure out the faithful.

Anyway, so after a weekend of coding, a week of debugging, and another weekend of rewriting, I finally had a base which could be modified to run link and word statistics on any blog. The results were interesting, to say the least.

Now, since writing a single post about all that would be a crime on humanity, I’m going to split it up. So, here’s the part zero, just the lame General Statistics. Stay tuned, for I have more in the works.

General Posting Statistics

I used all the posts from scobleizer.com, from the time he switched to Wordpress.com (1/10/05) including the “Hello World” post, till 22/12/06, or the time during which he didn’t post for two days. That gives us 447 days of blogging, which equals to 2405 posts, at an average of 5.3 posts a day.

What’s a statistics post without graphs? I’ll forewarn you that today’s charts aren’t pretty interesting. Here’s our first:


 

That worm’s pretty straight. So straight, that after September, I could barely see the straight trend line I drew! So, though a bit rough in the early days, it has pretty much stabilized now.

General Links Statistics

In those 2405 posts, he’s linked to 4992 pages, 4321
of them unique, spread out across 1858 domains(including subdomains). That gives you an average of 11.16 links a day and 2 links a post, with 86% of links being unique.     

I love graphs!


Dang, another straight line! A slight drop in linking frequency at May-06 to Sep-06, but then it’s resumed to normal propotions.

General Word Usage Statistics

He used 464,078 words in all, at an average of 192 words a post and 1038 words a day. He however used only 14,898 unique words, including derivatives, meaning that each word repeated, on average, 31
times. In contrast, Shakespeare used 884,647 words in all, of which 31,534 were unique, and repeated each word 28 times on average[source]. Translation: Shakespeare knew twice as many words, but Scoble wrote as much as half of all Shakespeare in a year and a half! I guess that maybe if I had considered his full blogging career, he might’ve surpassed Shakespeare in quantity, but…

One more Graph!


Another Straight line! So, the number of words he’s using is pretty constant. There’s a small cliff in September, where he started writing bigger posts, before getting back to his normal routine of smaller posts…

General Character Usage Statistics

2,103,816 Characters (or more precisely, Unicode Code Points) were used by him, including spaces, full stops, and all sorts of punctuation. That makes an average of 874 characters a post, 4706 characters a day. Poor keyboard. Poor Fingers.:) He used an average of 4 characters a word, which seems okay to me. This excludes HTML markup.
Unicode Code Points 


Yippee!! Another Straight Line!

That ends it for today. As I said, today’s post is pretty tame. As my Math teacher once said, Data is not interesting: Information is. Today’s post is data. Coming posts are information:D

Lot more charts to come!

You know those teasers they air at the end of every episode of most Animes? Here’s the equivalent:

Teasers

  • Scoble linked more to Ze Frank than to Scoble Show :D (The domain, that is. It redirects to podtech)
  • Before Joining PodTech, he’s linked only 7 times to Pod Tech. Today, PodTech is at #3 in the total number of links, and is actually #2 if you count in links to scobleshow.com, which redirect to PodTech.
  • In a move that confounds me, he had more closing parenthesis than opening ones!
  • The top ten sites he links to make up about one fifth of his total links.
  • He has linked more to typepad.com hosted blogs than any other. Livejournal blogs are so low on the list you’d think they barely exist.
  • Wordpress’s WYSIWYG editor saved him and his keyboard quite a lot of keystrokes. A lot.
  • He used the word Microsoft more than the word Google:D
  • He used the word I about twice as much as the word You. That egoistical bastard:D

Part I coming soon!

Any guesses on the most linked to site?

P.S. Incase you are wondering, I did respect the robots.txt file at scobleizer.com.

P.P.S: I have a whole mountain of data, and am pretty short on Ideas on what to do with it. Any suggestions?

P.P.P.S: Scoble’s now busy with a political campaign. Let’s see how long it takes till he finds this post by an almost unknown Z-lister (if he ever does!)

Technorati tags: Scoble, Statistics, .NET, Screen Scraping, StatBot

 

Categories
StatBot, Tech
Comments rss
Comments rss
Trackback
Trackback

« BSNL kicks up speed and usage cap… Calling Google AJAX Search API from C# »

18 responses

This is amazing!! an image is really worth 1000 words

Aswin Anand | December 30, 2006 | 10:31 am

This is amazing!! an image is really worth 1000 words :)

[...] The Zeitgeist of Scoble Wow, a 15-year-old did a huge

The Zeitgeist of Scoble « Scobleizer - Tech Geek Blogger | December 30, 2006 | 5:08 pm

[…] The Zeitgeist of Scoble Wow, a 15-year-old did a huge analysis of my blog. Some of the things he’s found: […]

Yuvi, You are no longer a wanna-be geek, anybody that would

Guy Pelletier | December 30, 2006 | 5:32 pm

Yuvi,
You are no longer a wanna-be geek, anybody that would dissect a blog as you have is definitely a geek!
I will have to subscribe just to find out everything anyone would ever need to know.
I would ask the following:
1. What post has the most response to it
2. What post did Robert comment the most on
3. How many posts have no comments (if any)
4. Will you upload and share your program

Great work, Yuvi. You should put your work as an

Krishna Kumar | December 30, 2006 | 6:21 pm

Great work, Yuvi. You should put your work as an application online. I would definitely like to use it to analyze my blog!

Brilliant analysis, well done as #4 says can we have

Geoff | December 30, 2006 | 6:42 pm

Brilliant analysis, well done as #4 says can we have the application.
Can you get some of your friends to analyse all of Hughs drawings at http://www.gapingvoid.com so we can do a search for the text he’s used.

Good stuff. I think that the information on his linking

Alfred Thompson | December 30, 2006 | 7:21 pm

Good stuff. I think that the information on his linking is the most interesting and useful BTW.

I can answer the question on use of the close

deannie | December 30, 2006 | 7:54 pm

I can answer the question on use of the close parens: often, when writing a list with numbers, one will do it this way:
1)
2)
3)

That is probably what produced the pattern you saw.

Awesome. Could the closing parens be smilies? :-) What impresses me

Michael Markman | December 30, 2006 | 7:58 pm

Awesome. Could the closing parens be smilies? :-)

What impresses me most (about Scoble, that is) is the consistency of his output. It’s almost as though he has cruise control.

Well done! I'm looking forward to the rest of

Karoli | December 30, 2006 | 8:05 pm

Well done! I’m looking forward to the rest of your series.

I don't wish to be bad influence, but I for

Ron K Jeffries | December 30, 2006 | 8:19 pm

I don’t wish to be bad influence, but I for one
would pay a modest fee to have you analize
my blog site.

Yes, I’m serious.

You got "Scobleized". awesome!

Sharath | December 31, 2006 | 8:45 am

You got “Scobleized”. awesome!

Thanks for all the comments guys... @deannie: I think that's it:D

yuvipanda | January 1, 2007 | 1:22 am

Thanks for all the comments guys…

@deannie: I think that’s it:D It didn’t occur to me…

@Micheal: No, I excluded smileys.

@Aswin, Sharath: Thanks a lot my friends:) Thanks for the support…

Kudos! Yuvi! on being Scobleized ( The olympic medal

sb | January 1, 2007 | 2:22 am

Kudos! Yuvi! on being Scobleized ( The olympic medal of blogging ) :-)

Dont be surprised if kiruba comes knocking at your door someday….

Are you considering releasing your program?

Andrew Ferguson | January 2, 2007 | 2:07 am

Are you considering releasing your program?

That's the most fascinating post I've ever come across on

PeterP | January 8, 2007 | 6:21 am

That’s the most fascinating post I’ve ever come across on the internet….

I have to ask two questions:

1) How long did this all take?

2) Why?

Fascinating though, thanks!

[...] http://blog.yuvisense.net/2006/12/30/statbot-analysis-of-scobleiz ercom-%e2%80%93-part-0-general-statistics/ [...]

Life - from inside my head » Blog Archive » Having a smashing time | January 8, 2007 | 6:03 pm

[…] http://blog.yuvisense.net/2006/12/30/statbot-analysis-of-scobleiz ercom-%e2%80%93-part-0-general-statistics/ […]

[...] yaşayan 15 yaşında bir genç. Bir süre önce Scobleizer.com

Bak şu Yuvi’nin yaptığına « Mustafa Ulu | February 10, 2008 | 12:19 pm

[…] yaşayan 15 yaşında bir genç. Bir süre önce Scobleizer.com günlüğü hakkında ayrıntılı bir inceleme […]

[...] per tweet. Still, just about half as many chars

Statbot visits Scoble at Twitter - The StatBot - Fun stats. Visualizations. Leaderboards. | May 1, 2008 | 1:53 pm

[…] per tweet. Still, just about half as many chars as found on his Wordpress blog when I last profiled it more than a year […]

Leave a comment

You can use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact Me

Email: yuvipanda@gmail.com
IM: yuvipanda@msn.com
GTalk: yuvipanda

My Photos


View my Flickr Page

My Badge


IndiBlogger - Where Indian Blogs Meet

Archives

  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox