StatBot visits The Old New Thing
February 15, 2007 | 1:29 pmIt’s been a long time since the last StatBot post, because of a couple of reasons. First, I spent a long time writing a Wikipedia External Links Dump parser, only to realize after that it didn’t provide enough information for me to draw solid conclusions from. Then, I started working on an Engadget vs Gizmodo StatBot post, and it has been a revelation. I violatedJoel’s laww (Don’t rebuild anything that works) when I tried to do everything from scratch rather than just modifying the scobleizer Scrappr. It didn’t turn out well, since I had to fix bugs I had already fixed elsewhere, and I inevitably introduced more bugs than I fixed. Lesson Learnt:) Not to forget that School kept coming in my way…
Anyway, after a few hours trying to get the Gizmodo Scrappr working(Their Pagination system, truly, really, sucks hardcore), I decided to give up for the day and do something else. That bought me to the First Blog I read regularly: The Old New Thing by uber-programmer Raymond Chen.
So, what did I do? After a hour and a half of hacking around, I had modified the Scobleizer’s Scrappr(PaleRash) into a Raymond Chen Scrappr(RayRash) with one caveat(HTMLAgilityPack seemed to crapp up whenever he used Tables, so I had to skip four posts). And here’s my tribute to the man who’ll forget more about Windows and Win32 programming than I’ll ever know.
But Before that…
Happy Birthday to me! I know this comes some 6 days late, but then, I’m lazy and been somewhat sick, so sorry for the delay. This year has been fantastic for me, with me moving to my own hosting space, interviewing Sriram Krishnan and finally deciding my goal in life is to work at Microsoft, this blog getting famous, me making a lot of friends,me buying a camm, and a camp that’s changed my life. Thanks life, and thanks to all the people who’ve made this possible.
This year, I got my first ever Birthday Greeting, from my friend Abu. Thanks Abu!
My friends celebrated my birthday at school day before yesterday, along with one of my friend’s birthday. My first cake at school, and I have a short clip of it here. Wanna see how afterschool life is here?:D [I’m the ugliest guy you could see in the Video]
[Warning: If you do not like shaky, stupid, non-professional, amatuer looking videos done to be shared with friends and then made public, please skip right ahead]
[Warning: Non-English Language Ahead]
Wishlist? An MSDN Subscription is the perfect gift for a wannabe .net geek!
Also, happy Birthday to David!
General Statistics
I analyzed all of Raymond’s Blog Posts from July 21, 2003 to February 9, 2007, a total of 1,298 days, or more than 3 and a half years. The corpus consisted of 1,490 posts, containing 29,63,695 characters and 9,08,498 words. He averaged 1.14 posts a day, with an average of 609 words a post and 1,989 characters a post. That gives us 2283 characters a day with 700 words a day.
Posting Frequency
Our First Chart for the Day!
Not much of a difference throughout the years, though peaked in May 2004. Been flat and consistent after May 2005.
Is Raymond a Bot?
Note: This section is a piece of poorly attempted humor.
The question seems natural, doesn’t it? (Yes, I agree it isn’t natural). How the heck can one human brain possibly host so much information and remember so much history while still hacking around complicated code? However, if Raymond Chen was actually a bot running somewhere which aggregates all those pieces of information and then posts them….
What could we do to detect such a bot? Regularity, ofcourse!
The 7′o clock syndrome! A whole 77% of all posts were posted at 7 AM. Whassup with 7?:D
I kindly request Raymond Chen to furnish us all with a good photograph of himself. No, Shaky Channel9 Vids or strange pics with you posing like a girl don’t count.
Links
Being an absolutely technical writer, he doesn’t link much. Infact, All those 1,490 posts contain only 640 links, which means that on average, each post contains 0.4 links(:)), which basically means that there are a whole lot of posts without even a single link. To be exact, 1,136 of his posts, or 76% of his posts, have no links in them.
Here’s a graph.
Maybe, Less Links = More Content?
Anyway, here’s our graph of his linking frequency:
After a single link in his opening post, the next link came a month later. Sortof increased from November 2003 to Nov 2004, and has been pretty stable since.
Here are the top 10 sites he links to:
| Rank | Site | Links |
| 1 | Old New Thing | 75 |
| 2 | MSDN | 69 |
| 3 | NPR.org | 22 |
| 4 | Other MSDN Blogs | 20 |
| 5 | Microsoft.com | 13 |
| 6 | weblogs.asp.net | 13 |
| 7 | Wikipedia | 6 |
| 8 | blog.ryjones.org | 4 |
| 9 | support.microsoft.com | 4 |
| 10 | metafilter.com | 4 |
And, here’s our colorful chart:
No surprises here for me, except the NPR.org (National Public Radio), which is also the top most linked to single url, occuring 5 times in 5 posts.
Code Blocks
Raymond writes Code. He’s a Programmer. How much code is there in his blog posts?
Of the 1490 posts, only 444 contained atleast one <code> block (which he uses for inline code, i.e. Function names, etc) and only 71 contained atleast one <pre> block(which he uses for code samples). This means about 30% of his posts contain Inline Code(<code> blocks) while only 5% of his posts contained Code Samples(<pre> blocks)
Here are a couple of graphs showing the percentage of posts with and without code and with and without Code Sample Blocks.
Also, it looks like his posts are containing less code samples as time goes by.
Does not really matter for me, since that Win32 code is above my head(I was introduced to Windows programming with VB6 in 2003, and migrated to VB 05 in 2005). And, maybe, it is also a good thing for people like me, since I usually scroll right over any Code Samples he has, and many like me read him more for his writing, the trivia, the reasons why things are the way they are than for the Code.
Linking affecting his Code Samples?
The number of links increased at about the same time that the number of Code Samples per post started to come down. Corelation?
Technical Words
I could bore you to death with the list of the top ten words you used, but as I said, it would bore you to death. So, instead, here is the list of top 10 technical words that he’s used:
| Ranks | Word | Occurences |
| 30 | { | 1765 |
| 31 | } | 1733 |
| 43 | windows | 1367 |
| 51 | program | 1221 |
| 52 | window | 1208 |
| 61 | function | 967 |
| 72 | return | 866 |
| 75 | hwnd | 834 |
| 79 | file | 784 |
| 80 | message | 783 |
| 81 | code | 778 |
| 86 | dialog | 728 |
| 93 | memory | 659 |
| 99 | user | 610 |
| 103 | control | 572 |
| 106 | system | 532 |
| 107 | example | 530 |
| 114 | class | 496 |
| 116 | menu | 483 |
| 120 | null | 477 |
| 123 | case | 469 |
| 128 | call | 461 |
| 134 | programs | 445 |
| 136 | int | 441 |
| 139 | thread | 439 |
| 142 | text | 434 |
| 148 | data | 422 |
| 153 | void | 410 |
| 159 | address | 370 |
| 164 | handle | 355 |
| 181 | object | 326 |
| 185 | version | 321 |
It is funny how the open and close braces were the most used technical words, outdoing ‘windows’ by around, 500 occurences.
Interesting occurences
Some words can turn out to be funny for the nitpicker. So, let’s pick a few interesting words and see what words lie alongside them(i.e. are used as much as them)
Let’s start out with, well, Raymond? Occurs 92 times, along with ‘discussion’, ‘global’, ‘ptr’, ‘hook’, ’hmenu’, ’simply’, ’solid’ and ‘cpu’ allowing me to claim fame by associating the phrase “Simply Solid CPU” with Raymond.
Vista? 56 times, along with ‘cost’, ’seperate’, ’slow’, ‘apparently’, ‘hardware’, ‘pay’, ‘magic’ and ‘overflow’. Coincidences are funny, ain’t they?
Chen? Just 5 times(I guess he doesn’t use his full name often), along with ‘blew’, ‘poking’, ‘rumors’, ‘movement’, ‘usual’, ’severity’, ‘lacks’, ‘independently’, ‘circuit’, ‘volunteers’ and ‘hacks’, maybe suggesting that the rumor that Raymond Chen is actually a bot built by volunteers independendly hacking together circuits is actually credible.
And, Linux? It occured 10 times, right in the neighborhood of ‘podcast’, ‘forms’, ‘tired’, ‘amazingly’, ‘neat’, ‘acting’, ’suffer’, ‘excitement’, ‘hyperthreading’, ‘cd-rom’, ‘angeles’(;-?), ‘’steve’, ’spare’, ‘plans’, ‘jenny’, ‘assist’, ‘marker’, ‘accepts’, ‘preliminary’, ‘confuse’, ‘illustration’, ‘backup’, ‘biggest’, ‘rundll32′, ‘tape’, ’suddenly’, ‘latency’, and a host of others that’ll get me flamed.
And, Apple? 5 timess, in the neighborhood of ‘cheaper’(ironic?), ‘purchase’, ‘lpparam’, ‘lpwindowname’, ‘legalcopyright’, ‘blocked’, ‘multi-processor’, ‘ chemical’, ‘artist’, ‘adult’(;)), ‘midnight’, ‘corrupts’, ‘distracted’, ‘fancier’, ‘ancient’, ‘cookbooks’, ‘pbit’, ‘lvm’, ‘critsec’, ‘destruct’, ‘mbstate’, ‘misnomer’, and a dozen of others listing whom would be a waste of your time and the speed of the tubes.
Want some more? Leave a comment, and if it’s interesting enough, I’ll post more interesting occurences for you…
Disclaimer
These numbers, while interesting, are just Quantitive. Remember, people, and consequently things they do, are too complex to be reduced to simple numbers. This “reduce to simple number” concept is ruining education, and don’t let it ruin you!
Just remember, there are three types of lies: Lies, School Books and Statistics. Raymond’s writing is priceless, and his writing style influenced me quite a bit. Thanks Raymond, and rock on! I’ll buy your book when I have the money.
And, I am thinking of doing one StatBot post a month, since that will give me enough time for learning more languages(I plan on doing Perl and Ruby) and building some apps(I’m thinking of an RSS Reader in WPF…).







Time to update your masthead! Fun video.
deannie | February 15, 2007 | 6:58 pmTime to update your masthead! Fun video.
Thanks for the birthday wishes Yuvi! I may be a
David Wilkinson | February 15, 2007 | 8:43 pmThanks for the birthday wishes Yuvi! I may be a little on the late side, but right back at you!
Love the squirty cream… Wish I had been there!
Nice one Yuvi!
I once had the opportunity to talk to Raymond personally,
Anando | February 15, 2007 | 8:56 pmI once had the opportunity to talk to Raymond personally, and he told me that he writes all his blog posts in advance and has automated the task of posting and hence you see all his posts pretty much at the same time of the day. Also, his posts will appear when he is on vacation or else away from his computer. Hope that explains the 7AM syndrome.
I think you should program something that will sync my
Aaron Axvig | February 16, 2007 | 1:13 amI think you should program something that will sync my Windows RSS feed-store between my computers. The programming should be interesting, but I think the real fun comes with deciding where to store the master data (unless you plan on just having one computer’s feed-store be the master?). Some ideas I have:
A shared directory on a computer or server.
A G-Mail account. I have no idea how this works, but evidently it has been extended to be a file storage “drive,” so maybe it’s feasible for files storing RSS data.
An FTP account.
A WebDAV folder.
So of course for this to be useful, this thing needs to install or whatever on my computer. Then it asks whether I already have a server location, or I would like to make a new one. Let’s say this time I’m making a new one. It figures out which feeds I have and what’s been read, and then pushes all that info out to the “sync location.” Whether it actually puts the entire RSS data out there or only records what’s been read, I don’t care. After this initial operation it will both download and upload changes (and add new feeds in either direction too).
Then I install it on my other computer, tell it that this time I already have a sync location set up, and it pulls down all the feed info (and maybe the actual feeds too) and loads them up in the Windows RSS store.
Anyways, that would be really cool. If you are not so ambitious, another useful thing might be a small background process to sync the read status of RSS feeds between Outlook 2007 and IE7. I haven’t tried it myself, but I’ve heard that it doesn’t work.
I once met the man in Bldg 5 when I
Monti Tredway | February 16, 2007 | 4:58 pmI once met the man in Bldg 5 when I was a tester on the Win 95 team. He’s either a very believable android, or a very smart man, maybe he’s a cyborg…
[...] OK, it’s this simple - I ask people for
STUFFLEUFAGUS - “A true friend stabs you in the front” - Oscar Wilde » Comments Galore | February 17, 2007 | 4:43 am[…] OK, it’s this simple - I ask people for their opinion - at least three times more often than the other bloggers that I was talking to (from a brief glance at there blogs - no detailed analysis like Yuvi does). Could it really be so easy to get comments as to just ASK for them? […]
I love it when you analyze sites! This one's great! Peter
PeterP | February 18, 2007 | 2:14 amI love it when you analyze sites!
This one’s great!
Peter
Yuvi, a question. What language do you use for your Scrappr,
engtech | February 20, 2007 | 8:55 pmYuvi, a question.
What language do you use for your Scrappr, and do you manually post-process the data into charts/graphs or do you have another script that does that for you?
@engtech: I used VB 05, simply because I'm more experienced
yuvipanda | February 21, 2007 | 3:52 am@engtech: I used VB 05, simply because I’m more experienced and comfortable with it. And, yes I manually post process them, but I do have another script which extracts the Raw Data, i.e. The numbers. For example, I have a script which punches out the links to a specific host over time, but It just returns numbers which Excel turns into graphs…
Thanks for droppin by!
Sweet stuff...Amusing that you called { and } technical words
UmeshUnni | February 24, 2007 | 8:51 amSweet stuff…Amusing that you called { and } technical words
Maybe you should have counted ; too 
Thank You
Alex | April 25, 2007 | 11:12 amThank You
[...] data for the analysis of Robert Scoble’s Blog(and linkblog)
YuviSense: Codin Kid » Analysing Engadget Part I - Posts, Words, Comments & Categories | May 29, 2007 | 2:03 pm[…] data for the analysis of Robert Scoble’s Blog(and linkblog) along with that of several others (Raymond Chen, Matt Cutts, Kamla Bhatt and Rob La Gesse), I needed something big to experiment with. Really big. […]
[...] It’s not the first time he’s done such a
Analyzing Engadget's Statistics | May 29, 2007 | 9:43 pm[…] It’s not the first time he’s done such a great statistical analysis either. He’s also profiled Robert Scoble and his linkblog, Matt Cutts, and Raymond Chen. […]
Web Hosting Reviews, Web Site Hosting... I couldn't understand some parts
Web Hosting Reviews, Web Site Hosting | October 25, 2007 | 10:30 amWeb Hosting Reviews, Web Site Hosting…
I couldn’t understand some parts of this article, but it sounds interesting…
[...] accident who loves it now. I’ve previously published stats
Introducing The Statbot - The StatBot - Fun stats. Visualizations. Leaderboards. | May 1, 2008 | 1:52 pm[…] accident who loves it now. I’ve previously published stats about Engadget, Scoble(Linkblog), Raymond Chen, Matt Cutts, Louis Gray, TechCrunch, Digg & Techmeme. My personal blog is here, and this place […]