Humor in the Making: Scraping Alexa
February 17, 2007 | 11:52 amI was looking through Alexa’s HTML, trying to scrape out the Site Rank. And, I found this:
<!–Did you know? Alexa offers this data programmatically. Visit http://aws.amazon.com/awis for more information about the Alexa Web Information Service.–><awT@Gf.X2><Email:><budf@opif.org><budf@opif.org>4</budf@opif.org></budf@opif.org></Email:></awT@Gf.X2><RO4><Reach per>3</Reach per></RO4><Reach><tqy><Traffic><DwE@Gg.aB>9</DwE@Gg.aB></Traffic></tqy></Reach><zxja><u5><u5><y3e5>,</y3e5></u5></u5></zxja><tprp>8</tprp><Page Views rank:><Reach per>4</Reach per></Page Views rank:><sss><Rank><RO4><pyp>1</pyp></RO4></Rank></sss></span>
Humor? The mangled, tangled spaghetti of tag soup here would confuse and kick out most HTML parsers and certainly every XML parser, but ofcourse, to the determined Scrapper, Regexes are always there for the rescue.:D
P.S. I would have used the webservice, but it costs. Once I start making money, I’ll gladly pay that, but till then…







:~). What can stop a determined geek? Nothing.
Anand Sankaran | February 17, 2007 | 1:45 pm:~). What can stop a determined geek? Nothing.
Happy birthday :D
Aswin Anand | February 18, 2007 | 11:30 amHappy birthday
uobjpwgv wisqnx djfikz zdepsrvt clxi ikgq adhqgzuot
sgczw kmptgfwnr | September 7, 2008 | 3:11 amuobjpwgv wisqnx djfikz zdepsrvt clxi ikgq adhqgzuot