I don't know much (read anything) about this. LOL. I have heard - html, php, css. Never got a chance to build a website.
Not so much buggy as quirky i.e. - it takes a fair amount of effort to learn their interface Once learned, it handles a pretty good variety of things you will find on individual web sites
I`m using C# with Html Agility Pack => http://htmlagilitypack.codeplex.com/ on my desktop soft to this kind of stuff. On my server I`ve got some php & perl scripts( + cron). Some of my php scripts are here(they are old, most of them i rewrote to OO style + DOM(not parsing using regex - its an ugly way)) http://213.227.70.223/public/php_code/YahooQuotes/quotes_update.phps http://213.227.70.223/public/php_code/
are you scraping news data for statistical significance? if you ping a site from the same ip a bunc hof time you will get blocked.. most sites have a protocol to do business with them upon.. like RSS or some xml ddt
sorry, didn't see your message before. Thanks for the tip!!! I am not going to scrap newsfeed at the moment. My aim currently is to scrap some data on stocks from yahoo/google finance finviz and other interesting sites and see if I can make any sense of them.
perl regular expressions and mechanize library I am sure there's similar library's in ruby python ...
You might take a look at the Coursera course from Georgia Tech on "Computational Investing Part I". The first few weeks covers getting data from yahoo/google. The last part of the course was using Python scripts to build a portfolio. There is also a lot of information on the class discussion forum.... https://www.coursera.org/course/compinvesting1 Good luck