how to do web scrapping

Discussion in 'App Development' started by gmst, Feb 7, 2013.

  1. gmst

    gmst

    Hey, Thanks for the link. Looks promising, even though you say it is buggy. Will check it out.
     
    #11     Feb 7, 2013
  2. gmst

    gmst

    I don't know much (read anything) about this. LOL.

    I have heard - html, php, css. Never got a chance to build a website.
     
    #12     Feb 7, 2013


  3. Not so much buggy as quirky
    i.e. - it takes a fair amount of effort to learn their interface
    Once learned, it handles a pretty good variety of things you will find on individual web sites
     
    #13     Feb 7, 2013


  4. Have fun!
     
    #14     Feb 7, 2013
  5. guest2

    guest2

    I`m using C# with Html Agility Pack => http://htmlagilitypack.codeplex.com/ on my desktop soft to this kind of stuff.

    On my server I`ve got some php & perl scripts( + cron).

    Some of my php scripts are here(they are old, most of them i rewrote to OO style + DOM(not parsing using regex - its an ugly way))

    http://213.227.70.223/public/php_code/YahooQuotes/quotes_update.phps
    http://213.227.70.223/public/php_code/
     
    #15     Feb 7, 2013
  6. gmst

    gmst

    #16     Feb 7, 2013
  7. are you scraping news data for statistical significance? if you ping a site from the same ip a bunc hof time you will get blocked.. most sites have a protocol to do business with them upon.. like RSS or some xml ddt
     
    #17     Feb 8, 2013
  8. gmst

    gmst

    sorry, didn't see your message before. Thanks for the tip!!!

    I am not going to scrap newsfeed at the moment. My aim currently is to scrap some data on stocks from yahoo/google finance finviz and other interesting sites and see if I can make any sense of them.
     
    #18     Feb 8, 2013
  9. 2rosy

    2rosy

    perl regular expressions and mechanize library
    I am sure there's similar library's in ruby python ...
     
    #19     Feb 9, 2013
  10. slacker

    slacker

    You might take a look at the Coursera course from Georgia Tech on "Computational Investing Part I". The first few weeks covers getting data from yahoo/google. The last part of the course was using Python scripts to build a portfolio. There is also a lot of information on the class discussion forum....

    https://www.coursera.org/course/compinvesting1

    Good luck
     
    #20     Feb 9, 2013