how to do web scrapping

Discussion in 'Programming' started by gmst, Feb 7, 2013.

  1. gmst


    Can anyone advice which language and which software to use to write a basic web scrapper? Also how involved is this work? Have no prior experience with this.

    My goal is to copy and paste specific information from some websites every 1/5 minutes to an excel sheet. Thanks.
  2. Eyez


    have you tried "import data from websource" on excel?
  3. maybe try ubot
  4. gmst


    Ah I missed it. Thanks for pointing :)

    I will give it a try and report back my experience.
  5. Eyez


    np, i use it to pull treasury rates from yahoo finance to price out the curve
  6. gmst


    Thanks never heard of it before :)

    Standard edition costs 245$. Will see if excel web query can do the job.
  7. If you use C# .Net it's only 3 lines to download HTML from a URL into a string. Then you can parse it any way you want, and if you know anything about parsing, mostly it will be easy stuff to code. If it's not HTML then it's harder.

    Another advantage is also you can iterate through many URLs easily, in case not all info is on a single page.

    Here's code for VB:

    Dim client As New System.Net.WebClient
            Dim html As String
            html = client.DownloadString("")

  8. Ive done some work scraping specific data for people in the bond markets.
    One useful (though quirky) product is djuggler

    After much tinkering around with this and other products - I finally decided its easier (for me) in the long run to simply use the C# APIs and write my own code (fewer things to learn-and relearn)
  9. gmst


    Thanks, but I haven't used C#. I can do VB though. I also have never done any parsing before. I will see if I can get some resource on net to learn it.

    But the message I am getting from various posters is that it is not something that will take too much time to learn and implement.

  10. ahhhh.... just wait until you encounter the vast number of different ways web sites are implemented
    #10     Feb 7, 2013