source for stock info (mktcap, sector, etc)

Discussion in 'Automated Trading' started by mr19, Mar 4, 2009.

  1. mr19

    mr19

    I currently fetch my mktcap, sector, etc. info nightly using perl script(s) against Yahoo finance (using Finance::YahooQuote & scraping). Recently their mktcap data has been questionable causing me a bit of work. I tried the same techniques on Google but they seem to discourage scraping and block you after they detect the scraping (have to type in a code to continue).

    Any other sources? Preferably free although I'm not against paying some cash.

    TIA.
     
  2. rwk

    rwk

  3. mr19

    mr19

    Thanks for the reply.

    If you're using Perl, you can bypass the screen-scraping and use the Finance::YahooQuote package. It has been working well for me for almost 2 yrs now, but as of late the mktcap field has been showing up with "N/A" for a large number of the stocks I track.

    I'll have to take a look at AAI and see if it will work for me.
     
  4. Yahoo or one of the data suppliers might be doing some sortof data update. Hopefully this isn't permanent. All the debt/equity ratios I'm getting back from Yahoo are "N/A". PM me if you want some horrid, horrid Java spaghetti code that somehow manages to work for Yahoo and MSN. Something like AAII might be a better way to go though.
     
  5. drp7804

    drp7804

    Sorry to dredge up an old thread, but saw the bit above about screen scraping Yahoo. Just curious what exactly you mean by that? Do you mean finding and converting the visual (bitmap) data on your screen into usable data (thinking something like the memory scanning that online poker bots do, etc)..... or did you just mean pulling price data from Yahoo via HTTP calls? Or something else entirely?
     
  6. rwk

    rwk

    I just download the HTML to my program and parse it to extract the data I need. In Windows, the WinInet API is really easy to use for that.
     
  7. mr19

    mr19

    I write perl scripts (although it could be in Python, Java, etc) that basically act like web browsers and download web pages with the data I care about. After the I fetch the page I parse it looking for fields I need.

    Sometimes it's easy, sometimes it's a PITA.

    For example, I have a perl script that downloads OHLC data using the URL below and then stuffs it into a database I maintain.

    http://ichart.finance.yahoo.com/table.csv?s=IBM&d=2&e=9&f=2011&g=d&a=0&b=2&c=1962&ignore=.csv

    Other times you have to download actual HTML content and figure out ways to identify the data you are interested in.

    Hope that helps.
     
  8. HTML, not bitmap. I believe the SMF Addin for Excel operates the same way, HTML parsing.

    http://finance.dir.groups.yahoo.com/group/smf_addin/
     
  9. drp7804

    drp7804

    Thanks, I think the term "screen scraping" is what caught my curiosity, but it sounds like most folks are either parsing page HTML or pulling the CSV price data with HTTP. I've been running a Java program for the past couple years which pulls data each night via HTTP calls. I think we've been lucky that Yahoo hasn't started limiting this activity like Google... not sure about you all, but I have a list of something like 28k symbols that I make a pass through each night... and can only remember one morning when I woke up to an issue from the previous night's download.