stock database with sector

Discussion in 'Data Sets and Feeds' started by borsaci, Jul 8, 2008.

  1. borsaci


    I am looking for a relatively easy way to create and maintain a database of all Nasdaq and NYSE stocks from which I will build other tools for my trading.

    I am interested in last price, market cap, volume (avg or historical if possible), and most importantly SECTOR attributes. Also would like it to be updateable when new stocks are added.

    I want to push this to a SQL database to workwith. Would rather avoid screen scraping methods unless there is a very good solution out there I could port to .NET.

    Is there a relatively easy way to do this? Are there any better, relatively low cost or free software solutions, or subscriptions to some service that would provide convenient database access to such info?

    I see a bunch of software and web pages with this info but not sure where their source is...

  2. I've done this by screen scraping using Java and something called JTidy which can be used to clean up html pages and convert to XML that is easier to deal with. It's not that hard. Not just sector and industry composition, but fundamentals too which I'm using to build up historical fundamental data.

    I store all time series data in MySQL tables, but the instrument, fundamentals and sector/industry composition I keep in XML documents.

    If you want free, I think you will have to screen scrape.
  3. Arnie


    I've used TC2000 for eons. One of the best EOD data bases available imo. When I find an interesting stock I can click on a link that takes me to the Industry sector or sub sector. You can export the data for Symbol, H, L, O, C, V. So it would not be hard to build a data base like that. You can also click on any industry group and get a list of the stocks in it.
  4. I screen scrape fundamentals and sector infomation from Yahoo finance. Takes ~30 min to download info for all 6000 US stocks.

    string url = string.Format("{0}", symbol);
    di.handleFundies = dwnlManager.AddJob(url, 30000);

    url = string.Format("{0}", symbol);
    di.handleSector = dwnlManager.AddJob(url, 30000);

    And here is sector parser:

    public void ParseSector(string html)
    sector = "";
    industry = "";
    if (html == null || html.Length == 0)
    List<string> tokens = u.RegexMatch(html, "yfnc_tablehead1.>Sector:< /td><td.*?html.>(.*?)< /a>< /td.*?yfnc_tablehead1.>Industry:< /td>< td.*?html.>(.*?)< /a>< /td>< /tr>< /table>");
    if (tokens.Count != 2)
    sector = tokens[0];
    industry = tokens[1];
  5. borsaci


    Thanks all for the replies.

    The second link was looking good but it clearly was missing stocks. It only returned ~1500 records. For example, IPI, and POT are not included.
    I guess they are not in DJ indices?

    This may be a starting point but not a clean solution as it does not contain all US Equities. Similarly, I think ICBenchmark is only stocks in the DJ index stocks...

    Arnie: I went to TC2000 and signed up but didn't see where the eoddata was, I guess you have to pay for that, at which point I guess screen scraping yahoo is just as easy.

    I found a site that has all stock symbols and some data (no sector/industry data), that you can download.

    There is a tool similar to tidy I think that will handle clean up html to xml for parsing as well. I'll post back when I remember what it was (my link is on my other machine)

    thanks again for all your responses. will post back to let you know which way i go.

  6. borsaci


  7. I looked into that too, I never found the exact answer... but I did find a bunch of articles about the adoption of icb, it looks like all the big u.s. exchanges and financial publications use it now, and internationally too.

    I think the dow jones indexes only use a representative sample of a sector instead of every stock in the sector, but the actual icb data base is more thorough than the dj us indexes...

    im looking into buying the database, but theres no buy/purchase links on any of the info... i sent them some emails, might just call one of the phone numbers in the fine print if that doesnt work. I hope it isnt extremely expensive or institution only, et cetera.

    Ill keep you up to date, let me know if you find something too egh? :D
  8. rwk


    I have been using Telechart, but I would like to find something similar that works in a non-Windows environment (i.e. Linux).
  9. borsaci


    almostgotit... lmk what u find, i'll do the same.

    if icb is pricey and we can share the data maybe i can go halfsies on it with you. ;)

    The icb website does say under 'product attributes':

    "(1) Covers all securities classified by Dow Jones Indexes and FTSE. Includes index components along with broader coverage of universe"

    not sure if this still just means securities in DJ indices...
    #10     Jul 14, 2008