database selection......

Discussion in 'Programming' started by cdcaveman, Jul 29, 2012.

  1. Ok... I've been doing some research trying to figure out the best solution for my situation...
    Currently i'm trading with Interactive brokers.. I trade mainly options.. I have bought Hoadleys excel fiancial add in and started messing around with it.. I've got some experience with programming but less and less as you go down the abstraction level. Meaning i know web application languages... I've taking a class in visual basic and didn't complete it.. And i'm not scared to dig in and learn new things... That being said
    I've been thinking about scraping bid ask last and volume for options of stocks that i follow and insert them into a db.. Does anyone have experience with this?
     
  2. Just to add to what i'm trying to do .. I'm going to use web applications as well. . So i was figuring on mysql. . I found drivers for mysql to excel so i c an still use my spread sheets and yet still serve the data out to my web apps
     
  3. Seems to me like a good solution if you can get it working with minimum effort that way.

    You might take a look into other database types, but it might not be worth the hassle. To me the relational aspect of sql doesn't make much sense when working with tick data. It also makes databases both more complicated and slower to use than some alternatives.

    You can look into key-value stores. In essence they are simple and speedy, e.g. Redis.

    CouchDB is document store (which features you might not use), might be preferable over Mysql because it has a JSON interface. But that all depends on what web interfaces you use.
     
  4. relational tables just aim to reduce duplicates... IE if you have a binary imput like put or call.. you can turn that into a boolean value of 0 or 1 and have the definition in another table that references it.. once you start getting into huge amounts of data this stuff becomes critical.. i'm no database administrator.. but relational tables do start to make sense quickly when data starts to accumulate.. but realistically one would have to ask.. how much can some of this data can be compressed using relational tables? the quote of database fields below is from live vol website.. alot of these fields make sense to store.. but as you can see not alot of them could be compressed by relational tables.. so the need for it in this case is negligible.

    Field List:
    Option Quotes

    Time, Root, Expiration, Strike, OptionType, Open, High, Low, Close, TradeVolume, BidSize, BestBid, AskSize, BestAsk, Underlying Bid, Underlying Ask, {Regional Exchange: BidSize, Bid, AskSize Ask} x[# of exchanges]
    Option Calculations

    Time, Root, Expiration, Strike, OptionType, Open, High, Low,Close, TradeVolume, BidSize, BestBid, AskSize, BestAsk, Underlying Bid, Underlying Ask, Implied Underlying Price, Active Underlying Price, Implied Volatility, Delta, Gamma, Theta, Vega, Rho
    Option Trades

    Time, SequenceNumber, Root, Expiration, Strike, OptionType, Exchange ID, TradeSize, TradePrice, TradeConditionID, CanceledTradeConditionID,BestBid, BestAsk, UnderlyingBid,UnderlyingAsk, {Regional Exchange: BidSize, Bid, AskSize Ask} x[# of exchanges]


    PS one cool feature i've thought would be interesting to replicate of the live vol software is
    Trades and Quotes (TAQ)

    Livevol® also offers the complete recorded history of equity and options tick data including an API to simulate real-time playback. Ask the Livevol® team for additional information.
     
  5. 2rosy

    2rosy

    simplest is to use sqlite (no admin headache). :cool:
     
  6. ok .. there is some good to Mysql because any web application will already come prebuilt with a mysql db.. meaning i wouldn't have to add a datasource and connection to the application to query the inserted options data... its just automatically available.. what do you mean by admin headache? i wouldn't be hosting the db myself.. it would be on a hosting account.
     
  7. Jay_Ap

    Jay_Ap

    I highly recommend Mysql as it is easy to work with, stable, and available on just about all hosted environments. It even has a pretty decent query browser tool that allows you to query data and view and modify it in an Excel like grid. You can also dump the queried data out as csv files.

    One word of caution, you need to really think about table structure and the storage engine you will choose (i.e. innodb, myisam, etc). This will depend on how much data you are going to suck into the database.

    I use Mysql with my production trading system and it works remarkably well.
     
  8. thats cool.. i use innob with a ecommerce site i own... its funny how you learn things.. i didn't know what engine i was using untill my hosting account told me they don't support innob and that they don't back them up.. haha but you know you learn things when you have to actually fix things instead of reverting back to old instances i guess :) i have messed around with mysql quite a bit.. Zend framework php etc.. started out in coldfusion etc.. now is there anything in the data that could use the benefit of relational tables? i mean theoretcially a p and c for calls takes up more space then a boolean value.. did you consider all that? sort of doesn't make sense cause that seems to be the only place you could crunch things.. i was just thinking of increasing the specificity of what the db gets so that it can not just grow as quickly as it would in just one table.. i'm imagining you guys are just dumping it into one table right? its only when you get odd sales and errors on data that you need some cleansing which is what the bought data from like tick data is suppose to have done to it.. I just can't see buying data that my broker already gives to me.. its ALOT of money considering the amount of money i'm trading to buy a few years of a few symbols of options chains
     
  9. Jay_Ap

    Jay_Ap

    I would recommend splitting the data across a few relational tables just so it doesn't grow super huge. So, descriptive info like ticker should be put in a separate table with a autoincrementing primary key. This key number is then used as the foreign key ID in the table holding the price time series data.

    Sometimes it's a tradeoff between splitting the data too much (i.e. normalizing) such that it's a pain in the ass to query across tables, versus just dumping all the data in one table so it's easy to work with. If you just want to do ad-hoc analysis, you may just want to dump it all in one table. If you want to do something more systematic, I'd go with normalizing the data.
     
  10. i completely understand foreign keys and their related complexities in quering db's. ... its just one of those things i've learned.. you say. . "oh no big deal for now" and then later you find out you should have jumped the complexity hurdle early on :) i haven't even started doing this... i just was reading some of Augen's books and messing around with excel and the sheet twsdde from interactive brokers... and a hoadley finance add in. .. do you clean the data at all? for example i traded a strangle on weekly opex for aapl and then i reversed into a bull put credit spread after for a decent rip on both... and i noticed that the otm 515 put that expired worthless posted a. 92 cent offer at expiration when it had been trading less then half that for the last hour of trading... i'm sure you have found some not so realevant data that you trashed or at least extracted to examine for reasons later... either way i appreciate your feedback.. i know our doing this at a much higer level then i am... so your feed backc helps
     
    #10     Aug 3, 2012