Historical Options Data for statistical analysis

Discussion in 'Options' started by cdcaveman, Jul 19, 2012.

  1. TskTsk

    TskTsk

    Are there any remote databases you can run analysis on? I know livevol has ability to run live scans on their databases, not sure about historical data. Would be much more practical than keeping terrabytes of local data. Anyone knows?
     
    #11     Jul 20, 2012
  2. exactly.. but i'm still learning how to use excel in such a way that i can make sense of any data at all! haha..
     
    #12     Jul 20, 2012
  3. That's essentially what I do. I have (own and maintain) the massive raid array with the data and then I rent out VMs that access the data directly via a 10G ethernet LAN link. It removes the hardware from the end user's equation.

    Right now the biggest problem with going retail or individual with a service like that is trust. Every other day you see threads about people stealing ideas, etc. For us it takes a small team to support and host all the data but it's small to medium sized hedge funds accessing the data so there are no paranoid individuals in the picture.

    The other nice thing about a database that large is you don't need to worry about theft... if you want to steal or copy the data... Good luck!
     
    #13     Jul 20, 2012
  4. I would imagine you would be able to secure your data through a good web application.. alot of the major retailers like Amazon do it.. if you had apps that generated the types of information that clients are requesting.. then you could serve the data through them securely.. right? i would think a subscription to acess that much data would be very valuable.. But the problem is people that really wanna manipulate that data wouldn't wanna use the more typical ways of doing it..
     
    #14     Jul 20, 2012
  5. newwurldmn

    newwurldmn

    Through all my conversations with your through PM, I feel that trust should not be an issue when doing business with you.
     
    #15     Jul 20, 2012
  6. I think OHLC bar data would be relatively easy to serve via web app.... BUT
    I do like the idea of having a VM with access to your db's so I can query the data directly. Ideally i'd like to be able to run scheduled jobs to copy specific symbol sets at LAN speeds to db's on my VM that i can sync/replicate via VPN to my LAN. Best to do all the heavy lifting near the source data.

     
    #16     Jul 20, 2012
  7. if you allow access to your database and the guy is a big trader isn't there a liability of front running whatever trades he is modeling? or am i crazy thinking this..
     
    #17     Jul 20, 2012
  8. Well if i'm able to query and extract the data i'm interested from the big db's on the VM the heavy processing is done at 10gb / 1g lan speeds. Then i can sync just the consolidated tick files to my servers and not have any exposure.

    I seriously doubt any front running can happen... just processing and extracting data.. most that is exposed are the instruments i'm querying.

    Just a numbers game.. instead of pushing 100,000,000 messages over the internet i'm extracting and sending just the 50,000 quote events. 1/2000 the data size.

    livevol is really expensive $6000 just for 1 month of 1 minute quotes for all option symbols. Not even tick data and doesn't include trade data... or $12 / month per symbol... spy option chains are approx 160 symbols.


     
    #18     Jul 20, 2012
  9. That's not pocket change!
     
    #19     Jul 20, 2012
  10. It's not about having a web app or serving the data to the client. I don't know about others but I/we go with a central database located on the same LAN as the rented VM's so that when the user queries the data it is populated locally on your machine, almost instantly. (yes I would know what you queried but only the tickers nothing else)

    Web apps work but dumb things down a lot. If you want simple bars yes that can be pushed to a web app just as it can be pushed to your VM - but either way I still know the tickers that you requested. By renting the end-user a machine to manipulate the data that is physically located next to the database the file transfers are very fast and the additional queries are fast. By offering the raw data to the client (which is larger and yet another reason to go with VMs vs. web app) it allows them to manipulate any way possible. Serving simple web data is very limiting in most all but the basic cases.


    And... here we go with the paranoid guy yet again... Everyone is out to get you... All pocketchange said was that he would push the end user's queried data to their VM. All that means is he would know what tickers and time frame you care about. Nothing was ever said about execution or algos. We are talking about HISTORICAL data here... You get the data and you decide what to do with it. How would I front-run your request for data on out of the money SPY options?

    If only you had any idea how hard it is to manage 100+ terabytes... and here we are again with the trust factor...
     
    #20     Jul 20, 2012