data base constrution

Discussion in 'Data Sets and Feeds' started by sevenlaws, Jul 1, 2010.

  1. we need some advise. we are new to black box trading and we need to know the best way and set up to store and use data for back testing.. our current program that is being built is considered a grey box but we have black box ideas for the future and want to have evrything ready so we dont need to redo anything..

    our program is being built in c#/.net. so i t needs to work with that for the grey box.. for now, but c++ for the future.

    ive been reading some of the forum and Binary, HDF5, and R are some of things im interested in. or if there are others better for this situation please add in.. also some insight on how to set the hardware (example) 2 guys in a office is the area we are working from. so it needs to fit that.

    im a trader not a programmer so i really need some help if someone would..

    thanks ps also we are looking into working with APAMA. anyone have experance or imput about that?
     
  2. you can create a black box trading program but you can't create a database?
     
  3. promagma

    promagma

  4. you can create a black box trading program but you can't create a database?




    well we have two programers that we have working on the program, connecting the fix and all of that , but we are trying to help them out with the data base. the programers are working form out of the state so we need to set up the hardware and data base here. also the programers are good but their not black bok experts so there is a lot of studying going on at the same time, by us and them.
     
  5. also had a question about excel. we can get free tick and historical data from our bloomberg as of right n ow. but we have no idea how to orginze it and have it ready for the programers.. is regular excel files organized by date good enough for now?.. we can only pull a couple days at a time of tick data.
     
  6. I sent you a PM. This thing is open-source (the CAD file anyway). I have a shop in NY State that makes the cases much cheaper than the shop they reccomend.

    http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

    Not sure what you are recording but you are going to need a lot of space. I changed out a bunch of things on the spec list - I'm using 3x 16-port backplanes that are much faster and allow for one extra HDD per row giving me 3x 16-2TB RAID6 arrays. I also upgraded the mobo and changed out the RAM, RAID cards and CPU.

    I needed to make a run of 10 cases so I'll be selling some of the others if you are interested.
     
  7. rosy2

    rosy2

    1) you need to get data. either buy it or capture it. i think its idiotic to capture it and it is not cheaper. anyway, once you have the data parse it into a common inhouse format. I highly recommend HDF5
    2) The data must be usable. Have a separate HDF5 file for each contract for each day. I have 3 topofbook, last trade, and orderbook for each instrument/day i use.
    3) language doesnt matter for this. its almost all IO. I actually think if you work with static data files in anything other than a dynamic language you're an idiot.
     
  8. ticktock

    ticktock

    Are you using daily, intra-day, or tick? The larger the data the more storage becomes an issue.
     
  9. So <i>more</i> data requires <i>more</i> storage, right? Want to make sure I'm absorbing your insights.
     
  10. I love the frequency of the word idiot in your posts.

    #1 - I captured my data and if I were to bill out programming hours, hardware and software it would still be slightly cheaper - since programming wasn't an issue and I already pay for the software my only cost was hardware which makes my cost to capture over $100k worth of tick data almost $14k.

    #2 - There are many easy open-source ways to capture tick data and it does not need to be in HDF5

    #3 - You call people idiots more than you offer solutions/suggestions. Any language really will work - use whatever you are most comfortable with.
     
    #10     Jul 3, 2010