Tick Database, Now Want to Run SQL

Discussion in 'Data Sets and Feeds' started by bscully27, Jun 28, 2012.

  1. Hello,

    I just created a simple equity tick-database in MS Access and now would like to run some queries and mine the information! (note: running sql through 3rd party software, cant stand MS access gui). However, i'm running into a couple basic issues:
    1) When looking for a percent move over say 10 seconds, what is the best approach for this SQL?
    2) I couldnt find on the web any examples related to point 1, any good ref sites?
    3) Should I not pursue SQL queries but rather use software like amibroker or tradelink??

    I prefer writing my own SQL for what it's worth..

    Thanks!

    Ben
     
  2. jharmon

    jharmon

    Good luck getting MS Access to handle the massive quantity of tick data.

    Indeed most SQL solutions would struggle with such quanities of data (tens of gigabytes per day).

    Unless you really have enough time, resources and money to put into the infrastructure for such a project, stop looking at the tick data.

    This doesn't apply to highly limited sets of symbols though...

    Rather than developing your own, try an off-the-shelf product such as AmiBroker or Multicharts with an appropriate tick data feed (IQ Feed, eSignal etc.) that provides you with a small set symbols of streaming tick data.
     
  3. As you correctly pointed out, MS Access is going to struggle with large volumes so I am literally keeping this DB to one symbol. Really testing out some functionality.

    That being said, could you please re-address the questions I posted? I have some rough queries ready but am looking for some guidance.

    Thanks,

    Ben
     
  4. jharmon

    jharmon

    pretty hard to write sql for you if you don't post your schema.

    Note with tick data you won't always get ticks happening in every 10 second block so you'll have to think about that carefully.
     
  5. Occam

    Occam

    It's been quite a while since I worked with SQL, but I can imagine SQL queries for something like this would end up being very ugly, not to mention inefficient. (I only briefly attempted to use SQL for this myself, before giving up for reasons of efficiency.)

    Some firms are using "time series databases" of varying complexity and expense (there are threads here on ET on the topic). I've also seen people touting CEP (complex event processing), which also has SQL-like syntax. Finally, I think more than a few end up building everything from scratch.

    http://en.wikipedia.org/wiki/Complex_event_processing

    http://en.wikipedia.org/wiki/Time_series_database
     
  6. I second that. THe idea to run that on Access on top is only comical - even for one tick.

    Put in a proper SQL Server, two processors and a lot of either SSD or disc and MAYBE you have a chance, but access just is WAY out of the league.

    You basically say "I want to participate in a formula one race with a fiat panda" and then (Regarding your one symbol only) "but I only have to participate for 5 rounds or so". Does not matter - wrong car.

    Same with database. One symbol, unless it is lightly traded / far away month is WAY out of league for something like access.
     
  7. Occam, thank you very much for the information. By FAR the most helpful. You don't know of any CEP/TSDS threads or webpages that assist setting up personal DBs for data mining?


    jharmon, your point about random time interval ticks is exactly why i posted a message asking for assistance... no advice? I took a stab at some psuedo code but looking for suggestions.


    NetTecture, this is a basic data mining exercise. Speed is NOT an issue. This is NOT used for live trading... Do you have an opensource alternate recommendation? Would MySQL suffice or should I pursue CEP/TSDS architecture?

    Thanks,

    Ben
     
  8. jharmon

    jharmon

    Why would you perform a data mining exercise that you don't want to replicate on live data?

    Seriously, employ a programmer for a few hours to come up with your solution if that's all you seek since you seem unable to program what is effectively fairly simple programming on your own accord.
     
  9. jharmon

    jharmon

    post your schema Ben (who seems to delete his posts in between posting)

    Seriously, what you want to do is determine whether normalising your data to x second interval screws up your % change algoritms.

    This is far more important than worrying about a SQL statement, database platform or other concern.
     
  10. Famous last words. Normally followed by a blank stare and the look of a tilted brain when "not an issue" turns into "what do you mean, it takes months to execute that statement?".

    No, sorry. Data mining is a high end eandavour. There is a FREE non open sourrce version, though - Oracle ExaData is frequently used for that, comes free with the hardware. Starts around 300.000 USD for the smallest installation.

    Seriously, your problem is not only the software side, it is also the hardware. One reason Access is a sad joke is that it can not utilize a 12+ core setup, and that is what you will need.


    Given that no CEP / TSDS architecture I know of can do data MINING (which is not "test a script" but data mining as per definition - the computer finding certain elements, which requires special non open software and is FUNDAMENTALLY DIFFERENT FROM EXECUTING A SQL STATEMENT) this is a question you can only answer.

    A lot depends what you actually want, and your definition of items seems to be very different from anyone who works for example in data mining definition of something.
     
    #10     Jun 29, 2012