Tick Database, Now Want to Run SQL

Discussion in 'Data Sets and Feeds' started by bscully27, Jun 28, 2012.

  1. Ok let me back up SEVERAL STEPS. I mis-used the word data mining.

    My project is not the truest sense of the phrase but rather simply running some queries on tick data to find particular events like large spikes, slow drifting, and any other point of curiosity.

    Once I have a solid footing with the SQL, I will look to find stock consolidation, channels, etc.

    The next phase will be to overlay a news feed to determine what articles moved my one stock vs not.

    Another use of my database will be tweaking filter-criteria which is based on price move, volume size, and timeframe. Currently I have a logically constructed setup with live exchange data to find special situations in the market and make plays. But i'm looking to use my database OFFLINE to see how effective my filters are...

    So again, to re-emphasize, not data mining but rather me running many queries for various goals (stated above).
     
    #11     Jun 29, 2012
  2. Big problem already.

    > Once I have a solid footing with the SQL, I will look to find
    > stock consolidation, channels, etc

    Forget doing stuff like that in SQL. SQL is set based - the moment you have to run your stuff along in a loop, it is by far NOT the optimal solution.

    I would go with a CPE / custom programming model here. Move your tick data into a simple binary coded file (like 8 bytes per event) then you can run it along very efficiently. Use a CPE model - with a PROPER programming langauge. C# or something easy and fast to write.

    Becuase seriously, any type of time series analysis in SQL is an exercise in masochism. SQL simply is not done for that - it is set based.
     
    #12     Jun 29, 2012
  3. Take a look at implementing sqlite inmemory with a disk based historic consolidated archive. You can very quickly load historic record sets for near real time analysis.

    thecommon.net has some examples worth taking a look. ie. metatrader/northwind db loaded in memory feeding excel etc.
     
    #13     Jun 29, 2012
  4. Jaybee

    Jaybee

    Hi Scully,

    I'm not into trading (yet) but I am a SQL DBA looking to get into trading. If you want to send me your schema, anticipated data streams and what kind of queries you need to run against the data, I will help you (for free, as this is something I want to set up for myself).

    I would just ask in return you help me with a few very basic questions about trading Stateside (I'm a Brit).

    Cheers,


    Jaybee
     
    #14     Jun 29, 2012
  5. I might add, that to get started, the OP could use ... ahem ... VBA.

    Select * from myTicks

    Then loop through the recordset, picking out "features".

    Got to start somewhere...
     
    #15     Jun 29, 2012
  6. sle

    sle

    Why not use KDB? 32 bit version is free. That's what man grownup firms use :)
     
    #16     Jun 30, 2012
  7. I think you are blowing things out of proportion here. Help this dude if you have anything to contribute, I am not sure your negativism adds any value. Sorry, but I cannot stand those know-it-all who only say "cannot be done", "you are too naive",... but do not even add a single source of evidence why it cannot be done. Yes it CAN be done, and yes you need to have a willingness to get your hands dirty but it can be done. I programmed my own binary data store and stream pure tick fx data at a rate of almost 5 million quotes a second to my strategy container and OMS, PMS, Risk systems, all coded up myself. So, please do not tell others it cannot be done under 300,000 USD (was that you or someone else?)

    Ben, true MS Access does not get you anywhere. Here are couple keywords you want to google and familiarize yourself with before asking more questions:

    HDF5, KX+/KDB, BerkeleyDB, Column based Databases, MonetDB, Redis, HBase, InfiniDB, RavenDB, Cassandra, binary file data store, Esper/Nesper.

    Some of the above are very pricey solutions with steep learning curve, others are open source and simple to learn to use over couple hours or weekend.

     
    #17     Jun 30, 2012
  8. what a bollocks, several hedge funds who use .Net (F# or C#) very successfully use Linq queries to "data mine", "aggregate", "sort", ... tick based data. In fact sort algorithms in Linq/Plinq in .Net 4.0 are so incredibly efficient you will not be able to speed things up by a whole lot with a pure C++ merge/sort algo.

    Same with SQL. Have you ever run SQLite with SQL like syntax? Can you tell me how much faster it is to run a custom query on a hdf5 db over such sql query? Not much according to my own profiling exercise. Point is, everything is relative. Just debunking SQL does not show much practical experience. Same as saying KDB is the best and there is nothing faster out there (yes there is, I bet I can aggregate tick data into time based bins faster using my own self-written accessors to binary data store than KDB could ever do).

     
    #18     Jun 30, 2012
  9. KDB is not the best per se, its the best if you dont care about money nor men-hours. the 32 bit version is NOT free, nothing is free in life. It time-outs after 2hours and deactivates after a certain period of time.

     
    #19     Jun 30, 2012
  10. Yep .. KDB / q & VBA = man-grownup.

    * I think the free version might boot you out every 4 hours.

    I fiddled about talking to KDB from Java at one point, for algo analytics. Not the slowest part of the bank's infrastructure.
     
    #20     Jun 30, 2012