SQL for trading...?

Discussion in 'Trading Software' started by CloroxCowboy, Jul 1, 2009.

How do you feel about SQL?

  1. Good tool for trading.

    27 vote(s)
    47.4%
  2. Awful tool for trading.

    11 vote(s)
    19.3%
  3. Never tried it / no opinion.

    18 vote(s)
    31.6%
  4. Heard so many bad things I'd never want to try it.

    1 vote(s)
    1.8%
  1. I use Sql Server as part of my setup, but I see a lot of negative reactions on ET about using this or other SQL databases.

    My use is for stat-arb, and I analyze (intraday) several hundred thousand possible pairs, with thousands of datapoints per pair...but I don't trade on millisecond timescales, so I can't comment on that from personal experience. I'm happy overall with the performance for any basis longer than a few seconds though. I run my analysis in a continuous loop that takes about 3 minutes per cycle.

    So who else uses SQL? Who's tried it and hates it? And why?
     
  2. thanks for the invite. Could you elaborate a bit how you have set up a high performance data structure (including comments on the indexing) on MS SQL Server? Very curious and happy to learn. I, along my partners, coded a front end in C# which allows for access to SQL Server but have not found it of any advantage over how we currenty deal with our time series.

     
  3. auspiv

    auspiv

    you should include an "meh - its alright" option. i'll chime in with more later.
     
  4. thstart

    thstart

    I have been through this. Had to test a strategy and needed a lot of data (EOD) which is 20,212 trading days since 1928 for DJIA. MS SQL do not have regression, sliding windows, etc. so I coded in in VB.NET and used MS SQL for data storage. The algorithm generates ~1,000 columns so it turns out MS SQL 2008 can hold only 500 so I had to make two tables. To use this later you have to use a join which slows down the query a lot. The problem was the regression, sliding windows, etc. reached the limits of .NET - i cannot process more than 10,000 data points back. For 10,000 points it takes an hour of processing time.

    Then tried with Sybase Anywhere 11. It has regression, sliding windows, etc. But still all data are processed for a lot of time ~10 minutes on my server.

    Then I tested 500 stocks and while there not so much history back the processing time was prohibitive for any real use ~1 hour - e.g. at EOD I have to process the data fast enough to be able to trade next day. And there is no time for testing different strategies. My understanding is that to be able to test I have to be able to calculate all parameters as they used to be at the trading day and to have at least 10 days back and 10 days forward of calculated gain in order know when such and such factors were at effect, you have such and such gain up to 10 days after. These have to be calculated for every single trading day back in the history to be useful.

    If you consider the cost - to get the maximum performance you have to use the enterprise versions - it is a very substantial investment if you want for example to speed up the calculations using 2 CPU with Sybase - it costs two times more.

    So it turned out this is not possible to do with Conventional Off the Shelf databases (COTS).

    After these benchmarks it turns out parallel computing is the way to do it as well a specialized and highly tuned database. We are doing this for more than 2 years and our internal report shows 8-10 times speedup on benchmark load/store of 200MB generated database, load/store being the slowest operations. Computations show up to 100 times speedup.

    So basically this is our experience so far with using COTS.
     
  5. tommintj

    tommintj

    I am an Oracle guy but I havn't used SQL lately in trading analysis. SQL is best used on sets of data. I have used SQL to curve fit using least squares cyclic data.

    I could probably suggest a SQL query that would perform well if you told me:

    Table column names
    Record counts
    Table columns in your primary keys (or indexes)

    What do you want to derive.

    Suggested Tables
    Stocks(symbol,description) keyed on symbol
    Tick (symbol,timestamp,open,high,low,close,volume) keyed on symbol,timestamp
    ...
    etc
     
  6. True, regression isn't built in, but you can write custom functions to do that kind of math.

    Could you clarify what you mean by sliding windows?
     
  7. Nothing too fancy. Most of it is building indexes correctly and avoiding unnecessary duplication by using a larger number of simple tables rather than fewer complicated tables. For instance, store tick data by itself...the only columns in my price table are ticker symbol, timestamp, and price - with symbol and timestamp as the primary key (clustered index). I have a separate simple table for volume, a separate table for stock descriptions and categories, a separate table for my defined pairs, and if I used Level2 I'd have a separate table for that also. In most cases, these tables are joined using their primary keys, so it is already in time series order.

    Other things...I try to create a modular function whenever I notice myself performing the same calculation in multiple queries.

    I keep the calculations on the server-side and only fetch records and fields that are absolutely necessary to the output.

    When I set up a new function or procedure, I spend some time tuning the queries, experimenting with different types of joins, making sure that my indexes are still sufficient, etc...there are tools in Sql Server to help identify these types of bottlenecks. Once I'm happy, I store the query plan and use it whenever I call the procedure again so the server doesn't have to run the optimizer each time.

    There are also hardware issues that could be hurting performance or things to play with in the way Sql Server is configured.

    Is that useful at all? I'm not sure how much detail you were looking for.
     
  8. All depends on the design. For analytic work, de-normalize and stick with a dimensional model.
     
  9. tommintj

    tommintj

    A caution about Where clauses...if you are comparing by using a function of a key column, you toss away any index on that column which will make your query run real slow.
     
  10. thstart

    thstart

    If we work for example with EOD data,
    sliding window is when you want for example to calculate a regression on 360 days back. 360 days is your window.

    First you calculate the 360 days window regression from current day, store the result together with the current day, then slide the 360 days window back one day, calculate again the regression, get the last result, store it together with the data for one day back, etc. That way you slide the window back until you have 360 days back. Then use the calculated points to plot them, compare, etc. It turns out to be very time consuming if you decide to perform these calculations on different window sizes - 360, 240, etc. and to manage all these data. Much more difficult if you try to find relationships between different stocks. It is very very slow even in a low level assembler programming.
     
    #10     Jul 1, 2009