Tick Database, Now Want to Run SQL

Discussion in 'Data Sets and Feeds' started by bscully27, Jun 28, 2012.

  1. DevBrian

    DevBrian

    After removing a bool check (if the background thread should keep processing) on the producer side, speed is above 10 million items per second.

    Constructor:
    _actionBlock = new ActionBlock<long>(l => { _handled++; }, new ExecutionDataflowBlockOptions() { SingleProducerConstrained = true });

    On background producer thread:
    while (true)
    {
    if (_actionBlock.InputCount <= 1000000)
    _actionBlock.Post(1);
    }

    If I add the method call back in on the consumer side, speed drops to the 5 million items per second rate.

    Academically, it's interesting to see that difference. But if I were to use this code to feed tick data to a strategy, I would need to eventually call a private method. Implementing a strategy within a lamda expression which isn't a option for us.
     
    #61     Jul 26, 2012
  2. you do not need to deal with lambda expressions. You can pass in a reference to an Action<T> block directly into the constructor of the ActionBlock. The Action<T> works in the exact same way as a private method. You can define the Action<T> in the same location as your private methods and pass it to some sort of ConfigMethod within the class in which you setup the ActionBlocks. Simple as that. By the way I would look for ways to get around BoundedCapacity because it indeed slows you down. There are ways you can slow down your producer if the consumer is the bottleneck and cannot be sped up. Or you can implement your own custom blocking mechanism and instead of, worst case, blocking at almost each iteration because the queue is full just after adding an additional element. The custom mechanism would know when the queue is, for instance half empty and signal to the source that it wants it to fill it up again. But its more boiler plate code which TPL Dataflow takes care of.

    I would recommend you to now run the thing with your actual algorithms, because most likely your bottleneck is not the passing of messages anymore.

     
    #62     Jul 26, 2012
  3. DevBrian

    DevBrian

    True, my bottleneck has always been the strategies themselves. And none of our strategies can process greater than probably 500k messages a second. Is this your experience as well?

    I ask, because it makes one question the usefulness of any data stream optimized to read faster than your fastest strategy.
     
    #63     Jul 26, 2012
  4. same issues here, but I manage to process ticks as part of a full strategy a lot faster than 500k/second. You want to look at basic optimization techniques, but first of all I would run the code through a profiler as another poster suggested.

     
    #64     Jul 27, 2012
  5. And then learnproper programming ;)

    * Get rid of all floats (Double, float), use integer arithmetic.

    * Doing that NEVER EVER do a division, always use multiply/shift, which is many times faster.

    The result is a definable result and granularity on all operations (contrary to floats where you would have to round) and a performance difference in factory - around up to factor 50. PLUS on many processsors the FPU is a much more limted ressource than the IPU ;)
     
    #65     Jul 27, 2012
  6. getting rid of all double, float variable types may not apply to all trading strategies. You could go lower and lower level and arrive at coding assembler but that would get into a very deep discussion about the pros and cons. Not everything has to run on integer values, the speedup sometimes warrants that, yes, but often times it is negligible.

    Its hard to judge what bogs down code without looking at it, thats why I recommended a profiler. I would not go as far as you as postulating to get rid of all non integer types. I run on floats and doubles and do more than fine. I guess your advice comes in handy to really tweak things in the end but I am afraid we are dealing here with much more subtle, trivial programming inefficiencies here. Just polling the queue count each time in the while loop alone is a no-go.

     
    #66     Jul 27, 2012
  7. Actually it does wonderfully apply to trading.

    Start with bars - open, high, low, close are TICKS - not arbitrary numbers. So, store and aggregate them i nticks.

    Then go fro mthere like another 6 digits (hard factor) and use integer from now on. That is a fixed factor from the orriginal bar to the expanded resolution one. But instaed of 6 digits (like 1.000.000) use a power of 2 number, so going down to the original tick resolution is a simple shift.

    You may not be able to avoid ALL divisions, but most of them.

    Obviously that doesn ot help if some overly smart developers decided to use floats all over their framework.

    It is EXTREMELY expensive to dean with any price comparison after any mathematical operation in floats because not only do you ahve to do the operation, but also have to.... round it to the proper resolution, which means you basically are wasting a LOOOOT of cycles.

    Even something like x * TickSize in float is NOT guaranteed to be equal to A * TickSize + B * TickSize - the two floats may be CLOSE, but not IDENTICAL.

    So that in ints, and they are not identical, but then I compare:

    ((A * TickSize) >>x) + ((B * TickSize)>>X).

    Funny thing is that >>X is a VERY cheap operation these days thanks to barrel shift registers. Most likely 1 tick.
    The alternative is:

    Math.Round(A*TickSize, x) + Math.Round(B*TickSize,x)

    And you do NOT want to know the cost of the Math.Round operation. It is EXPENSIVE.

    That is NOT relevant when you do real time processing, but when you do extensive backtests + optimization it may cut down processing times by a factor of 10 and more, depending how heavy the operation is ;)

    When you use modern Opterons you get the additional benefit that the 2 cores of a bulldozer module... have ONE SHARED FPU UNIT - but two separate Integer units ;)
     
    #67     Jul 27, 2012
  8. With all due respect but I think you did not read my post carefully. I never claimed that it would not apply to trading, quite the contrary, I stated clearly that it has it applications and can speed up things. I claimed that the previous poster apparently is dealing with much more trivial issues in his code that could hand him at least an order of magnitude speedup if he implemented more efficient code. One aspect is what you pointed him to but its something that can be very complicated and time consuming to implement. Please consider that most financial and math/quant libraries DO PERUSE double or float variable types, you would have to re-write ALL such libraries or write your own from scratch if you wanted to go all integer. This is in most cases not feasible nor advised. We do not need to argue about your points because yes they are correct and I think most everyone will agree with your suggestions but you need to balance that with the benefit you derive from that vs the time it takes to implement and what specific libraries the project is accessing. Blanket statements are rarely the solution. It really depends what computations the code within the strategy in question is performing.

    There are very important differences even whether you divide a float by 2 or multiply it by 0.5 in terms of precision and computational requirements. Yes they all matter but my claim is: First tackle the issues that can be adjusted and changed in a matter of minutes rather than delving right into issues that require hours if not days to carefully plan, think through and then implement.

     
    #68     Jul 27, 2012
  9. I agree. And that is where there is ONE thing only to solve it efficiently - a profiler. Plus possibly some simple mathematical optimizations. For example I know one library where a moving average is calculated by - adding the elements in it, then dividing by number of elements.

    Evers time.

    instead of adding up, remember that for next run, then remove the element falling out and add the one going in.

    For a longer moving average that is significant (50 additions instead of 1 addition, 1 substraction).

    But a profiler will show you where the problem is straight in. The one from Visual Studio (mind you, not the lower ties) is VERY good in that and ALSO can properly deal helping debugging issues with TPM.

    Now we just need Visual Studio 2012 released sooonish... their profilers are a LOT better than the one from 2010.
     
    #69     Jul 27, 2012
  10. I run VS11 beta and the profiler is awesome especially the one profiling concurrent operations. I love it.

    P.S.: By the way, if the compiler and profiler did their job, then the entire operation of floats is pipelined. So if you have a loop, for example, the entire loop will only take x cycles longer to run on the FPU rather than each loop iteration. The difference then between running on ints vs floats becomes negligible. (Even running a decent loop of 1000 elements) on floats vs ints and assuming about 5 cycle overhead on each iteration will only cost you an extra 10 or so microseconds, depending on how your CPU is clocked. This 5 cycle overhead on the whole loop will be so small that you would not even be able to measure it on Windows machines.

     
    #70     Jul 27, 2012