Tick Database, Now Want to Run SQL

Discussion in 'Data Sets and Feeds' started by bscully27, Jun 28, 2012.

  1. Key here is CONCURRENCY. If you target .Net let me know I could give you couple pointers. I run on the new TPL Dataflow library which is blazing fast and works very well despite it still being in beta. Its an amazing new library and really expands where Rx left off. Also, make sure you store data in binary format, most often your bottleneck is i/o. And do not read byte by byte or small chunks of byte arrays but read a significant chunk as byte array and process internally, minimizing I/O overhead.

     
    #51     Jul 25, 2012
  2. its a huge miss perception that concurrency automatically means locks and that spinning off tasks, threads, or dataflow blocks automatically adds overhead, if its done in the right way. There are plenty ways to use concurrent collections with minimal locking.

     
    #52     Jul 25, 2012
  3. DevBrian

    DevBrian

    I believe my bottleneck is just the call stack.

    I'm currently using my own implementation of a consumer producer pattern. It succeeds in increasing the end to end throughput of the system, but it has limits as well. I've looked at the dataflow library, it's fast.

    But...

    Doing :

    while (true)
    {
    if (_bufferBlock.Count < 10000000)
    _bufferBlock.Post(1);
    }


    It handles about 1 million messages a second doing:

    ActionBlock actionBlock = new ActionBlock<long>( l => { _handled++; });

    But below only handles 400-500k messages per second:

    ActionBlock actionBlock = new ActionBlock<long>(
    l => { actionBlock_Handle(l); });

    private void actionBlock_Handle(long l)
    {
    _handled++;
    }

    The difference seems to be just the call stack. And in any real world application will have some private method to be called. So even with TPL, I'm still looking at the 400-500k max bottle neck.
     
    #53     Jul 25, 2012
  4. Want to believe? Go to church.

    Want to fix the problem?

    Pull out a profiler and KNOW where you waste the performance. Simple like that.

    It is likely a little more complex than you think, but a profiler will just pinpoint the issue within a minute or two ;)
     
    #54     Jul 25, 2012
  5. you are wasting a lot of precious resources:

    a) you do not need a bufferblock. Just post or better sendAsync directly to an actionblock (if endpoint) or transformblock. But even that is not the intended use. Only the producer should post or sendasync all other datablocks should be linked to each other through LinkTo.

    b) calling count each time is a waste. You can set the capacity of dataflow internal queues directly and have the datablock block once the capacity is reached. Before commenting on your "handle counter", how do you feed the action block with items? Did you link the bufferblock to the actionblock? (see a)) post directly to the actionblock. I tried it just myself and get to 24 million messages per second on a 1 year old quad core machine. (just incrementing the counter)

    c) In general do not configure the action block or any datablock with a method call but rather an Action<T>.

    As mentioned if you do things right you should get to at least 22million messages per second on an high spec machine, given you just measure the passing and receipt of messages (no processing included other than the counter increments).

     
    #55     Jul 25, 2012
  6. good point, forgot to mention that.

     
    #56     Jul 25, 2012
  7. DevBrian

    DevBrian

    Thanks! I was under the impression I needed a BufferBlock for async behavior. Throughput is about 5 million messages per second by cutting that out.

    Not sure why, but setting the BoundedCapacity of the ActionBlock significantly hurts the performance. Checking InputCount before calling Post is performing much better... Seems odd.
     
    #57     Jul 25, 2012
  8. It makes sense to observe this about BoundedCapacity, because it most likely runs other checks on the internal queue which may add overhead. Your numbers still sound low. I ran the same test on my very old 2core laptop and still get to 11million items per second. Have you followed my advice to setup an Action<T> instead of calling a method each time? Or do you observe the 5mil messages/sec while incrementing the counter directly within the actionblock?


     
    #58     Jul 26, 2012
  9. Hadoop
     
    #59     Jul 26, 2012
  10. what about? I fail to see the relationship with tick databases.

     
    #60     Jul 26, 2012