SQL for trading...?

Discussion in 'Trading Software' started by CloroxCowboy, Jul 1, 2009.

How do you feel about SQL?

  1. Good tool for trading.

    27 vote(s)
    47.4%
  2. Awful tool for trading.

    11 vote(s)
    19.3%
  3. Never tried it / no opinion.

    18 vote(s)
    31.6%
  4. Heard so many bad things I'd never want to try it.

    1 vote(s)
    1.8%
  1. I do not argue no SIMD. See, .NET also has no coffee maker.

    Sadly both of those are irrelevant for TRADING EXECUTION compared to data analysis. In execution basically:

    * You get the orders incoming.

    * Distribute them to one "subsystem" per instrument.

    * Either queue them (ordered by price - for limit/stop orders)
    * Or match them to the queued orders.

    From there, executions need to be logged (obviously) and communicated back to the "terminals" (in a technical sense: end points).

    In none of this would SIMD be used to any degree. So, nice statement - ireelevant still. With the high amount of work going on, though, the multi threaded architecture and avoidance of locks is more than paramount. There is a reason .net 4.0 indroces finally spinlocks - they are great for this scenario, and most developers are too stupid or ignorant to know the difference between a normal lock mechanism and a spinlock, so can not program one themselves ;)

    That being said, the garbage collector can be avoided (a lesson long lerned in Direct3d programming an embedded java, which incidentally has no garbage collector) by not creating garbage in the first place (structs are not garbage collected, and otherwise objects can be reused). Most developers in .NET sadly have no clue about high performance programming.

    I stay to my words: this is most likely crap programmming going on there.
     
    #51     Jul 4, 2009
  2. I am not so sure about THAT. I mean, seirously... This is how I would do it (SQL Server as data dump, updates going through a queueing mechanism to make sure short spikes do not get bogged down, keeping all relevant data in memory during normal operations - on a restart, one needs to first clean up the queue by processing, then load the memory again, but this startup time is acceptable as this services should not require restart under normal operations).

    But some developers may actually use the sql server for handling the transactions, managing the order queues (as in: process them there etc.), and that is slower than doing it in memory. I have seen shit hit the fan of so called "qualified developers" when I Was doing SQL Serve rLevel 3 support for Microsoft, with the biggest morons actually moving the compleete application INTO THE SQL SERVER (stored procedures containing .NET code, and COM objects) and the front end doing ONLY visuaization (and I mean only - even simple validation like "is that a valid credit card number" which is a checksum calculation was done in a stored procedure). And then they wondered that the db server and network were highly used.

    Sadly (so to say) this is not a line of business application that is common, so.... and I bet the team in question was not day traders of ever working in a brokerage, so they may simply lack the "idea" of what goes on. And I am not so sure how much one CAN blame MS for that... It looks like the contract partner was Accenture, and MS may have had strict guidelines on what they are looking at, recommending etc. Again, the amount of petty politics in those projects is ridiculous. "Make an architectur ereview, but dont complain the architecture is bad" is happening more than one believes.
     
    #52     Jul 4, 2009
  3. thstart

    thstart

    A good point. ;)

    But you still have to store these data somewhere right?

    They can be used. ;) I suppose the orders are first stored in the memory. The data can be packed and compressed with SIMD to move faster through the pipe.

    It is possible bot not always. I read about the application note from MS about the Event Flow Processing where it is claimed a low latency, but still the performance is not enough.

    Definitely this is the part of the problem, but still it is unbelievable MS would allow this to happen if they know how to do it. At this time this was made as a PR how good was .NET for HPC computing.
     
    #53     Jul 4, 2009
  4. How? The contract was with Accenture. I have seen this shit happening from both sides (including MS side, incidentally). Accenture was the contract partner. MS was "in the boat", but I bet they were not responsible.

    [/Quote]But you still have to store these data somewhere right?[/Quote]

    No. Well, not for processing. I can easily use SQL Server as async "data store" to store the results of orders, and handle all processing in memory. Naturally, if you manage order "lists" and matching in sql server.... single point of performance locking ;)

    Don't get ridiculous. The amount of data for one order is so low you can not sensible compress it, especially not with SIMD. If an order is larger than 32 bytes something is REALLY wrong here. and that may already be way too much (note: Order, not management like linke list header). No sense in compressing anything.

    What does an order have?

    * Account - coded, 4 byte. May include clearing firm.
    * Id / Reference. 8 byte, guid
    * Amount 4 byte
    * Stock - 2 byte
    * Price - 4 byte, stored as ticks
    * limit - 4 byte
    * order type plus duration (gtc etc.) - 1 byte

    Hm, count so far is 27

    Add a timestamp and number and you may go to 48 bytes 64 including management infos for linkes list. And quite a lot of that could be optimized (price 4 byte -that is a lot, may do with 2).

    I doubt compressigng it to move it faster through the pipe is worth anything here. And if you compress (as I do with the tick streams I store), SIMD would again not be usefull at all. SIMD is very specific use. This simply does not fit. No vectors to start with.

    Programming fuckup. Replaced with proper software that just by chance runs on Linux ;)
     
    #54     Jul 4, 2009
  5. thstart

    thstart

    It seems to me MS or the others wanted to use all their stack in this project, .NET, MS SQL, the OS. If this is just a pass through problem - just to pass the data as fast as possible they had to use the OS + some programming. But still the data have to be stored somewhere and they don't had to use MS SQL for that.

    So you worked for MS?

    The problem I see is that all these stack and stacks and stacks of application frameworks complicate the things a lot. No matter how much effort you put in learning all these technologies currently it is simply not possible for know well all of them.

    Knowing well I mean not just to read about them but to make a real working application with benchmarks, tests, etc. Recently I was involved in a project to integrate external Data Quality web services to a SAP application. When I have seen their "stack" it was simply impossible to understand quickly what they are doing. Their support made a loop of 3 weeks until made a patch and not sure if it will work. Their frameworks is GB of GBs of downloads, different versions of one application working with different versions of other application.

    At the end I proposed to our partners to make the web service interface to SAP using REST with XML response. They created their application for 3 hours with no proxy generation, just with notepad, copy/pasteand don't want to hear again for SOAP. I did not convince them to return to SOAP either ;)

    The point is - the simper solution always wins if it done right.
     
    #55     Jul 4, 2009
  6. thstart

    thstart

    So you process just 1 order at a time, serially?
     
    #56     Jul 4, 2009
  7. Yes, I was involved with MS - not as employee, though. I also have about 10 years .NET experience (which is possible - some people worked with it before it was announced).

    Now, yes, I would process one order after the next, serially.

    Here is the catch: That is how it HAS to be done, according to exchange rules. You can parallelize it, but not for the same difference.

    Market orders must be processed in the order they arrive. Stop orders turn market orders when the price is hit. This means they need to be matched one by one. Maybe you can use some peeking (match market order against other coming market orders), but you can not do that sensible in high degree paralellism.

    This means you need, in the end, to have a queue of "orders" and a queue of "watched orders" (stops, limits) for every instrument, and then have a single threaded processing system there. When an order arrived, you match it, then requeue whatever orders are to be requeued (limit orders now being market orders for example).

    There is no other way than to go serially at this point. Not under the constraints given for the market by the regulations.

    From an architecture, basically, you can have a dispatcher that splits orders by instrument, but then you are into "one process, one thread" territory per definition.

    That said, this is not anything that is bad. Properly programmed a single thread on a modern processor should not have a problem matching about 50.000 to 100.000 orders per second. Remember, I would do nothing else - once matched etc., the orders get written to an outgoing queue (in memory), then moved to a persistent queue / memory mapped lot file, THEN processed to the database and communicated to the terminals.

    The thread is doing nothing more than constantly matching orders.
     
    #57     Jul 4, 2009
  8. thstart

    thstart

    Matching involves a search right? Faster search means faster matching.
    Faster matching - faster execution.
    Do you see where this is going?

    The system under question do not reach more than 10,000 orders/sec

    This part would be slow with MS SQL.
    What they do in other exchanges?
     
    #58     Jul 4, 2009
  9. thstart

    thstart

    I don't know if this new feature will help now but 2 years ago obviously .NET was not ready for the task we are talking about.
     
    #59     Jul 4, 2009

  10. Matching does NOT Involve seraching, interesting enough. I suggest you sit down and start thinking over the problem.

    THere are a bid and an ask. Matching an order means nothing than going to the best other part (first in list) and fill against the order most appropriate (first in list AGAIN). No search involved at all. If you run out of matching orders on a price, then you jsut take the next price. Two link lists of linked lists.

    When limit orders arrive, you put them into the proper list at the end. This is a hashtable lookup (list by price) so now we are on linked lists of linked lists, and the linked lists are also in a hashtable. All that can be prepared. No "search" Invovled that requires more logic.

    The point here is mostly that:
    * "Search" is only by price (nothing else matters)
    * Prices are not arbitrarily but hava minimum step (in fact this is why NxCore encodes them in the stream by integer)

    Ergo: No "Search" involved that requires a database, indices etc. This is a simple special case, one that has been solved ages ago with the introduction of hashes.

    Searches interesting enough get only into play when one gets cancels and/or changes to orders. Cancels for the removal, changes to remove and reinsert. But then, again, there is an exchange ID, so this can AGAIN be done with a Hashtable.

    Keeping all this in memory (and even 2 years ago a 64 bit system could have quite some) give you an extreme speed here.

    Yes, to you not having through that over.

    Which means I could handle that on - hm - one mid size server with plenty of reserve. Heck, a quad core end user system should be able to handle that ;) The number of instruments and thus RAM requirements would possibly make it sensible to distribute that over multiple machines ;)

    Sensible architecture.

    This part MAY be slower under peak, but it does not have to happen in real time. One can write out the trasnactions into a memory mapped file / Message queue (MSMQ anyone?) and process those with a small delay. Note that this would not mean people dont get their fills in time, just the internal bookkeeping may run behind 50 or 100 ms or a second under stress situations. This all is a matter of scaling. THe only thing one must be sure of is transactional integrity.

    This runs, btw., into a non MS issue here. You need a proper databaser for proper accounting. The choice is not that large here. And they all do not have large differences in performance. If that part was so slow, it was due to either badly set up / scaled servers (morong admin), or bard architecture (moron architects), or stupdi programmers (moron programmer).

    What new feature?

    The CONCEPT of Spinlocks is a little oder than 2 years. That MS decides to put SpinLocks into the core framework now (including light locks that get only synched by process, and not per machine) does not mean this stuff was not usable - by a non-moron programmer - 5 years ago. Just he had to have a LITTLE more knowledge of actual PROGRAMMING on a lower level (contrary to just using framework classes).

    A Spinlock takes not even a day to implement - either in C#, or in managed C++, so it is easily usable from C#. Microsoft does not make "oh, magic" here. They just decided that stuff is used often enough to put it into the core.

    Which means, and that is what I say all the time, that the programmers where not too smart.
     
    #60     Jul 4, 2009