Application of my fault-tolerant real-time technology

nbates · Jul 2, 2005

IMO - Low level scrubbing and monitoring multiple real-time market data sources (with feed quality monitoring, steering and selection) is best done using an "intelligent peripheral" adapter (e.g. an embedded board with a bit of custom software/firmware that has multiple NIC interfaces).

Not a big deal, just the matter of deciding what you want and putting it together.

An embedded Kernel like vxWorks and the Green Hills or VX software development tools are a good combination, hosted on a good PCI-SBC with low-latency interrupts and support for block-mode DMA transfers.

Here's an example:

http://www.cyclone.com/products/index.php#pci
http://www.cyclone.com/products/tornado.php
http://www.cyclone.com/products/windows.php

Bsulli · Jul 2, 2005

For a moment I want to look at this from a business model point of view. I'm going to peck out a portion of what one needs to ask themselves. I could spend the weekend typing out additional questions in my head but hey, it's a holiday weekend. lol

Firstly I admire your determination to provide solutions to problems, my hats off to you! ET can use more folks with your attitude.

I'm going to try and consolidate several questions you posed and were posed or suggest back in this thread.

The first few things to identify are these solutions for 1)retail 2) brokers 3)institutions 4) some combo of the first three 5)or all of the first 3. Each one can and does have it's own unique set of challenges.

Next set of items to identify

Data: Exchanges versus Resellers(i.e. Esignal, Qcharts, etc.) Next are you as the customer the end point, mid point or starting point in the chain

Brokers: Retail, Clearing Firm, Self Clearing Firm, others

1. Routing among multiple brokers from an end point.
Problem is the instrument being traded is held in "Street Name" by the broker so if I have an IB account and purchased an emini contract and IB goes down I can't route it to another broker to sell. The order from my platform would be telling the second broker I want to sell a position to close. The second broker wouldn't recongize the order to close a long and would ask me do I want to sell to open a position(go short) because the back office risk management system wouldn't have a record of the original long position, only the first broker risk management system would have a record of the long position.

2. Tick data scrubbing( for lack of a better more emcompassing descript)

As far as the quality of the data it's only as good as the exchange produces in house(true exchange data is only single source). As for comparsion among data resellers in the area of the quality of the tick data you can compare them as a service and have a product to offer for sale(thinking report/newsletter form). However measuring the resellers latency is a whole another can of worms because of where you locate your server farm in geographic comparsion to where the resellers farms are located. Then in addition to measuring the resellers latency your forced into measuring the carriers added latency.

If This service was offered there is a market segment that needs quality over latency. As for offering the resellers scrubbed data for resell would a whole other matter from a contractual standpoint.

If you did a report for data resellers similar to what Gomez Advisors does for Broker reports that certainly could possibly change some of the data resellers to straighten up there acts! That alone would be doing the trading community a great service!!!!!

Again these are only a very few things to think about.

Keep pounding away at it! Look where it got Edison.

Good luck and good trading.

Bsulli

Joel Reymont · Jul 7, 2005

I decided to give it a go. I'm building a new ATS development platform on top of Erlang. I will focus on real-time trading systems and tick data. The core of the system will be open source but I'll make money on data feed handlers and add-ons.

This is a bit of a research project so any suggestions and critique is welcome. I will also eat my own dogfood and use the platform for trading.

There are several features in Erlang that make this the ideal platform for processing and analyzing high volumes of real-time data. Please take a look at the following links:

http://www.wallstreetandtech.com/showArticle.jhtml?articleID=164903661

http://db.riskwaters.com/public/showPage.html?page=printer_friendly&print=195098

http://www.dbta.com/in-depth/mar05/rugg-palmer.html

This is just the type of work that can be cheaply done with Erlang and I'm surprised that no one has thought of it before.

Data feed adapters can be easily coded with Erlang and it comes with a high-performance database that makes storing tick and instrument data easy. I will initially focus on tick data as this should make my job easier. I won't have to deal with futures rollovers, etc.

Building grids and clusters is also simple as Erlang is all about large numbers of very lightweight distributed processes running on a network of "nodes". Processes don't need to be aware of where on the network other processes are running, they just message each other.

You should build yourself a server farm with real-time data feeds from various sources all going into the same database. I will enable this. And of course there's that 99.999% uptime.

The RSS feed is in http://wagerlabs.com/uptick/atom.xml or you can omit the atom.xml bit to just read the blog.

I have basically waited for this for the past 10 years (see http://wagerlabs.com/resume.pdf) and now all the pieces of the puzzle have fallen into place. Please let me know what you think!

Joel Reymont · Jul 11, 2005

I'm looking for co-conspirators, traders who would tell me what they want and how they want it. Within limits that I set of course. Currently these limits are tick-based trading of futures. Want to contribute? Let me know!

I am here to push the limits. I don't want to develop a me-too trading platform, I think there are large numbers of very good platforms out there and I don't have to name them.

There's an explosion of tick data on Wall St, new instruments, etc. My technology shines when thinks can go parallel and concurrent, when soft real-time response is required.

I want to maintain a laser-sharp focus on a small niche that I can attempt to dominate. I want a unique selling proposition and that is tick-by-tick trading of futures.

I fully intend to make the Uptick core open source, specifically so that people can iron bugs collectively and have faith in the platform. This should help the platform spread and become a standard in the high-frequency trading niche. Then I can sell money-making add-ons built on top of the platform.

I see a nail and I've got just the right hammer. I want it to be the best hammer for this particular nail and I absolutely do not want to develop a universal tool. I will let other people cover that angle

What do you think?

Thanks, Joel

--
http://wagerlabs.com/uptick

tickdbguy · Jul 12, 2005

I have already done what you describe in Erlang. The Erlang grammar is ideal for expressing such things.

One challenge, however, is that Erlang does not have what I would describe as "Industrial Strength" debugging capabilities. So I suggest you use a test-driven approach to develop your system.

It turns out that Erlang, being a rather dynamic language, is well-suited for test-driven development.

Additionally, Erlangs bit-syntax is ideal for low-level data feed parsing.

Erlang use is growing. I'd be happy to field additional questions about the technology and its application in the financial trading systems space.

One caution, though: Don't assume that because it is easy to express fault tolerance in Erlang that it is still easy to achieve. Proving you have attained a certain degree of robust fault tolerance is not easy, even using Erlang. Although I think Erlang is the best for distributed messaging on systems of less than 256 nodes there is still quite a lot of work to do.

I also conjecture that Erlang may be a substitute for TIBCO and other similar systems for networks of less than 256 nodes. It is quite flexible and I've already built a subject-based order routing system with it with smallish numbers of lines of code.

Erlang rocks in this application space. No doubt about it.

Good luck.

enewhuis@gmail.com

tickdbguy · Jul 12, 2005

The other thing I forgot to mention is Erlang's pure functional foundation. Other than the built-in "ETS", which is like a tabular data store, and the process dictionary, which was a mistake in the design of Erlang, your "data structures" cannot be modified.

You must make full copies of your data structures to update them.

This poses an interesting code optimization challenge and the contradiction--how do I modify process state and have stateless post-critical-defect restartable processes--is at the heart of fault tolerance and distributed systems design.

Theoretically, in Erlang, the smallest unit of data state is the process (more like a thread in traditional programming).

Oh one more problem...

Erlang threads are "simulated" by the Erlang rutime system and, as such, cannot be scheduled to execute among parallel processors.

Of course all the Erlang researchers know all these limitations and are always coming up with clever ways to deal with these limitations and I do not doubt that several Erlangers have proposed native thread versions of the Erlang runtime.

Joel Reymont · Jul 12, 2005

Quote from tickdbguy:

You must make full copies of your data structures to update them.

This poses an interesting code optimization challenge and the contradiction--how do I modify process state and have stateless post-critical-defect restartable processes--is at the heart of fault tolerance and distributed systems design.
More...

I solved this by using the OTP (Open Telecom Platform) behaviors when building my poker software. Processes can always save their state to Mnesia (the Erlang real-time distributed db) so that they can be restarted on another node and their state restored.

Erlang threads are "simulated" by the Erlang rutime system and, as such, cannot be scheduled to execute among parallel processors.

Of course all the Erlang researchers know all these limitations and are always coming up with clever ways to deal with these limitations and I do not doubt that several Erlangers have proposed native thread versions of the Erlang runtime. [/B]
More...

Right. While I'm waiting for multi-processor versions of the Erlang runtime I can start as many nodes on the same machine as I have processors.

A node is basically a virtual machine that runs Erlang byte code or native code if compiled with HiPE. You can run one node per box or you can run many.

Joel

ktmexc20 · Jul 18, 2005

Joel, I posted this on the erlang forum re the thread you started. Thanks for introducing erlang.

Quote from me...

My first impression is that Erlang seems to be a dream for RT, DC-HPC,
so I want to thank the developers for it's utility and it's
OpenSource.

The discussion here about efficiency for large data set retrieval and
number crunching, led me to want to share my favorite libraries of
choice. I'm a Python aficionado because of it's remarkable scripting
ease, and it's near transparency over "C" libraries (amongst other
reasons)... Here are some excellent packages for efficiency and easy
porting:

For DB, Pytables is a thin wrapper over the NCSA's HDF5 library:
http://pytables.sourceforge.net/html/WelcomePage.html
http://hdf.ncsa.uiuc.edu/HDF5/

For Number Crunching I use the STCI's Numarray package, as well as
Numeric/Scipy.
http://www.stsci.edu/resources/software_hardware/numarray

Pytables is quite amazing and also built with Numarray/Numeric so is
therefore nearly seamless for retrieval and then crunching.

The desires of all the situations mentioned in this thread, regarding
DB's and Number crunching, are contained very well in these packages.
More...

Joel Reymont · Jul 18, 2005

You are very welcome. I started an open source project to build a trading backend on top of Erlang. Take a look at http://wagerlabs.com/uptick

ktmexc20 · Jul 18, 2005

One nice thing to mention about pytables/hdf, is that the file itself is not opened into memory but only an active pointer.

Same goes for the hierarchial groups/tables/cols/rows// single arrays or meta data within the file. Components can then be efficiently sliced/diced/iterated in "C", and finally returned as an active object.