Distributed Computing Applications to ATSs

brannode · Oct 14, 2008

Hello,
I have been interning for nearly a year now at a large investment management shop as a CS graduate student and have since caught the investment bug. I have become very interested in TA and automated trading systems and am working on getting approved for an independent study course building a system all the way through.

My goal is to build a framework that allows for experimentation and learning. (I am aware of the many open source projects.) While I believe that I have a solid enough understanding to build a rudimentary system that allows for this, I am also interested in integrating my own work in distributed systems and making use of a cluster available to me.

From my inexperienced perspective and glances through academic and professional literature, some obvious applications are:

- Backtesting: Testing a single strategy over a very large set of historical data or testing many strategies
- Strategy evolution: Using GAs or similar to evolve strategies
- Real-time evolution of strategies: NNs (or are there other industry approaches?) that are fed in real-time to make decisions. (Not certain that this will yield anything profitable.)
- Real-time analysis: Having a large set of machines allows for more real-time processing. Would having synchronized machines acting as one be useful to you in your setup? Are you hampered at all by the overhead involved in setting up and maintaining such a thing?

That said, I was wondering if I could gather feedback, ideas/hypotheses on the integration and utility of clusters with automated trading systems. If an idea sucks, I'm glad to be informed.

altair_606 · Oct 18, 2008

Hi brannode,

it's quite interesting because I was also thinking about trading applications of distributed computing to begin to work on a project of this domain.

It's pretty sure some traders here should be interested in increasing their calculus power... so I hope we will get answers.

Anyway, don't hesitate to PM me if you want to work with me.

AltaÃÂ¯r

Tums · Oct 18, 2008

Quote from brannode:

Hello,
I have been interning for nearly a year now at a large investment management shop as a CS graduate student and have since caught the investment bug. I have become very interested in TA and automated trading systems and am working on getting approved for an independent study course building a system all the way through.

My goal is to build a framework that allows for experimentation and learning. (I am aware of the many open source projects.) While I believe that I have a solid enough understanding to build a rudimentary system that allows for this, I am also interested in integrating my own work in distributed systems and making use of a cluster available to me.

From my inexperienced perspective and glances through academic and professional literature, some obvious applications are:

- Backtesting: Testing a single strategy over a very large set of historical data or testing many strategies
- Strategy evolution: Using GAs or similar to evolve strategies
- Real-time evolution of strategies: NNs (or are there other industry approaches?) that are fed in real-time to make decisions. (Not certain that this will yield anything profitable.)
- Real-time analysis: Having a large set of machines allows for more real-time processing. Would having synchronized machines acting as one be useful to you in your setup? Are you hampered at all by the overhead involved in setting up and maintaining such a thing?

That said, I was wondering if I could gather feedback, ideas/hypotheses on the integration and utility of clusters with automated trading systems. If an idea sucks, I'm glad to be informed.
More...

If you want fault-tolerant, you need Tandem NonStop servers. (now under the HP NonStop brand).

This is the server of choice for major exchanges around the World.

brannode · Oct 18, 2008

Quote from Tums:

If you want fault-tolerant, you need Tandem NonStop servers. (now under the HP NonStop brand).

This is the server of choice for major exchanges around the World.
More...

Yes, I am well aware of their reputation and capabilities.

The point of using multiple machines/processors in this context, however, is to be able to run computations that are unwieldy/too slow on a single commodity machine.

My question is more a matter of, if you had infinite processing power, how would it affect your design strategy for an ATS?

itmediaco · Oct 18, 2008

I am using JGroups for basic clustering. Each node works on a fixed number of contracts and puts results into a Replicated Hash Map. Since every node has the same hash map in memory, there is no single point of failure. The only problem so far is a ~2 minute delay to reload analysis tasks from a crashed node...

brannode · Oct 18, 2008

Quote from itmediaco:

I am using JGroups for basic clustering. Each node works on a fixed number of contracts and puts results into a Replicated Hash Map. Since every node has the same hash map in memory, there is no single point of failure. The only problem so far is a ~2 minute delay to reload analysis tasks from a crashed node...
More...

What processing is each node responsible for?

Is the 2 minute delay due to recalculation or because the task you have allocated each node is that time-intensive?

Do you know of others working with distributed systems towards this end?

Thanks, I really appreciate hearing about this.

itmediaco · Oct 19, 2008

Quote from brannode:

What processing is each node responsible for?

Is the 2 minute delay due to recalculation or because the task you have allocated each node is that time-intensive?

Do you know of others working with distributed systems towards this end?

Thanks, I really appreciate hearing about this.
More...

Processing tasks are assigned to each node on a round-robin basis until the maximum # of tasks is reached. There is no head node -- each node listens for changes to the cluster group. My objective was load balancing, not so much failover.

The delay is because I'm using Esper for ESP, which can only persist the engine state in memory... so resuming analysis requires playing back past events from the database. It's a bit crude at this point. I am evaluating EsperHA ($8,000) that has the ability to persist to disk.

altair_606 · Oct 20, 2008

Here is an interesting paper using MPI for parallel computing: Parallel Algorithm for Real Time Decision System for Financial Markets : www.hipc.org/hipc2003/HiPC03Posters/prtd.doc