Automated Trading for a Living

gmst · May 2, 2013

Quote from NetTecture:

No, plus - this is a web app. I mean, not the applicaiton to write strats (that is visual studio), the control and initial analysis (just some playing around) panel for the grid system. PLus there is some seriously confidential data on it - strategy name (haha) but mostly legally tricky stuff (employee names - the guy who "owns" a job, i.e. has scheduled it). It is late here, let it run overnight, then tomorrow I make a screenshot and redact out some stuff.
More...

No worries mate - feel free to make this request a very low probability thing on your calendar. I was just curious because I run all my stuff out of retail software (MC) and am in the process of augmenting my infrastructure to use excel+vba+vb.net much more extensively to precisely tackle such kind of issues.

NetTecture · May 2, 2013

Well, we are a lot more complicated. Serious build process, all done in house with partners. Multiple installers - nodes for processing. config file as setup (so we can replace them fast), sql server for data storag (and this is large - 1000gb currently allocated for backtest data). Website for system front, command line for scheduling things, C# / Visual studio as dev environment.

The whole goal is very much reusability and replayability - a command line means you can start the same backtest on your machine. Which rocks for degugging. For example right now I see one task in an optimization failing.

The cmd line for that one is:

$R.InvokeBacktest(\"8d472ac5-1793-4229-8479-fc159d55d162\",\"STRATNNAMEREMOVED\",\"20120610\",\"20120614\",4225,4352) quit

So, I can copy/paste that into a debug build on my machine and look why it blows (because indicentally there is no error info). Before doing that I will obviously just try to restart it.

gmst · May 2, 2013

Quote from NetTecture:

Well, we are a lot more complicated. Serious build process, all done in house with partners. Multiple installers - nodes for processing. config file as setup (so we can replace them fast), sql server for data storag (and this is large - 1000gb currently allocated for backtest data). Website for system front, command line for scheduling things, C# / Visual studio as dev environment.

More...

Must say - nicely done.

QuantWizard · May 2, 2013

Quote from NetTecture:

Well, we are a lot more complicated. Serious build process, all done in house with partners. Multiple installers - nodes for processing. config file as setup (so we can replace them fast), sql server for data storag (and this is large - 1000gb currently allocated for backtest data). Website for system front, command line for scheduling things, C# / Visual studio as dev environment.

The whole goal is very much reusability and replayability - a command line means you can start the same backtest on your machine. Which rocks for degugging. For example right now I see one task in an optimization failing.

The cmd line for that one is:

$R.InvokeBacktest(\"8d472ac5-1793-4229-8479-fc159d55d162\",\"STRATNNAMEREMOVED\",\"20120610\",\"20120614\",4225,4352) quit

So, I can copy/paste that into a debug build on my machine and look why it blows (because indicentally there is no error info). Before doing that I will obviously just try to restart it.
More...

May I ask what level of performance you are currently achieving? Just trying to convince myself getting into it although I'm not sure whether the effort/investment are worthwhile...

NetTecture · May 2, 2013

Quote from QuantWizard:

May I ask what level of performance you are currently achieving? Just trying to convince myself getting into it although I'm not sure whether the effort/investment are worthwhile...
More...

Define performance?

Backtesting?

2 years strat backtest in less than 2 minutes. That is full market replay with order book. No rollover - we trade real (i.e. on switch date a second task starts trading that day).

Optimizations? Well... The first job is just finished. 4550 parameter pairs. That is a 29 month optimization - around 1.5 hours. Good thing is tat all the data now is in our data warehouse for analysis. Another 54 to go until monday (tomorrow is holiday here).

And - we will soon get another 3 years data and.... triple our performance by adding more computers with more cores. Right now that backtest was done on 54 computer cores only.

garachen · May 2, 2013

Seriously. Sounds like you have a pretty sweet setup going on. What I found funny was that no trading firm I'd ever worked at had a proper backtesting system. It's one of those things that takes competent developers. Just getting data from disk to memory efficiently takes some effort.

Seems like software that should be for sale somewhere but it just isn't.

The whole process taught me a lot about the evils of bit rot.

QuantWizard · May 2, 2013

Quote from NetTecture:

Define performance?

Backtesting?

2 years strat backtest in less than 2 minutes. That is full market replay with order book. No rollover - we trade real (i.e. on switch date a second task starts trading that day).

Optimizations? Well... The first job is just finished. 4550 parameter pairs. That is a 29 month optimization - around 1.5 hours. Good thing is tat all the data now is in our data warehouse for analysis. Another 54 to go until monday (tomorrow is holiday here).

And - we will soon get another 3 years data and.... triple our performance by adding more computers with more cores. Right now that backtest was done on 54 computer cores only.
More...

Ok I see. What are your expectations in terms of returns, volatility etc.?

NetTecture · May 2, 2013

Quote from QuantWizard:

Ok I see. What are your expectations in terms of returns, volatility etc.?
More...

That is confidential and not part of this thread. But we work (not all is finished) on a lot of statistical optimizations that no retail software ever does - like a weekly backtest that then compares sim to real trading to make sure the executions match. Crap if your assumptions on slippage are hogwash and you make a loss because of that.

But seriously, strat metrics are not topic of this thread.

vincegata · May 3, 2013

Quote from NetTecture:

And - we will soon get another 3 years data and.... triple our performance by adding more computers with more cores. Right now that backtest was done on 54 computer cores only.
More...

How do you scale to use more computers/cores for backtesting? Is it like you run a strategy one (housed in an executable) against a first set of symbols for the whole backtest period on a first core, you run the same strategy against a second set of symbols for the whole backtest period on a second core, you run a strategy two against a first set of symbols for the whole backtest period on a third core and so on. When you add more servers you just decrease a number of symbols to run against a each strategy. Something like that? By a strategy I mean it can be a task such as calculating correlation between symbols.

NetTecture · May 3, 2013

Quote from vincegata:

How do you scale to use more computers/cores for backtesting? Is it like you run a strategy one (housed in an executable) against a first set of symbols for the whole backtest period on a first core, you run the same strategy against a second set of symbols for the whole backtest period on a second core, you run a strategy two against a first set of symbols for the whole backtest period on a third core and so on. When you add more servers you just decrease a number of symbols to run against a each strategy. Something like that? By a strategy I mean it can be a task such as calculating correlation between symbols.
More...

Did you read my posts? Seems you have a problem to understand simple sentences? I did explain this.

We split the backtest into tasks - a task is always
* One week (Sunday to Saturday - we are always flat on weekends)
* Max. x combinations (128 at the moment).

The tasks come into a central queue. Every node takes tasks from the queue. Add more computers, they just take still tasks from the same queue. We are single threaded (WAY faster than multi threaded) In backtests and thus paralellize tasks - i.e. a node takes one task per core. This avoids dealing with thread synchronization in the backtest an still use the CPU 100%.

Due to the queue system I can easily scale this to a thousand computers if I need to - or more . TOTALLY classical HPC (High Performance Computing).

Every task writes the results and all relevant information into the central database for analysis.

For example we rignt how do a retest fof a lot of stuff due to - data issues (we had bad exports).

One particular example:

Optimization - XXXXX_SI, 4620 combos)

(that is a particular - name is XXXd out) strategy in silver, having 4620 parameter combinations).

It has not started yet, so:

Scheduled:4620

That is for a 29 month period. As you can see - that is 4620 tasks for the grid to take up and work on. So, in theory - I could have up to 4620 cores working on this particular optimization at the same time.

What we can not do now is genetic optimization - but we work on it with a more complex task structure and additional tasks (i.e. you get ONE task to generate tasks for generation 1, then another one to generate the tasks for the next generation).

Again, his all is totally standard - any of the supercomputers of the last generation works similar, with a HPC setup working on work orders from a central queue.