Holy Grail Trading Software

buzzy2 · Apr 14, 2002

I've used Matlab intensively, it has a number of limitations for really big projects, it's not evident at first, and it takes too long to explain. The biggest limitation is that there is no data-management included. But it's very useful for "small" research projects. Don't bother about using it real-time, it's not worth the trouble, even if it may look good in some small demos.

On the other hand 20,000 hours is a bit excessive. IMHO people waste a lot of time trying to come up with nice GUI's or interfaces with spreadsheets. Take t****station, looks good too, but it's completely useless to me.

Speaking of holy grail machines, it seems that pair trading, relative value strategies, etc. for equities are getting arbitraged away... see below... any opinions?

when don bright started talking about daytraders doing pairs, i saw the writing on the wall...

FINANCIAL TIMES
April 12, 2002, Friday USA Edition 2
SECTION: GLOBAL INVESTING; Pg. 27
HEADLINE: Statistical arbitrage is stung by a more efficient market
BYLINE: By ROBERT CLOW
DATELINE: NEW YORK

Fund of hedge fund investors are becoming increasingly sceptical of statistical arbitrage as a hedge fund strategy.

They point to declining equity volatility and the increased efficiency of the equity market, arising from decimalisation, as two reasons why it is increasingly difficult for statistical arbitrage hedge funds to make money.

"Lower volatility and higher active trading makes it difficult to make money," said Barry Colvin, chief investment officer of Tremont Advisers, noting that some statistical arbitrageurs, who trade less frequently, continue to produce results. Mr Colvin noted that statistical arbitrage can take several forms, but generally the strategy involves buying and selling equities as they deviate from a historical range.

For example, if IBM rose sharply in price, a statistical arbitrageur might sell the stock short while buying a number of peer companies, such as Hewlett Packard. Such a trade should allow the arbitrageur to benefit from IBM's reversion to a more reasonable price while offering protection from a more general rally in the industry.

Statistical arbitrageurs usually rely on complex mathematical models to provide them with buy or sell signals. As a result some of the best-known quantitative names in hedge funds are active in this area.

DE Shaw is one of the best-known statistical arbitrage funds. Its results are understood to have held up well over the last year, but many of its peers are already suffering from the tough environment.

"Those that have tilted towards value have actually done OK," said Mr Colvin, noting that some of these funds have a growth or value bias like traditional equity funds. "Those that have no bias whatsoever have been pretty flat."

Most relative value hedge fund strategies either rely on volatility being high, as with this strategy, convertible arbitrage and global macro economic trading, or on volatility decreasing.

Mortgage arbitrage is one example of a so-called short volatility strategy, because the mortgage traders are short on the underlying mortgage refinancing options.

Equity volatility is declining now, partially as a result of the declining volume of money being pumped into the stock market by mutual fund investors.

Arthur Samberg, head of Pequot Capital Management, has said he expects hedge funds to struggle over the next few years as a result of this declining volatility.

The recent decimalisation of the stock market, which led to stocks going from being priced in eighths, quarters and halves to ten cent increments, has meant stock prices often move in smaller increments and therefore tend to overshoot less in response to news or data.

But that kind of increased efficiency is bad news for statistical arbitrageurs, whose whole strategy is based on profiting from stock prices overshooting for brief periods either on the upside or the downside.
More...

metooxx · Apr 14, 2002

Originally posted by jaypaul

This is not a 20,000 man-hour project. It could be done by one person in under a week using the proper tools.

I would write everything in Matlab, including the GUI. The quantitative code would be concise, elegant and fast. You can do a single line of Matlab code what would take tens of lines of C code. You also get built-in numerical routines, charting and array/matrix operations which dramatically reduce the need to use "for" loops and "if" statements. Debugging becomes ten times easier because Matlab is fully functional, semi interpreted, strongly typed, and uniformly double precision in its math operations. If you use mainly array operations, there is very little performance hit versus compiled C code. If you still need loops, you can always convert Matlab code to C and compile.

Matlab is the most powerful language for data intensive math, when you consider the entire design-debug-test cycle. Yes, other languages may be faster.

Do any of you use Matlab or numerically/data intensive quantitative analysis?

-J
More...

Not to be too disrespectful, but you have no idea what you are talking about.

You could not link in 1 data feed in 6 months, let alone the rest of the problem.

We have a graveyard out back for programmers that have worked here and told us 2 weeks and were still working on a subset of the problem 2 years later.

What he described is not a hacked together implementation of other software packages; there is no edge there.

Writing a specific system is the most minor part of the problem; we can implement a new idea in a matter of minutes to a couple of hours; the data feeds, data bases, execution links is where the work and benefit is.

metooxx · Apr 14, 2002

Originally posted by buzzy2
[BOn the other hand 20,000 hours is a bit excessive. IMHO people waste a lot of time trying to come up with nice GUI's or interfaces with spreadsheets. Take t****station, looks good too, but it's completely useless to me.
[/B]
More...

20,000 hours is actually an understatement; we have 5 programmers that have been doing that for 3 years; that is conservatively 30,000 man hours, not counting the programmers that have come and gone.

No fancy GUI, no real charting, no fluff; just fast, accurate and it works.

I am not talking about software that can only do one thing, in one market.

There are over 200k instruments out there, with each type a different or multiple database structure; with lousy documentation. Spend a couple months writing to that and then find out, "Oh, we don't use that database anymore."

Now, compound that with multiple vendors and try to pick the fastest quote.

Then figure out how to clean the data; after you hit a 100 lot on a 2 day old stale quote.

Very complicated and very expensive; but it does work when you do get there.

buzzy2 · Apr 14, 2002

Originally posted by metooxx

Writing a specific system is the most minor part of the problem; we can implement a new idea in a matter of minutes to a couple of hours; the data feeds, data bases, execution links is where the work and benefit is.
More...

metooxx: not all so-called "systems" are trivial. the data part and the execution are difficult problems, but any decent programmer can solve them. the problems are basically idiosyncracies in hardware/data provider/broker interface and lack of documentation, not so much because it's an inherently difficult programming problem.

lately it's been easier than say 5 years ago though.

however, some people are using very profitable systems that have taken a lot of time to figure out. of course, there are obvious "arbitrage" and "pattern" opportunities here and there, no need to waste your time if they make enough money for you and work long enough so you can keep your job.

metooxx · Apr 14, 2002

Originally posted by buzzy2

metooxx: not all so-called "systems" are trivial. the data part and the execution are difficult problems, but any decent programmer can solve them. the problems are basically idiosyncracies in hardware/data provider/broker interface and lack of documentation, not so much because it's an inherently difficult programming problem.

More...

I am not saying they are trivial, I am saying that once you have the rest worked out you can quickly adapt to a new idea, that is the only reason to develop a system like this.

Using Tradestation, which does not work, as an example; lets say they have 100K programming hours into it, and you write a system in 40 hours that makes you millions. Your 40 hours is insignificant in relation to their 100K if you could not have made it work without them.

And yes any decent programmer can solve them; it is just identifying a group of decent programmers, the hard part; and committing the time, resources and capital to complete the project.

buzzy2 · Apr 14, 2002

Originally posted by metooxx

There are over 200k instruments out there, with each type a different or multiple database structure; with lousy documentation. Spend a couple months writing to that and then find out, "Oh, we don't use that database anymore."

Now, compound that with multiple vendors and try to pick the fastest quote.

Then figure out how to clean the data; after you hit a 100 lot on a 2 day old stale quote.

Very complicated and very expensive; but it does work when you do get there.
More...

that's right, a lot of time is spent in idiosyncracies.
i guess 30,000 hours is an accurate assessment if you have to figure out all those things and had to trash several versions and start again from scratch.

as a fun story, there is an open source C++ application out there (i won't mention its name here) you know what? it's a guy who has spent like 2 years building analysis/ data management/ execution capabilites etc. i recently look into the code and found out that it wouldn't be capable of doing certain things i am doing. not to mention there are still plenty of bugs and data type/memory management is awfully inefficient. no multithreading capabilities. if i were him i would stop trying to "fix it" and start rewriting from scratch, but like in trading, some people refuse to recognize they are wrong, cut your losses short and move on. too much ego.

metooxx · Apr 14, 2002

We started in this business with a couple of independent programmers telling me it would take 2 or 3 weeks, at most a month to complete. The first working version, a DOS scanner was not in production for 6 months. We gave about 5 independent programmers the job, they did not know that other people were doing identical work. One guy got it finished.

That is the reality of the problem.

Today, our programmers could whip that out from scratch in a week or two. Experience counts.

nitro · Apr 14, 2002

...not to mention there are still plenty of bugs and data type/memory management is awfully inefficient....
More...

If this were its only problem, then it is no big deal to get a gc (garbage collector) for C/C++. If memory allocation was inneficient, it would again be trivial to get a high speed allocator "drop in" replacement for malloc/new. There are even product that do both at once.

no multithreading capabilities.
More...

This is a bigger issue, and an extremly difficult one to get correct. For example, if I were building this system, I would write so that it was not only multi-threaded, but distributable as well.

if i were him i would stop trying to "fix it" and start rewriting from scratch, but like in trading, some people refuse to recognize they are wrong, cut your losses short and move on. too much ego.
More...

Yes - this shows your real understanding of programming technique. But IMHO, shows a lack of experience in writting complex systems. In almost every software engineering project that I was involved in, we often wrote a system that we knew we were throwing away to learn what the difficulties were. In addition, something that took great pains to do in one language, became "obsolete" as other languages, or tools, became available. The best engineers understand that all programs are special purpose languages for the domain they are "modeling." First they learn the "language" of the domain by building a prototype, then they build the special purpose language that would have made the prototype in a 1/25 the number of lines of code. The special purpose language would be designed to also be reflective (in the way that java or C# are) so that anything not handled by the language would be easily implemented (btw, this is why systems like TradeStation suck - they are not reflective.) The ability to adapt to new instruments/datafeeds, etc is also critical, as a six month advantage in coming up to speed to trade a brand new instrument can mean millions or tens of millions of dollars in profit. If you just wrote something that dealt with the domain as it exists now, this would take months to adapt. If you wrote a specialized language that you then used to write the program in, it would take a fraction of the time. The best programmers are the language designer/writers.

Given todays languages (without the luxury or skill to implement the special purpose language) there is no question in my mind that the best compromise in to write these sorts of tools a hybrid environment of programming languages. For example, I would use assembler of the most time critical parts. I would use C++ for the low level stuff that could not be done in a higher level language, but did not need the absolute speed of assembler, but could benefit from the portabilty of C++. Finally, for all non time critical and high level stuff, Java, (or C# as the compiler as it matures and if it only needed to run on Windows) would be my choice.

The poster above that suggested Matlab as the orinating language is not differentiating between the "Model" and the "View." Systems like Matlab are great for writing the view, but are not expressive enough to write the model (hence his suggestion that you interface with a C/C++ program.) It is imperative that the "simulartor" be written in such a way as to be able to accept multiple views, or my prefered term, metaphors. I know that there are many people that are very comfortable with Excel - why alienate them?

Finally, to the orignal poster - I would use your system if you designed it in one way - make it reflective. That alienates languages like C/++ and leaves Java (and possibly C#) to implement it in. If this was not an option, then make it open source so that, if I needed to implement something, I could do it with enough sweat.

nitro

easyrider · Apr 14, 2002

"when don bright started talking about daytraders doing pairs, i saw the writing on the wall... "

LOL. I've been thinking the same thing. When everybody's doing it you better start looking for another strategy.

buzzy2 · Apr 14, 2002

Originally posted by nitro

But IMHO, shows a lack of experience in writting complex systems. In almost every software engineering project that I was involved in, we often wrote a system that we knew we were throwing away to learn what the difficulties were.
nitro
More...

dear nitro:

i myself have thrown lots of programming time because of poor design. now i am wiser and i spend more time in the design/prototyping stage. i sure have nothing against throwing code because you learn things on the way, but sometimes people have to throw code because of bad choices in design/poor planning. That's an avoidable waste of time.

my philosophy is: do whatever works for you. Our platform is a mix of C and C++. I find our C++ framework easy to (re)use and don't need C# or java. Assembler we've found not worth the trouble, save for small hacks here and there and for doing naughty things .

i don't know if my framework is "complex" enough for your taste. but it's running already, it's easy to maintain/update/expand/reuse, runs in windows 2000 and solaris, i don't worry a lot about programming issues anymore...

Originally posted by nitro

If this were its only problem, then it is no big deal to get a gc (garbage collector) for C/C++. If memory allocation was inneficient, it would again be trivial to get a high speed allocator "drop in" replacement for malloc/new.
nitro
More...

IMHO this shows a lack of experience in numerical/scientific and financial programming. malloc/new are low level issues. i am more concerned about the high level stuff, how are you going to represent financial instruments, how much memory for each, how much data you keep in memory and on hard disk etc. etc.

not trivial when talking about thousands of instruments with thousands of tick by tick prices/quotes and years of historical data.