Genetic programming

drm7 · Dec 28, 2012

There are two commercial GP engines out there that I have found:

Trading System Lab (TSL), which costs $60,000, and

Adaptrade, which costs about $1,200.

I have no idea if either one "works." There is extensive discussion out there on NuclearPhynance, Wilmott, etc., regarding TSL, without any definite conclusions.

However...TSL actually runs a GP engine called Disciplus, which costs about $200. Those with programming experience may be able to work with it.

(I have no affiliation with any of these vendors, and am very skeptical of GP, GA, neural networks, or any kind of machine learning driven trading.)

jack hershey · Dec 28, 2012

Quote from slacker:

That is a good application for GAs, called a 'classifier' role. Train bots to identify uptrend, downtrend and chop.

Rosy spoke of "state" and you think in terms of the above. Both sound like making money by extracting the full offer of the market. I'll answer your Q's below in that light.

Problems with GAs, what does the chromosome look like?

Trend is the chromosome. Most do not know what trends look like and what their parts are.

An array of bytes or a b-tree structure?

Neither. I think this is most people's stumbling block. They are not open to the correct realization.

How do you splice 'crossover' the b-tree at evolution time?

Markets are not evolving. BUT you think, and others like you, think they are. Certainly, information flows in structures differently, BUT basically, and in detail, markets have two sides, operators and regulators. To participate, you need money or instruments. Wall Street makes up new instruments and plugs them into kinds of markets; those with capital get handled by the operators and the instrument makers.

How to avoid overfitting with a GA?

Overfitting is a myth and is exclusively associated with applying improper maths. By using the correct math and applying it to the real system wherethe informations originates, then you can get beyond the "state" and its characterizations that are too rough to have statistical significance.

If your chromosome is large enough to solve the problem it is large enough to 'remember' key positive or negative specific in your sample data....

Ah yes ....... the examination is of the trend forming and it has descrete and significant items. They are named and their calss is called "events". There is no thing called "sample". there are only facts from the flow or process of information being outputted in a labelled order. It is, of course, filtered and weighed.

Where can you get enough sample data to completely test insample/outsample?

This is the simplest question. BUT, constraining the thinking to improper and unfounded regions has no significant yield. The stepping stones are uncomplex. As a result there are 10 price items and 11 volume items which make up the "continuation" portion of a trend. Trends have two ends and they collectively have 35 items.

All items are in one class: events.

How do you insure that a system that is 'random number' based (evolution) is bug free?

Systems for the operation of markets are extremely limited. The cause is singular: variable granularity. You are not thinking in terms of the proper mathematics. Change this error of yours. Buy some Legos; they will teach you what Sister Montessori taught with rational numeration systems.

What is the 'fitness' value in a trading system you are going to evolve the system to find?

100% if you do "round" and do not use irrational numbers for variable prices.

Net profit? (too risky, large drawdowns but big profits)

Pragmatically, you take the full offer of the market less premium. Premium means the cost to participate whatever you may have to pay. If you are human, you use 10 to 100millisecond observations. If you extend the human limitations with tools, the full offer is available always in liquid markets.

Sharpe ratio?

On slower fractals the tested annualized result in unsleective universes is 60 plus (third party testing)

Develop your system and then put in on cheap GPU cards for parallel processing in CUDA.

GAs are much more fun than NN as you can see them learn. NN are more for predicting,

This is an error in design, Predicting is a handicap in a rough system.

GA are more about responding to events.

Another error of design. It is never necessary to be this inefficient and ineffective since the Order Of Events is known before the event occurs.

Good trading...
More...

Your Q's are typical of an outsider to SA. I cannot understand why and how you missed that variables have granularity.

Rev. Thomas got it wrong. Keynes and Carnap got it correct.

Were I you, I would read up on how computers got invented and then used. Reeves did not make the market (10 revolution reostats) but TJ did (punched cards finite maths with granularity.)

Look at a penny... It is not made with probability. 2.4 cents are spent to make it.

dtrader98 · Dec 28, 2012

Quote from syswizard:

GA has never been proven to work for building trading systems that work well over time. They tend to "curve fit" the backtested data.
I don't think it's possible to program GA not to curve-fit.
More...

Quote from 2rosy:

what about using GA to determine what state a market is in ie. sideways, uptrend, downtrend
More...

There are several well-known methods available to avoid overfit; regularization and introducing complexity penalties into the fitness function come to mind.

Regarding market states, nobody seems to discuss much how to even define the states rather than learn them. A much better discussion might be geared towards what qualifies as a state (side, up, down) and what factors are good attributes that contribute to that state identification.

All the learning should come after these features are identified, not after.

Lastly, for the thread; there are major differences between GP and GA. They seem to be thrown around interchangeably here.

2rosy · Dec 28, 2012

Quote from dtrader98:

Regarding market states, nobody seems to discuss much how to even define the states rather than learn them. A much better discussion might be geared towards what qualifies as a state (side, up, down) and what factors are good attributes that contribute to that state identification.

More...

ok. this is off topic. How would I define a state? Can I use some type of clustering to find them? I have looked at hidden markov chains (and everything else) but really dont know what I am looking at.

I just want to be able to change parameters given a market condition.

jack hershey · Dec 28, 2012

Quote from 2rosy:

ok. this is off topic. How would I define a state? Can I use some type of clustering to find them? I have looked at hidden markov chains (and everything else) but really dont know what I am looking at.

I just want to be able to change parameters given a market condition.
More...

How a market is examined is determined by the market condition as rosy says. Market condition allows an ATS to go to and magnify the smallest of nuances that guide the ATS in taking the full offer of the market.

Any modern platform roughs out the "state" in terms of a given fractal's trend. I have never seen non trending, which here in ET, may be referred to in an assortment of terms. All trends have two periods of overlap. The overlap is found at each end.

Most capital in the financial industry is "in the market" all of the time. For professionals, during this time it is making money. Maaking money is done through price change over a sequence of events.

dtrader98 · Dec 28, 2012

Quote from 2rosy:

ok. this is off topic. How would I define a state? Can I use some type of clustering to find them? I have looked at hidden markov chains (and everything else) but really dont know what I am looking at.

I just want to be able to change parameters given a market condition.
More...

The same problem persists when discussing things like markov chains. Plenty of people have a keen interest in hidden markov models, without understanding that something like a visible markov model might be far better suited to solving their needs (usually because they want to feed garbage into a black box model, without having taken any time to understand what the model does or why it might be useful -- or not).

One easy way to think of a state is as a language symbol (A, B, C...etc.). We can make a lot of probability based measures around these states. For example, the conditional probability of Z following K, should intuitively be lower than H following T, but we can also quantitatively measure this through formal statistical measures (contingency tables, chi-squared tests, etc..). Once we have some idea of how to work with states, we need to define what the states represent.

When discussing market states, we need to really quantify what a state represents and what variables and co-variates would help to identify the state.
Someone could use something like a kaufmann efficiency ratio or hurst exponent as a measure of the state's properties. In the hurst case, maybe an asset's time series hurst exponent of greater than 0.8 could qualify as a trend, while less than .3 might qualify as a reversion market (something in-between could be classified as neutral). Once we nail down --quantitatively-- the measure of the states, we can move back into the statistical realm as described above and look for co-variates that might better assist in forecsting the next state (a useful attribute I've found for example, could be the VIX level).

As you mentioned, another way to identify a state is via clustering; an example using candlesticks is here:
http://intelligenttradingtech.blogspot.com/2010/06/quantitative-candlestick-pattern.html

The idea is to ultimately define the states quantitatively, and then process them at a later stage. There are far too many qualitative (only) descriptions and approaches in the TA world, IMO.

A lot of people I've spoken with complain about the GA/GP blender approach to data mining, but I've found that feature selection (or narrowing down the ingredients with some heuristics) tends to work better to start the modelling process. And as pointed out earlier, there have been a lot of improvements with respect to curve-fitting problems in the machine/statistical learning space.

Hugin · Dec 30, 2012

Quote from syswizard:

GA has never been proven to work for building trading systems that work well over time. They tend to "curve fit" the backtested data.
I don't think it's possible to program GA not to curve-fit.
More...

I my opinion this is valid for any selection/optimization process, even manual ones. To some degree this is what we are looking for - adapting a set of parameters to the data. Personally I do not like the term "curve-fitting" since in most cases (and definitely for the GA/GP) it is actually selection bias we are talking about.

The problem seems to be that people think that it is possible to pick a technology and then throw loads of data at it and it will find a good trading system for us. Any optimizer worth its name is able to find a good solution given enough freedom. Using dozens of input parameters will make it easy for the optimizer to find "patterns" even in random data. It also introduces the "curse of dimensionality" problem which makes it hard for a trading model to make sensible decisions on how to use the input data.

If you want to use this type of technology you still have to come up with a model that has a chance of succeeding. The parameterize it and run it through the optimizer.

Note: I have used a GA/GP optimizer for "explorative" work, i.e. to find new trading ideas. But in these cases I would never used what is found directly. It requires analysis and this is turn requires that the trading model is possible to "disect" in order to understand what has been found. I would not recommend this approach though.

Hugin · Dec 30, 2012

Quote from dtrader98:

When discussing market states, we need to really quantify what a state represents and what variables and co-variates would help to identify the state.
Someone could use something like a kaufmann efficiency ratio or hurst exponent as a measure of the state's properties. In the hurst case, maybe an asset's time series hurst exponent of greater than 0.8 could qualify as a trend, while less than .3 might qualify as a reversion market (something in-between could be classified as neutral).

A lot of people I've spoken with complain about the GA/GP blender approach to data mining, but I've found that feature selection (or narrowing down the ingredients with some heuristics) tends to work better to start the modelling process. And as pointed out earlier, there have been a lot of improvements with respect to curve-fitting problems in the machine/statistical learning space. [/B]
More...

Interesting idea using Hurst exponents.

I sometimes find the relationship between market state and the trading model a bit challenging. Should one create a number of models and let the market state enable/disable them or should one try to include the variables describing the state into the trading model? For example to adjust price movements using volatility or add trend measurements in the model.

One thing I like with GA/GP compared to other optimizers is the flexibility, since all you need is a goal function stating if one individual is better than another. This makes it possible to mix integer/real valued variables, enabling us to have one part that makes selection and another for decision making (signal or not).

Hugin · Dec 30, 2012

Quote from slacker:

What is the 'fitness' value in a trading system you are going to evolve the system to find? Net profit? (too risky, large drawdowns but big profits) Sharpe ratio?

More...

To me this is the most important question together with how to define the trading model you are going to optimize.

The optimizer will make anything to fulfill what the fitness function says and use any hole in the logic. As you say, using net profit will probably not do it (at least in my experience). Other parameters that could be used are:

1. Signal frequency - how often do you want/need your system to signal?
2. Consistency over time, both in results and signal frequency.
3. How to measure results going forward after a signal and what basic information are you going to use? You also need to decide what prices to use (end of day, VWAP, every hour)?
4. How to treat outliers?

Combining these into a function value that determines if one solution is better than another is a rather complicated task.

slacker · Dec 30, 2012

Quote from Hugin:

To me this is the most important question together with how to define the trading model you are going to optimize.

More...

The fitness function is one of the key advantages of GA over NN or other hill climbing solvers.

You are not limited to conventional TA indicators with the fitness function. You can create a time series of uptrending prices and another times series of random data and then train the GA to identify trending line segments in the first series. This is the 'classifier' role that GA do very well.

If your chromosome is a symbolic B-tree you can do a lot of interesting combinations such as:

Japan AND economics NOT politics NOT stock

Good luck,