Regression trees for predicting trade success

Discussion in 'Strategy Building' started by jcl, Feb 22, 2012.

  1. jcl

    jcl

    I'm currently experimenting with classification and regression trees for predicting if a trade will be profitable or not. The results look quite promising so far - the profit almost doubles when I filter out trades with rules generated by a relatively simple regression tree. But some more research is required.

    Has anyone already made experiences with tree based machine learning algorithms, such as CART? And is there some interest that I share my results here?
     
  2. I'm sure there will be a great deal of interest. Wish I could contribute to the thread but I have zero experience with the subject.
     
  3. Humpy

    Humpy

    If it is any good there will be lots of interest.

    So speak on
     
  4. Hugin

    Hugin

    We have done some work with CART like classifiers within non-trading applications. They are easy to use and you can easily understand the results, which is not necessarily true of other classifiers.

    One problem is that they are sensitive to the nature of the dependencies between input variables. The reason is that decision trees (at least the standard version) takes decisions one variable at a time.

    Take for instance a problem with two classes of data, with two input variables and where the line that separates the two classes is along the diagonal. CART struggles to solve this efficiently. It can be solved by generating a large number of nodes where each nodes takes care of one small part of the line but for a real problem this adds to the risk of overfitting data since you add a lot of parameters (the additional nodes) and it is hard to guarantee that the training algorithm use these as expected.
     
  5. ssrrkk

    ssrrkk

    Nice answer. From my experience, you can throw all the classification models you want at the problem, but in the end what matters is whether there is a real signal buried under the noise, or whether you are just going to fit to noise. This goes for neural nets, SVMs, SOMs, logistic regression, Bayesian nets, genetic algorithms, PCA, disciminant analysis, etc. etc. etc.
     
  6. jcl

    jcl

    Thanks for the link! I also tried first to use a tree for generating trade signals, but this didn't work very well. The trees became too complicated and could not be effectively pruned.

    CART is indeed not suited in cases where the result can be expressed as a function of products of input variables. I think for this and other reasons, using a decision tree for generating trade signals won't work. However the tree method seems to work quite good as a trade filter. Dependent on input variables the trees look quite simple and have an effectivity in the 70%..90% range.

    I'm just testing it with some simple algos. This is the original trade algo with the lowpass filter that I posted in the other thread:

    Code:
    var *Price = series(price());
    var *Trend = series(LowPass(Price,1000));
    Stop = ATR(100);
    
    if(valley(Trend)) {
    	sellShort();
    	buyLong();
    } else if(peak(Trend)) {
    	sellLong();
    	buyShort();
    }
    
    And this is the version with the CART filter:

    Code:
    var *Price = series(price());
    var *Trend = series(LowPass(Price,1000));
    Stop = ATR(100);
    	
    var AutoCorrel = Correlation(Price,Price+1,30);
    var Volatility = ATR(30)/ATR(1000);
    var DomPeriod = DominantPeriod(100);
    var FD = FractalDimension(Price,30);
    	
    if(valley(Trend)) {
    	sellShort();
    	if(adviseLong(0,Volatility,DomPeriod,AutoCorrel,FD) > 0)
    		buyLong();
    } else if(peak(Trend)) {
    	sellLong();
    	if(adviseShort(0,Volatility,DomPeriod,AutoCorrel,FD) > 0)
    		buyShort();
    }
    
    So I'm using some random indicators, such as fractal dimension, dominant cycle and so on, for input to the tree. The generated rul looks like this:

    Code:
    int adviseEURUSD_S(var* signal)
    {
      if(signal[1] <= 12.938) {
        if(signal[3] <= 0.953) return -70;
        else {
          if(signal[2] <= 43) return 25;
          else {
            if(signal[3] <= 0.962) return -67;
            else return 15;
          }
        }
      }
      else {
        if(signal[3] <= 0.732) return -71;
        else {
          if(signal[1] > 30.61) return 27;
          else {
            if(signal[1] <= 19.315) return 8;
            else {
              if(signal[2] > 46) return -80;
              else {
                if(signal[0] <= -9.606) return -63;
                else return 62;
              }
            }
          }
        }
      }
    }
    
    This is for EUR/USD short trades, for other currencies and long trades there are similar rules. The rules are generated by the optimizer and executed in trade and test mode. This is the implementation:

    http://zorro-trader.com/en/advisor.htm

    The original algo got about 75% annual profit, the version with the "advise" function got 163% profit on unseen data, although the input variables above are more or less random. I plan to test this with more algos and different input variables in the next time.
     
  7. Hugin

    Hugin

    Yes, it is definitely so that defining the model is the trickiest part. I wish it wasn't so, so you could just throw technology at the problem. I have always found it hard to find the right balance between the number of parameters (implicit and free) in a model and the expected walk-forward performance to back-test/training performance relationship (i.e. to find the answer to the question "in general, if I add another parameter to the model how much higher back-test performance do I need in order for the expected walk-forward performance to be higher?")
     
  8. @Hugin. Always good to hear your comments. BTW, I always meant to ask; did you get your handle from the bayesian software tool?

    @jcl. Seeing as you are a forex guy and looking at the types of inputs you used, here's a series of some slightly older, well-written articles you might find of interest-- I did.
    http://forexautomaton.com/research

    On the off note that either of you guys has an amazon prime membership, PM me.
    I came across something interesting fairly recently.
     

  9. Would be interested to see your results with comparison between different types of trees. Random forest is the one I'm most familiar with. You might also try boosted trees.

    How much time does your data span? I've been fooled before by algorithms that performed well in 1-2 months of testing, only to become random in following months.
     
    #10     Feb 23, 2012