Developing a profitable system(infrastructure) on a (pseudo-)random data

Discussion in 'Data Sets and Feeds' started by TSGannGalt, Jul 7, 2010.

  1. MGJ

    MGJ

    Mike805, I programmed your idea (I think) and tried it out. For the price stream I used the most recent 1000 daily closes of the emini S&P futures contract ES. For the random numbers I generated UNIFORMLY DISTRIBUTED numbers between the observed min and the observed max of the price stream. The results were disappointing; no better than flipping coins; waaaay less than 66%:

    999 predictions
    501 correct
    498 wrong
    50.15 pct correct

    Here's my Perl code
    Code:
    use POSIX ;
    
    # named constants
    $PREDICT_LESS = 0;
    $PREDICT_MORE = 1;
    
    # read the historical price data (Format: Date,Price)
    $themax = -100000.0 ;
    $themin = 100000.0;
    $nprices = 0;
    while( $inputline = < STDIN > )
    {
        chop( $inputline ) ;
        ($ddate, $cclose) = split(',', $inputline );
        $closeprice[$nprices] = $cclose;
        $nprices++;
    
        if($cclose > $themax) { $themax = $cclose; }
        if($cclose < $themin) { $themin = $cclose; }
    }
    
    # generate random numbers and use them to predict whether
    # tomorrow's price is greater or less than today's price
    
    # make an arbitrary guess for the first day
    $prediction = $PREDICT_LESS ;
    
    $span = $themax - $themin ;
    $numright = 0;
    $numwrong = 0;
    for($i=1; $i<$nprices; $i++) {
    
        # evaluate the prediction made yesterday
        if($prediction == $PREDICT_LESS) {
            if($closeprice[$i] < $closeprice[$i-1]) { $numright++; }
            else { $numwrong++; }
        }
        else { # yesterday we predicted MORE
            if($closeprice[$i] > $closeprice[$i-1]) { $numright++; }
            else { $numwrong++; }
        }
    
        # now make a prediction for tomorrow
    
        # generate a UNIFORMLY DISTRIBUTED random number x
        $x = rand() ;
        $x = ($span * $x) + $themin ;
    
        # compare today's closing price versus the random number
        if($closeprice[$i] > $x) { $prediction = $PREDICT_LESS ; }
        else { $prediction = $PREDICT_MORE ; }
    
    }
    
    
    # print the results
    printf "%d predictions    %d correct    %d wrong    %7.4f pct correct\n",
      ($numright + $numwrong), $numright, $numwrong, (100.0 * $numright / ($numright + $numwrong)) ;
    
    # finished
    the code and the emini price data file are enclosed in the attached zip archive.
     
    #21     Jul 8, 2010
  2. Cool - thanks for doing this!

    However, I think there might be some issues in the way you've chosen to structure the problem.

    Foremost, SP closing prices are not a random process, there is inherent upward drift, dramatic price altering events, and above all, the distribution is not normal. Also, there is the "sampling" element. A closing price is a sample in time, it does not reflect a pure pseudo random stochastic process.

    The way around this is ... (I won't give the answer away just yet).

    However, attached is an excel demo of this probability effect. "Button 1" Runs 40 trials where a "right" probability is output.

    Edit: ET won't let me upload a macro enabled workbook...

    Mike
     
    #22     Jul 8, 2010
  3. Dacamic

    Dacamic Guest

    I usually struggle with absolutes, yet do agree that optimized systems should be viewed with skepticism. It's also worth noting, however, that brute force searches are not limited to optimization. In fact, they might not perform optimization well depending upon the breadth of a search.
     
    #23     Jul 8, 2010
  4. For some of the newbs who keep on emphasizing on random = useless...

    I keep on mentioning (pseudo-) with random because depending on how you generated the data... the distribution of the data will always be reflected. Whether it be Uniform or Guass...

    Brownian / Wiener takes a specific distribution and generates a random data. From my limited knowledge about Stochastic Algebra, Ito Algebra is somewhat within that line of taking a characteristics and generating a bunch of stuff....

    So... what Monte Carlo and all the other shufflings and random data generation is that we're trying taking a sample distribution profile and generating more data and variances to confirm that the model/trading system is compatible with the distribution profile in which it was developed under.

    Let's say I take a S&P data and develop a Sharpe= 3.0 model. I take the S&P data and regenerate data and spin the models in it. If the vast majority of the performance does not reflect the original performance... it provides you with a feedback and insights on the character like the model's performance is dependent on a situational character of the market.... you're generating the data based on the wrong characteristics of the market in which you based your distribution profile... So let's say, after the random data test, you find that the Serial Correlation of the S&P data was 0.70... you code/generate and test. You get another set of results.

    Seriously... this is Monte Carlo 101. "Randomized data" has it's value, if you know how to use it, just like any other tool like Tech. Analysis or AI stuff....
     
    #24     Jul 8, 2010
  5. Actually, to be pedantic, Wiener processes or Brownian (geometric or not) motion (Wiener with drift), the Wiener process (component for Brownian motion ) has increments X_t2 - X_t1 that are normally distributed with mean = 0 and variance = t2 - t1, by definition.

    Also note that in Brownian motion the variance (and drift) is assumed to be constant, and Xt are iid (no serial correlation).
     
    #25     Jul 8, 2010
  6. This stuff is way over my head and I wish I even knew where to start to learn but wouldn't developing a system that is profitable on random data essentially be the "holy grail"?
     
    #26     Jul 8, 2010
  7. That depends on the random process generating the data. I can assure you that geometric brownian motion is not the process generating financial time series. It is used in texts/papers because it makes the maths simple(r) even though many of the results are not really applicable (i.e., Black Scholes Merton - vol is not constant).
     
    #27     Jul 8, 2010
  8. I get 50% if increase the sample size and spin....

    Are you trying to get into some Bayesian stuff?

    Or is this some simplified... Ito Calculus' Integrator/Limit of Prob. stuff...

    ?????? Seriously, am I wasting time trying to figure what this quiz is about?????

     
    #28     Jul 8, 2010

  9. Yeah, seriously, don't waste your time... the idea isn't worth spending too much time on.

    Honestly, I just threw out the gambling concept out there because it has a very neat property in terms of the fact that the distribution used in making decisions (R), is *independent* from C. The idea was to see what ideas you had on the matter. It wasn't meant to be a quiz, just something to get some dialogue going.

    The issue comes down to discrete normally dist. data with nice discrete steps. The more similar the data is to that of a coin flip, the better this particular case will work.

    To go the other way(i.e. real data), one has to create a distribution that most accurately reflects the data. In the case of SP data, a normal Dist. won't work and one will have to account for those non-linear and non-continuous steps. There's an art and some science to that... it essentially becomes an optimization problem where one has to "fit" a distribution to the data using any number of numerical methods.
     
    #29     Jul 8, 2010
  10. Mike805, your Excel sheet seems to work only because RANDBETWEEN(-1, 1) results in 2 cases (-1 and 0) where C is still the "High", and only 1 case where R is a High (for 1), i.e., 2/3 = 66.6%.

    If you change that to RANDBETWEEN(-2, 2), you get 60% because now, there are 3 cases (-2, -1, and 0) where C = "High".

    I'm not sure I see the "very neat property" here? Can you please state the exact property of random walk / distribution / gambling, which you intended to demonstrate? Maybe you can modify the Excel sheet?


     
    #30     Jul 9, 2010