Developing a profitable system(infrastructure) on a (pseudo-)random data

MGJ · Jul 8, 2010

Mike805, I programmed your idea (I think) and tried it out. For the price stream I used the most recent 1000 daily closes of the emini S&P futures contract ES. For the random numbers I generated UNIFORMLY DISTRIBUTED numbers between the observed min and the observed max of the price stream. The results were disappointing; no better than flipping coins; waaaay less than 66%:

999 predictions
501 correct
498 wrong
50.15 pct correct

Here's my Perl code

Code:

use POSIX ;

# named constants
$PREDICT_LESS = 0;
$PREDICT_MORE = 1;

# read the historical price data (Format: Date,Price)
$themax = -100000.0 ;
$themin = 100000.0;
$nprices = 0;
while( $inputline = < STDIN > )
{
    chop( $inputline ) ;
    ($ddate, $cclose) = split(',', $inputline );
    $closeprice[$nprices] = $cclose;
    $nprices++;

    if($cclose > $themax) { $themax = $cclose; }
    if($cclose < $themin) { $themin = $cclose; }
}

# generate random numbers and use them to predict whether
# tomorrow's price is greater or less than today's price

# make an arbitrary guess for the first day
$prediction = $PREDICT_LESS ;

$span = $themax - $themin ;
$numright = 0;
$numwrong = 0;
for($i=1; $i<$nprices; $i++) {

    # evaluate the prediction made yesterday
    if($prediction == $PREDICT_LESS) {
        if($closeprice[$i] < $closeprice[$i-1]) { $numright++; }
        else { $numwrong++; }
    }
    else { # yesterday we predicted MORE
        if($closeprice[$i] > $closeprice[$i-1]) { $numright++; }
        else { $numwrong++; }
    }

    # now make a prediction for tomorrow

    # generate a UNIFORMLY DISTRIBUTED random number x
    $x = rand() ;
    $x = ($span * $x) + $themin ;

    # compare today's closing price versus the random number
    if($closeprice[$i] > $x) { $prediction = $PREDICT_LESS ; }
    else { $prediction = $PREDICT_MORE ; }

}


# print the results
printf "%d predictions    %d correct    %d wrong    %7.4f pct correct\n",
  ($numright + $numwrong), $numright, $numwrong, (100.0 * $numright / ($numright + $numwrong)) ;

# finished

the code and the emini price data file are enclosed in the attached zip archive.

Mike805 · Jul 8, 2010

Quote from MGJ:

Mike805, I programmed your idea (I think) and tried it out. For the price stream I used the most recent 1000 daily closes of the emini S&P futures contract ES. For the random numbers I generated UNIFORMLY DISTRIBUTED numbers between the observed min and the observed max of the price stream. The results were disappointing; no better than flipping coins; waaaay less than 66%:

999 predictions
501 correct
498 wrong
50.15 pct correct

Here's my Perl code

Code:

use POSIX ; # named constants $PREDICT_LESS = 0; $PREDICT_MORE = 1; # read the historical price data (Format: Date,Price) $themax = -100000.0 ; $themin = 100000.0; $nprices = 0; while( $inputline = < STDIN > ) { chop( $inputline ) ; ($ddate, $cclose) = split(',', $inputline ); $closeprice[$nprices] = $cclose; $nprices++; if($cclose > $themax) { $themax = $cclose; } if($cclose < $themin) { $themin = $cclose; } } # generate random numbers and use them to predict whether # tomorrow's price is greater or less than today's price # make an arbitrary guess for the first day $prediction = $PREDICT_LESS ; $span = $themax - $themin ; $numright = 0; $numwrong = 0; for($i=1; $i<$nprices; $i++) { # evaluate the prediction made yesterday if($prediction == $PREDICT_LESS) { if($closeprice[$i] < $closeprice[$i-1]) { $numright++; } else { $numwrong++; } } else { # yesterday we predicted MORE if($closeprice[$i] > $closeprice[$i-1]) { $numright++; } else { $numwrong++; } } # now make a prediction for tomorrow # generate a UNIFORMLY DISTRIBUTED random number x $x = rand() ; $x = ($span * $x) + $themin ; # compare today's closing price versus the random number if($closeprice[$i] > $x) { $prediction = $PREDICT_LESS ; } else { $prediction = $PREDICT_MORE ; } } # print the results printf "%d predictions %d correct %d wrong %7.4f pct correct\n", ($numright + $numwrong), $numright, $numwrong, (100.0 * $numright / ($numright + $numwrong)) ; # finished

the code and the emini price data file are enclosed in the attached zip archive.
More...
Cool - thanks for doing this!

However, I think there might be some issues in the way you've chosen to structure the problem.

Foremost, SP closing prices are not a random process, there is inherent upward drift, dramatic price altering events, and above all, the distribution is not normal. Also, there is the "sampling" element. A closing price is a sample in time, it does not reflect a pure pseudo random stochastic process.

The way around this is ... (I won't give the answer away just yet).

However, attached is an excel demo of this probability effect. "Button 1" Runs 40 trials where a "right" probability is output.

Edit: ET won't let me upload a macro enabled workbook...

Mike

Dacamic · Jul 8, 2010

Quote from promagma:

For the purposes of this thread, I think all will agree that randomly optimizing systems on random data (or even price data) is useless. But for my purposes - I am using calculated datasets which may actually contain an edge.
More...

I usually struggle with absolutes, yet do agree that optimized systems should be viewed with skepticism. It's also worth noting, however, that brute force searches are not limited to optimization. In fact, they might not perform optimization well depending upon the breadth of a search.

TSGannGalt · Jul 8, 2010

For some of the newbs who keep on emphasizing on random = useless...

I keep on mentioning (pseudo-) with random because depending on how you generated the data... the distribution of the data will always be reflected. Whether it be Uniform or Guass...

Brownian / Wiener takes a specific distribution and generates a random data. From my limited knowledge about Stochastic Algebra, Ito Algebra is somewhat within that line of taking a characteristics and generating a bunch of stuff....

So... what Monte Carlo and all the other shufflings and random data generation is that we're trying taking a sample distribution profile and generating more data and variances to confirm that the model/trading system is compatible with the distribution profile in which it was developed under.

Let's say I take a S&P data and develop a Sharpe= 3.0 model. I take the S&P data and regenerate data and spin the models in it. If the vast majority of the performance does not reflect the original performance... it provides you with a feedback and insights on the character like the model's performance is dependent on a situational character of the market.... you're generating the data based on the wrong characteristics of the market in which you based your distribution profile... So let's say, after the random data test, you find that the Serial Correlation of the S&P data was 0.70... you code/generate and test. You get another set of results.

Seriously... this is Monte Carlo 101. "Randomized data" has it's value, if you know how to use it, just like any other tool like Tech. Analysis or AI stuff....

Equalizer · Jul 8, 2010

Quote from TSGannGalt:

....
I keep on mentioning (pseudo-) with random because depending on how you generated the data... the distribution of the data will always be reflected. Whether it be Uniform or Guass...

Brownian / Wiener takes a specific distribution and generates a random data. From my limited knowledge about Stochastic Algebra, Ito Algebra is somewhat within that line of taking a characteristics and generating a bunch of stuff....

...
More...

Actually, to be pedantic, Wiener processes or Brownian (geometric or not) motion (Wiener with drift), the Wiener process (component for Brownian motion ) has increments X_t2 - X_t1 that are normally distributed with mean = 0 and variance = t2 - t1, by definition.

Also note that in Brownian motion the variance (and drift) is assumed to be constant, and Xt are iid (no serial correlation).

mindtrade · Jul 8, 2010

This stuff is way over my head and I wish I even knew where to start to learn but wouldn't developing a system that is profitable on random data essentially be the "holy grail"?

Equalizer · Jul 8, 2010

Quote from mindtrade:

This stuff is way over my head and I wish I even knew where to start to learn but wouldn't developing a system that is profitable on random data essentially be the "holy grail"?
More...

That depends on the random process generating the data. I can assure you that geometric brownian motion is not the process generating financial time series. It is used in texts/papers because it makes the maths simple(r) even though many of the results are not really applicable (i.e., Black Scholes Merton - vol is not constant).

TSGannGalt · Jul 8, 2010

I get 50% if increase the sample size and spin....

Are you trying to get into some Bayesian stuff?

Or is this some simplified... Ito Calculus' Integrator/Limit of Prob. stuff...

?????? Seriously, am I wasting time trying to figure what this quiz is about?????

Quote from Mike805:

Cool - thanks for doing this!

However, I think there might be some issues in the way you've chosen to structure the problem.

Foremost, SP closing prices are not a random process, there is inherent upward drift, dramatic price altering events, and above all, the distribution is not normal. Also, there is the "sampling" element. A closing price is a sample in time, it does not reflect a pure pseudo random stochastic process.

The way around this is ... (I won't give the answer away just yet).

However, attached is an excel demo of this probability effect. "Button 1" Runs 40 trials where a "right" probability is output.

Edit: ET won't let me upload a macro enabled workbook...

Mike
More...

Mike805 · Jul 8, 2010

Quote from TSGannGalt:

I get 50% if increase the sample size and spin....

Are you trying to get into some Bayesian stuff?

Or is this some simplified... Ito Calculus' Integrator/Limit of Prob. stuff...

?????? Seriously, am I wasting time trying to figure what this quiz is about?????
More...

Yeah, seriously, don't waste your time... the idea isn't worth spending too much time on.

Honestly, I just threw out the gambling concept out there because it has a very neat property in terms of the fact that the distribution used in making decisions (R), is *independent* from C. The idea was to see what ideas you had on the matter. It wasn't meant to be a quiz, just something to get some dialogue going.

The issue comes down to discrete normally dist. data with nice discrete steps. The more similar the data is to that of a coin flip, the better this particular case will work.

To go the other way(i.e. real data), one has to create a distribution that most accurately reflects the data. In the case of SP data, a normal Dist. won't work and one will have to account for those non-linear and non-continuous steps. There's an art and some science to that... it essentially becomes an optimization problem where one has to "fit" a distribution to the data using any number of numerical methods.

TigerBalm · Jul 9, 2010

Mike805, your Excel sheet seems to work only because RANDBETWEEN(-1, 1) results in 2 cases (-1 and 0) where C is still the "High", and only 1 case where R is a High (for 1), i.e., 2/3 = 66.6%.

If you change that to RANDBETWEEN(-2, 2), you get 60% because now, there are 3 cases (-2, -1, and 0) where C = "High".

I'm not sure I see the "very neat property" here? Can you please state the exact property of random walk / distribution / gambling, which you intended to demonstrate? Maybe you can modify the Excel sheet?

Quote from Mike805:

Yeah, seriously, don't waste your time... the idea isn't worth spending too much time on.

Honestly, I just threw out the gambling concept out there because it has a very neat property in terms of the fact that the distribution used in making decisions (R), is *independent* from C. The idea was to see what ideas you had on the matter. It wasn't meant to be a quiz, just something to get some dialogue going.

The issue comes down to discrete normally dist. data with nice discrete steps. The more similar the data is to that of a coin flip, the better this particular case will work.

To go the other way(i.e. real data), one has to create a distribution that most accurately reflects the data. In the case of SP data, a normal Dist. won't work and one will have to account for those non-linear and non-continuous steps. There's an art and some science to that... it essentially becomes an optimization problem where one has to "fit" a distribution to the data using any number of numerical methods.
More...

Log in or Sign up

Developing a profitable system(infrastructure) on a (pseudo-)random data

MGJ

predicto.zip

Mike805

randomfun.xls

Dacamic Guest

TSGannGalt

Equalizer

mindtrade

Equalizer

TSGannGalt

Mike805

TigerBalm