While developing a XL sheet to analyze the trades generated by a signal generator/backtester, I stumbled across the different ways to use for calculating the deviations: = STDEV() = STDEVP() = AVEDEV() Assuming there is a "best choice" between them, which one would it be? Maybe some kind soul can enlighten me on this. My knowledge about statistics is far too low to understand the textbooks - but I hope some simple rules of application can be given. Thanks in advance, Andreas
Ummm... Do you know what Standard Deviation is? The formula? Even if you get the numbers... do you know what it means??? I think that's the question you should ask yourself and the answer will come to you by itself. Don't you think?
LOL. It's like asking: "I'm developing a spreadsheet to analyze trades and I know how to add, subtract, multiply, and divide. Which one should I do?" The answer of course is: "it depends on whether you want to add, subtract, multiply, or divide". Or in your case it would be: "It depends on whether you want the standard deviation, the standard deviation of a population, or the average deviation." But in any case, thanks for telling me about the average deviation function, AVEDEV(). I didn't know it existed. I've always taken an extra column to compute the absolute values and then average them. Nice!
WDGANN: Yes to the first three questions. No to the third. Standard Deviation is a) assuming a normal distribution and b) quite heaviliy reliant on aberration or extreme values. These properties are not as strong in Average Deviation or even Deviation of Squares. So I insists on this not being a question of choosing + - * and /, but I am specifically asking if choosing one deviation over the other makes sense in specific situations. Say I want to know the range of returns with a 95% probability, is it wise to use standard dev if I know that extreme values lead to a (possibly) exagerated stdev? Or, along the same track, would it possibly be wiser to prefer average deviation over standard deviation for calculating historical volatility, no matter if "it has always been like this". So I see no reason why you don't enlighten me on the strengths and weaknesses of the different algorithms, should you know them. I guess I had made clear in my original post, that I am not an expert, so if you can help, please do so. If you can't or don't want to, then why bother trying to make a fool of me? Helps your self-esteem, or what? Which is quite different from what AVEDEV() does. Even results are different. But if you like it nevertheless, I am happy I could help. For your collection of unknown functions, there's also a =DEVSQ() Best, Andreas
a) is very wrong: the concept of standard deviation (= square root of variance) is independent of the shape of the distribution. b) is correct, use the median instead, which is more stable. To your original question: whether to use STDDEV or STDDEVP depends upon whether you are looking at a sample (STDDEV) or a population (STDDEVP). The difference lies in the factor 1/n used for the computation of the standard dev of a population and 1/(n-1) used for a sample. In case of a sample, you do not know the real mean of the population, you use the mean of the sample instead. For large n the difference usually becomes small. In your case you should use the formula for a sample: the trades of your backtest are all the trades your system would have made in the past, but you want to estimate deviations for the future. This requires your system to not change in the future(or the distribution of your trades to be the same as that in the test), usually a bold assumption. There is a formula saying in case the distribution is normal 95% of all "trades" will be within two standard deviations of the mean. If your distribution is not normal, this statement is not valid and you have to use the formula of Tschebyscheff, which is not that exact. So, you have to test whether the distribution of your trades is normal, I do not know how to do that in Excel, I'm sure it can be done. I doubt the distribution is normal. Regards Bernd Kuerbs
If your distribution is not normal, this statement is not valid and you have to use the formula of Tschebyscheff, which is not that exact. A long time since I have seen his name spelled this way. Regards, Bruce
STDEVP is the standard deviation for the (whole) POPULATION of N items whereas STDEV is the standard deviation of a SAMPLE of n items extracted from the POPULATION. The only difference in the formula between STDEV and STDEVP comes from the denominator which is n for the stdv of the population and (n-1) for the stdv of the sample because for the sample you already used what is called one degree liberty for estimating the mean so that there is only n-1 datas left for estimating the standard deviation. Dividing by n-1 instead of n assures that no (systematic) biais is introduced. Of course when n is big there is not much practical difference between dividing by n or n-1. For AVEDEV I must check before because I have the french version of Excel and don't know for the moment to what function it corresponds exactly.
Bruce, I think the distribution of the different versions of his name warrants a formula of its own. I found 9!! Tschebyscheff (Bernd's version) http://finanz.math.tu-graz.ac.at/~predota/history/mathematiker/tschebyscheff.html Tchebychev http://deming.eng.clemson.edu/pub/den/archive/97.08/msg00263.html Chebychev, Chebyshev http://www.nada.kth.se/kurser/kth/2D1441/lecturenotes/Chebyshev.pdf Csebyshev http://math.ucsd.edu/~vanvu/262-2/fall2003.pdf Tchebysheff http://math.cudenver.edu/~ssain/prob/lec13.pdf Tchebycheff http://www.uni-hohenheim.de/i3v/00068900/17610041.htm Cebysev http://jipam.vu.edu.au/v3n5/048_02.pdf Tchebyshev http://www-gat.univ-lille1.fr/~badea/cv.pdf
Bruce, I think the distribution of the different versions of his name warrants a formula of its own. I found 9!! "Chebyshev or Tchebycheff Pafnuty Lvovich Chebyshev was a notable Russian mathematician, who was born on 16 May 1821 and died on 8 December 1894. He wrote on number theory, analysis, probability, mechanics and maps. Being Russian, his name was written in the Cyrillic alphabet, causing problems of translitteration. Stijn van Dongen has compiled a list of 33 western European verions of his surname, of which Chebyshev seems to be the most popular in English, Tchebycheff in French, Cebysev in Spanish, and Tchebyscheff in German. Despited this variety, the final vowel always seems to be transcribed as "e" in that list when it is in fact pronounced "yo" in Russian. Even his given name and his patronimic have varying versions in roman script." from http://www.btinternet.com/~se16/hgb/cheb.htm I didn't think an explanation of the use for Chebyshev's inequality would be necessary Mr Subliminal, but if you happen, by chance, to run into Swing Zones you might pass along this bit of information: Let's say Swing has a series of trades numbering in the hundreds and knows what his average return per trade is and also knows the variance of the returns around that average. Using Chebyshev's Inequality he can then calculate a simple estimate of the chances that a return of at least + or - x will occur. So Mr. Zones swings an average $300 trade. His variance is equal to $36. What is the probability that any one of his trades will deviate from $300 by more than $30 in either direction? Remember x = $30; M(ean) = $300; and V(ariance) = $36. Chebyshev wrote the probability that the absolute value of the difference (x-M) is greater than or equal to an arbitrary number x>0 is less than or equal to the variance divided by x squared. Which is how a person would say the formula in English. It looks like this: P{|x-M|>=x}=< V/x squared or using Swing's hundreds of trades example P(|x-300|>=30)=<36/900 = 0.04: that is, the probability that Swing will have a deviation of greater than $30 in either direction from his average $300 trade is no more than 4%. So if Swing is curious about the frequency of a certain drawdown or exceptionally good trade occuring, this formula will give him a simple estimate. Bruce
.. So Mr. Zones swings an average $300 trade. His variance is equal to $36. ... It is an example, I know: But an average of $300 and a variance of $36? Even if you had written "a standard deviation of $36" I would suggest another name for that trader...... Have a nice day Bernd Kuerbs