As I understand it, a reduced standard deviation of returns is the way to go in terms of reducing risk. However SD doesn't discriminate between wins and losses. For example, the following values give a low SD: 1000 1000 1100 990 1200 (SD = 91) However if you have one really terrific trade, SD goes through the roof: 1000 1000 1100 990 8500 (SD = 3344) So this second set of returns appear riskier, when looking only at SD, and not the trades themselves. Is there a way to find a SD measure that identifies only the volatility of negative returns? Cheers, Adrian

The problem with measures like the Sortino Ratio that only use downside standard deviation, is that when you throw away the upside returns, you reduce your sample size and therefore the confidence that your calculation actually reflects the underlying process. For example, if your strategy has roughly 50% winners, your now only using 50% of your data for computing your volatility measure. Or worse, If your strategy has 70% winners, now you are throwing away the majority of your available data points. This doesn't necessarily mean that the Sortino Ratio is "bad", and shouldn't be used. It just means that you need to keep this in mind, and be aware of how your win percentage is affecting the sample size.

False; upside returns are not thrown away in either case. You have an equal number of samples for both sharpe and sortino. The number of observations does not change. The major purpose for sortino is not to penalize upside variance; it need only filter the distribution via some threshold, not reduce sample size.

You may use an equal number of samples for computing both the Sharpe and Sortino ratios, but you don't use an equal number of samples for computing the denominator of both ratios, and that is extremely important. When you multiply or divide one statistic that is determined over a large number of samples with another statistic that is determined over a smaller number of samples, the confidence of the combined statistic is going to be largely determined by the statistic with the smaller number of samples. The Sharpe and Sortino, both use the same number of samples, for computing the numerator, but the number of samples for the denominator is almost always going to be smaller for the Sortino Ratio. Here's a fairly extreme example that should make the point obvious. Say, you have a system that has 30 backtested trades, with 27 winners, and 3 losers. When you calculate standard deviation for the Sortino Ratio, you are calculating a standard deviation with only 3 samples. If you calculate Sharpe, you are calculating a standard deviation using 30 samples . It should be intuitively obvious that the Sharpe ratio is more likely to be closer to the true Sharpe ratio of the distribution, than the Sortino Ratio is going to be to the true Sortino ratio of the distribution. For this system, you would need a sample-size of almost 300 samples to compute a Sortino Ratio as "good" as the Sharpe Ratio was with only 30 samples.

Your confidence in either case is dependent on the underlying returns distribution, which is the same for both cases. If you had 999,000 positive returns and one negative return; your confidence in the sortino metric would be just as valid as your confidence in the sharpe metric (which would hopefully be suspect in either case). You are misunderstanding statistics as it is applied here (there is no change in 'any' quantity of samples), but I won't argue the point any further.

You can't know what the underlying return distribution is going to be. The whole point of computing things like the Sharpe or Sortino, is to try to estimate its properties from the returns that have evolved so far. Your confidence in this estimate is dependent in part on how many samples you use to make this estimate.

You do realize my mention of underlying return distribution implicitly refers to sample, not population distribution. You can not know the pop distribution--that is correct; no argument there. However, number of observations being used in both sample distributions can be known and are identical (even though positive volatility attributes of the sortino sample distribution are filtered or forced to zero). If you truly want to make the argument that sortino or sharpe ratios have different levels of reliability, you need to determine and compare statistical properties of samples of 'Those' data sets themselves. Peace, dt

I'm not saying exactly that Sharpe and Sortino have different levels of reliability. I'm saying that for the same number of samples, you should generally have less confidence that the population Sortino looks like the sampled Sortino , than you do that the population Sharpe looks like the sample Sharpe. Hey if you are going to argue that a standard deviation computed with 3 samples is just as good as a standard deviation computed over 30...than I don't really know what to say...