What is the correct way to think analyse this?

abattia · Dec 30, 2010

Thank you to the "ET brain trust" members still with me on this one ...

To re-cap:

I am investigating a systematic strategy (stock swing trades, daily bars).

When backtested against individual NASDAQ 100 stocks, the strategy sometimes trades only a low (i.e. single digit) number of times each year, not enough for any performance analysis to be statistically significant.

However, backtested against the current NASDAQ 100 stocks as a group, the strategy trades on average 500+ times in a year (over the last 4 years).

I am trying to determine whether I have a statistically significant number of backtest trades to analyze the strategy viz-a-viz trading against all NASDAQ 100 stocks as a group (can I avoid being âfooled by randomnessâ?).

Quote from black diamond:
...You would like to know if your signals are clustered in time or evenly distributed...your tests are not independent when treat all the individual trades as a big sample and ignore whether they happened at the same time or not...
More...

Following black diamondâs suggestion, I have been analysing how individual trades are distributed in time.

METHOD
Over 4 years of backtest data, I analysed whether multiple trades of different NASDAQ 100 stocks were occurring together.

I assumed any two trades were not statistically independent (i.e. were âstatistically dependentâ?) if :
a) BOTH entered on day âAâ,
b) BOTH exited on the day âBâ, and
c) BOTH were winners (or both were losers).

Otherwise I assumed they were statistically independent.

Then I averaged the number of âstatistically dependentâ trades that occurred each time the system traded.

RESULTS
The results are below (and I have no idea why they are so much further down the page LOL!).

CONCLUSION
From the results, is it fair to conclude that a good guestimate of the number of statistically independent "tests" of the strategy will be the total number of unique trades divided by 3 (i.e. approx 170+/yr)?

<table border="1">
<tr>
<th width=100 >Year</th>
<th width=150 >Total # of
unique trades</th>
<th width=180 >Avg # of
"statistically dependent" trades occuring each time strategy trades</th>
<th width=180>Std Dev of # of "statistically dependent" trades occuring each time strategy trades</th>
</tr>
<tr><td halingn=centre>2010</td>
<td>521</td>
<td>2.8</td>
<td>3.7</td></tr>
<tr><td>2009</td>
<td>435</td>
<td>2.5</td>
<td>3.6</td></tr>
<tr><td>2008</td>
<td>666</td>
<td>2.9</td>
<td>3.6</td></tr>
<tr><td>2007</td>
<td>577</td>
<td>2.4</td>
<td>2.3</td></tr>
</table>

DeeDeeTwo · Jan 1, 2011

I'm sure any semi-competent 19 yo math type with a PC...
Could use what used to be called "data mining"...
And today is called "data dredging"...
To come up with endless similar results in 100 hours.

The problem is Detachment From Reality...
The market is not a bunch of data points...
It's 10,000 or 50,000 experts that impose very high costs on you...
That take no fucking prisoners...
And chisel you for $0.01... over and over up and down the Food Chain...
Something like this might have been marginally profitable in 90s before decimalization.

The Correct Way To Analyze This

(1) Very specifically, why does this approach work?

(2) Why does this TAKE MONEY AWAY from the Top 10,000 traders?

(3) What is your Competitive Advantage...
That allows you to overcome transaction costs + spread + cheating...
And take money way from Market Makers, Fund Managers, and Insiders?

Unless you can answer these questions SPECIFICALLY... you have nothing.

Eventually you will have to give up on this...
And develop some actual expertise...
And an actual Competitive Advantage.

abattia · Jan 1, 2011

Quote from DeeDeeTwo:
... Unless you can answer these questions SPECIFICALLY... you have nothing.

Eventually you will have to give up on this...
And develop some actual expertise...
And an actual Competitive Advantage.
More...

Many thanks!

Don't mean to seem as though I am ignoring you; I'm not!

If youâd like to start a separate thread with the title of "Backtesting is Pointless", or whatever most closely represents the opinions you express above, I'll be happy to contribute. Iâd be an interesting thread, Iâm sure ...

My own opinion is that itâs not pointless, and Iâm trying to solicit help here from others with a similar view ...

goodgoing · Jan 2, 2011

Quote from Stoxtrader:

Selection bias is the same as censored/truncated data. Search for Heckman algorithm, Heckman solution, Heckman selection model, Heckman two-step procedure, Heckman correction. One implementation is in the R package sampleSelection.

http://cran.r-project.org/web/packages/sampleSelection/vignettes/selection.pdf
More...

Censoring involves partially known data and it is the beyond control of the researcher.

Selection bias is related to the method of collecting samples by the researcher.

I cannot see why you equate the two although I don't claim to be a statistics expert. Maybe someone with a formal graduate degree in the subject could help in resolving this.

Stoxtrader · Jan 2, 2011

Quote from goodgoing:

Censoring involves partially known data and it is the beyond control of the researcher.

Selection bias is related to the method of collecting samples by the researcher.

I cannot see why you equate the two although I don't claim to be a statistics expert. Maybe someone with a formal graduate degree in the subject could help in resolving this.
More...

I am not an expert either. It may be that the mathematical definitions of censored data and data with selection bias are different.

This book sample seems relevant to selection bias, comments on different biases, including selection bias, in dealing with stock market indexes (indices).

http://books.google.com/books?id=de...page&q="selection bias" "stock index"&f=false

black diamond · Jan 5, 2011

Quote from abattia:

Following black diamondâs suggestion, I have been analysing how individual trades are distributed in time.

METHOD
Over 4 years of backtest data, I analysed whether multiple trades of different NASDAQ 100 stocks were occurring together.

I assumed any two trades were not statistically independent (i.e. were âstatistically dependentâ?) if :
a) BOTH entered on day âAâ,
b) BOTH exited on the day âBâ, and
c) BOTH were winners (or both were losers).

Otherwise I assumed they were statistically independent.

More...

Cool! But if you want to be formal about it, this does not get you all the way to independent observations. For example, if two trades are entered on day A, one exits on day B, and the other exits on day B+1, the overlap in holding periods makes them artificially correlated. The usual way to handle this is to create an actual portfolio and measure the results in calendar time, not across individual trades. So assuming you are working on daily signals, each day you would average the returns over all your trades, weighted by the position in each. Then calculate your stats on the daily return series. The other way I know to handle this is work trade-by-trade and adjust your t-stats for clustering, which is more complicated and I think less useful. But I think you are probably fine with what you are doing, it gets you in the right direction.

black diamond · Jan 5, 2011

Quote from abattia:

But if I decide to trade just the 76 "good" stocks (rather than the full 100) aren't I just "curve fitting"?

Or does the fact that I would be taking out whole sectors (*) (rather than individual companies) redeem me - at least to some extent - from this cardinal sin?

More...

This is a tougher question. Some purists would consider most of this stuff data mining. I think you can learn all the stats rules you want, but in the end you need to combine test results with your beliefs about how the world works to analyze a system. Personally I am not crazy about the idea of picking individual stocks based on backtests alone without understanding why some stocks work and others don't, but I know someone who does this in a momentum system with good results. But I am comfortable with the idea that a technical system might work on illiquid stocks but not liquid stocks, or a system using accounting ratios would work differently in different industries, so I would run tests like these without worrying about data mining.

ronblack · Jan 5, 2011

Quote from black diamond:

This is a tougher question. Some purists would consider most of this stuff data mining. ... so I would run tests like these without worrying about data mining.
More...

There is nothing wrong with data mining. It is used every day by thousands of firms around the world to improve drastically their operation. the issues found in the literature are philosophical rather than real.

abattia · Jan 5, 2011

Thank you, Black Diamond

Quote from black diamond:
...The usual way to handle this is to create an actual portfolio and measure the results in calendar time, not across individual trades. So assuming you are working on daily signals, each day you would average the returns over all your trades, weighted by the position in each. Then calculate your stats on the daily return series...
More...

So is that to say that each day when trades occur becomes an "independent observation" in the statistical sense?

... if the strategy trades 200 times in a year across all instruments, but these trades are grouped into 50 days, then the number of independent observations that year is 50?

_ _ _ _ _ _ _ _ _ _ _
EDIT: actually, I have just re-read your response and noted you refer not just to entries and exits but also to days in which a position is held over the day. So restating ...

... if the strategy trades 200 times in a year across all instruments, but these entries and exits are grouped into 50 days, and there are also 10 days in which neither entries or exits occur but open positions are held over, then the number of independent observations that year is 60?