What is the correct way to think analyse this?

Discussion in 'Strategy Building' started by abattia, Dec 16, 2010.

  1. Thank you to the "ET brain trust" members still with me on this one ...

    To re-cap:
    • I am investigating a systematic strategy (stock swing trades, daily bars).
    • When backtested against individual NASDAQ 100 stocks, the strategy sometimes trades only a low (i.e. single digit) number of times each year, not enough for any performance analysis to be statistically significant.
    • However, backtested against the current NASDAQ 100 stocks as a group, the strategy trades on average 500+ times in a year (over the last 4 years).
    • I am trying to determine whether I have a statistically significant number of backtest trades to analyze the strategy viz-a-viz trading against all NASDAQ 100 stocks as a group (can I avoid being “fooled by randomness”?).
    Following black diamond’s suggestion, I have been analysing how individual trades are distributed in time.

    METHOD
    Over 4 years of backtest data, I analysed whether multiple trades of different NASDAQ 100 stocks were occurring together.

    I assumed any two trades were not statistically independent (i.e. were “statistically dependent”?) if :
    a) BOTH entered on day “A”,
    b) BOTH exited on the day “B”, and
    c) BOTH were winners (or both were losers).

    Otherwise I assumed they were statistically independent.

    Then I averaged the number of “statistically dependent” trades that occurred each time the system traded.

    RESULTS
    The results are below (and I have no idea why they are so much further down the page LOL!).

    CONCLUSION
    From the results, is it fair to conclude that a good guestimate of the number of statistically independent "tests" of the strategy will be the total number of unique trades divided by 3 (i.e. approx 170+/yr)?

    <table border="1">
    <tr>
    <th width=100 >Year</th>
    <th width=150 >Total # of
    unique trades</th>
    <th width=180 >Avg # of
    "statistically dependent" trades occuring each time strategy trades</th>
    <th width=180>Std Dev of # of "statistically dependent" trades occuring each time strategy trades</th>
    </tr>
    <tr><td halingn=centre>2010</td>
    <td>521</td>
    <td>2.8</td>
    <td>3.7</td></tr>
    <tr><td>2009</td>
    <td>435</td>
    <td>2.5</td>
    <td>3.6</td></tr>
    <tr><td>2008</td>
    <td>666</td>
    <td>2.9</td>
    <td>3.6</td></tr>
    <tr><td>2007</td>
    <td>577</td>
    <td>2.4</td>
    <td>2.3</td></tr>
    </table>
     
    #21     Dec 30, 2010
  2. I'm sure any semi-competent 19 yo math type with a PC...
    Could use what used to be called "data mining"...
    And today is called "data dredging"...
    To come up with endless similar results in 100 hours.

    The problem is Detachment From Reality...
    The market is not a bunch of data points...
    It's 10,000 or 50,000 experts that impose very high costs on you...
    That take no fucking prisoners...
    And chisel you for $0.01... over and over up and down the Food Chain...
    Something like this might have been marginally profitable in 90s before decimalization.

    The Correct Way To Analyze This

    (1) Very specifically, why does this approach work?

    (2) Why does this TAKE MONEY AWAY from the Top 10,000 traders?

    (3) What is your Competitive Advantage...
    That allows you to overcome transaction costs + spread + cheating...
    And take money way from Market Makers, Fund Managers, and Insiders?

    Unless you can answer these questions SPECIFICALLY... you have nothing.

    Eventually you will have to give up on this...
    And develop some actual expertise...
    And an actual Competitive Advantage.
     
    #22     Jan 1, 2011
  3. Many thanks!

    Don't mean to seem as though I am ignoring you; I'm not!

    If you’d like to start a separate thread with the title of "Backtesting is Pointless", or whatever most closely represents the opinions you express above, I'll be happy to contribute. I’d be an interesting thread, I’m sure ...

    My own opinion is that it’s not pointless, and I’m trying to solicit help here from others with a similar view ...
     
    #23     Jan 1, 2011
  4. Censoring involves partially known data and it is the beyond control of the researcher.

    Selection bias is related to the method of collecting samples by the researcher.

    I cannot see why you equate the two although I don't claim to be a statistics expert. Maybe someone with a formal graduate degree in the subject could help in resolving this.
     
    #24     Jan 2, 2011

  5. I am not an expert either. It may be that the mathematical definitions of censored data and data with selection bias are different.

    This book sample seems relevant to selection bias, comments on different biases, including selection bias, in dealing with stock market indexes (indices).

    http://books.google.com/books?id=de...page&q="selection bias" "stock index"&f=false
     
    #25     Jan 2, 2011


  6. Cool! But if you want to be formal about it, this does not get you all the way to independent observations. For example, if two trades are entered on day A, one exits on day B, and the other exits on day B+1, the overlap in holding periods makes them artificially correlated. The usual way to handle this is to create an actual portfolio and measure the results in calendar time, not across individual trades. So assuming you are working on daily signals, each day you would average the returns over all your trades, weighted by the position in each. Then calculate your stats on the daily return series. The other way I know to handle this is work trade-by-trade and adjust your t-stats for clustering, which is more complicated and I think less useful. But I think you are probably fine with what you are doing, it gets you in the right direction.
     
    #26     Jan 5, 2011
  7. This is a tougher question. Some purists would consider most of this stuff data mining. I think you can learn all the stats rules you want, but in the end you need to combine test results with your beliefs about how the world works to analyze a system. Personally I am not crazy about the idea of picking individual stocks based on backtests alone without understanding why some stocks work and others don't, but I know someone who does this in a momentum system with good results. But I am comfortable with the idea that a technical system might work on illiquid stocks but not liquid stocks, or a system using accounting ratios would work differently in different industries, so I would run tests like these without worrying about data mining.
     
    #27     Jan 5, 2011
  8. ronblack

    ronblack

    There is nothing wrong with data mining. It is used every day by thousands of firms around the world to improve drastically their operation. the issues found in the literature are philosophical rather than real.
     
    #28     Jan 5, 2011
  9. Thank you, Black Diamond
    So is that to say that each day when trades occur becomes an "independent observation" in the statistical sense?

    ... if the strategy trades 200 times in a year across all instruments, but these trades are grouped into 50 days, then the number of independent observations that year is 50?

    _ _ _ _ _ _ _ _ _ _ _
    EDIT: actually, I have just re-read your response and noted you refer not just to entries and exits but also to days in which a position is held over the day. So restating ...

    ... if the strategy trades 200 times in a year across all instruments, but these entries and exits are grouped into 50 days, and there are also 10 days in which neither entries or exits occur but open positions are held over, then the number of independent observations that year is 60?
     
    #29     Jan 5, 2011