Arguments in favour of the use of Synthetic Data in Backtesting

Discussion in 'Strategy Building' started by AFJ Garner, Apr 30, 2013.

  1. There is precious little agreement in academia as to almost any aspect of markets. There is no need to list or re-iterate the many views out there.

    Much has been made of black swan events and there apparently increasing frequency in modern markets. Much has been made of the apparent change in markets over the decades which make it increasingly dangerous to rely on actual past market data beyond about the year 2000. We can not predict the future, we can not rely on the past. What should we do therefore to test the robustness or otherwise of our chosen algorithmic trading system?

    Many have noted that when presented with well constructed synthetic data series, few are able to tell the difference between these and real price series. Trends surface in synthetic data – even the most crude simulated data will show you this. Simulated data can ape changing volatility; changing correlations between simulated data streams will presumably manifest itself without the need for design. Synthetic data can manifest long periods of mean reversion, price shock and any other aspect of market reality.

    So ask yourselves this question: if the future is unknowable and unpredictable, may there not be some value in pitting your favoured approach to markets against totally unseen and randomly generated data which is more likely to represent the limitless possibilities which lie ahead of us than the (by definition) limited market conditions we have already experienced.

    And consider this: if your system is unable to profit from synthetic data, will it be any more capable of coping with what real markets throw at it tomorrow and ever after?

    It is just possible that the use of synthetic data in back testing might reveal some interesting “truths” which some of us may be unwilling to acknowledge.

    Anthony FJ Garner
     
  2. Are the markets random? ----[a]
    Or are they not randon (but with a very low signal-to-noise ratio). ----

    IMHO, the answers to your questions will depend on whether the respondent subscribes to [a], or to , or even to something else!

    I'm a . So running a strategy in pure white noise is not telling how the strategy will do in a market where the edge is present.

    But I guess it does give me an idea of what can happen during a sustained period of white noise price action ... perhaps it's a test that can be factored into identfying and refining appropriate position size, etc? So, it could be useful in the way that a Monte Carlo can be useful, i.e. to remind you that "deterministic" is not an adjective that applies to very much in the field of [systematic] trading, and that you need to be correctly prepared for a bad run ...
     
  3. dom993

    dom993

    What worsen matters, markets, while not random, are (of course) only probabilistic (and I certainly agree w/ the low signal/noise ratio, which further aggravate things, by both "hiding" true signals in the noise, as well as "making up" false signals from noise).
     


  4. Excellent post.
     
  5. Having seen this question too many times, I will sum this up in no uncertain terms:

    Using Synthetic Data in Backtesting will never produce a profitable system. Period. We don't need to debate that, but for the rest of this thread, we can discuss whether it adds validity to our assumptions, and that is synthetic datas' only purpose.

    To your point, there is no argument to support whether synthetic data produces profitable trading systems through backtesting. Whatever argument there might be has no reasonable basis because there is no case where anybody has done this. Creating systems designed on synthetic data alone and have it magically turn out to be profitable is a myth, so don't pretend like there is an argument to use synthetic data in backtesting and have it turn out to have profitable expectancy in whichever market you're simulating. Synthetic data is to check your systems execution algos, and make sure everything's working. It should not be used to create trading systems or to evaluate the fitness of the real chart's parameters.

    I guess I'm not in favour of using synthetic data in backtesting because I haven't seen a convincing argument for its use in my trading. Again, it checks your algos executions and possibly simulates some targets but using that data as a way to model another market is where I'd disagree with why you wouldn't want to use it in that way.
     
  6. TD80

    TD80

    Cheers Anthony, I miss our old TB Forum days. I see you have started a public forum with Andreas, so I will skulk around there as well :cool:

    I suppose in considering your provocative pot stirring question, I would say it really boils down to what the rationalization is for your anticipated alpha / non-random return.

    Certainly we can debate market efficiency on an academic level, and of course many a U. of Chicago acolyte will tell you it is all random / efficient. I believe these individuals have a rather too-kind view of humanity and specifically crowd behavior.

    My personal religion on this topic can be boiled down to: (reasonable, calculated) Risk takers in aggregate must be compensated over the long run, otherwise the entire capitalistic system fails. Let's hope the system doesn't fail in the near future :D
     
  7. I have continued to work on the question of changing volatility in the futures markets over time, for the purposes of attempting to build realistic synthetic data series.

    It is my intention to mimic each class or sector separately and to have many different series for each “asset class” which try to incorporate typical volatilities for the underlying instruments.

    I will program series for stock market indices, grains, metals (precious and industrial) and so on in an effort to introduce as much realism as possible.

    In that regard I am taking a closer look at the entire history of each of the instruments in my portfolio. I set out on my website a few charts for Comex Silver.

    What has happened over time? Has the frequency in any particular “bin” of daily return ranges changed over time? Has there been a trend up or down? If so, will this trend continue or will it revert to some sort of mean? Fruitless to speculate perhaps but interesting to see nonetheless what has occurred since volume built up by around 1967 in the newly created Comex Silver contract.

    Again, I used a CSI Pertpetual contract and the log normal of daily price change. If you look at a price chart perhaps the most notable features are the huge spikes in 1979 (the Hunt cornering episode I think?) and again the gigantic run up in 2011.

    In terms of short term measures of volatility (3, 7 and 14 days rolling annualised standard deviation of daily returns), these two periods alone do not really stand out: there are many peaks in very short term measures of volatility and these peaks seem to have been increasing in magnitude in recent years. The longer term picture seems more moot and the picture less clear: the trend in the 100 day rolling average seems to have trended up over time, but the same can not really be said for the 500 day. And in all cases the period from 1989 to 2002 shows a lowering of volatility over all time frames.

    What of the future? Who knows? In may be safest to assume that volatility over the very long term is mean reverting. No doubt greater volatility has followed increased volumes, but what of volume? If silver is less traded for any reason in the future we might reasonably expect volume and volatility to drop off.

    Where does this leave us? Back where we started: we must probably rely on a “close to random walk” in terms of volume, price and volatility if we want to cover all possibilities. But there again we should probably combine such series with different assumptions in other series for the same class: assumptions that volatility will trend upwards perhaps, as will volume.

    And what of price? Certainly in terms of stock indices it may be reasonable to include both scenarios: price series where a continuation of the Enlightenment continues to drive economies and hence stock markets upwards. And price series which include no inherent return or “drift” – where stock indices remain stagnant and move sideways, perhaps over many, many years.

    The same with commodities. Oil and gas may be scarce resources but who says their price has to move up forever? Perhaps man will cease to rely on oil as nuclear fusion becomes a reality.

    My object in investigating synthetic data is to test trend following on as wide a range of possible future conditions as I can imagine.