Book recommedations on Algo trading and backtesting

Discussion in 'Automated Trading' started by cafeole, Nov 1, 2020.

  1. MarkBrown

    MarkBrown

    fail - lol
     
    #31     Nov 26, 2020
  2. cafeole

    cafeole

    I am reading 1. right now, have 2 and 3 on the way. I really like it so far. I don't have AmiBroker so I will have to wait a little bit to try his methods.
     
    #32     Dec 2, 2020
    ValeryN and MarkBrown like this.
  3. On stationarity, firstly what works is to transform your prices in some way to make them stationary. I risk adjust everything, which makes everything a lot more stationary, and means you can fit things across regimes and pool data across instruments. It also means that non Gaussian price distributions become very closer to Gaussian.

    Secondly, for stationarity in the sense of 'does this model work across different regimes / time periods' or do I need to keep (over)fitting it for more recent data? Clearly it depends on the strategy. For slower systems and lower SR systems like I use, the source of profitability seems to be fairly consistent and it's appropriate to use as much data as possible. Some people have used hundreds of years of data to test trend following. A HFT firm will use a few months of data, tops. Everyone else is somewhere in the middle.

    GAT
    AKA Robert Carver
     
    #33     Dec 4, 2020
  4. cafeole

    cafeole

    Thanks Robert.
     
    #34     Dec 4, 2020
  5. You can't change the distribution by just normalizing data. Risk adjusting hence also does not change the distribution of data. That is quite basic statistical knowledge.

    You can confirm the above in the following papers. But there are tons others out there. You can shift the mean by transforming variables but most statistical moments remain anchored to the underlying distribution.

    2016, “Improved Shape Parameter Estimation in K Clutter with Neural Networks and Deep Learning”. International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 3, No. 7, pp. 3-13.


    2015, “A Neural Network Approach to Weibull Distributed Sea Clutter Parameter’s Estimation”. Revista Iberoamericana de Inteligencia Artificial, Vol. 18, No. 56, pp. 3-13.


    2015, “Estimation of the Relation between Weibull Distributed Sea Clutter and the CA-CFAR Scale Factor”. Journal of Tropical Engineering, Vol. 25, No. 2, pp. 19-28.

    PS. I am not the author of above papers

     
    #35     Dec 5, 2020
    Kust likes this.
  6. Hmmm. I don't agree. I expect I haven't explained myself properly.

    Consider Gaussian returns with mean zero over 10 years. In the first 5 years the standard deviation is 20%. Then it falls to 10%. The resulting total distribution will be fat tailed, with a standard deviation of 15%.

    Now divide all returns by a standard deviation estimate from the last 30 days. The resulting distribution will be almost perfectly Gaussian, apart from the 30 day adjustment period which will produce a few returns that are a little too small.

    This also works quite well with real financial data.

    GAT
     
    #36     Dec 5, 2020
  7. Hmm, I have to disagree with you there. I think you are conflating the non-normal nature of the distribution with stochastic/uncertain volatility. You can rescale the returns by the expected volatility (e.g. I rescale returns by the ATM implied vol for some studies) and that does create a historically-consistent set of returns. However, the resulting distribution will still be fat-tailed and skewed.
     
    #37     Dec 5, 2020
    YuriWerewolf and DiceAreCast like this.
  8. Exactly. You can through a transformation shift some of the properties of the distribution but that does not make the actual distribution of underlying data standard normal. A perfect example is the change of measure that underlies the derivation of BS. The change of measure shifts the mean in order to apply mathematics that require mean and variance to match a standard normal distribution but that does not mean a non normal distribution has been transformed into a normal distribution. If that was the case we would not speak of a "risk neutral probability" or "risk neutral world". Steven Shreve's book on stochastic calculus is an excellent reference. For the amateur "Heard on the street" by Falcon Crack explains it to the layman (or should I say MBA graduate, I am not saying this condescending, I think the prep book exactly targets MBA grads). Keywords: Girsanov and Radon-Nikodym Theorems. Though, for details I would probably have to undust some of my textbooks and look it up, I have not worked in this space in ages

     
    Last edited: Dec 5, 2020
    #38     Dec 5, 2020
    YuriWerewolf likes this.
  9. Here is a quick example. Taking returns of SP500 since 1990 and rescaling by VIX (which is a superior predictor of realized volatility) produces the following Q-Q plot: Screen Shot 2020-12-05 at 2.00.23 PM.png

    here's python code for those who cares:
    Code:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    from statsmodels.graphics.gofplots import qqplot
    from scipy.stats import shapiro
    from scipy.stats import normaltest
    
    def dtidx(x):
        idx = pd.DatetimeIndex(pd.to_datetime(x))
        return idx.tz_localize('UTC').tz_convert('US/Eastern')
    
    def read_yahoo(name):
        o = pd.read_csv(name + '.csv')
        o['Date'] = dtidx(o['Date'])
        o.set_index('Date', inplace=True)
        o['Return'] = np.log(o['Adj Close']/o['Adj Close'].shift()).fillna(0)
        return o
    
    vix = read_yahoo('VIX')
    spx = read_yahoo('SPX')
    df = pd.DataFrame({'r':spx['Return'],'v':vix['Close'].shift()/1600})
    df['q'] = df['r']/df['v']
    
    qqplot(df['q'].dropna(), line='s')
    plt.show()
    
    stat_s, p_s = shapiro(df['q'].dropna())
    print('Shapiro=%.3f, p=%.3f' % (stat_s, p_s))
    stat_d, p_d = normaltest(df['q'].dropna())
    print('D’Agostino’s K^2=%.3f, p=%.3f' % (stat_d, p_d))
    
     
    #39     Dec 5, 2020
    eternaldelight likes this.
  10. Ha, when I saw the graph I was about to say "we spotted us an R lover" but then I saw your python code :D

    By the way, as various topics fly all over this thread already, what do you recommend as basic data holding structure for time series data that need to be manipulated (such as compressing, slicing, pruning...) in python? Pandas or Numpy or both in combination? I usually get my data already preprocessed and cleansed straight out of kdb.

     
    Last edited: Dec 5, 2020
    #40     Dec 5, 2020