The stats thread

Discussion in 'Options' started by TheBigShort, Dec 6, 2018.

  1. TheBigShort

    TheBigShort

    Hey everyone, I thought ET might do well with a stats/betting thread. Although Cross Validated is very helpful with answering my questions, sometimes non-financial data scientists have a hard time connecting the 2 fields. We are lucky enough to have some very smart people on this forum who I and many others would love to learn from. For obvious reasons, I will be hiding some of the variables used in my models going forward.

    A couple days ago I came across an interesting variable that did a decent job at predicting the 1 month implied vol vs what the market actually realized in the following 30 days (SPX). log(IV t0/RV t1). The variable is the interest rate swap vol (SRVIX). Here is the first model we have. The data is from 2012 - Yesterday.

    iv_rv = log(IV t0/RV t1)
    TenYearVol = SRVIX.Index
    iv_rv ~ TenYearVol

    Plot1 = regular graph
    Plot2 = residuals
    Plot3 = QQ plot
    Plot4 = summary

    Screen Shot 2018-12-06 at 1.35.50 AM.png

    Screen Shot 2018-12-06 at 1.37.12 AM.png

    Screen Shot 2018-12-06 at 1.37.54 AM.png
    Screen Shot 2018-12-06 at 1.38.56 AM.png


    From looking at the residuals, we can see lots of heteroskesdaticity, and the qqplot tells us that we have some heavy tails, so the distribution is not normal. So I did 2 different transformations, the first was a boxcox, the second was to use a general linear model with a gamma distribution. The gamma distribution was a better fit so I ended up going with that. Here are the stats. We also have a quasi R^2 of .20 (1 - residulas/Null).

    Screen Shot 2018-12-06 at 1.50.22 AM.png
    Screen Shot 2018-12-06 at 2.03.20 AM.png

    Screen Shot 2018-12-06 at 2.03.41 AM.png
    What do you guys think? Is this trade-able? Maybe not enough data?

    For the interested, I added a dummy variable, where 1 = SPX was above SMA50 and 0 = SPX was below SMA50, it only marginally increased the R^2.

    Screen Shot 2018-12-06 at 1.59.12 AM.png
     
  2. gonna have to dust off my "Math for Data scientists" lecture at my post grad program or you an just wait for sle,dest, TomM and a couple more quants who are so capable in answering this... :)

    One thing I would suggest initially though is to try this on a less liquid, less efficient ticker that you think can also be influenced by your variable and see if you get somewhat similar results.. then it gets interesting.
     
  3. While this is a bit beyond my pay-grade... I do like posts like this. (Thank you!)
    Curious if some additional clarity may be extracted if the data were segregated into periods where the IV clearly missed the event, and "more normal" periods where the log(IV/RV) was "relatively well behaved". -- Initially ignore the periods where IV underestimated (perhaps only consider periods of Contango as an approximation, or only times with positive value of log(IV/RV)).
    My assumption with this separation of the data, is that the IV cannot predict the unknown, so remove the large unknowns from the equation.
     
  4. TheBigShort

    TheBigShort

    Almost didn't recognize you with the new display!! That's the end goal, is to find predictors of less liquid underlyings. However I thought I would get some statistical advise on a more liquid underlying where the data is much cleaner.
    If I remove the outliers the data looks much cleaner, but I am not to sure it's right to remove them as it significantly changes the slope of the line (expected value). I'll post a photo of it later this evening!!!
     
    stepandfetchit likes this.
  5. Get a vacuum cleaner.
     
  6. TheBigShort

    TheBigShort

    I am trying to de-jump earnings in implied vol for backtesting purposes. Any easy ways to do this? I have the rolling 30 day implied vol and the rolling day 60 implied vol + all the earnings dates