Earnings journal

oldmonk · Sep 24, 2019

I don't know about long vol on those trades, but moves like those are why I like to buy wings. I agree that negative skewness in the return profile doesn't matter in the long run as long as the strategy has positive expectancy, but one loss like that could easily wipe out months of gains.

TheBigShort said:
I believe the math topic we are looking for is called "Extreme Value Theory".
More...

EVT is generally used from a risk management perspective, and requires a lot of data to be useful. With earnings we only have a small sample for each stock, which makes it very difficult to statistically identify potential outliers. Best bet here I think is fundamentals, though that would require digging through troves of filings and news articles to identify outlier candidates.

Magic said:
Regarding high implied, one of the things I was puzzled by is that earnings seem to generally normalize at a fixed spread slightly above or below the actual moves.
More...

I've noticed the same thing, which is why I initially thought of double calendars to play high IV earnings, with the strikes placed close to the implied move.

Magic said:
I initially would’ve expected a wider margin on the more volatile ones because the shorts are taking on more variance. Like 9% implied over 8% avg move is a different trade than 3% implied over 2% avg move.
More...

This is something I'm yet to find a good explanation for. Intuitively the best short vol candidates should be the stocks with the highest implied moves, but turns out to be the other way round due to the reason you mentioned.

I've always been skeptical of long vol earnings trades due to my backtest results. Maybe that's because I haven't found a good way to incorporate kurtosis into my analysis. I plan to work on that at some point, but for now I think I'll stick with short vol until I find something viable.

Kevin Schmit · Sep 24, 2019

TheBigShort said:
ATM Implied SD for ULTA was ~$25. So 3x 25 = 75. 337.5-75 = 262.5.
More...

Three SD for ULTA would have be closer to 272 than 262. Be careful to calc the SDs in log-price rather than in price. The difference will be larger for upside moves -- what might look like a 5 SD up-jump will actually be much less.

If we can build a model that tells us what companies are more likely to have larger tails that would be great. I believe the math topic we are looking for is called "Extreme Value Theory".
More...

Good idea, but might be more difficult than you believe. Standard EVT and anomaly detection methods honed on time-series data don't work that well on instantaneous cross sectional event data, especially when the cross-sectional data is inhomogeneous.

BTW, the second term of Bennett's formula for one-day expected return is just a complex way of writing eventVol*2/sqrt(2*pi) -- i.e. it's just the unitless atm straddle price. The first term shouldn't be in the equation -- for reasonable event vols it will be insignificantly different from one; for extreme event vols it will give absurd numbers (e.g. vol = 1000%, exp(vol**2/2) = 5.18e+21).

TheBigShort · Sep 24, 2019

Big day for us Magic! AZO and KMX barely moved! I'll post fill later

Magic · Sep 24, 2019

Yeah!! Was worried a bit about KMX pre-market. But the open was beautiful. Got out of AZO @ 64 when it started to move. Short was from 76.35. Sometimes I wonder how much of a tailwind declining vol / favorable circumstances in the index help our trades. Seems like when conditions in the market are benign these tend to do well. Perhaps that's why last season was a little rough; in part because of macro dynamics.

TheBigShort · Sep 24, 2019

Kevin Schmit said:
BTW, the second term of Bennett's formula for one-day expected return is just a complex way of writing eventVol*2/sqrt(2*pi) -- i.e. it's just the unitless atm straddle price. The first term shouldn't be in the equation -- for reasonable event vols it will be insignificantly different from one; for extreme event vols it will give absurd numbers (e.g. vol = 1000%, exp(vol**2/2) = 5.18e+21).
More...

Thanks for getting back to me on this. I thought QStackExcahnge was more active.

Kevin Schmit said:
Good idea, but might be more difficult than you believe. Standard EVT and anomaly detection methods honed on time-series data don't work that well on instantaneous cross sectional event data, especially when the cross-sectional data is inhomogeneous.
More...

Do you have another recommendation that might help me model the tails? Assuming I have 40 earnings dates for 500 equities. Maybe I could group by certain factors (mktcap, liquidity, industry, growth, etc...) and see which groups have the largest kurtosis?

TheBigShort · Sep 24, 2019

+1.50

TheBigShort · Sep 29, 2019

Going off the concept of shrinkage:
We look at an analyst Hi/Lo earnings estimates. We assume, the Hi estimate is too high and the low estimate is to low. We then model the stock's reaction to different surprises within that range. This sounds like a pretty neat idea to me.

So if analysts think the EPS (or Rev whatever explains more variance) is between $5.00 and $10.00 with an average of $7.50. Using historical EPS surprises and historical Jumps, we build a distribution of how the stock will react for any of the numbers lying on the line $5:$10. What do you guys think??

From what I am seeing so far, if the numbers come in above/below the hi/lo range, the stock moves are huge! So something else we could look at is how tight the hi/lo band is vs what it usually is or vs what the surprise distribution is. This could help us predict the cheapness of the wings.

PS. Companies usually surpirse to the upside. So you would have to scale to 0 mean and unit variance

PPS. How should I go about doing this? Let's say I get all the EPS surpirses for AAPL and their historical jumps. Should I just run a robust regression? I would like to incorporate some type of time varying volatility. The way AAPL would have reacted to a 10% EPS miss 15 years ago is different from how it will react to a 10% EPS miss today. How might you guys approach this situtation?

I will post a doc tomorrow evening with FANG Rev and EPS surprises vs Jump and Move so we can all be on the same page

Kevin Schmit · Sep 29, 2019

TheBigShort said:
How should I go about doing this? Let's say I get all the EPS surpirses for AAPL and their historical jumps. Should I just run a robust regression?
More...

Yes, but include other names as well -- last 20 quarters of AAPL earnings alone is not a large enough sample, even with robust/shrinkage methods. Obviously, normalize your inputs/outputs so that they are comparable across tickers. I suggest panel ridge regression. Diebold had a good blog post on this at https://fxdiebold.blogspot.com/2016/06/fixed-effects-without-panel-data.html . The math of ridge regression in a nutshell is covered at cross-validated here https://stats.stackexchange.com/questions/69205/how-to-derive-the-ridge-regression-solution -- basically just tack a square lambda * I matrix to the bottom of your predictor matrix and treat it as OLS.

I would like to incorporate some type of time varying volatility.
More...

Add ticker-specific contemporaneous vols to your panel of regressors. In a panel ridge scenario the actual regressor would be the interaction term, e.g. aaplVol * applDummy.

Kevin Schmit · Sep 29, 2019

TheBigShort said:
Do you have another recommendation that might help me model the tails? Assuming I have 40 earnings dates for 500 equities.
More...

Yes, approach this as a classification problem with extreme events as a minority class. You may want to model extreme up (vs total sample), and extreme down (vs total sample), separately.

Step 1: first make sure that the extreme events constitute a valid minority class in the space of your candidate predictors. Take predictors 1, 2, and 3 at a time. Transform to uniform marginals using empirical or analytical univariate cdf's. Draw minimal nearly-convex hull around extreme event points in your 1-, 2-, and 3-spaces. Do any of the hull comprise less than 60% of the lengths/areas/spaces. If so you likely have a minority class. If not try iteratively dropping points on the hull (drop the point that most reduces the length/area/space). Keep dropping points until you are retaining 85-90% of points. Does the space inside the hull now comprise less than 60%? If yes, you have a minority class. If not, either give up on this approach or add predictors and try again.

Step 2: assuming you actually have a minority class, check whether you class is a single class, or an aggregation of smaller minority classes, which is quite common. Take predictors up to five at a time (after rotating, whitening, and transforming to uniform marginals) and see if there is any natural clustering. Use dbscan (R) or other density-based clustering. Judge cluster quality by within-cluster scatter vs between-cluster scatter.

Step 3: as a preliminary model(s) use iterative LDA (if you found minority sub-classes use an ensemble of one vs. all rather that all in one model). Use the CCA method of LDA as you want to keep the zero-weight eigen-vectors (cannonical axes).

...

This should be enough to get you started.

Edit: forgot to mention that you should treat this as pooled cross-sectional data -- that is lump all names and earnings rows together in one large sample (with appropriate normalization of your independent/dependent variables of course).

TheBigShort · Sep 30, 2019

Last week I listened to Diebold for the first time - his lecture on "connectedness". It was a bit difficult for me to follow. His blog so far is much lighter!

Kevin Schmit said:
The math of ridge regression in a nutshell is covered at cross-validated here
More...

In ESL they do a fantastic job explaining ridge and lasso. https://web.stanford.edu/~hastie/Papers/ESLII.pdf Page 61:71.
Btw, do you know how I can keep up to date with Rob and Trevors, work?

I am currently tackling the ridge problem and the results are very good! I get an R^2 of .46 on the first run! I have 2 questions tho before I post my current progress.

From what I understand, the the dummyVariable is the ticker. So instead of AAPL, we have a 1 and instead of NFLX we have a 2. These end up being one hot encoded. That is what I am understanding from Diebold's blog.

Second, you mentioned one of the regressors to be (specific contemporaneous vol*dummyVariable). Are you saying create a new variable VolDummy and delete the original vol variable? In which our data frame would look like this*:

*For the actual dataframe, Vol will be scaled by ticker (it will be a Z score rather than a %).

Ps. I am using Trevors glmnet package for this task.