I suppose if your focus is to only analyze intraday activity (with a specific window in mind), it might make sense to remove the gaps and thus any type of bias your concerned about. Ultimately, it really depends on what the end game is. Once you've identified more specifically what you are looking for, it might make sense to observe both cases and how one might unnecessarily bias or omit any useful results. Here's an example study with gap omission: "This paper compares various measures and forecasts of volatility in equity markets. In the absence of overnight trading it is shown that the daily volatility is best measured by the sum of intraday squared 5-min returns, excluding the overnight return. In the absence of overnight trading, the best daily forecast of volatility is produced by modeling overnight volatility differently from intraday volatility" http://www3.interscience.wiley.com/journal/92013893/abstract?CRETRY=1&SRETRY=0 Many of the papers that I've observed tend to sum all of the 5min squared returns for one day and use that as a proxy for daily vol.
Ok, so lets remove the effect of gaps. How should we code this up? We don't need anything special or tricky, just something brute force, i.e. simple-stupid at this point. We're still on point 1 here, i.e. understanding vola. clustering. I think after we get this gaps issue out of the way we can come up with any and all explanations as to where clustering generally occurs, what its frequency is, i.e. is it periodic (if at all), does our approach need normalization?, is it related to the moon's phase, jupiter's phase, yada yada yada, and, eventually how to potentially trade those points (i.e. our hypothesis). So, how do we go about getting rid of those gaps? (all ideas welcome from everyone) Mike
Mike- Thanks again for your efforts here. Any chance we could see that indicator in excel or a psuedo code explanation? Hate to ask for more, but i'd like to contribute and have little more than a vague idea of what that TS code is doing...
If you can utilize time in your equations, simply take that time period and remove it, or if you can't remove it, maybe substitute the prior sum or add the average of previous 12 periods so as not to have too much of an impact.
Talon- Once these minimums are met, how do you compare the "quality" of two tendencies? For example, say we're analyzing two events - Event1 and Event2 - but only want to include one in our system. The x-period detrended returns are: Event1: mean = 1.1%, stdev = 2.0% Event2: mean = 1.0%, stdev = 1.5% If we traded these events as systems in and of themselves with equal position sizing, we'd expect to make more money with Event1 over the long run. However, expected drawdowns would also be larger. Alternatively, if Event2 were selected and position sizes scaled up so that the expected Max DD would equal that of the Event1 system, profits would be considerably larger than with Event1. That said, I imagine combining either one of these events with others in a complete system would potentially dampen the volatility effects - but hard to say. Do you consider any risk adjusted or sharpe-ratio-like measure when reviewing tendencies? Or at this point is it simply a matter of trying to maximize the detrended mean return after costs -- and only later once all the parts are put together will you look at system vol, employing stops, managing drawdowns, etc.?
Just in case anyone is not familiar with volatility clustering (Heteroscedasticity) signatures, here is a graph showing signed cl-cl 1min returns on left, and squared returns on right. It should be fairly obvious that they do not have identical signatures. One is random gaussian generated, the other is true S&P market data.
OOPS ! Sorry all, I accidently posted the incorrect (not debuged) code set and chart. It was the end of a long day when I posted that original Please use the attached code as a template. As you can see, the difference is substantial. This code uses same bar sqrt((High - Low)^2) calcs, not close to close values, hence any bar to bar non-linearities are removed. This is a safe assumption for a continous intraday time series such as the ES. This is *not* a good assumption for non-liquid issues, but, it does the job of removing overnight gaps. The pseudo code is the following: 1. calculate sqrt((High - Low)^2) for each of the last 12 bars on a 5min bar. 2. Sum the 12 values from calc 1. Thats it...
Hi Mike, I don't think this is what you meant to do. Taking the square root of the square is just taking the absolute value. If you want to take the square root of the sum to reduce the impact of outliers I think that would make sense, but there is no point in doing this before you sum them. Also here's a couple of observations/questions: I haven't given it much thought before, but I noticed this and similar measures have an undesirable property when you are applying them to price changes that can take either sign. Probably simplest to give an example: this will give the same vol if all 12 values are +2 pts, or if half are +2 and half -2, even though you probably want your vol measure to distinguish between the two market states. You could get around this by demeaning the price changes (like in the basic std dev formula) or taking logs, but these have their own idiosyncracies. Do you have a preference or are you still figuring it out? Also I don't really think you are capturing vol clustering here. It looks like you are just measuring vol over a rolling window, and your measure will be clustered by definition because each bar's vol measure has 11 datapoints in common with the adjacent bar's vol measure. If you really want to measure clustering you need non-overlapping vol observations and some time series model. But I don't think you need to for this purpose, I am just mentioning it because you keep talking about clustering. Edit: I just re-read your post that the goal is not trade filtering. I was trying to help with the calculations but I guess I don't really get what the goal is here so maybe I should back off.