Is the medium "the medium for the last n innovations"? If so then isn't the value of n going to heavily affect the result? Here are some sample runs with 1000 innovation, median length 40, repeat 10x. POS:24.75 NEG:25.75 POS:25.58 NEG:25.62 POS:26.37 NEG:24.77 Discarding the fact I may have coded this incorrectly, I'm not exactly sure what this proves?
I try to be clear, but so far, I see I haven't been. Some things should be readily obvious to people who have familiarity with time series and statistics. Couple of things. 1) Every number a, b, and N observations (time values) should all come out positive, as a and b are a count of outcomes meeting rules shown. So, if someone gets a negative value for the result, there is something wrong with your interpretation or my description. 2) I wasn't too clear on how the ts sequence was generated. I never said it was a cumsum random walk. I simply said it was a random gaussian value generated at each time instant (distribution parameters are fixed). You can think of each step as an innovation if you like, although in this case they are not innovations. 3) The series should be stationary by inspection. If not, something is wrong. 4) The median (not medium) is the median of the entire time series you run, 1000, 100000, 1million, whatever; doesn't effect results. More the better (central limit in action). That's the beauty. Cheers. clue: what you are trying to verify is that the frequency of occurrence of outcomes generated by your rules are greater or less than 50%, and by how much, and whether the result applies to any randomly generated sequence of arbitrarily large length described in the last 2 posts.
Ok, that clears things up a bit, here are some runs... POS:37.41 NEG:37.84 POS:37.455 NEG:37.535 POS:37.3067 NEG:37.5267 POS:37.3 NEG:37.68 POS:37.336 NEG:37.602 POS:37.48 NEG:37.5167 POS:37.5314 NEG:37.4857 POS:37.5038 NEG:37.44 POS:37.5078 NEG:37.4411 POS:37.533 NEG:37.496 Also, attached is a single run, is this looking correct now?
No. Blue-looks ok. I don't know what the red line is, but if it is the median, it should be flat. There is no windowing in my description.
My results are multiplied by -1 but confirm bias. Is my logic in excel correct? Sim runs for a 1 min or so on my pc.
Stupid question, the median is going to pretty much be zero for any stationary time series right? So I replaced 'median' with '0', but I still get pretty much the same result...just re-checking the other stuff now. Edit: Here is the code int main(int argc, char *argv[]) { int pos = 0; int neg = 0; int num_trials = 0; for(int z = 0; z < 10; ++z) { for(int j = 0; j < 10; ++j) { double y = 0; double last_y = 0; for(int i = 0; i < 1000; ++i) { last_y = y; y = Random::GetGaussian(); if (last_y < 0 && SignOf(y - last_y) == 1) ++pos; if (last_y > 0 && SignOf(y - last_y) == -1) ++neg; ++num_trials; } } double pos_chance = pos / (double)num_trials; double neg_chance = neg / (double)num_trials; std::cout << "POS:" << pos_chance * 100 << " NEG:" << neg_chance * 100 << std::endl; } }
I don't know what your logic is; it should be the rules I described. The results should converge to a stable value (much as you'd expect percentage of heads to asymptotically converge to 50% after many fair coin tosses).
Not necessarily. It could be any value, but should be constant. Anything outside of zero is considered bias, but doesn't effect the results. You are concerned with dispersion around the median (and remember, reversion). I think you can get it. But so far, results are not at all what I expect.
Ok, I haven't go though all the post from top, if someone else mentions about fluid dynamic design with linear regression analysis, hat off to you. Fed, ECB and other central banks designed a pond with cash as liquidity, interest rate is very much predicable, I think it is possible to predict impact of liquidity flows from either stock , bond or commodities markets. The difficult part is you have to have accurate data for input, however, a lot of data is either manipulated or delayed, which will make your analysis flawed from beginning. To reduce these problem you need man power to collect your own data.
Ok, reworked it a bit... Code: int main(int argc, char *argv[]) { int pos = 0; int neg = 0; int num_trials = 0; for(int z = 0; z < 10; ++z) { for(int j = 0; j < 10; ++j) { std::vector<double> run; for(int i = 0; i < 1000; ++i) { run.push_back(Random::GetGaussian()); } double median = Utility::Median(run); for(int i = 0; i < 999; ++i) { if (run < median && SignOf(run[i + 1] - run) == 1) ++pos; if (run > median && SignOf(run[i + 1] - run) == -1) ++neg; ++num_trials; } } double pos_chance = pos / (double)num_trials; double neg_chance = neg / (double)num_trials; std::cout << "POS:" << pos_chance * 100 << " NEG:" << neg_chance * 100 << std::endl; } } Results (still the same): POS:37.4975 NEG:37.6577 POS:37.4775 NEG:37.7477 POS:37.5909 NEG:37.5742 POS:37.6451 NEG:37.5501 POS:37.5275 NEG:37.4635 POS:37.5209 NEG:37.4241 POS:37.5189 NEG:37.4432 POS:37.495 NEG:37.4862 POS:37.4797 NEG:37.4497 POS:37.5095 NEG:37.4625 I'm obviously still missing something...