Joe Doaks' Data Analysis

Discussion in 'Data Sets and Feeds' started by Joe Doaks, Jan 20, 2008.

  1. I'm not sure which category this thread belongs in, but after you hear what I have to relate I think you will agree with me that it belongs here. Being a private sort, I have never revealed who I am or what I do. Let us simply say that I am an untenured assistant associate adjunct professor of electrical engineering at the Central University of North Texas. This far enough from the truth that you cannot identify me, but puts me into an accurate professional context so you may appreciate my creditials as a data analyst.

    The most peculiar thing happened Friday. The departmental secretary rang me in the lab and said "Dr. Doaks, you have a call from Chicago, a Mr. Goldengeldfarberstein, or something like that. He says he has a consulting job for you at whatever the going rate is for undistinguished podunk-U EE professors." Never one to miss an opportunity to make $10 an hour, of course I took the call.

    To make a long and dull story shorter if even duller, Mr. GGFS, as I like to call him, sent me an Excel file with three columns and 390 rows and said, in effect, "You come lowly recommended from the junior clerk intern of mine who fetches my coffee. He frequents some sleazy website called EffeteTrader or something or other and said you are infamous there. I can't tell you what this is, but see if it means anything to you. Tell me if there is any relation between variables A and B." Given Mr. GGFS's name, and the fact that he called from Chicago, and that he has an intern, I am thinking that he may be a moneymaker type. I did not just fall off the turnip truck! I know a bit about the world of finance!

    So I took the gig. The data plot is attached. I'll share with you what I find, since a bigshot like Mr. GGFS is unlikely to be lurking here and see it. I find it very peculiar that there ia a huge variation in B but very little in A over the sequence. Can anybody here give me an idea what it might be, if it fact it is financial data? I have some ideas for how to start, but it doesn't look promising to make anything of it.
  2. I cannot say that I am surprised at the lack of response to this thread. After all, lurkers typically outnumber members at any given time by at least 2:1, and judging from the quality of posts, lurkers must be smarter than members. Certainly they couldn't be any dumber. So on the chance that some intelligent lurker might enjoy this, I shall continue.

    The first step in the analysis is to hypothesize the meaning of the increments in the x-axis data, if only as a point of departure. Their equally sequenced nature suggests a scalar measure such as pumping stations in a pipeline or floor stops for an elevator. But given that my benefactor Guldengeld Farbstein, for that is his correct name, our departmental secretary having a North Texas accent and ears to go with it that would make a cow laugh, is most likely a financier, the x-axis probably represents increments of time, time being money to those types.

    So we shall assume that we are looking at two financial time series. And that Guldengelt suspects a causative relationship between them. Which if understood by him and not by others might lead to profit for him. Or perhaps he just has an inquiring mind. The first thing we note about the two time series is that series B has much more variability than does series A. I shall hypothesize that the process represented here is in the nature of a control mechanism, B representing an error correcting signal which is striving to maintain, however imperfectly, the constancy of the value of A. So we shall look next at the statistics of the two series.
  3. doli


    B is striving to maintain the constancy of A?
    Why is B so ragged and A so smooth?
  4. Doli, you are indeed a brave man. I would have expected no less of you. You are so smart you should be a lurker. A (red) is quite obviously the controlled process because it is relatively constant. B (blue) is obviously some function in the control loop which is regulating A. B might be the raw error signal measuring the difference between where A actually is and where it is desired to be. Or B might be the commanded position in a bang-bang servo, or it might be the rate command in a rate servo. B is less likely to be a higher order command like an acceleration, but we can't yet say. I am assuming that the big guy, for that is how I visulalize him and his wallet, engaged me because he wants a cybernetic analysis of whatever this process is.

    (As an aside to those of you who know me well, bear with me, I AM headed somewhere you will ultimately recognize the landscape.)
  5. The first thing we notice about the presumably controlled variable A's time series is that it is quantized with a minimum increment of 0.25. I say minimum because the incremental change in A can be 2 or more quanta. This suggests that the process controlling A is digital. It is less likely that it is a quantum mechanical process, although we cannot rule that possibility out.

    The distribution of values of A has a mean of 1860.89 and a standard deviation of 9.967. This yields a signal-to-noise ratio of 186.7, certainly not shabby, but suggesting that there is noise in the control system's measurements, or in the error correction actuation, or both. Or there may be an uncontrollable error source in the process, similar to an unmeasured flexure in the structure of a mechanical servo.

    The second thing we note about A's distribution (shown in the attachment) is that it is platykurtotic with a kurtosis of -0.489. This is decidedly sub-gaussian, possibly with a cyclical element. The skewness is +0.383, strongly non-gaussian and suggestive of complex noise processes at work. One thinks of the Rayleigh distribution and quantum noise. And of course of Mandelbrot. In any event there are outliers. I did not bother to calculate the higher order moments, as it did not seem informative at this stage.
  6. Now we turn to the statistical analysis of B, the presumed control function, with some disturbing results. Consider the distribution of B's values shown in the attachment. Unlike the quantized variable A, presumably controlled, B takes apparently continuous values. This would be highly irregular for a control system to be driving a quantized (digital or stepped) output process with analog signals.

    The minimum value of B is 227 (presumably just outside its dead band of control), and its maximum is 6716, a dynamic range of 29.6, actually quite low by servo standards. The mean is 1482.09, and the standard deviation is 962.97, so the control loop is quite noisy with an SNR of 1.54, so extraordinarily noisy as to be impractical. The distribution is leptokurtotic with a kurtosis of 5.89, so high as to suggest that the uncontrolled input is highly random. The skewness is 2.02, again the outliers. I shall report to my employer that if A and B represent a control system, ther is much room for improvement. B may not be controlling A at all. So I shall also investigate the alternate possibility that A and B represent an uncontrolled weakly causative process. I shall use correlation analysis for this. More to come.
  7. I cannot decide whether nobody gives a shit about what I am trying to show you, or if you are just too innumerate to understand. I suspect the latter. But for the sake of closure I shall continue.

    When you have two variables which you suspect may be related to each other by causation, or to a third unknown variable, a common approach is to perform a cross-correlation of the two. This consists of leading or lagging one of the variables and calculating the sum of the products of the overlapped series. If there is causation, and the processes have been sampled at the correct interval, lags or leads by only a few samples will suggest whether or not they might be related. Attached is the cross-correlation of A (the slowly varying variable) and B (the fast varying variable). Negative shifts reveal if variable B might lead variable A and therefore possible cause it's variation. Positive shifts investigate if A might lead B. The numerical values on the left are so low as to likely be random results, and it is unlikely that changes in B cause changes in A. Values on the right are higher, but not sufficiently so to suggest that A is a strong causative factor in changes in B. Thinking that the two series might be oversampled, I tried cross-correlation at intervals of five samples. I found moderate correlations on the order of +0.5 in a broad range around 200 shifts, but the broadness of the correlation suggests only that there is some cyclicality in the two series. So there is no evidence of correlation between variable A and variable B.
  8. So I dutifully reported my findings to Mr. Guldengeldfaberstein from Chicago, and was humilated to receive the following reply:

    "Bloaks, you are a complete fucking idiot! You are dumber than the Wharton Financial Engineering PhD who fetches my coffee! Have you ever heard of Newton? No, not Fig or Wayne, I mean Sir Isaac. Did it ever enter your booze-addled brain to look at DERIVATIVES of A? And I don't mean futures or options! I hired you because a search on you shows you are an expert in state estimation of fast moving physical objects. And also because my inside man at the SEC says you have no citations against you, so you must not know shit about finance. I want you to be stupid and pretend that the data I gave you are PHYSICAL data! Analyze them using the physicality delusion! Every other fucking engineer I ever met does that naturally when he approaches the markets!" Back to the drawing board.
  9. doli


    Didn't I say that one was ragged and the other smooth?
  10. Sorry, to my eye the fast variable looks ragged, the slow variable smooth. The point my employer was making is that perhaps differentiating the slow variable will reveal something of interest. All will be revealed in undue time. And shown to be relevant to a certain trading system popular with ET. Sorry, this is a feeble attempt at a Platonic dialogue in math. It's been done before, but it's never been pretty.
    #10     Jan 22, 2008