Why use a database?

Discussion in 'Data Sets and Feeds' started by onelot, Oct 9, 2004.

  1. prophet

    prophet

    Excellent design.
     
    #61     Oct 14, 2004
  2. Prophet, I'm not saying tick data is useless, I'm just saying it's harder to analyze than bar data. The algorithms and data structures are significantly more complex than using bar data. That statement holds for every single algorithm you have described.

    Also, as far as I can tell, your algorithms are essentially converting tick data to fixed intervals prior to numerical analysis. If, in fact, the easiest way to deal with tick data is to first bin it into fixed time intervals then once again that pretty much proves my point.

    Martin
     
    #62     Oct 14, 2004
  3. If I have a nightly data analysis program that takes 10 minutes to run, I could care less about a 4 order of magnitude improvement. It works, it makes the deadline, and optimizing the code may not be an effective way to spend my time. Instead, I could be designing a new strategy and reusing my clean, well-designed, non-optimal code.

    Obviously there are other cases where performance matters very much. But the original poster in the thread was just starting to design their system. At that point it is madness to optimize against performance problems that may never arise.

    It is much easier to optimize clean code than to clean optimized code.

    Martin
     
    #63     Oct 14, 2004
  4. prophet

    prophet

    Yes, there may be a somewhat steeper learning curve. More advanced analysis always has a steeper learning curve. Significantly more complex? Certainly not. A little more complex, yes. However, none of the extra complexity matters once the basic data manipulation infrastructure is in place, and you are spending most of your time testing hypotheses, optimizing, trading the systems, etc.

    My algorithms only convert tick data into fixed intervals for purposes of generating meaningful covariances and regularizing performance statistics into per-hour or per-day intervals. The bulk of my analysis is based on per-hybrid-tick and per-N-hybrid-tick calculations and is never converted to fixed-time until after the trading is done and I have a performance statistic and need to calcualte a covariance or Sharpe ratio.

    You asked about calculating covariances over tick data. Just because I describe a simple method to convert tick data to fixed-time does not prove anything about the algorithms I use. It does not prove that fixed time intervals are superior. It proves the opposite, that tick data is more versatile and trivially easy to convert to fixed-time intervals when necessary.

    And that 10 minutes will never add up because you’ll only be running 10 minutes of computation per day right? Get real. Any serious optimization over a statistically significant amount of data will take a lot of time.

    You misrepresent the situation. No one is suggesting all code should be optimized against future problems. Proper optimization only targets the code that matters most.

    Regarding those just starting out... it is extremely foolish to design or code any program without a basic understanding of computational efficiency (algorithm complexity, caching intermediates, memory locality, etc.). If those concepts are too complicated, one should at least avoid deeply nested loops, i.e. reduce the polynomial order of algorithm running time. It is also trivially easy to determine what parts of code need optimization using a profiler. Repeated use of a profiler will train any programmer to write very efficient code without any additional effort. My top level code constitutes 90% of written lines, is all interpreted Matlab, yet constitutes less than 10% of running time. The 90% of running time is done either by a handful of highly vectorized Matlab lines and/or by a few MEX functions coded in straight C. None of these C functions are more than 50 lines. Many MEX functions are around 25 lines of C, with only 10 lines doing the actual calculations. This isn't very difficult given the performance improvements.

    Optimized code does not imply dirty code. Code can easily be both clean and optimized. Besides, we are usually talking about the critical 10% of code anyway... not a big deal to optimize. Leave the rest unoptimized.
     
    #64     Oct 14, 2004
  5. As far as analyzing tick data versus bar data, I think we've said all that needs to be said. Let the coder be the judge. :)

    Jeez... kids these days. If 10 minutes on a modern processor isn't enough to do real optimization, then clearly everyone analyzing the markets a decade ago was losing money. Too bad they didn't have Pentium 4s.

    If you think the markets have changed since then, well, not for me. I'm making good money with a swing strategy that runs in 10 minutes a night on an Athlon 64. In Python by the way... 100% interpreted language. Every application is different.

    OK... if less maintainable, less concise, and less understandable code is still clean, what does it take to make code dirty?!? Seriously.

    Martin
     
    #65     Oct 14, 2004
  6. prophet

    prophet

    Why do you make such silly arguments? Plenty of traders are making money without any computational optimization, both now and 10 years ago. Most of us system traders use our brains to analyze and trade, in combination with available computational tools, now and then. There were plenty of supercomputers 10 years ago being used efficiently for analysis, perhaps more efficiently than any of us program our computers today. Bloat is accepted today. It wasn't then. Markets were also very different 10 years ago. There was less automation/program trading to compete against. More dumb money too.

    I don't know whether to congratulate you or feel sorry for you here. It's wonderful that you have found a system that requires little computational time. On the other hand I feel sorry that you are content enough with the "good money" and "10 minutes a night" to not see the potential to ramp up the testing and perhaps turn "good money" into "great money". I hope your system continues to profit into the future, because when if and when it stops working, you may blame yourself for resting on your laurels, not allocating more than 10 minutes a day for optimization, simulation, or at least exploring new markets. Are you proud of your efficiency or your efficiency plus the obvious laziness it affords you?

    What code do I consider dirty? Buggy code, poorly written, undocumented and hard to read, poor designs... code that doesn’t achieve a very useful balance of correctness, performance and maintainability or readability. The primary purpose of code is to do a job correctly, sometimes as fast as possible. It’s readability and maintainability is also important, but often secondary to correctness and performance, especially in the case of optimization where the optimized code constitutes a small fraction of the total project, and can be easily rewritten.
     
    #66     Oct 14, 2004
  7. marist89

    marist89

    Give me a little credit.


    nope


    NUMBER(18)


    regular old b-tree


    composite.


    I could, but for this excercise, no.


    who said anything about order? If I had to order this beast, I'd have a sort_area_size of about 256M and it would come out in less than 2 seconds.

    Sure that would affect performance. But if I needed to get 850K rows out real quick I wouldn't design it so that it carried a lot of DML's, now would I?
     
    #67     Oct 14, 2004
  8. marist89

    marist89

    Not Windoz!
     
    #68     Oct 14, 2004
  9. To refute your silly assertion, I guess. You said that one cannot do serious market analysis in 10 minutes a day of computer time. Fortunes have been made and Nobel prizes have been won using a lot less computation than 10 minutes on an Athlon 64. Not to say that I'm making a fortune or winning a Nobel prize, but it's not like I'm going to get there by optimizing my code some more.

    Trust me, 10 years ago, people were complaining about bloat and reminiscing about the old days when people wrote tight code, just like you are today. It was also at least 20 years ago, when computers were a thousand times less powerful, that Donald Knuth wrote that "premature optimization is the root of all evil." The more things change the more they stay the same.

    Boy, you are making a lot of assumptions here. I'll just say that your assumptions are not correct and leave it at that.

    Most programmers goes through a phase where they think about programming as this samurai art of making their code leaner and meaner and faster than anyone else's. Then, if they spend enough time writing, debugging, maintaining, and reusing code, they usually figure out that the true lasting value of code is not how fast it runs or how clever it is, but rather how expressive it is, how well it bridges the gap between human understanding and machine execution.

    You're welcome to call that laziness, I really don't mind.

    Martin
     
    #69     Oct 14, 2004
  10. prophet

    prophet

    Sure the analysis might be valid, maybe even serious. Plenty can do serious analysis with a calculator, or all in their head. However, in a comparative sense ten minutes per day of quantitative analysis is not serious relative to what is possible with longer computational times. Analysis on 100 markets is more relevant than analysis over 1 market right?

    So? Fortunes and Nobels have been won using zero computation too.

    Do you believe scores of software companies, quants, hedge funds and financial institutions optimize their code for the fun of it? Would they rather wait years for an analysis to finish that might only take a day with optimized code? Or why don’t they just used recycled Pentium II PCs to cut costs?

    You initially asked how people made money in the markets quantitatively 10 or 20 years ago given that I say 10 minutes/day today is not a "serious" amount of computation. I then mentioned custom supercomputer codes for quant analysis, 10 or 20 years ago, with no bloat, most of which is still advanced by today’s standards. The point is that it has been done, and that a decent amount of computation is often helpful or essential for success. Yes there are exceptions. Plenty have done ok with just calculators, slide rules or hand drawn charts. However, anyone who attempted serious (eg multi market or tick based) analysis with primitive tools like PCs 10 or 20 years ago probably had a slim chance of success unless they had a cluster, or brought novel skills or data to the table.

    What is the relevance of this? This statement seems to address the quality or form of optimization, not the quantity of computations involved.

    Lean mean and fast can’t be expressive too? Here you go again making unsubstantiated claims, ignoring all of my arguments, and portraying the issues as black or white. Why go through the trouble if the code isn’t expressive and flexible? The whole point of using interpreted Matlab for 90% of my code is to achieve maximum expressiveness and maximum efficiency thanks to the optimized 10%. You never answered the point about only needing to optimize a fraction of code while the other 90% is highly expressive. It's a not a complete overhaul like you would have us believe. More like replacing a fraction of functions with optimized equivalents and avoiding deeply nested loops. How hard is that?

    And you are welcome to educate me on that point. I admit I could be wrong. I really don’t mind either way. It just seemed to me that anyone who says they do quant analysis, but is content with 10 minutes of computation per day and little optimization must not be terribly motivated to maintain or improve their profitability.
     
    #70     Oct 15, 2004