Could you explain why you are making this statement regarding SQL and time series? I'm interested to learn about specific problems. I would definitely agree if we were talking about processing streaming data, but from R1234's question it sounds he is just working on a static historical data set and asking about sorting data that could be stored in a single table... In my opinion a database would be more scalable than the Excel- or R-based solution than he was talking about, and still pretty easy to setup. Definitely no more difficult than R
...definitely more time consuming and costly than R. And why I said that SQL solutions do not lend themselves well to any time-series data is because of the relational database nature of SQL like solutions. There is a myriad of information out on the net about this very topic. You should consider columnar databases or binary data stores to handle time series. SQL based db solutions were not designed to handle time series based data well. There are some R packages that are highly proficient in storing, retrieving, and querying time series based data. Also memory mapped solutions, in memory db or file based. I can assure you that for proof of concept I will most likely design something around R in 1/10th the time it would take you with an SQL solution. Not because of our different skills sets, but because in R all it takes is the importing of a package, done. Just to install SQL takes a significant amount of time.
So you don't know R right now, but are thinking about learning? My two cents worth is I wouldn't use R for this kind of application, and I'd spend my time learning something else. Although in theory there is no issue, I've found there are problems with the way R allocates and then doesn't free up memory, particularly when running over loops. You need to write your code very carefully to avoid this kind of problem. As a relative novice you'll potentially waste a lot of time learning how to do this. Where I used to work both Matlab and python were used successfully in the implementation of exactly the kind of system you have. I have a well known preference for Python so I won't trot out the pros and cons once again. Other languages are of course available. I'd personally use a simple database for storage rather than say flat files, although as others have said this is more for robustness than to get more memory back. One more thing; backtesting this beast could be done very easily as a parallel process (since todays CS ranking has no bearing on tommorrows). I guess in a world where most people have multiple cores on one machine some paralleisation would be done by the interpreter or complier (I'm not an expert on this) but you still have to write your code in such a way to make it possible (eg list comprehensions in python, or the equivalent). Alternatively we used to run stuff on a big cluster where we had to make the parallel stuff explicit (using something like http://www.parallelpython.com/). Another option could be using something like https://www.quantconnect.com/ (no connection, never used it, but looks interesting) to get parallel computing power. GAT
volpunter, sounds interesting, also sounds like your experience with database systems comes from large enterprise environments... Installing mysql on my laptop took about as much time as installing R itself... and you don't have to find and install additional packages to create a simple flat table, load the data, and start querying it... You do have to give some thought to how to index it though... So I'm not sure I agree with the statement that the SQL approach takes more time to setup. But I do get your point regarding problems of SQL dealing with time series in large complex system environments, esp. where you have to deal with streaming data in low latency scenarios... I just don't think we are talking about anything that complicated for this particular example
agree with many of your points and precisely because OP's problem does not seem overly complex is why I recommended R. R lends itself to do some quick ranking of metrics. Python can get the job done, so can any other code implementation in different languages. I recommended R because the packages are there and one needs a 1-line command to provide access to the right package.
For what it's worth, I use both spacewiz's and i960's solutions: - postgresql with heavily indexed tables - sql for simple, frequently used queries - suck all the data into Perl when I need to perform computational intensive calculations. s/Perl/R|Python|C|C#/ depending on your preferences.... Use the solution that's best for the task. Multiple solutions for multiple tasks, if need be. Pragmatic people are not zealots... They prefer to get shit done instead of preaching their gospel is the one true religion....
I just read your post and thought maybe you had read my mind! I hired an R programmer last week to write me the program. He will be using parallel processing in his code.
That was your most valuable takeaway from the past 3 pages? Confused (but happy you got your answer you were seeking)
Multithreaded code will potentially make things faster but you should be telling said programmer to change the algorithm - that's where the real optimization is. Then again I don't totally know what the entire problem space is you're trying to solve. It could be other things unrelated to sorting.