I agree, but I think that applies mostly to a corporate team-oriented environment. (unless of course if you have a personal super duper parallel system that receives multiple price feeds and need to evaluate & execute ultra short term strategies in realtime ) If you're just a lone ranger, I think the most important consideration in deciding whether you need a db is your querying needs - that is, if you need to selectively read in only chunks of your data at a time. That's usually due to: 1) Lack of main memory: e.g. you want to evaluate a strategy on intraday data that spans several years and your memory can only store one week's worth of data at a time. 2) The nature of your strategy requires you to select subsets of data at a time e.g. your strategy involves choosing only the top 100 most active stocks from the Russell 3K each day. Otherwise, like Gringinho said, I think the easiest and fastest way to go is to store your data in flat files and then stuff everything into main memory when you need to work with them.
I would rather use some form of Db access class like ADO.Net than write tons of functiosn to manipulate flat files. ACK! That woudl mean you gotta move back and forth in the file for backtesting purposes with dates etc. geez. I have not used ADO before. But is it true to say if you want to grab mm1/dd1/yyyy1 to mm2/dd2/yyyy2 would be trivial in ADO once you have stored your data in date OHLC format>
I strongly believe you actually mean that you would hold any data you would test or manipulate - in memory - and NOT on file or database for individual record updates (yuck!!). So the manipulation bit get's fairly equal, while the storage part is different. Well, that is if you don't need transactional security; do you ?!?!
Safety .... Well yes but.... It sure is easy to just run a script to import data into a database system and then, once it is there declare who can access the data. Pretty simple to back up too. Having everything in one central and secure place is a lot easier in my opinion than manipulating many seperate files .... just my opinion. ... And by the way, lots of database reporting achitectures exist to house "seldomly accessed data" if the data itself and any enhancements made to the data are of value ....... In my development I only work with data loads to memory outside of a database system if there is no other choice due to performance issues: Always the preferred solution is centralized data storage in a modern database system and unless your app is just a hobby for one person you wont find many people happy with a solution based on flat files - unless there is no other choice due to budget or technical issues.
Sure, IBM DB2 Universal Server and data-mining addons are well suited e.g. Proper backups and storage are of course the same responsible attitude required for all types of IT work - regardless of StorageWorks libs, tape-robots, CD-RW, whatever. I think this is the preferred way for all institutional, corporate users. For a one-man-show operation, I think running and updating an advanced DB solution reuquires a lot of work and a lot of DBA knowledge. I used to admin a VoIP call-record Oracle 8i db on SunOS, as well as a Oracle-based backoffice system for a travel agency network on HP-UX. It was a lot of data, batch-working and not at all trivial. I spent a lot of time and effort on getting the right performance out of the systems, using striped volumes and all kinds of platform tweaks and patches, not to mention all the SQL-hints and tweaking of indexes needed as data gets bloated. My experience tells me that I will not do the same for a personal trading system - not in a million years. Those who rely on professional consultants and need the other DB-benefits could of course do it, but it's just too much overkill for simple backtesting and strategies for a single trader.
Well ... Perhaps.... Even for a lone trader that works with a lot of strategies and who knows how to deal with sql it should not be problem, even with 100's of Gbytes of data. Most kids out of college can manage it with todays tools/systems. Just depends upon what you want to do...... and where you want to go. If all you ever want to do is work by yourself than anything is OK. And I agree, working with a database can be a bottomless pit of time if you dont know what you are doing or if the people that designed it didn't know what they were doing ....
Tick data is hard to work with efficiently no matter what data structure you use. IMHO, the only reason to stop using a transactional database as your backing store is if the total size of the database gets out of hand. So it depends on time resolution, number of symbols and how much history you want, and how much you want to spend on disks and backup media. So I guess we're back to "it depends on the application." It's not a matter of computing power, it's a matter of statistical validity. The institutionals do not calculate the covariance matrix of the S&P 500 because it would not be statistically valid. There aren't enough degrees of freedom. Using more history gives you better statistical validity but worse predictive value since markets change over time. Roughly speaking, you would need 10,000 data points per stock for a valid covariance matrix of the S&P 500. That's almost 40 years of daily data. Martin
You're so right! I forgot to mention that I always think about tick-data and products like ES, EuroFX etc., but of course 1-min, 5-min or larger OHLC-data will not be so much of a challenge. Good point. OTOH, if you talk 20k symbols or something, you're getting back in there .. just like with tick-data.
From my journal: " ------------------------------------------------------------------------ 11 Nov 03 ------------------------------------------------------------------------ Testing MySQL. Populated with 101M intraday records. Time 5.3 hours, disk space 4.3G, index 2.4G. Query to select 22000 records - 300 sec. Query to select 1600 records - 3 sec. Added 2 indexes ('open' and 'price') - 4 hours, 4.3G disk space. "SELECT * FROM `rtquotes` where date='2003-11-06' and open>200 and open < 210" - 300 sec Conclusion: not suitable (too slow) " Same query would cost me only about a second with flat files.
Effective caching in memory will ALWAYS help. Yes. See this post for information on effective algorithmic improvements: http://www.elitetrader.com/vb/showthread.php?s=&postid=594912#post594912 Are you talking about profiling? One should always profile their code. Here is a nice and inexpensive profiler for EXEs (with or without embedded debugging information): http://www.lw-tech.com/