If you're storing your data in smaller than one-year segments, read as many in as it takes to get a year. If they're > 1 year, you've got some awfully big files there.
I store tick events in binary files, separated out by date. Each security gets its' own directory...so for example all of my "SPY" tick data is in a directory called "STK_SPY". I load the files a few at a time on another thread, as a back-test progresses. 99% of the time, I am interested in looking at the data as a sequence of days...i.e. what happened between x and y dates, so this is very easy to implement these types of queries using this format.
CSV and binary files look to me like they require some extra programming efforts to achieve the flexibility of a database. Why not to use SqlServer or MySql, to store data for backtesting purposes. Backtesting does not require real-time performance so one of those relational databases would probably suffice. I've never used KDB but it's used for storing data in real time and that functionality is not needed for backtesting.
HDF5 and file system both give you the ability to group your data (grouped by date and subgrouped by ticker, like others have said) and store it pre-ordered by time. So if you only have one type of query, you can structure your data for that and completely avoid table scans or needing to sort anything. It is basically a straight shot from the disk. I have no experience with KDB but I would guess it is more flexible if you need to query data in many different ways. Any of this beats relational databases, which have no concept of pre-ordered data, so you can't even do a quick binary search lookup without a hefty index file.
they load differently. binary file -you just open it and load into memory. sql-to load you have to go thru each record. the difference in time and performance is huge. i remember in old times calculations with binary that take an hour can take like 5-6 hours when i was using ms access stored tick data
I generally backtest in R. I can load my data once, then backtest in a variety of different ways (or using different backtest variations, as your comment suggests) without having to read the data again. If I change my code, I just re-source it, but the structure containing my data stays present.
Lets take 6 months intraday tick data (only last trade, not bid and offer, and not market depth data), and lets take QQQQ, you can store the whole data in memory in R?