Just a little note on this. The level of programming is going to be somewhat different than what ordinary programmers might expect. That is, unless you're a system, performance tuning expert. The code right now for the "Bars" class works to satisfy 2 very critical requirements. 1. Make it very simple to use them in code. And 2. make them screamingly fast to fabricate from ticks (or minute bars as an optimization). So I started out with all of this originally with a purist object oriented approach and using interfaces, etc. That's because the wisdom is to code smart first and then tune for performance after. Well, it required refactoring those classes repeatedly and the whole object graphs that construct the bars from the ticks so that it squeeze every last ounce of performance from the compiler. I'm certain that most purists will have questions about "why did you do this? or why did you do that?" But I will recommend, in general that unless you have specific training and experience in that area, then you primarily assess their worth at meeting the 2 requirements which are, are they fast? Are they easy to use in coding customer strategies? As an aside, 10 years ago I interviewed for a job as an architect. After successfully passing all the interviews, they took me to the CEO (realatively small company) he asked me only one question.... "Which is more important, satisfying the customer requirements or making the best technology decision when those are in conflict?" As any good interviewee I tried to tease out of him what his opinion was but he was mute except to repeat his question. So, I said, "of course the best technology solution". And he said, we'll hire you but not as an architect. And that was the end of the meeting. The Business Manager explained the story behind how he and CEO formerly worked at the same company and had opposite philosophies. So the (later to become) CEO stuck with the company while the (later to become) business manager left out of disagreement with the company over design/technology decisions. The result as the the CEO became a millionaire and owner of his own firm while the other works for him as business manager. And that educational experience nudged me firmly in that direction too. Made me a much better architect too. So let's please don't overly dwell on the code design but focus on how well it works, and the other things you mentioned like versioning and unit tests. There's no question the code can improve. But when it comes the engine which has good on the "critical path" let's call it. I want to review all of that myself--any ideas for changes--because that's why this project is called "TickZOOM". Hey maybe let's keep the last 4 capital letters for emphasis. Sincerely, Wayne
Big, By the way, no offense to db40. That's alot faster than the relation databases I tried. Even after spending week and optimizing mysql so it did in memory mapping and using the different higher performance file storing that doesn't offer transactional rollbacks it still took MINUTES to load a few hundred thousand ticks. So db40 is certainly fast. The problem is, I have found that ticks are a VERY different beast than any other type of data I ever saw in the business world. We're usually impressed with a relation database that has a million rows of data. At one company were I worked they had THE largest oracle installation in the world. And they still only had millions of rows in their tables. They had to switch to teradata finally to upgrade their ability to handle large data sets. But in tick data, you're talking about hundreds of millions or billions of tuples or rows of data. And they're all very small (when compressed). It's been very challenging for me because I've working on optimizing moving large data around like images at real time speed. It's a different story entirely when each entity is only a few bytes. Also, it takes a long time of studying and experiment to learn the CLR for .NET and it's idionsynchrasies for performance. Many of it's cool features are TOTALLY off limits because of all the instructions generatd by the compiler and horrible tick speed if you use them. Just take delegates as one example. I thought they were cool. In C and C++ we used function pointers as a big performance boost. They're horrible performing in C#. The list goes on and on. Optimizing code is very hard work. Sicnerely, Wayne
I think you misunderstood me quite a bit. Load time is instantaneous. I was talking about "save" time, ie, how long it took to save all those in a db. The thing is, this save time is going to be pointless since, for historical data, you only do it once, and not during your normal usage, but beforehand. Once your historical db is done, then you never ever save anything back in it. You just do queries. That's why I don't mind at all even if it took 10 hours, because once it is saved, that's it, you have your db, you can now use it.
Cool, can you so 100+ million to get a more quantitative idea of the load time? If it's can load 100+ million in less than (or even close to) 15 seconds like TickZoom, than I'll switch TickZoom to db40 RIGHT NOW!!!! I'll always to anything the speeds up load time. It is often the case in performance testing that you must increase (greatly) the size of the test to get a clear idea of the speed. I can wait to get precise load time numbers. I would be very nice to have the querying ability. Oh, what's the max db size? How do they handle multi Gig databases? Sincerely, Wayne
Big, There's another issue with db40. I called to speak to their sales. I'll try again tomorrow. They don't publish the price. Unfortunately, it's not free exactly. It's a dual license just as I propose. They use GPL and a commercial license. That means if my license plan works we have to pay for it. If that's reasonable, fine. Trouble is, I think our need for a database is minimal. But you can convince me otherwise. What kind if querying do you think we will need? I can only think of date/time range and symbol list. I already have a way in mind to implement that and it won't cost anything other than a day of my time to do it. If db40 is high dollar or the they $$$$$$ in they eyes at us incorporating it in a product then we may need to consider a different db or just do it ourselves since it's so easy for our needs. As far as querying, there's very little useful to query off of ticks. Since they're not computed into bars or anything else useful. By the way, processing the ticks in TickZoom goes as fast as loading them so I do that parrallel in a separate thread so it makes it appear that processing doesn't take any time at all. Anyway, I can't imagine db40 loading any faster than TickZoom since it reads directly from the binary bytes in the file into the object. There's no way to improve that except when where loading multiple instruments or files, they can be done in separate threads. If it's just symbol and date range, I'd rather do it myself. What other kind of querying do you imagine we need? Anybody? Sincerely, Wayne
On the licensing issue... I'm not sure what the discussion is. What's the money for? I have no problems paying whatever fees, since a few thousand is just a rounding error at the end of an active trading month. I think that will be the case for anyone who deploys "real" applications out there. But again, what's the money for? Is anyone looking to get rich off supporting or developing the platform? I personally am just looking for a platform that lets me get rich by trading my strategy. From looking over the thread, that seems to be what motivates most people. So I say, punt the subject. Release it GPL so we all know we're contributing code to something we'll have source code access to. And if/when there are enough users out there that people will pay for support... I think support organizations will just spontaneously create themselves, and that's just fine for me. Linus didn't start by laying out the framework for Red Hat, he just wrote code.
I'd definately like to help out with this project as you get it off the ground. As for db40, so long as you release your code under the GPL you don't have to pay for a commercial license. Also, have you checked into using sourceforge.net for hosting your project?
Heech, thanks for that perspective. I was just coming to the conclusion that nobody cares about this issue. And you proved it. So I will start out TickZoom with GPLv3 as you say. And it'll be clear up front the intention to split into a dual commercial and GPL down the road with a written commitment to always keep the code open source. You might think GPLv3 requires always staying open source but that is up to the owner of the license. Most projects have committers surrender their rights to any patches so while it's GPLv3, it's owned by the company so they can change the terms later. Unfortunately, some projects committed suicide by changing their license totally to commercial. That pissed off all who had contributed to the project. So we must protect both the community and protect the ability for a commercial interest. It'll be quite some time before TickZoom is battle tested enough to go commercial anyway. For example, I just read db40 took 3 years to do so. Sincerely, Wayne
Wayne, Great, sounds good. And the tick database issue... again, that to me sounds like something the community of developers can work on in parallel. I don't see why the project wouldn't be able to support multiple databases... I assume you have some kind of database working now, and that's good enough for me. I'm biased... I'm only working on minute bars, so I really don't care about the tick bar issue... but I'm probably not the only one. And I'm also greedy because I "feel" like getting started on a new platform as soon as possible. So, here's hoping you'll plant that flag in the ground soon.
Heech, I'm going to do it right way. I found a great service (well recommend) for only $15/month. They host the wiki, bug tracker, and Subversion all on the same server. The wiki and bug tracker are a tool called Trac. Oh it also include web based code browser with color coding. What's VERY cool about is that all the features integrated. In other words, the wiki pages (for documentation) are version controlled with the code in subversion. Also, the posting issues/defects/ you can use Wiki market to include links, etc. Finally, when committed code, if you simply paste the issue numbers in the comments, it'll automatically change the status in the issue database. ALSO, they have dual SVN instances running in cluster so if one fails it instantly fails over to the other one. So those databases on separete raid databases. By the way, that's the configuration SVN recommend because everything is always replicated (backed up) continuously. so there's never any chance of losing code. All in all. I'm amazed. And they have tons of customers. Don't you agree that's a killer cool development environment? I'm sure you have limited time to work on this like me. Who wants to deal with lost repository code and down servers. Anyway, I'll get that up soon. I have the forum setup already but no point mentioning it here yet, or is there? Wayne