Understand, I guess. However, can you divulge whether the data mining solution you use is open source or not? thanks
Some OS; some not. If you really want to try to start learning a very capable OS tool for data mining, I'd recommend R.
dtrader, just curious if you feel R is better to delve into for data mining than rapidminer? I assume you do as you almost stated as much, but do the capabilities of R match-up to rapidminer? What would be some of the advantages of R? thanks
Between R and RM, R is much more flexible and IMO, easier to use. Rapid Miner does not have much literature last I checked, either. It takes a while to understand how to put things together on RM; and if you aren't already proficient in data mining concepts, you will have a hard time understanding how to put together a RM script. I can't think of anything RM can do that R can't. R, however, can do many things RM cannot. As an example,RM cannot solve an AR model parameter on a time series. R has so many capabilities RM lacks, it isn't even funny. R is at 1st a statistical tool, RM is only a data mining tool, with some minor statistical functions. And as I mentioned RM is a GUI resource hog; I experienced many crashes playing with RM-- never in R. Hope that convinces you. I gave an example of a NN for RM on the thread you mentioned. There are also many examples here and on google for R. A good layman's intro to data mining is Data Mining Techniques,Berry/Linnof. Get to understand the basic concepts, and then you can move to many, many R books. Like Data Analysis and Graphics using R, MainDonald, Braun. Cheers. P.S. If you do not have much programming experience, RM might seem simpler to use at it is a GUI. The problem is that it just is not that simple, unfortunately, and there are not many examples nor lit. to understand implementation.
Thanks for the detailed answer. Yes, you have likely convinced me. As I am not really familiar with the workings of R except for what you have mentioned and some that I have read, my big question is would I have access to some of the algorithms that are contained within RM without actually having to program them into R? The way I understand it there are many "modules" (not sure if the right word) out there for R that others have programmed, so maybe that would be my solution? thanks again for your help also, I will take a look at that book rec.
YES! Do not be afraid of R, if you are not a programming guru. Take a little time to play with it; buy a simple book to try examples: http://www.amazon.com/Introductory-...=sr_1_1?ie=UTF8&s=books&qid=1252269370&sr=8-1 I mentioned the 1st book, because it has many data mining function examples, but the above is a good intro. There are also many decent free intro sites on the web. R uses libraries and packages, which in turn contain functions already created for you. There are numerous examples available to play with. An example would be a time series package that contains models like AR, GARCH, etc.
Thanks for that rec as well. Funny, just a little bit ago I saved this one to my list at amazon. Also found this one as well: http://www.amazon.com/Handbook-Stat...TF8&coliid=I20NUHUT4UMPT2&colid=2JJB5F5TAZBVC so will get one of those. While I am at it, and as you are right, I am not programming guru, what language would you recommend to program ideas? Maybe R would do the trick for what I am looking for, not sure. I use VBA in excel right now, but I have seen a lot of comments that it is antiquated, not useful, etc. Not looking to create a full-blown backtesting engine, but just program out some of my analyses and ideas. Do you think learning Java, C++, etc. or something like that would be useful? thanks
Just found this as well: http://rattle.togaware.com/ So, I believe your rec. is the way to go. Seems to be a lot out there for R.
VBA has some limitations and some advantages. I won't argue the philosophies here. But I can tell you that building a full blown back-tester is not impossible in vba. I'll leave it there. If you have not done a lot of programming, I suggest to skip the lower level languages like c#/java etc. They are good at production/efficiency (when ms latencies count), but suck in terms of learning and implementing.