Most programs use only 1 processor because the task of rewriting code for multiple processors is both difficult and expensive. Most sequential type of optimization and backtesting algorithms are not exactly suitable for multiple processing. There is a neat trick you can do to circumvent that and Michael Harris has done it with his APS software. Basically, you run concurrently multiple instances of a program and each instance does a portion of the task. I was one of the first to request multiple CPU processing for APS and I got this solution in response. It works nicely. Maybe you should ask Tradestation people to do the same. Running the program for days to get results is not very productive.
Writing a multi-threaded application is not that hard nor is it all that new. Windows NT, back in '95 had SMP: http://msdn.microsoft.com/en-us/library/ms810434.aspx He's making you buy multiple licenses? That's not a solution, that's a rip-off. BTW, you might want to check into the scalability of SMP on Windows with the Intel CPU architecture. Additional CPUs do not give you a linear performance gain. Since you have to buy multiple licenses you'd be better off running APS on separate machines
You have to buy mutiple licences to make use of mutiple CPUs??!!! [edited: sorry, someone beat me to the punch above] "Most sequential type of optimization and backtesting algorithms are not exactly suitable for multiple processing." Could you elaborate on that for me? I'd like to hear of some use cases that wouldn't be suitable.
I know a 2.66 Dual core pentium isn't a 3.0 or better or an i7 but cant my processor use up to 4GB of ram with Winxp and I'm avoiding Vista right now? what exactly are you saying I need a quadcore or what? Given that Tradestation doesn't utilize multithreading how much of an advantage would there be with a quad or a 2.66-> 3.0 Dual core?
While the total memory space in a 32bit CPU is 4GB, the most you can use is 3.5GB due to I/O (video, disc, network, etc.) being mapped to the remaining 512MB. You need to move over to a 64bit CPU and OS in order to use more than 3.5GB of ram.
That won't work since the execution time will remain the same for a certain task for each of the machines. Writing multi-threaded applications is not that hard but it is not enough for improving execution time. You will either pay for the cost of the programming or for multiple instances of a program. Actually, I paid just an additional 20% to get the second license. Maybe you would like to explain this to us how it is done. Consider the trivial nested loop example below: x = 0. y=0. for i = 0 to 100 x = x+i for j = 1 to 1000 y = x+2j end end print(x,y) How do you make use of multiple CPUs to get the result faster? Maybe there is a way, I don't know. I am not a good programmer myself. On the other hand, for the following example it is much easier: x = 0. y=0. for i = 0 to 100 x = x+i end for j = 1 to 100 y = y+2j end print(x,y) You can do it as follows I suppose: Thread 1: x = 0. for i = 0 to 100 x = x+i end Thread 2: y=0 for j = 1 to 1000 y = y+2j end print(x,y) Now you have gained the execution time for thread 1, assuming thread 2 takes longer.
Implicit is the ability to split the data set or segment the problem space across machines. If so, there certainly will be an improvement in total clock time to complete the entire job. Uh, that's my point. An application that iterates multiple pattern matches across multiple symbols and costs $1,500 should already be multi-threaded. Charging you extra for what amounts to an amateurish hack shouldn't be tolerated in something that's advertised for trading professionals. Actually, there's a special term for this sort of problem and it's treated fairly well here: http://en.wikipedia.org/wiki/Embarrassingly_parallel
The biggest issue/problem with the Tesla architecture is that you have to re-design your software around their APIs. For some that's easy, but for many, it's probably not a good fit. If your software already is highly distributed and parallel without lots of locking, cuda might fit. Otherwise, it's a bit project to support it. Personally, it'd rather see a board with 100 386-instruction set CPUs where normal software would have an easier time of exploiting the parallelism. Edit: actually, 386 is not that important. What's key imo is a general purpose CPU with good compilers, preferably an well-known architecture such as 386, mips, sparc or the like, and a solid kernel on it.
I agree. NVidia labeling the Telsa as "general purpose" is misleading at best. It's not really general purpose, but here's a 64 core MPP you can get on a single-board: http://www.tilera.com/products/processors.php But, something like this, if it sees the light of day, would be much more economical power-wise and give you a 12-way cluster: http://www.theregister.co.uk/2009/05/15/dell_does_via_nano/