Tesla Personal Supercomputer by Nividia

Discussion in 'Automated Trading' started by LAtoLV, Mar 5, 2009.

  1. Most programs use only 1 processor because the task of rewriting code for multiple processors is both difficult and expensive. Most sequential type of optimization and backtesting algorithms are not exactly suitable for multiple processing.

    There is a neat trick you can do to circumvent that and Michael Harris has done it with his APS software. Basically, you run concurrently multiple instances of a program and each instance does a portion of the task. I was one of the first to request multiple CPU processing for APS and I got this solution in response. It works nicely. Maybe you should ask Tradestation people to do the same. Running the program for days to get results is not very productive.
     
    #11     May 18, 2009
  2. jprad

    jprad

    Writing a multi-threaded application is not that hard nor is it all that new. Windows NT, back in '95 had SMP:

    http://msdn.microsoft.com/en-us/library/ms810434.aspx

    He's making you buy multiple licenses? That's not a solution, that's a rip-off.

    BTW, you might want to check into the scalability of SMP on Windows with the Intel CPU architecture. Additional CPUs do not give you a linear performance gain.

    Since you have to buy multiple licenses you'd be better off running APS on separate machines
     
    #12     May 18, 2009
  3. byteme

    byteme

    You have to buy mutiple licences to make use of mutiple CPUs??!!!

    [edited: sorry, someone beat me to the punch above]

    "Most sequential type of optimization and backtesting algorithms are not exactly suitable for multiple processing."

    Could you elaborate on that for me? I'd like to hear of some use cases that wouldn't be suitable.
     
    #13     May 18, 2009
  4. I know a 2.66 Dual core pentium isn't a 3.0 or better or an i7 but cant my processor use up to 4GB of ram with Winxp and I'm avoiding Vista right now?

    what exactly are you saying I need a quadcore or what? Given that Tradestation doesn't utilize multithreading how much of an advantage would there be with a quad or a 2.66-> 3.0 Dual core?
     
    #14     May 18, 2009
  5. jprad

    jprad

    While the total memory space in a 32bit CPU is 4GB, the most you can use is 3.5GB due to I/O (video, disc, network, etc.) being mapped to the remaining 512MB.

    You need to move over to a 64bit CPU and OS in order to use more than 3.5GB of ram.
     
    #15     May 18, 2009
  6. Actually some advanced techniques parallellize easily like MonteCarlo and the Bootstrap.
     
    #16     May 18, 2009
  7. That won't work since the execution time will remain the same for a certain task for each of the machines.

    Writing multi-threaded applications is not that hard but it is not enough for improving execution time.

    You will either pay for the cost of the programming or for multiple instances of a program. Actually, I paid just an additional 20% to get the second license.

    Maybe you would like to explain this to us how it is done. Consider the trivial nested loop example below:

    x = 0.
    y=0.
    for i = 0 to 100
    x = x+i
    for j = 1 to 1000
    y = x+2j
    end
    end
    print(x,y)

    How do you make use of multiple CPUs to get the result faster? Maybe there is a way, I don't know. I am not a good programmer myself.

    On the other hand, for the following example it is much easier:

    x = 0.
    y=0.
    for i = 0 to 100
    x = x+i
    end
    for j = 1 to 100
    y = y+2j
    end
    print(x,y)

    You can do it as follows I suppose:

    Thread 1:

    x = 0.
    for i = 0 to 100
    x = x+i
    end

    Thread 2:

    y=0
    for j = 1 to 1000
    y = y+2j
    end
    print(x,y)

    Now you have gained the execution time for thread 1, assuming thread 2 takes longer.
     
    #17     May 18, 2009
  8. jprad

    jprad

    Implicit is the ability to split the data set or segment the problem space across machines. If so, there certainly will be an improvement in total clock time to complete the entire job.

    Uh, that's my point. An application that iterates multiple pattern matches across multiple symbols and costs $1,500 should already be multi-threaded.

    Charging you extra for what amounts to an amateurish hack shouldn't be tolerated in something that's advertised for trading professionals.

    Actually, there's a special term for this sort of problem and it's treated fairly well here:

    http://en.wikipedia.org/wiki/Embarrassingly_parallel
     
    #18     May 18, 2009
  9. The biggest issue/problem with the Tesla architecture is that you have to re-design your software around their APIs. For some that's easy, but for many, it's probably not a good fit.

    If your software already is highly distributed and parallel without lots of locking, cuda might fit. Otherwise, it's a bit project to support it.

    Personally, it'd rather see a board with 100 386-instruction set CPUs where normal software would have an easier time of exploiting the parallelism.

    Edit: actually, 386 is not that important. What's key imo is a general purpose CPU with good compilers, preferably an well-known architecture such as 386, mips, sparc or the like, and a solid kernel on it.
     
    #19     May 18, 2009
  10. jprad

    jprad

    I agree. NVidia labeling the Telsa as "general purpose" is misleading at best.

    It's not really general purpose, but here's a 64 core MPP you can get on a single-board:

    http://www.tilera.com/products/processors.php

    But, something like this, if it sees the light of day, would be much more economical power-wise and give you a 12-way cluster:

    http://www.theregister.co.uk/2009/05/15/dell_does_via_nano/
     
    #20     May 18, 2009