Thousand+ core cluster of Raspberry Pi

Discussion in 'Hardware' started by nitro, Dec 12, 2015.

  1. nitro

    nitro

    I prefer to stick the O-ring in the ice water than philosophize about it.

     
    #11     Dec 14, 2015
  2. nitro

    nitro

    feature-intelligent-rack-pdus.jpg

    http://www.raritan.com/products/power-distribution/intelligent-rack-pdus?utm_campaign=70150000000iCnh&utm_medium=Blog&utm_source=Basic vs. Intelligent&utm_content=&utm_term=

    One problem I see is how to design a PDU to connect thousands of PiZeros to them. Having to deal with all these transformers is a huge pain in the ass.

    You don't wan to deal with thousands of these guys in the lower left hand corner. You just want the cable in the lower right hand corner connecting directly from the PiZero to a PDU/USB Hub.

    pizkit.jpg

    In essence, you want an intelligent high quality powered USB Hub in a rack PDU design that can handle large volumes of connections, but just for the power.

    If someone would design a PDU that took just the cable from the PiZero with no transformer, then given that these computers have such low power ratings it is possible to have say 128+ power receptacles on the PDU.

    The goal is to have a 1U computer with a bunch of PiZero in them that can be racked easily, cooled easily, maintained easily, and get something like 128 to 256 cores per 1U for a fraction of the price/power equivalent cluster in traditional computers solutions that do the same thing.

    Dreaming a little more, wireless power would RULE.But even those new things similar to what you do to power your phone at StarBucks by simply laying the phone on it would be even more ideal in this case.
     
    Last edited: Dec 14, 2015
    #12     Dec 14, 2015
    tom2 likes this.
  3. nitro

    nitro

    linux.png
     
    #13     Dec 14, 2015
    Gambit and tom2 like this.
  4. nitro

    nitro

    So that people don't think this is just fun and games, consider one of my systems. I am able to analyse 3000 symbols (for now the number can easily double).

    Say I want to analyse a years worth of tick data. Say a day has an average of 10,000 Bid/Ask quotes. Say I am able to process through the entire back testing framework about 10 B/A quotes a second. Here is the problem:

    3000 * 10000 = 30 million quotes a day
    30,000,000 * 250 trading days a year = 7,500,000,000 Bid/Ask quotes a year

    I can do 10 a second so

    7,500,000,000 / 10 = 750,000,000 seconds to finish
    750,000,000 / 60 = 12,500,000 minutes to finish
    12,500,000 / 60 = 208,333 hours to finish
    208,333 / 24 = 8680 days to finish
    8680 / 365 = about 23 years to finish

    Now imagine I had a one thousand node cluster. Since this is an embarrassingly parallel computation, I can divide 8680 / 1000 to get just about 9 days to do an entire years worth of analysis of tick data on 3000 symbols through a system end to end. That I can live with.

    The program is already cluster aware. I just need the nodes. AWS/EC2 is very expensive.
     
    Last edited: Dec 29, 2015
    #15     Dec 29, 2015
    Gambit likes this.
  5. Gambit

    Gambit

    Could this be rigged up for cryptocoin (ex: litecoin) mining?
     
    #16     Dec 29, 2015
  6. vicirek

    vicirek

    10 per second is very slow. Are you using asynchronous/parallel coding practices? You should review the code and see if there are any areas that need algorithm improvement and code optimization. As suggested earlier GPU computing is very often better choice than cluster computing. Before spending time and money on hardware use smaller sample that is manageable on standard equipment. If your method does not show promising results on small sample then running the same on massive data set is not going to make a difference. I would suggest to focus on algorithm and program optimization first rather than on hardware.
     
    #17     Dec 29, 2015
    Deuteronomy_24_7 and sysdevel99 like this.
  7. nitro

    nitro

    10 a second is slow, but trust me, it has to be this way. I might be able to get it to 30 a second, but what good would that do me? I need it to be 1000x faster, not 3x.

    As far as running on a GPU, it is possible, but so is going to Mars. The amount of work to port to a GPU is probably intense.

    The system is extremely promising. if it were just promising, I wouldn't waste the effort on trying to test it on the intensity described here.

    Take note, this is running the exact same algorithm on different data, so it is embarrassingly parallel.
     
    #18     Dec 29, 2015
  8. vicirek

    vicirek

    Embarrassingly parallel sounds like good candidate for GPU because that what GPU is all about. The only issue is moving 15 - 30 GB back and forth between CPU and GPU fast. Programming GPU is much easier today with CUDA or C++ AMP for example since it is C/C++ code with few extensions but can be done in many other languages using different GPU ports. Most of the time algorithms use hybrid CPU/GPU computations since not all computations are best suited for GPU. I just added these remarks for completeness. Theoretical single GPU speed up is definitely not 1000x but rather 100x maybe more (theoretical! all depends on algorithm type) and single computer can have multiple GPU working in parallel. In your scenario cluster computing might be better suited to solve the problem. On technological note Intel produces Xeon Phi with around 50 cores (real cores as opposed to streamlined GPU cores) for parallel computing but the cost might be prohibitive. GPU is much cheaper supercomputer than anything else.
     
    #19     Dec 29, 2015
  9. nitro

    nitro

    I don't know anything about the Phi, but I am guessing it can't be the same thing as developing for a multicore CPU. If it were, why not just put 50 cores into a Xeon?

    As far as cost, I estimate that a 1000 core Zero Pi cluster would cost around $15,000. I doubt a Phi costs that much. On the other hand, it is [distributed] 1000 cores.

    I have no idea where I would find 1000 outlets though. LOL. That is why the PDU has to be designed for this sort of thing first.
     
    #20     Dec 30, 2015
    vicirek likes this.