Matlab question - for guys who use it every day

WinstonTJ · Jun 29, 2012

Quick question... A little off topic because this is HARDWARE based.

I have only used R previously so that's all I have experience with. As I understand it I have a client with a license of up to 12 cores. I also understand that physical cores are required vs. hyperthread cores.

Would I be better off giving this client two machines (8-12 cores each) or just a single 12 core machine?

I have a feeling that the OS and other applications are going to drain more than expected so providing a simple 12-core machine won't actually give Matlab a full 100% on the 12 cores.

I tried to call them but they won't talk to me without license info and I just don't know it or care to bother the client about it. I simply need to find out what is in their best interest and build it.

Choice is two 8-core machines or one 12-core machine. Client seems fine going the distributed route - but will that slow things down tremendously?

Also, in terms of RAM, I'm being told that 3-4GB per core would be ideal. Is that normal? I'm thinking that would be a reason to go with the two machines (because you can get more density & DIMMS that way vs. one single machine). Is 3-4GB RAM per core a crazy number or is that normal?

Thx for feedback & suggestions.

monstimal · Jun 29, 2012

I'll first admit you're way beyond my usage of it if you have these kinds of needs. My guess is you're better off with the one machine, but that's just a guess.

The only thing I was going to say is, if you go with the two machines, to make them work together you will have to buy the add-on toolbox for distributed computing. Maybe that cost isn't an issue for this.

Also you probably know this judging by your question, but not every MATLAB operation will be multi-threaded. A list:

http://www.mathworks.com/support/solutions/en/data/1-4PG4AN/?solution=1-4PG4AN

Anything beyond this will require the parallel processing toolbox to be multi-threaded. Maybe this paper will have some answers for you:

http://www.mathworks.com/tagteam/42682_91467v00_NNR_Cleve_US.pdf

WinstonTJ · Jun 29, 2012

Thx!! Yes, I read those papers. I should clarify - the reason why everything is assumed to be non-hyper-threaded is because it's coded that way.

I know about the distributed and parallel tool boxes/packages. Or I should say the client knows about them as well as the cost of them - they just want the most optimal hardware configuration.

Right now I am trying to decide over three different setups:

First would be 6-8 (maybe 10??) ultra-small-form-factor desktops all lined up in a row. They are tiny pancake sized so I would add an additional 2-port 1gig NIC so that the parallel stuff would have a bonded 2Gig network interface and then the mobo NIC would be for management. The machines would be quad-core without hyperthreading and have 4 DIMMs so I assume 16GB of DDR2 RAM.

Second is 2-3 workstation/server type boxes with dual-xeon quad-core CPUs giving each server 8 logical cores. It would be a 2U form factor so I can bond multiple 1gig NICs This uses less power and is more maintenance but also has a greater risk of downtime if there is a failure. Each box would supply 32-64 gigs of RAM so it would be 4++ gigabytes of ram per logical core.

Third option is getting a single server with (Probably AMD??? but maybe Intel) multiple sockets. The client wants 12 cores so I'm thinking that I would need 16-24 cores available just to offset the overhead of the raid array I/O as well as the server's OS and other junk that's on the machine.

With regards to the distributed & parallel computing its a non-issue. If we package everything into one massive server it will run VMware's vCloud Director (or similar) so that they can spin up as many single/dual core virtual instances as needed by the given task. It looks like we are going to need the Parallel & Distributed toolboxes either way.

Since this is mission critical it needs to be legit - but it isn't ultra-low latency sensitive. It's not going to be processing realtime market data and making decisions realtime ASAP. Because of that I'm wondering if there will be a major difference between DDR2 and DDR3 RAM and between Intel or AMD's last generation CPUs vs. brand new (expensive) hardware.

If they have a license for 12 cores I need to deliver 12 cores that can run at 100% for an infinite amount of time without making the overall machine slow or sluggish.

Rocko Bonaparte · Jun 29, 2012

Quote from WinstonTJ:

Thx!! Yes, I read those papers. I should clarify - the reason why everything is assumed to be non-hyper-threaded is because it's coded that way.

I know about the distributed and parallel tool boxes/packages. Or I should say the client knows about them as well as the cost of them - they just want the most optimal hardware configuration.

Right now I am trying to decide over three different setups:

First would be 6-8 (maybe 10??) ultra-small-form-factor desktops all lined up in a row. They are tiny pancake sized so I would add an additional 2-port 1gig NIC so that the parallel stuff would have a bonded 2Gig network interface and then the mobo NIC would be for management. The machines would be quad-core without hyperthreading and have 4 DIMMs so I assume 16GB of DDR2 RAM.

Second is 2-3 workstation/server type boxes with dual-xeon quad-core CPUs giving each server 8 logical cores. It would be a 2U form factor so I can bond multiple 1gig NICs This uses less power and is more maintenance but also has a greater risk of downtime if there is a failure. Each box would supply 32-64 gigs of RAM so it would be 4++ gigabytes of ram per logical core.

Third option is getting a single server with (Probably AMD??? but maybe Intel) multiple sockets. The client wants 12 cores so I'm thinking that I would need 16-24 cores available just to offset the overhead of the raid array I/O as well as the server's OS and other junk that's on the machine.

With regards to the distributed & parallel computing its a non-issue. If we package everything into one massive server it will run VMware's vCloud Director (or similar) so that they can spin up as many single/dual core virtual instances as needed by the given task. It looks like we are going to need the Parallel & Distributed toolboxes either way.

Since this is mission critical it needs to be legit - but it isn't ultra-low latency sensitive. It's not going to be processing realtime market data and making decisions realtime ASAP. Because of that I'm wondering if there will be a major difference between DDR2 and DDR3 RAM and between Intel or AMD's last generation CPUs vs. brand new (expensive) hardware.

If they have a license for 12 cores I need to deliver 12 cores that can run at 100% for an infinite amount of time without making the overall machine slow or sluggish.
More...

I suppose MATLAB can have some specific quirks and goofs but this kind of question doesn't seem to be too MATLAB-specific--other than the licensing stuff. I haven't use MATLAB since school but I was doing software performance stuff for awhile there.

If this is parallel number crunching stuff, then just look at the CPU core number crunching benchmarks between the products. For the casual user, those numbers aren't use useful but it sounds like that's exactly your cup of tea.

For RAM, I thought DDR2 was out the window anyways these days.

My own knee-jerk impression is to go with the one machine unless reliability is a concern. With one machine it's one power supply, one case, one motherboard... etc. But on the other hand, if one of those fail then they're SOL until it gets repaired. In the distributed environment, a meteor could hit one of those boxes and you'd keep on trucking.

To ensure full utilization, you'd definitely need more cores than 12. One of the cores would have to take care of any fumbling around they do on the box, or just the OS doing its business.

WinstonTJ · Jun 29, 2012

To build DDR3 boxes like that things start to get expensive. DDR2 isn't cheap but the boxes and CPUs are so that's why the question of distributed horsepower comes into play.

For the most part I'm looking at server grade stuff - Intel Xeon or AMD equivalent. The RAM is a concern to me - I know how to find the benchmarks that are general but have no idea how MATLAB specific it will be.

sma202 · Jun 29, 2012

I use matlab with 32gb and 8 cores. But it depends on what you're doing and the toolboxes involved. If it's for real-time, then I'd go higher but for backtesting, this suffices. matlab is a hog though and not very efficient, there's a lot of functions that are just plain slow and its better to hook a c/c++ func using mex to speed things up.

Rocko Bonaparte · Jun 29, 2012

I had failed to consider the higher cost of server-grade memory. I tend to forget about the ECC aspect of server memory, which throws everything out of whack. But consider the hardware costs versus the license and it might still be worth it. Consider the question "at what point would it be cheaper to license more, slower cores?"

Generally you'll want to look at both the integer and floating-point scores for benchmarks. The integer scores matter for the interpretation overhead. I am under the impression the I/O benchmarks really don't matter; you fetch that crap into RAM and that's that. 3d stuff relying on a graphics card, of course, don't matter and you can go with built-in if it's offered.

Steven.Davis · Jun 30, 2012

I haven't used Matlab in a decade so I can't add to much. Four points.

1. DDR3 memory is much faster than DDR2. Current top is 2133MHz vs 1066MHz.
2. ECC memory is a little slower due to needed an extra cycle to EC. For mission critical, I consider it a must have at 2133 MHz.
3. Exporting to compiled code. Don't know if this helps but SciLab supports calling from C++ directly w/o the interpreter overhead.
4. Debug in MatLab then code. A popular mode is to develop and debug in MatLab/Excel, and then write the C++ against the underlying tuned libraries. Depending upon the functions called, see previous poster, this can be valuable or an exercise in over-engineering.

Hope this helps,
Steven