GPU Computing (Possibly the Tesla or Quardo 4000)

WinstonTJ · Aug 20, 2012

I was asked today to spec and configure a GPU box and I've never messed with GPU computing at all. Outside the easy and obvious like buying the GPU and sticking it into a machine and then putting an OS on it I have a few questions that hopefully someone can comment on.

Happy to document or at least post pics of the build(s) and specs, etc.

The biggest thing I'm wondering is whether or not you can assign GPU cores to processes just as you can for CPU cores. For example, if I have a 12-core server (host) and I put in two GPU units (yes I have two PCIe x16 slots ) can I assign blocks of GPU cores to specific applications?

Is anyone using VMware and GPUs? I know you can pass through the whole device to a VM/VPS but can you allow the Hypervisor to control the GPU and then use something like vCloud Director to scale up 20-30 "on demand" VM/VPS machines and utilize the physical server CPU cores for operating system but blocks of 50-100 GPU cores for computing and processing work?.... OR... Do I have to configure it to just throw the jobs/work at both of the GPU units and let them figure out the hierarchy and order?

Final question is that of Memory (RAM). Normally on the big RAM boxes I have it's optimal to divide up the RAM in blocks per CPU (if you have 2 CPUs and 64GB of RAM you give each CPU a 32GB RAM allocation vs. use the RAM as one giant block like a normal desktop). Should I be giving extra memory to the GPUs (usually through the OS not in the BIOS)? Also, if I have two GPUs and two CPUs and I divide the RAM in half and then allocate an extra ?? 1mb per core?? 10mb per gpu core?? do I need to then specify or assign the GPUs each to their own core at the OS level?

Also regarding memory on these GPU units - can that be upgraded or is it soldered into the GPU board. Seems foolish to spend $2200 for a 448-core GPU that only has 6GB of DDR5 - or - 2GB of DDR5 for 256-cores on the alternate we are looking at.

OK this is actually the last question - I've been shopping for old/legacy PCI video cards (yes PCI) because I assumed that these things like to be dedicated GPUs vs. actually drive monitors. Does it make any difference in performance (this will be racked so there will never be a monitor physically attached to it) if the OS thinks it's also driving a monitor vs. being a dedicated GPU?

Looking forward to comments - thx.

vicirek · Aug 20, 2012

GPU is a separate device from CPU and GPU cores are different from CPU cores. To acces GPU you need a card and programming environment compatible with NVIDIA CUDA, OpenCL, or DirectCompute that is downloaded and installed separately (cannot address VmWare question but probably not). Programs are usually written in C/C++ environment and require special compiler or external library specificaly addressing GPU functions. Nvidia/AMD list compatible cards and they are higher end usually PCIE 2.0 compatible because memory bandwith is essential. What you do is to write data to GPU memory then execute kernels (code for each core which is the same on those hundreds of cores) in parallel on GPU and read back data to RAM. Some of the early cards are cheap now but they have fewer cores and not much memory. GPU code and graphics can be executed on the same card or they might be dedicated. GPU can be dedicated to specific CPU thread or threads. GPU memory managment is topic by itself but programmer manages the memory allocations and thread assignment into blocks and cores and their execution. Right now it is fairly low level programming and you need a very good programmer to do that (not me). Luckily GPU becomes mainstream now and there are libraries that let you access GPU programming without going to low level hardware programming but they are solving specific problems (math science graphics signal processing and even finance). So this is not only a card but most importantly how to use it and integrate into your software environment (windows,linux etc.). GPU memory is fixed and cannot be upgraded. The most efficient way is to go with gaming graphic card with a little more memory like 1.5-2G and good memory bandwith. They sell for $100 to 300 but must list compatibilty with program environment listed above.

WinstonTJ · Aug 20, 2012

I get all that on the compatibility and software side.

Just wondering about the strict hardware setups.

vicirek · Aug 21, 2012

Hardware setup is quite simple because GPU is a separate (super)computing device and any plain vanilla but current board and CPU with PCIE will do if most of the work will be offloaded to GPU.

Cooling and power consumption is an issue because higher end cards run hot. Power supply needs to be above 600w for single GPU and you may need extra power connectors for the card. Most of the setups use 1 to 4 GPUs but to avoid issues the GPU processor should be of the same class. So if this is some rack and co-located computer than there might be an issue or cost for extra power or cooling.

Some of the GPUs are better in double precision and computing in general and on Nvidia side it would be Fermi processor which is the previous generation. For really intensive tasks GTX 400 or 500 series are best. Newer Kepler GPU might be good for single precision or integer computing but not double precision because NVIDIA scales down consumer graphic cards on compute side so Tesla and Quadro will sell.

Do not know much about ATI because I focus on NVIDIA CUDA so I check only those.

My current setup is one GTX 580, intel i7, 16gb ram, Win 64, 620w power supply on Asus Maximus Z68 chipset so in general it is a gaming rig (I do not game though).

Another is AMD Phenom II with 2 GT 210 cards. Each has 16 cores while gtx 580 has 512 fermi cores. Difference in GPU compute is about 8 to 10 times faster for gtx 580.

It is the best to use same type of card in multiple slots. Double card is probably sufficient because of the need for program debugging and having separate graphic output for the programmer. Single card computer would do Mars mission no problem.

Everything depends on the special needs required by program that runs it or to turn it around you ask the programmer what is needed or tell him to manage what he got.

In general these are separate dedicated applications and you cannot tell your OS hey buddy I just got you nice GPU please offload some computing intensive work there.

Random.Capital · Aug 21, 2012

Quote from WinstonTJ:

...a 448-core GPU that only has 6GB of DDR5...
More...

Ah, sounds like Tesla.

For your first time out, honestly, I'd just grab that and go. There's a huge learning curve to doing this effectively, it'll really help to go with something used by (relatively) lots of people. These things are all over fintech - believe me, CUDA/OpenCL are no cakewalk, you'll be happy to have someone to talk to.

Also, have a seperate GPU for graphics, use the Tesla (or whatever) strictly for compute acceleration. KISS applies.

NetTecture · Aug 21, 2012

Yes. And if you have to program yourself, look at C++ and Visual Studio 2012 - MS has a new library where you program using Lambdas and it paralellizes the resulting code onto the GPU automatically (or uses the CPU when no GPU's are present).

Otherwise it is HARD to get good results out - this is a separate computer, so everything must be planned properly, also to optimize latency issues, make sure the high amount of cores is always busy, prepare data transfers ahead of time etc. - and yes, transfer times CAN be non trivial to optimize

WinstonTJ · Aug 21, 2012

Thanks you for the responses it's a big help.

I have a 1400w PSU and will run 3x video cards, two GPU units and a junk video card for simple output to a monitor if needed.

I don't know that I want to spend the cash on a spare machine to test and play around with so I may have to test VMware ESXi and a few other configurations physically on this box before it's delivered.

I'm going to look into the NVIDIA industrial or professional lines as well because I'm not married to the Tesla or the Quardo, just didn't realize that there were other models that could be more appropriate. That said I do think it's a good point to use something that's used by (relatively) lots of people.

Thanks again!