Cuda questions

931 · Sep 24, 2016

Hi,
Have noticed in this forum some posts mentioning cuda programming and was wondering if someone with bit of cuda knowledge can give some general advice about using it for financial calculations.

Overall is it good idea to learn cuda and use floating point for financial data?

How hard it is to learn from multithreading experience on cpu to gpu?

If code is very cache intensive on CPU but not maximally RAM intensive, could GPU do better?
For example if very long tasks for 3000 gpu cores instead of short and repeating task, What would happen to gpu performance?

For example if 500 cuda cores try get same memory location, May it slow down all cores from acessing same location of memory?

If some core writes to locations on memory that others try to read at same time, Do gpus crash like cpu?

No need to answer all but some advice would be good.

2rosy · Sep 24, 2016

Threading and cuda are apples and oranges. I use the python numba package from continuum. Basically all i need to do is write run on gpu; no cuda code required

stevegee58 · Sep 25, 2016

GPU programming whether it's CUDA or OpenCL is still something of a black art. The real problem with any kind of parallel programming is finding parallelism opportunities in algorithms. Some algos simply don't lend themselves to parallelism where others are a slam-dunk (like Bitcoin mining).

I'd say that rather than taking a GPU and throwing problems at it, look at what problems you're trying to solve and see if parallel processing is possible.

Daniel Fernandez is doing this kind of thing at asirikuy.com and has written some about it on his blog at mechanicalforex.com.

931 · Sep 25, 2016

stevegee58 said:
GPU programming whether it's CUDA or OpenCL is still something of a black art. The real problem with any kind of parallel programming is finding parallelism opportunities in algorithms. Some algos simply don't lend themselves to parallelism where others are a slam-dunk (like Bitcoin mining).

I'd say that rather than taking a GPU and throwing problems at it, look at what problems you're trying to solve and see if parallel processing is possible.

Daniel Fernandez is doing this kind of thing at asirikuy.com and has written some about it on his blog at mechanicalforex.com.
More...

Thank will read.
Task does not need to spread its subtasks to many threads and to sync threads while performing task.
It has long sequential tasks for each thread and common for all threads is writing and acessing locations in ram , other than that threads would not need to sync or wait on eachother.
But it is possible that sometimes same location will be overwritten by big portion of threads and also acessed.

It is possible to create long tasks for each thread so alot of parralel oppertunity, But do gpus handle well long separated parralel tasks if memory is common for all?

stevegee58 · Sep 25, 2016

GPU memory arrangement is not simple and will vary widely among manufacturers, GPU models etc.
Yes GPUs have common/global memory but again it depends. It may be so slow that it's not worth accessing in parallel. You'll have to research it and do some of your own prototyping.

Choosing CUDA over OpenCL may be a little premature. You should probably do more research on that as well. I think OpenCL has a bigger user base than CUDA so you'll find more information and code examples on it. Even if CUDA is "better" you may find that OpenCL is still the easier choice.

If you ever owned a VCR then you may remember Beta and VHS. Technically Beta was better but was proprietary to Sony. Guess who won? VHS because it was a multi-manufacturer consortium and so there were simply more VCR players using VHS. CUDA is a single company proprietary language but OpenCL is a consortium.

931 · Sep 30, 2016

Thanks for info and pointing to OpenCL.
Looking further into Cuda/OpenCL, it appers that both require extensive research and probably lots of trial &error before anything useful can come out.
OpenCL appears more promising as its easyer to find info, and it may be even possible to combine it to other computing hardware.
Acessing used regions of ram by multiple cores still apears to be problem on gpu.

931 · Oct 19, 2016

What about using OpenACC?
It appears to be avalible for nvidia and amd gpus and is claimed to require very little code modifications.

Zzzz1 · Oct 22, 2016

Beta was always inferior to VHS and so was ntsc to pal. And i think cuda has pretty much won the AI battle agaimst opencl and carved for itself out a pretty neat niche. I say that because i see almost every AI framework seamlessly supporting cuda but hardly any supports opencl out of the box.

stevegee58 said:
GPU memory arrangement is not simple and will vary widely among manufacturers, GPU models etc.
Yes GPUs have common/global memory but again it depends. It may be so slow that it's not worth accessing in parallel. You'll have to research it and do some of your own prototyping.

Choosing CUDA over OpenCL may be a little premature. You should probably do more research on that as well. I think OpenCL has a bigger user base than CUDA so you'll find more information and code examples on it. Even if CUDA is "better" you may find that OpenCL is still the easier choice.

If you ever owned a VCR then you may remember Beta and VHS. Technically Beta was better but was proprietary to Sony. Guess who won? VHS because it was a multi-manufacturer consortium and so there were simply more VCR players using VHS. CUDA is a single company proprietary language but OpenCL is a consortium.
More...

stevegee58 · Oct 23, 2016

Zzzz1 said:
Beta was always inferior to VHS and so was ntsc to pal. And i think cuda has pretty much won the AI battle agaimst opencl and carved for itself out a pretty neat niche. I say that because i see almost every AI framework seamlessly supporting cuda but hardly any supports opencl out of the box.
More...

I have this sneaking suspicion (with no evidence at all) that the CUDA AI niche is heavily dominated by academics. Nothing wrong with that but academics tend to be off in the weeds doing their own thing. I'll stick with OpenCL for the reasons I stated above: wider user base, more online examples. While I'm at it I'll throw in another reason: CUDA only supports Nvidia hardware. OpenCL is truly open and works with hardware from multiple manufacturers.

Zzzz1 · Oct 23, 2016

Why do you say OpenCL has a wider user base? I hardly ever heard of anyone using OpenCL. What makes you make this claim? Just curious because I truly only hear of CUDA based projects, and that not just in academia. Nvidia's top computing GPU cards are selling like hot apple pie. Your last argument does not convince much given Nvidia GPU performance beats most of its competition, especially when it comes to AI. Add to that a vast array of AI applications and projects they are currently building and I would say you are with OpenCL pretty much kept outside the door. Do you mind sharing which applications you use that rely on OpenCL? Thanks

stevegee58 said:
I have this sneaking suspicion (with no evidence at all) that the CUDA AI niche is heavily dominated by academics. Nothing wrong with that but academics tend to be off in the weeds doing their own thing. I'll stick with OpenCL for the reasons I stated above: wider user base, more online examples. While I'm at it I'll throw in another reason: CUDA only supports Nvidia hardware. OpenCL is truly open and works with hardware from multiple manufacturers.
More...