To speed up strategy parameter optimization, I have been considering adding GPU support to my trading platform. I have thought this mainly for strategy development and portfolio generation, but not for live trading nor backtesting which are not as performance intensive. Are people using GPUs to accelerate parameter optimization? My parameter optimization is already 100x more efficient than MT5 and running multithreaded, but would be nice to get some further performance improvements.
You should use cloud services here, as the setup is all done and you can just use the full calculating power from Microsoft Azure, Google Cloud or AWS Amazon. You can get custom pricing here depending on your needs. But for me the optimizing time was never the issue. Also the coding time not. All the bottlenecks was just the underlying idea creation and strategy setups. That is the main important. Everything else is done in no time (usually). You need to understand the markets and what you do. If you have the right sound strategy in your mind that is by distance the most important and most time consuming to achieve.
Distributing the work to cloud services is definitely a good step, but utilizing GPUs can dramatically cut down the computation time and cost even when running on cloud. It's also nice to be able to iterate locally with reasonably priced hardware before running on cloud with a larger data set. For me I like to be able to test algos with a good number of instruments over multiple years to see how different ideas work or changes influence overall algo performance, so I find that the optimization is an integral part of the algo development.
I think for financial data it suffices to use multicore CPUs in multiple workstations; ie. such CPUs like AMD Ryzen Threadripper. I think these are even faster than server CPUs. IMO using GPU for such data is overkill.
At least for NinjaTrader's built in strategy parameter optimization, the more RAM available, then faster it completes. For this reason, we offer 32, 64, and 128gb RAM dedicated servers.
Interesting, the optimization shouldn't require much memory even with 5sec bar resolution, let alone 1min bars. One year of 5sec forex bar data is only about 200MB and only 17MB with 1min bars.
Even for the most simplest MT5 strategy it takes about 100 mins for single single forex pair to run 10 year walk forward analysis (4 year in-sample window, 1 year out-of-sample WF phase) on 18 core PC.
What loss function are you minimizing? Most likely it is wrong. You probably just don't understand the parameters of your "system". Last year I thought I needed AWS and its compute to figure out something completely subjective within my model. I would have spent so much minimizing bullshit. You can't just RL a 100 free parameters without finding your system is bullshit.
I have found moving calculations from CPU (multithreaded C++) to GPU (opencl 1.2) can decrease execution times by a factor of 30 - 60. But, porting to make good use of a GPU is not always that easy. For example, instead of three nested loops with sizes M, N, and O, one might be able to have one opencl kernel (function) with M * N * O work items (threads) where each work item produces the data from the innermost and surrounding loops. This can quickly use a lot or memory. And, it's not as simple as writing the opencl kernel because you have to get the necessary data into the GPU's memory (e.g., precalculate all possible indicator values) and map data in and out of the GPU as needed. Another example where it might be tricky to use a GPU is if your optimization needs randomness (e.g., genetic optimization), you might want to port a good pseudorandom number generator to be available inside opencl kernels. And when things don't work as expected inside an opencl kernel, it might be a lot harder to debug (e.g., no debugger available, and I haven't found printf statements to be reliable inside opencl kernels).