How VPIN works. FWIW, I think the VPIN is a lagging indicator. Still, useful to understand how it is computed:
There are so many FPGA starter boards, I finally decided on this one. Ultimate goal is to price 100k options on a binomial tree with 700 steps every ~second. http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=205&No=941&PartNo=1
What do you use to program these FPGA devices, and how painful is it (compared to high level languages)?
RTL It is extremely painful. Programming an FPGA looks incredibly foreign to a typical imperative programing language programmer. If on the other hand you are comfortable with declarative languages, you are closer to feeling comfortable with the paradigm. I think the weirdest thing is having to think of cycles and clocks when programming FPGA. Sort of the way an assembly language programmer is intimately aware of registers. It is not uncommon to see logic circuit diagrams of your program, something very foreign to a typical programmer. And of course, the development tools are not really mainstream. There is a learning curve, but if the goal is ELL (extreme low latency and throughput - ASICs are even faster, but one step at a time) there is no choice. There are lots of tutorials online. Once you get past here, you are half way. Just because you understand how to turn hardware/firmware into software (which is a reasonable definition of FPGA), doesn't mean you magically turn out ELL programs. Then of course the programming part is to understand the data structures and algorithms that leverage this type of hardware.
Thanks, nitro. Isn't there a choice to use OpenCL to program FPGA? Additionally, have you considered GPU (as opposed to FPGA) as the hardware alternative for many-core computing?
Yes you can use OpenCL to program FPGA. Since I myself am getting up to speed on this, I don't know if OpenCL is the best way to take advantage of FPGA. I get the feeling that to get the ultimate advantage, you have to drop down to HDL to get the most benefit. But yeah, OpenCL is to HDL what C# is to assembler. GPUs are fine for throughput but they are currently have high latency since they have to be run with a standard CPU. AFAIK, there are no standalone GPU, and therefore the data has to be fed to the GPU through the NIC on a standard computer. Nvdia is working on being able to access main memory from the GPU<=>RAM without having to copy it continuously back and forth, and since market data is constantly changing, GPUs have limited scope in this domain. I forget what it is called, DirectLink maybe? NVidia said this technology would be available sometime in 2016, but I think it has been pushed back to 2017. There are some attempts at reducing GPU latency, like GPUDirect RDMA. None of these things compete in ELL with an FPGA/ASIC getting the packet directly with the networking stack implemented on the FPGA. That said, not everything has to be ELL. It might be good enough to be very very fast, especially in markets where there isn't penny spreads in the bid/ask.
A reasonable starting point Comp Sheet comparing NN technologies: http://deeplearning4j.org/compare-dl4j-torch7-pylearn.html