Lightweight threads

nitro · Feb 22, 2012

I have an arbitrage strategy that I am implementing that requires a humongous number of cores/threads to implement. The main problem is, that the languages that I am competent in, threads have all sorts of problems, and in .Net they are hardly lightweight.

This led me to this library:

http://blogs.msdn.com/b/daniwang/archive/2008/12/29/lightweight-threading-using-c-iterators.aspx

It is interesting that C# scales almost as well as Erlang on this test. Funny how C++/C# would crush Erlang on most computational tests, but when it comes to large number of threads, other languages have a hard time keeping pace with Erlang.

It is interesting, this pattern, because most arbitrage type trading requires you to scan large ammounts of data for opportunity. So say you are looking at an option chain, and need to compare that chain against other option chain, where any two options from each chain could be an opportunity. You can begin to see that even for a small number of strikes/months, the combinatorial explosion of computations would overwhelm most computers even with lots of cores.

I always look to simpler domains where it is both fun and educational, and mirrors the problem in the more complex domain. I realized that Conway's Game of Life has many features that, adding the constraint that it must be as fast as possible, that this problem shares many of the techniques that would be useful for this type of high frequency trading.

So, I pose a challenge. Post your best program that uses Erlang to compute the Game of Life. Note that a non-threaded version is not interesting!

nitro · Feb 22, 2012

BTW, this seems to be the way to do lightweight threads within a Microsoft solution:

http://msdn.microsoft.com/en-gb/magazine/cc163556.aspx
http://msdn.microsoft.com/en-us/library/bb648752.aspx

Kevin Schmit · Feb 22, 2012

Quote from nitro:

I have an arbitrage strategy that I am implementing that requires a humongous number of cores/threads to implement.
More...

Nitro, this statement is almost certainly not true.

Before you you run out and buy a machine with a "humongous number of cores" (or rent them on the cloud) and try to run 10,000+ simultaneous threads, I suggest you try redesigning your code. That has got to be an easier solution.

braincell · Feb 23, 2012

I'm not sure about the depth of the analysis you're talking about but did you consider an event driven system? Each will fire it's own thread when needed and terminate with the end of analysis. This can be on a per tick basis for example, or per bid/ask update. If the computation takes less time than each tick (and it should if i understand what you're doing) then you don't have to have threads running at all times. If the computation takes a bit too long, you might need to consider optimizing it by reverting to simpler data structures like 1d arrays etc which are faster than e.g. lists and so on.

Other than that, you can simply do distributed computing where you network multiple CPUs to compute data coming from the central server. Then you don't need a CPU with multiple cores, just any kind of solid data center with a bunch of them networked together. I did this for a similar task.

Finally, the fastest multithreading is done with Assembly language, so consider learning it. I could answer your challenge in ASM but i'm coding too much anyway.

promagma · Feb 23, 2012

I have 800 threads running in Java, maybe 400 working simultaneous. Utilizes all cores and works fine.

NetTecture · Feb 23, 2012

Quote from promagma:

I have 800 threads running in Java, maybe 400 working simultaneous. Utilizes all cores and works fine.
More...

Actually no, it works a lot less efficient than X threads would work, with X being - if you are not IO bound etc. - a little more than the number of cores or the number of cores depending how well you know how to program.

800 threads mean a ton of operating system overhead that you do not see.

For example memory - last time I checked, 2 mb minimum stack applies per thread, which makes 1.6 gigabyte totally wasted memory.

I prefer my architecture to have a number of worker processes and then work of queues of smaller work items. A lot less overhead.

byteme · Feb 23, 2012

Quote from braincell:

I'm not sure about the depth of the analysis you're talking about but did you consider an event driven system? Each will fire it's own thread when needed and terminate with the end of analysis. This can be on a per tick basis for example, or per bid/ask update. If the computation takes less time than each tick (and it should if i understand what you're doing) then you don't have to have threads running at all times. If the computation takes a bit too long, you might need to consider optimizing it by reverting to simpler data structures like 1d arrays etc which are faster than e.g. lists and so on.
More...

I believe the use case is dealing with options chains where multiple strikes at multiple expirations can literally all change simultaneously in response to the underlying. So, a lot of these ticks happen at the same time.

The good thing is that in the case of calculating fair values for options you can use shared-nothing lightweight threads since the value of one does not depend on the value of another.

GPUs are good at this kind of thing and I'm sure Nitro has considered CUDA or related solutions already? Of course there is the latency of transferring data to/from the GPU memory though.

NetTecture · Feb 23, 2012

Quote from byteme:

I believe the use case is dealing with options chains where multiple strikes at multiple expirations can literally all change simultaneously in response to the underlying. So, a lot of these ticks happen at the same time.

The good thing is that in the case of calculating fair values for options you can use shared-nothing lightweight threads since the value of one does not depend on the value of another.

GPUs are good at this kind of thing and I'm sure Nitro has considered CUDA or related solutions already? Of course there is the latency of transferring data to/from the GPU memory though.
More...

Actually no, you still waste. I would basically in C# use a custom TaskScheduler for X threads as above and schedule tasks. See, they do NOT happen at the same time - a CPU core can only execute one hardware thread at the same time ANYWAY. That is 1 thread per core on AMD, 2 on Intel with Hyperthreading. Called hardware. Having less threads in this case is less task switching, on top of a lot less wasted memory. Everyone using 800 threads for something like option chain calculation... better des not claim to be more than a junior developer. Waste of resources and time.

byteme · Feb 23, 2012

Quote from NetTecture:

Actually no, you still waste.
More...

Still waste what?

See, they do NOT happen at the same time
More...

Are you saying that option quotes for multiple strikes do not change at the same time? Even when you have multiple auto-quoters running for multiple market-makers? Otherwise, you're talking about something different to what I asserted.

- a CPU core can only execute one hardware thread at the same time ANYWAY.
More...

Yes, and what's your point?

That is 1 thread per core on AMD, 2 on Intel with Hyperthreading. Called hardware. Having less threads in this case is less task switching, on top of a lot less wasted memory.
More...

You seem to think that kernel space threads are the same thing as Java threads and Erlang processes and CUDA kernels.

Please tell me what context-switching happens with shared-nothing Erlang processes? Tell me what kind of context switching occurs with CUDA kernels?

Everyone using 800 threads for something like option chain calculation...
More...

Who is everyone?

To re-iterate Java threads != Kernel threads. Nonetheless, the combination of a thread pool with limited size and a blocking queue for task execution is more than adequate for many use cases in Java.

better des not claim to be more than a junior developer. Waste of resources and time.
More...

Superiority complex much?

Joel Reymont · Apr 29, 2012

Do not use Erlang for computationally intensive stuff.

Do use Erlang to implement network protocols, data feeds, execution managers, etc.

Erlang is an interpreter at its core, although you can compile certain bits to native code. Still, you will be sorely disappointed if you are looking to blaze through backtesting, etc. with Erlang.

N.B. I make a very good living programming in Erlang, see

http://www.linkedin.com/in/joelreymont