Global, One of the downsides of human comprehensibility is the required dumbing down of what is a very complex process. The human mind can deal efficiently with 7 to 10 components/factors in a situation requiring evaluation or judgement. This is probably because for most of the last 5,000,000 years of our evolution life was pretty simple: is that cave going to be damp in the winter? is that good creature to eat or will it try to eat me? is that life form good to have sex with, or not....not a lot of choices or complexity. So we are for the moment stuck with a brain that when given a domain with hundreds or thousands of interrelated factors, like financial markets, can't do much until it groups/summarizes/ignores or in some way whittles the whole thing down to 7 to 10 factors before proceeding. For more info read any good text on cognitive neuroscience. However we are not suck with our brain. Software exists that can model 3,000,10,000, .... variables at the same time without the summarizing, fuzzifying process that would be required in a human brain. Overfitting is a problem for folks who play with powerful software without the required skill, knowledge or training.....like letting middle schoolers play with shoulder fired surface to air missiles.....they are as likely to shoot up the school house as take down an enemy aircraft. In very simple terms overfitting is avoided by rigidly segregating the learning, training and test data sets during model development. Definitions: Learning Set: Time series used to postulate and discover model components, interrelationships, functions, factors. Training Set: Time series used to learn the model by weighting component interrelationships. Test Set: application of trained model to a previously unseen time series to determine likely real world performance.
I like having Alpha.[ModelName] for name space organization. All the alpha models exist as their own class (with sub classes if needed). I like it because: - I get "full" functionality to create a model. Whatever is possible in the language/platform/framework, I can use. - Alpha models are isolated from each other. They'll re-use functionality from other areas (execution, transaction cost estimators, risk, etc), or subclass/polymorph if they need something custom. - Bring alpha models online and offline is relatively trivial. - Big alpha models and little ones can exist peacefully together. - By having alpha models grouped into the same application, it's pretty easy to say "strat x isn't allowed to open positions on an instrument that strat y already is working with", etc.
You do not need to describe your strategy in script code. But you can write scripts in the language of choice and store as string and have it evaluated via reflection. C# like many other languages offers a variety of options (Roslyn, ScriptCS, CSScript,...)
You can store the whole core logic as script in a string and have it compiled via reflection or other options I outlined above. The benefit is that it is human readable, can be easily swapped in and out, you can even change core logic in a text editor and load in as string and have it evaluated with the changes. Obviously, this would not be an option if you cared about code steal. In that case you would develop the core strategy inside a library and obfuscate it. I do not understand the original motivation of the OP but the choices are endless either way.
In the interests of balance and in praise of the simpler approach, I will share a few anecdotes from my professional experience of systematic trading with multi billion dollar portfolios. The first story occurs in 2011, when as you'll recall there was an earthquake in Japan. On Friday there hadn't been much news, but by sunday it was clear the damage was much deeper than expected. I was part of a group of PM's each covering a different asset class, and we had a conference call with the CEO and Chief Risk officer to decide what to do next, depending on how the Japanese markets opened monday. Some of the PM's had absolutely no idea how their models would react conditional on price changes, because the complex non linear interactions were just way beyond what someone could intuitively or back of the envelope work out. When the markets opened, in some cases with very large moves, the non linear models did some really strange things. We lost complete trust in the fancy models. After that a decision was taken to strip back models to their simplest elements. We lost about 5% of the in sample performance, but probably almost nothing . The second story concerns a really smart recent Phd who we interviewed for a job. After telling us about some fancy fitting technique he'd used on some data in his thesis, he was asked the killer question - how many degrees of freedom ? He didn't know. When he eventually worked it out it was obvious that the model was horribly over fitted. A relatively stupid second year undergraduate using simple statistical techniques would have been able to avoid this mistake. The final story is about a really smart Phd who we did hire. He spent about 6 months with some really complex fitting tools, and managed to come up with a six parameter non linear model. Nobody could understand it, or predict exactly what it would do given a particular price movement, and it was 99% correlated to the much simpler model we already had. I left the shop shortly afterwards, so I can't tell you how that story pans out.
agreed with some. I use injection (visitor pattern) but I think simple is best; leave the logic in the code. I realize you can eval the json or xml but then why stop there you can also write a DSL, dink around with llvm, or a visual tool that links logic together.
So your models could not handle a Black Swan event? Why not just develop separate models for large earthquake, very large earthquake, very very large earthquake with tsunami, asteroid impact destroying 1/5 of Earth surface, outbreak of Zombies and so on? We have them. On slow days I love to play around with them by entering event conditionals like tribal waves destroy London and Tokyo AND Zombie outbreak kills everyone in New Jersey.
Not sure I follow what you are saying. I simply stated that the simplest way to swap in and out core strategy code during run-time and have it human readable is via script. Maybe the following as one of many choices makes it clearer what I tried to say: http://www.csscript.net/help/script_hosting_guideline_.html http://scottksmith.com/blog/2013/05/08/getting-started-with-scriptcs/ http://blogs.msdn.com/b/csharpfaq/archive/2011/12/02/introduction-to-the-roslyn-scripting-api.aspx