Just to further explain...the idea of programming is to find edges. We use R to collect data and test data to prove statistical edges. Once an edge is found, it can be traded any number of ways..manually, automated, both, etc. So say you have a theory, that 30 NLs in currencies have the opposite effect as they do in equities i.e. a 30 NL pos confirm would indicate higher prices in say AAPL but in the Euro, maybe it indicates a fade. You need to test this idea. A chart won't help you. So you write some code to get currency data, write some code for the signal and test the signal over the last 5 years lets say. You can compare the signal to the noise in the data to make sure the relationship is not spurious. If the signal is proven valid, you move on to optimizing the signal. That is, how you properly position size and execute the signal to maximize your MAE and minimize your MFE. Once this is done you have yourself a strategy. This is all statistical analysis. Programs like R or Python allow one to do this quickly and competently. The confidence you get from the data analysis is what allows one the conviction to both execute the trade and to trade maximum size given a certain risk constraint.
Not to nit-pick, and I might be exposing myself by saying this, but wouldn't you want to minimize your MAE and maximize your MFE?
Thank you for the in depth answer. Follow up question (may be ignorant as well)...how does a program account for the unknown. You are inputting the data you have available but obviously not the data you don't have ( ie:"You don't know what you don't know"). What if it is this unknown data set that is actually causing the end effect / how is that accounted for by a program ? or asked in another way, how do you confirm/isolate that the data you used was the only or key/main meaningful variable in what is a multi-variable situation ? in re-reading your original answer I guess this is where more in depth stat. anal. comes in, but feel free to add any other comments. thanks
Yes, this is all the types of stuff you learn in statistics. The idea of taking a sample from a population is that provided that the data is Gaussian (normally distributed), then a large enough sample should approximate the data from the entire population. This is always the assumption you make and there are many ways to test the assumption to make sure it approximately holds. When you run a regression on data, you are parceling out the effects of the other variables. The interpretation of the given coefficient is the effect that variable has on the output holding all other variables constant. So a simple linear regression is easy to interpret.
Thanks again, last question..yes, undergrad stat was awhile ago.. where in this programming process is it determined that there is a causation (that an action/trade could be initiated) between 2 of the entities and not just a correlation ( they move together but one is not causing the other to move). Thanks.
"....effects of the other variables". How does the programmer identify ALL the other variables ? back to the "you don't know what you don't know" issue. After the program has been run, potential edge determined, strategy decided aren't you still at the mercy that the original variable and parameters you entered were the significant ones ? Thanks for your time, I'm done.
No, we never use the word causation. It can NEVER be known. The word we use is "association". For example: An individual who is overweight is associated with an increase risk in having heart disease. Being overweight may or may not cause heart disease but the data is showing an association between the two.
There is a lengthy process one undergoes to determine variables to use for a model that I'm sure you have long forgot in your college stats class. I'm not going to bore you be going through all the steps but there is a step by step process for determining significance, omitted variables, and relevance to a given model. It's a very structured approach. No guessing involved, just math.
Has anyone read this book yet about the Libor scandal? The Spider Network: The Wild Story of a Math Genius, a Gang of Backstabbing Bankers, and One of the Greatest Scams in Financial History Hardcover – March 21, 2017 by David Enrich (Author)