In summary: Technical Analysis (TA) for building TA indicators. Fundamental Analysis (FA) + Natural Language Processing (NLP) to convert fundamental economic news into FA indicators. Quantitative Analysis (QA) for risk management. Machine Learning (ML) to build predictive model based on selected TA/FA indicators within constraint of QA risk management. All of the above components form my strategy discovery module. My order execution module is implemented in Java API. I also have a data collection module. The strategy discovery module is using R, Python, machine learning library, quantlib, Scala, postgreSQL, RabbitMQ running on Hadoop, Spark, Ubuntu, local virtual machine or AWS. The order execution module is currently in Java, I will move this to C++ for speed? and will be collocated for stability and supported with another order execution module on a different server, just in case the first module fail. The data collection module is currently in Java and PostgreSQL, using RabbitMQ to communicate with others, will move to Hadoop/Spark or Apache Ignite as the data increase if I start monitoring stock, currently just doing emini sp 500, will add more futures soon, PostgreSQL is enough for storing all futures data. I do not have a risk management module yet. The current state: TA indicators, in SQL, will move this to Scala in a few days. FA+NLP indicators, not exist, a lot more work need to be done. QA risk management, not exist yet. ML in R, Python, run manually. Any thoughts?
How will you validate anything coming out of this? What timeframe or combination of timeframes are you planning to look at? How much historical data do you have, of which quality? How long do you estimate the entire ML process will take, on how many compute engines? May be, you could try to create a profitable mechanical trading system manually, to gauge what it takes to achieve that before applying brute force ... Also, there are probably lessons to be learned from other fields using ML.
To illustrate the process, let says I have two indicators, moving average recent 5 minutes > moving average recent 60 minutes, price increase in recent 10 minutes > 10%, the outcome to be predicted is whether in the next 10 minutes the price will increase by $25, yes or no, so binary outcome. I will have thousands of automatically created indicators like the example above and probably about 20 manually created popular TA indicators (e.g. MACD). Anyone would like to share a list of popular TA indicators? It may not work alone but could work in combination with others. ML will select these indicators then build the predictive model based on 18 month 1 minute bar data. So the processes are 1. to update all thousand of indicators for the next prediction 2. to applied the predictive model, the outcome would be price increase > 10% yes/no 3. after 10 minutes, then check the price, whether the prediction is correct/not 4. use the new actual price to refine the model, learn from new data, go to step 1 Monitor the correctness ratio in step 3, how many times correct vs incorrect Validation: using 6 month data, prediction every 10 minutes Timeframe: starts with 10 minutes, this may change later. Historical data: 18 month for training the model + 6 month for validation How long the whole cycle: should be 10 minutes, not sure how many nodes running in parallel Yes, I am doing it manually now, incrementally automate some parts. In this sort of problem - 50% of my time is to understand the trading domain, this understanding is paramount to be successful - 25% to understand the data, process the data, create indicators - 15% all the engineering work, coding, testing, installing software etc - 10% doing machine learning experiment to find the right algorithms with the right hyper parameters.
IMHO, 6 months out of sample for 18 months of training is not a validation at all. Your process has to overcome a huge datamining bias, of courses some strategies will pass your 6 months oos validation. I would suggest you to do exactly what you described using historical data from 2010-2011, then for your enlightment backtest the strategies that have been validated per your criteria on 2012..2015.
I wish I could do that, I don't have 5 years data, only 24 months. Anyone know where can I get ES 1 minutes bar data from 2010? From IB, I can only get 2 years.
If your goal is to predict future prices, then it should be obvious that some technical indictors like MA crossovers are nonstarters. Most indicators are designed to indicate future price direction, not future price levels.
I thought you'd already picked out "thousands" of them? Anyway, there are lists of the usual suspects all over the Web... have a party...
This sort of knowledge not obvious for me, I need to learn a lot on trading. Thanks. My planned thousands indicators are generic that are auto-generated by applying combination of functions. Something like Moving Average in the last 5 minutes, 10 minutes, 20.....etc. In addition of that, I will code 20+ popular indicators manually. Thanks for the list, its better if someone with expertise in the field recommended them
> better if someone with expertise in the field recommended them The self-styled TA experts on the site tend to focus on Wyckoff / Hershey methods... have a look at the archives.