JVM itself is great. I agree on that, but my point was directly for Java as a language (JVM aside). I am huge Clojure fan which runs on JVM, and apart from being Lisp family functional language it allows you to access Java libs if you need.
For what you are looking for I would treat it as a binary classification problem: The two classes: 1: either "Buy to Exit" if your initial position was short or "Sell to Exit" if your initial position was long 0: Do nothing. From hereon I will just speak assuming your initial position was long - so for short position just do the reverse. At each instance in time t (after the position is established) your model will predict the probability that you should "Buy to Exit" given some features that are available at time: t-1. Now, the task for your is to think about what features to compute - you could look at computing features that capture momentum-type features, or reversal signals (I'm sure you will have some technical indicator library already available for Java - but in Python I use TA-lib - look at the list of indicators they have and make your seletions. The next step is you need to decide your unit of analysis - by which I mean, you need to be clear on what a single row in your training dataset (for the ML model) represents - is it a time-based bar like 1-min bar or 15-min bar etc... or is it a volume-bar? Once you decide on the unit of analysis ( In my case I almost always default to using volume-bars as intro to this area see) then you need to start assembling your training dataset - with features (or predictors/explanatory variables in columns - and volume or time-bars in rows). Next step is to create a label or target variable ( this is the dependent variable that you are hoping to predict using the predictors/explanatory variables mentioned above) - note this is a 0/1 binary variable - recall that 1 means you are telling the model that you should exit your long position 0 means do nothing - let your profit run or if drawing down you are saying you are happy for the position to go against you (questionable approach - as it assumes you are trading without stop loss). Then you need to decide how you are going to split your dataset so you can use some of the data to validate/test your model. Then you can use a logistic regression model (one of the simplest techniques in ML) to predict the target variable of interest. It simply takes the input Xs you derived and produces a probability figure. Then you can use Area under the Curve to work out the best cut-off to take action - this cut-off is what allows you to may the probability figure between 0 and 1 to a 0/1 binary value. I would recommend reading Advances in Financial Machine Learning: Lecture 10/10 (Presentation Slides) before you take any concrete steps - I have bought his book but much of the information is already available via various blogs and presentations. As for me - there are two key takeaways that I'm using from that book to establish my dataset for ML - one is working with Volume Bars ( it helps that I have access to DTN IQ Feed - where I can bring up volume bars for almost any instrument - I mainly work with Futures though) and the next is his idea of Triple Barrier Method for labelling instances for ML algorithms. I have a version of that labelling criteria implemented (not exactly what he wrote in the book but shares some characteristics with his approach - you can check out https://mlfinlab.readthedocs.io/en/latest/ All the best.
Girija - Thank you very much! traider - Thanks for the input. Traderesque - A great Great thank you! Snuskpelle - Thank you kmiklas - Thank you zenlot - Thank you very educational. arbitech - Thank you guowei58 - Thank you As for choice of language, it's really agnostic. This is not HFT.
Garbage is actually gold at beginning in order to develop strong data filtering algo. I found plenty of "market inefficiency's" when using "garbage" data. Its tedious part to make those data filters. Without considering spread probably...
Dont implement what gets taught at UNIs, what is hyped up etc, too many people already tested those... If too many do , no advantage , zero sum game.