Data science question: best or simplest ML/DL/NN approach

zenlot · Jun 14, 2020

Snuskpelle said:
You misunderstood "starting out". When you have no existing codebase, the choice is easy: Pick Python (or maybe R if you're a statistician or data scientist).

However, he's already on a Java codebase. There's Weka (which he's using), deeplearning4j, etc. Unless he's looking for some particular functionality he's unable to implement himself if it's missing he can get by.

As for Java "being for porn", lol... The JVM is a very mature runtime system and regarding GC I would pick it instead of .NET for some types of demanding backend applications.

Anyway, @arbitech makes a good point. Usually, it comes down to having data (features) that actually have predictiveness. You can then apply comparatively simple algorithms. That it would suddenly appear a novel algorithm written in Python that revolutionizes trading, and where time lag to Java implementation becomes a problem, is simply not that likely.
More...

JVM itself is great. I agree on that, but my point was directly for Java as a language (JVM aside). I am huge Clojure fan which runs on JVM, and apart from being Lisp family functional language it allows you to access Java libs if you need.

kmiklas · Jun 14, 2020

zenlot said:
Talking only about AI/ML/DL... Python has taken over. It's not for starting out, it's for complete solution. I am talking here about this particular space, not anything else, where C# and Java shines.
AI/ML/DL... - Python
Trading systems, execution... - C++, C#
Kernel development - C
Distributed systems - Go
Porn - Java

Plenty of other use cases and languages. You have to pick a language which suits the task best.
More...

Why Java for porn?

Traderesque · Jun 20, 2020

user1 said:
QUESTION: Given timeseries historical data containing examples of successful trading executions for the training, what is best machine-learning/deep-learning/neural-network technique/algorithm/approach for implementing a feature to an existing trading system, specifically to manage a trade after execution (position is opened) so as to take the maximum profit or incur the minimum loss in order to close position?
More...

For what you are looking for I would treat it as a binary classification problem:
The two classes:
1: either "Buy to Exit" if your initial position was short or "Sell to Exit" if your initial position was long
0: Do nothing.

From hereon I will just speak assuming your initial position was long - so for short position just do the reverse.

At each instance in time t (after the position is established) your model will predict the probability that you should "Buy to Exit" given some features that are available at time: t-1.

Now, the task for your is to think about what features to compute - you could look at computing features that capture momentum-type features, or reversal signals (I'm sure you will have some technical indicator library already available for Java - but in Python I use TA-lib - look at the list of indicators they have and make your seletions.

The next step is you need to decide your unit of analysis - by which I mean, you need to be clear on what a single row in your training dataset (for the ML model) represents - is it a time-based bar like 1-min bar or 15-min bar etc... or is it a volume-bar?

Once you decide on the unit of analysis ( In my case I almost always default to using volume-bars as intro to this area see) then you need to start assembling your training dataset - with features (or predictors/explanatory variables in columns - and volume or time-bars in rows).

Next step is to create a label or target variable ( this is the dependent variable that you are hoping to predict using the predictors/explanatory variables mentioned above) - note this is a 0/1 binary variable - recall that 1 means you are telling the model that you should exit your long position 0 means do nothing - let your profit run or if drawing down you are saying you are happy for the position to go against you (questionable approach - as it assumes you are trading without stop loss). Then you need to decide how you are going to split your dataset so you can use some of the data to validate/test your model.

Then you can use a logistic regression model (one of the simplest techniques in ML) to predict the target variable of interest. It simply takes the input Xs you derived and produces a probability figure. Then you can use Area under the Curve to work out the best cut-off to take action - this cut-off is what allows you to may the probability figure between 0 and 1 to a 0/1 binary value.

I would recommend reading Advances in Financial Machine Learning: Lecture 10/10 (Presentation Slides) before you take any concrete steps - I have bought his book but much of the information is already available via various blogs and presentations. As for me - there are two key takeaways that I'm using from that book to establish my dataset for ML - one is working with Volume Bars ( it helps that I have access to DTN IQ Feed - where I can bring up volume bars for almost any instrument - I mainly work with Futures though) and the next is his idea of Triple Barrier Method for labelling instances for ML algorithms. I have a version of that labelling criteria implemented (not exactly what he wrote in the book but shares some characteristics with his approach - you can check out https://mlfinlab.readthedocs.io/en/latest/

All the best.

guowei58 · Jun 23, 2020

arbitech said:
GARBAGE IN GARBAGE OUT!

Dude, I am gone to make you saving a lot of time..

If you are able to find the good features, inputs, you ou even will not need ml!

I spent last year on ml... it’s just a piece of shit...
From dqn to lstm...
Majority of quant articles are just charlatans....except maybe for microstructure alpha... but with the bid ask don’t even think to try to catch this alpha.

All things Being equals.. on less than 2-3 minutes it’s possible to forecast price variation with around 70pc accuracy and a bit moore...just with linear statistics. Of course you can add your so loved ml, but it will just be like the cherry on the cake.

i am à java Guy, but i switched to python for ml... dev is much much Moore faster despite python being an awful language...

my biggest deception is may be on deep reinforcement learning...despite looking promising...

that being said, if someone has found something working on ml for tf above 30 minutes... please let me know...
More...

agreed. linear regression is powerful and generalized enough to work well in most cases.

user1 · Jun 24, 2020

Girija - Thank you very much!
traider - Thanks for the input.
Traderesque - A great Great thank you!
Snuskpelle - Thank you
kmiklas - Thank you
zenlot - Thank you very educational.
arbitech - Thank you
guowei58 - Thank you

As for choice of language, it's really agnostic. This is not HFT.

931 · Jul 15, 2020

arbitech said:
GARBAGE IN GARBAGE OUT!
More...

Garbage is actually gold at beginning in order to develop strong data filtering algo.
I found plenty of "market inefficiency's" when using "garbage" data.
Its tedious part to make those data filters.

arbitech said:
on less than 2-3 minutes it’s possible to forecast price variation with around 70pc accuracy and a bit moore...just with linear statistics.
More...

Without considering spread probably...

931 · Jul 15, 2020

Dont implement what gets taught at UNIs, what is hyped up etc, too many people already tested those...
If too many do , no advantage , zero sum game.