Data science question: best or simplest ML/DL/NN approach

Discussion in 'App Development' started by user1, Jun 13, 2020.

  1. zenlot

    zenlot

    JVM itself is great. I agree on that, but my point was directly for Java as a language (JVM aside). I am huge Clojure fan which runs on JVM, and apart from being Lisp family functional language it allows you to access Java libs if you need.
     
    #11     Jun 14, 2020
  2. kmiklas

    kmiklas

    Why Java for porn?
     
    #12     Jun 14, 2020
  3. For what you are looking for I would treat it as a binary classification problem:
    The two classes:
    1: either "Buy to Exit" if your initial position was short or "Sell to Exit" if your initial position was long
    0: Do nothing.

    From hereon I will just speak assuming your initial position was long - so for short position just do the reverse.

    At each instance in time t (after the position is established) your model will predict the probability that you should "Buy to Exit" given some features that are available at time: t-1.

    Now, the task for your is to think about what features to compute - you could look at computing features that capture momentum-type features, or reversal signals (I'm sure you will have some technical indicator library already available for Java - but in Python I use TA-lib - look at the list of indicators they have and make your seletions.

    The next step is you need to decide your unit of analysis - by which I mean, you need to be clear on what a single row in your training dataset (for the ML model) represents - is it a time-based bar like 1-min bar or 15-min bar etc... or is it a volume-bar?

    Once you decide on the unit of analysis ( In my case I almost always default to using volume-bars as intro to this area see) then you need to start assembling your training dataset - with features (or predictors/explanatory variables in columns - and volume or time-bars in rows).


    Next step is to create a label or target variable ( this is the dependent variable that you are hoping to predict using the predictors/explanatory variables mentioned above) - note this is a 0/1 binary variable - recall that 1 means you are telling the model that you should exit your long position 0 means do nothing - let your profit run or if drawing down you are saying you are happy for the position to go against you (questionable approach - as it assumes you are trading without stop loss). Then you need to decide how you are going to split your dataset so you can use some of the data to validate/test your model.

    Then you can use a logistic regression model (one of the simplest techniques in ML) to predict the target variable of interest. It simply takes the input Xs you derived and produces a probability figure. Then you can use Area under the Curve to work out the best cut-off to take action - this cut-off is what allows you to may the probability figure between 0 and 1 to a 0/1 binary value.


    I would recommend reading Advances in Financial Machine Learning: Lecture 10/10 (Presentation Slides) before you take any concrete steps - I have bought his book but much of the information is already available via various blogs and presentations. As for me - there are two key takeaways that I'm using from that book to establish my dataset for ML - one is working with Volume Bars ( it helps that I have access to DTN IQ Feed - where I can bring up volume bars for almost any instrument - I mainly work with Futures though) and the next is his idea of Triple Barrier Method for labelling instances for ML algorithms. I have a version of that labelling criteria implemented (not exactly what he wrote in the book but shares some characteristics with his approach - you can check out https://mlfinlab.readthedocs.io/en/latest/

    All the best.
     
    #13     Jun 20, 2020
  4. guowei58

    guowei58

    agreed. linear regression is powerful and generalized enough to work well in most cases.
     
    #14     Jun 23, 2020
  5. user1

    user1

    Girija - Thank you very much!
    traider - Thanks for the input.
    Traderesque - A great Great thank you!
    Snuskpelle - Thank you
    kmiklas - Thank you
    zenlot - Thank you very educational.
    arbitech - Thank you
    guowei58 - Thank you

    As for choice of language, it's really agnostic. This is not HFT.
     
    #15     Jun 24, 2020
    zenlot likes this.
  6. 931

    931

    Garbage is actually gold at beginning in order to develop strong data filtering algo.
    I found plenty of "market inefficiency's" when using "garbage" data.
    Its tedious part to make those data filters.

    Without considering spread probably...
     
    Last edited: Jul 15, 2020
    #16     Jul 15, 2020
  7. 931

    931

    Dont implement what gets taught at UNIs, what is hyped up etc, too many people already tested those...
    If too many do , no advantage , zero sum game.
     
    #17     Jul 15, 2020