Machine Learning Algo for Trading

Discussion in 'Automated Trading' started by stepseazy, Jun 13, 2016.

  1. MATLAB is super easy for the simpler machine learning algos as it is point and click but it is expensive and R is free. You can literally run the 5 or 6 algos in a matter of minutes in Matlab. The free version of Matlab called Octave does not have the point and click features. R and MATLAB both are not super fast as they are higher level languages. SAS is very powerful once you know how to use it but again is expensive.

    The main deep learning packages are Torch and Tensorflow (written by google), both in Python. Currently Tensorflow/Python is the avenue I am taking because it is optimized to run on google cloud. I'm also finding that Python has useful packages for numerical analysis although I am just starting. If you are interested in neural networks, then Python is probably your best bet because you will need to build something that requires considerable processing power.

    Good luck!

     
    Last edited: Aug 10, 2016
    #101     Aug 10, 2016
    eusdaiki and wolfcuring like this.
  2. Jerry030

    Jerry030


    Orange is a great package....free, highly visual, unless you really like writing thousands of line of code.
     
    #102     Aug 10, 2016
  3. 931

    931

    Multiple users here seem suggest to preprocess or filter data before feeding machine learning algos.
    Has someone experienced significant difference between unfiltered and filtered data?
    Could moving avg to smooth data do the job or need implement something more complex like kalman filter?
     
    #103     Nov 8, 2016
  4. Simples

    Simples

    In order to solve problems you need to be able to 1) prioritize, 2) frame the right questions and 3) scope the problem(s).

    Questions regarding this:
    1. What's your overall aim: Is it to make A) a machine learning implementation, or is it to B) make a consistently profitable trading system? If A, then that won't necessarily help to solve B in a supposed optimal way. Also you need to decide what "optimal" means to you in regards to: effort, time, resources, security, ownership, etc.
    2. What are the most important aspects you need solved first and what questions do you need answers to in order to make the currently most important progress?
    3. What can you build now that can help you test your current batch of hypothesis'?
    General questions about filtering or not is meaningless without this kind of context established first. Answering such questions do not mean your answers can never change in the future. Especially in trading, everything needs to be revisited. Often, the answers may already be "obvious" to you, but needs to be challenged all the same. So the point here is to challenge oneself, not any framework (whatever works best now).
     
    Last edited: Nov 8, 2016
    #104     Nov 8, 2016
    userque, 931 and dartmus like this.
  5. 931

    931

    Not using any external framework , all is custom code and thats why it would be easyer to implement data filtering or preprocessing.
    Machine learning implementation and strategy tester is made and have been thinking of implementing filtered data for algos and using unfiltered for simulationg excutions.

    Problematic aspect is too few trades made by system.
    Most important aspect i dont know, everything is interconnected and to know weakest link would be good but not easy to conclusively find.

    Would like filter to try reduce chaos in data, maybe by removing mostly unrepetative noise(if it possible) to have less differences troughout data to make it easyer for algos to classify data movements. Would that make any sense?
    Currently using only unfiltered price data, only part that may be filter is timeframe converter to make small timeframes to bigger.

    Idea is to implement multiple filters , and at runtime enable/disable, configure to find what may work better compared to unfiltered.
    For example has someone used moving avg to slightly smooth data vs using unfiltered data?
    Or kalman filter to remove noise or equalizer to amplify/dampen some spectrum?


    Can only find out value of hypothesis by testing filtering but it would be nice to know others experiences about feeding algos with unfiltered price data vs filtered (Same implementation might not be meaningful for different algo but still good to know).
     
    Last edited: Nov 8, 2016
    #105     Nov 8, 2016
  6. jcl366

    jcl366

    When you talk about a machine learning implementation, I assume it's a short term price prediction with a neural network. Just for establishing a situation where the question "filter or not" makes sense.

    Raw prices don't work well and complex filters, like a Kalman filter, don't work well either. What works are simple filters, such as highpass and lowpass filters. At least in my experiences so far with shallow and deep neural nets for price prediction.

    Machine learning is 90% empirical. Now one can really tell you why a certain method works and another method fails. So you spend a lot of time with testing methods. An interesting approach that could work with raw prices would be an input stage with a one-dimensional convolutional neural net. I always try to persuade our clients to contract us for such experiments... :).
     
    #106     Nov 8, 2016
    userque and 931 like this.
  7. Simples

    Simples

    Not using any external framework , all is custom code and thats why it would be easyer to implement data filtering or preprocessing.
    I've recently gone the all custom code-route myself. One codebase, one place, multiple scripts though to separate data harvesting (daily data for now) and algo + simulation. Not at real execution-stage yet and plan to use another broker than my current production system, since my current broker went bananas with poor usage of javascript (no API's offered here :vomit:).

    Machine learning implementation and strategy tester is made and have been thinking of implementing filtered data for algos and using unfiltered for simulationg excutions.
    Ok this brings some useful context. So, I guess you want to "filter", aka. smooth data, in order to help your algo. This choice will directly affect the underlying ML/algo being fed different data of course, which will be "per design".
    However, then it begs the question: If the ML doesn't extract everything it needs in order to make robust and stable decision, do you actually need ML at all, what do you need it for and how can you ensure it converges to optimality (however you define it)? How can you trust it?

    Problematic aspect is too few trades made by system.

    That's a strange problem indeed, but is shared by my current production system. Either: Are we too impatient, or is some foundational understanding of the markets missing, would ML bridge that gap better than a human and how to prove that?

    Most important aspect i dont know, everything is interconnected and to know weakest link would be good but not easy to conclusively find.
    Maybe make it more simple?

    Would like filter to try reduce chaos in data, maybe by removing mostly unrepetative noise(if it possible) to have less differences troughout data to make it easyer for algos to classify data movements. Would that make any sense?
    On the surface, it makes perfect sense. However, it also tells me that the ML algos must then have limitations if reducing noise is necessary. Maybe wrong algo is used if it cannot pick movements from noise or is too unstable? How to solve that problem and if you do solve it, will simple if-conditions be enough afterwards?

    Currently using only unfiltered price data, only part that may be filter is timeframe converter to make small timeframes to bigger.
    Filtering irrevocably converts your data, for both good and evil :sneaky: The good is that the data may be more understandable, but it'll also contain more "lag" and contain less original information. Your timeframe converter may also create artifacts or bugs, and be very wary of stealing information from the future in backtests (you probably already know this).

    Idea is to implement multiple filters , and at runtime enable/disable, configure.
    For example has someone used moving avg to slightly smooth data vs using unfiltered data?

    If you configure at runtime in production, you're creating problems for yourself, as this invalidates your system performance and backtests. It's better to try to minimize configuration and not mess with the runtime once it's operational.

    MA, EMAs, all sorts of smoothing filters are being used by everyone. However, it's faulty thinking that you just need "filter + ML + runtime config", and bang that solves your problems somehow sometime in the future. Where's the proof and backtest, or even a believable hypothesis? Additionally, if you optimize parameters based on past performance, you risk finding meaningless statistical spikes, especially the more dimensions you explore (more parameters * more datasources).

    I can only find out hypothesis about filtering by testing filtering but it would be nice to know others experiences about feeding algos with unfiltered price data vs filtered (Same implementation might not be meaningful for different algo but still good to know).

    I'd say test it out. Make it cheap & easy to test everything out. I don't structure my code too heavily, but make big procedures just for that purpose, flexibility to test stuff out instead of locking myself in. It's gotten better after my first attempt which is running live and doing it's break-even thing :D:rolleyes:, but still, need to explore possibilities and then all structure and external dependencies may rein you in unnecessarily.

    In the end, it's not a magic system, a magic process or a magic teacher that'll make you land on your feet, send you running and later flying, it'll be you and your built-up experience, which can be broad and then 99% shaven off.
     
    #107     Nov 8, 2016
    water7 and 931 like this.
  8. 931

    931

    I have multiple algos, some older ones are momentum based and work best with only recent data , others appear to work best if gather more data and from further history to form statistics for potential future.
    Hi & lo pass seems good idea to test, il try implement that after coding moving avg smoother.

    Yes , to try prepare data for algos.
    Straegy tester and ML are interconnected. Strategy tester does not work purely as brute force parameter scanner but uses genetic or ML algo.
    To answer about trust,I trust this method with algos that dont have many parameters. But i dont think it has ever found its final configuration with algos that have 8+ parameters. As it would take unimaginable computing power to go trough combinations in reasonable time, unless very long time given for tests or using algo with very few parameters or too high timeframe.

    I think ML algo usually cant be better than developer can think or direct but it could perform specific tedious tasks faster.

    Always testin multiple times for benefit when modify or adding extra features.
    Only can understand how pieces work individaully, But how to find out how it all interconnectedly affects end result and what are weak or useless links in the chain?

    Id try imporve overall features to get the best out of what i have.

    Yes it initially created problems for me if new ticks came to last bar and algos already used the dataBar that was uncompleted.

    I mean to config features at runtime without need to mod sources for testing other parameter.
    All those modifications get saved to config files and could be verified to work identical on multiple instances.

    Sure, there cant be easy soultions automatically turning chaos to some order.
    Have already broken countless hypothesis and deleted at least 5x more sources than is left.
    Ratio of info preserved in out of samle data and ammount of trades is what i try to increase with additional features.
     
    #108     Nov 8, 2016
    Simples likes this.
  9. Simples

    Simples

    Always testin multiple times for benefit when modify or adding extra features.

    Agreed. Actually need to be a bit paranoid about changes to avoid continuous worsening :banghead:

    Only can understand how pieces work individaully, But how to find out how it all interconnectedly affects end result and what are weak or useless links in the chain?

    Here design and clarity may help. If you can decouple functionality / processes, it may break up a big problem into smaller but simpler problems, that you can more easily prove / test. This is at the cost of having one big solution containing everything within it, which often become a big ball of intertangled dependencies and mess. Untangle and experiment with designs. Be prepared to refactor or throw away initial prototypes.

    Id try imporve overall features to get the best out of what i have.

    End-to-end testing is inherently complex, hides details and have longer feedback cycles. It's best to delay their usage.

    I mean to config features at runtime without need to mod sources for testing other parameter.
    All those modifications get saved to config files and could be verified to work identical on multiple instances.

    Generally: Another, perhaps more modern way to look at it, is leave the configuration in code. You are anyways changing code yourself, and adapting the sourcecode will yield infinite possibilities, rather than the few dimensions provided by parameters. Only make configurations you really require and minimize their usage. Involving development in testing / optimizing, shortens feedback loop and bring more flexibility.

    Sure, there cant be easy soultions automatically turning chaos to some order.

    Averaging is easy for longer-term. However, if you want to predict short-term outcomes, that quickly gets messy and all sorts of biases creep in.

    Have already broken countless hypothesis and deleted at least 5x more sources than is left.

    This is good practice if the maturity of code demands it. Coding new stuff is quick and easy (relatively). Maintaining old code and ideas, contains hidden taxes and costs.

    Ratio of info preserved in out of samle data and ammount of trades is what i try to increase with additional features.

    Not sure why you would preserve info from out of sample data, for what exactly? I guess I don't understand what you meant, as I think out of sample data is data which haven't yet been sampled/processed (ideally).

    I think amount of trades isn't necessarily a good measurement. Isn't win-loss, costs, max drawdown, risk, etc. better measurements? I actually am working on minimizing amount of trades to reduce costs and my current system (which requires infinite patience) try to ride out big waves to stay longer in game. Not saying this is The Way, but that you may need to revisit your assumptions on what performance means for you and what you're aiming at.

    Are you treating this as a business yet? Costs are real and can kill a theoretically performing system down into uselessness.
     
    Last edited: Nov 8, 2016
    #109     Nov 8, 2016
    931 likes this.
  10. 931

    931

    Dont know the correct term for this but by info preserved i mean if at in sample period avg growth angle is some figure and if out of sample has half or less the in sample growth then it lost lost 50-60% info or more.
    Yes and by having more trades shorter term id hope to reduce drawdown, risk etc. Cost is not big problem with forex but with stocks it can be .Curreltly on eur/usd i get only 1-3 trades per month and with long testing periods (15years) drawdown periods can reach over 2-3 month even on in sample data.

    Have bridges to terminal programs implemented more than year ago but have not traded live for long yet.
    If using bridges can test validity of strategy tester vs metatrader , ninjatrader, but never used mt4 for learning parameters because it single threaded and slow, custom implementation is running much faster.
    Thats the way i found the problem of algo already using last unifinished bar also.
    So far solution has been to avoid last bar data as it takes away lot of problem.
    But it also reduce test results and creates lag
     
    Last edited: Nov 9, 2016
    #110     Nov 9, 2016