Those of you doing ML as part of your trading, do you use a second (or third) algorithm to weed out false signals? I have the feeling this will help me.
I have done this before. It did help. Boosting (machine learning) https://en.wikipedia.org/wiki/Boosting_(machine_learning)
Whats the difference between a false signal and and an adverse draw from a stochastic process? I don't try to filter anything...just estimate the probability of future events. You should be fitting a distribution...not a point estimate. At least that's my method.
Can you describe more how you're fitting a distribution? I don't follow how you fit a distribution you don't know exists...
Neural networks are all about structure, and you can coax them to do different things by using different loss functions and Input/Ouput configurations. Mean square error, which is what is suspect your using, is the most basic, and it works great if your simply trying to fit y = f(x), where your truth data comes from a reasonably high SNR. Most real world processes (at least the interesting ones) are stochastic. The simplest example of how to fit a distribution using a neural net is to try to estimate the parameters of a distribution (e.g mean and standard deviation for a Gaussian) instead of "y" itself, and than use the negative log of the probability distribution as your loss function. What if you're distribution is poorly approximated by a Gaussian? Use a mixture model instead.
Right, but the question is how do you even know that there is a distribution? Do you plot some sample variables then try to get an algo to approximate that?
You don't just blindly apply algorithms to data you know nothing about.....unless you like buzzwords. AI/ML bro! Also, every function/process...even deterministic ones have a distribution
I know the question sounds stupid. But I can either assume and be wrong or take the chance and know for sure. Hence the name
Great thread. I gave a presentation recently on how, as "Machine Learning" tools in Python become more popular, the ease of making drastic procedural mistakes (and thinking that your results are nifty!) increases dramatically. After going through lots of math and through lots of Python history (native Python, pandas, numpy/scipy, scikit-learn, library-de-jure...), I came to the conclusion that ultimately, every ML Classifier is trying to get you back to a least-squares set-up, and to the underlying Gauss-Markov assumptions (https://en.wikipedia.org/wiki/Gauss–Markov_theorem ), especially of IID conditions (https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables ) of which time-based financial data specifically spits in the eye. My advice was, mathematically, "Slow down!" I got actual applause.