From idea to trading system, a framework.

Discussion in 'Strategy Building' started by eusdaiki, Jun 4, 2014.

  1. -------------------------------------------------------------------------------------------
    This is a framework to go from any trading idea (or hypothesis) to a trading system, it is (by definition) work in progress and open to feedback and constructive criticism, since this is the only way to make it stronger.

    This framework is built around the idea of building algos based on a hypothesis about the market and how it behaves under a given set of conditions, we will call this set of conditions an event.

    The scope of this framework covers the process from the conception of the idea to the point where it is trading live.

    The goal of each step in the framework is to prove the hypothesis wrong, or falsify it, once falsified we go back to the top adjust the hypothesis and start over. The idea behind this is that it is cheaper to discard a wrong idea earlier in the process.

    The goal of this process of iterations is to make the hypothesis stronger based on the observations in the data.

    The market hypothesis needs to have 2 components.

    An event, which is a series of market conditions that are described by the hypothesis.
    A set of rules to follow in order to profit from that event.

    Overview:
    - We will first formulate the hypothesis on paper
    - Build the system that detects when there is a high probability of occurrence of the event.
    - Build the rules to act in a manner that allows us to profit from the event, if it where to occur, and minimize risk if the event doesn't materialize.



    - Formulation
    =======================

    During this stage we formulate a hypothesis that describes an event in the market.

    - Formulate the hypothesis in written words
    ----------------------------------------------------------
    During this step, we word out the hypothesis.
    Describe the market conditions, or theories that provide a foundation
    to the idea.
    The hypothesis should be stated as a description of a specific market event.

    What are the conditions that describe the event?
    Before, During and After?
    describe the event or the outcome of the event.

    - Formulate the hypothesis in terms of data
    ----------------------------------------------------------

    The goal of this step is to define the conditions in such a way that you can automatically label instances of the event in the data.

    In this step you can formulate calculated data indicators that may help identify the event. Such as statistical measures on the data, rolling window calculations, etc...

    During these stage we must also define which AI, if any, is a good fit for the problem.


    - Detecting the Event
    ========================
    The next step is to build a signal based on the hypothesis.
    For the effect of this writing, we will consider that the signal is not trivial and is generated by an AI.

    In the event that the signal generation was trivial we can simply skip this step, generate the signals and move to the next phase.


    AI training
    -----------------------

    During this stage we will train an AI to identify the hypothesis on historical data.

    To prevent hindsight biases, the data must be separated by dates, so there is no overlapping timestamps on different data groups. This holds true for every data partition that we do during the experiments.


    - training data
    - quiz data
    - test data

    Label the data:
    Find instances of the event, using hindsight
    -- e.g. if the event is a 10% market move, then find the instances where such a move happened and label the event right before the move.
    We will use the labels to train the AI into identifying the conditions prior to the event.

    Training data:
    Have the AI look at the training data, make predictions, measure the accuracy of the prediction against the labels and adjust the parameters.
    Repeat.


    Quiz data.
    Every few hundredth iterations on the training data, run the predictions on the quiz data and measure the accuracy. Do not adjust the parameters based on the quiz set!

    Test data.
    This data is used only once at the end of the training and the results are reported.


    Acting on the event.
    ========================

    During this stage we observe the statistical data produced by the event, and we generate a trading plan around these observations.



    Profile the event
    -----------------------------------


    In the reports for the experiment include event profiles of the events identified by the AI on each of the 3 datasets.
    The event profiler will allow us to make forecasts and predictions on what we can expect of the event in terms of risk/reward.


    - Generate trading rules
    ----------------------------------------
    Up to this point we've worked only on the description of the data.
    From what we learned by describing the data we formulate a trading plan.
    What is the profit target?
    What is the maximum loss we will tolerate?
    What is the probability of the trade being a profit?
    Should the entry be made in 1 shot, or in steps?

    The tradeplan is a set of rules that will be applied whenan event signal is generated by the signal engine.

    Trading test
    =========================


    Back test
    ------------------------------
    With the tradeplan we go back to the lab.
    This time we simulate the execution of the test plan on the backtesting data.
    For the backtester we have the AI identify events on the backtesting data, and we simulate trades based on the trade plan rules.

    The backtest data must be divided in the same manner as the AI data.

    On the training data, we iterate finding adjusting the trade rules to optimize the results. We run the quiz every few hundredth iterations and run on the test data once we consider that we have a strong set of rules.

    - Walk Fwd test
    -----------------------------------------
    During this test we use the set of rules obtained during the back test and simulate executions against live market data.
    The AI identifies the trading event, and we enter/exit based on the rules.

    - Live test
    -----------------------------
    Test the algo in live markets and grow its volume slowly to measure the effect of slippage and control its risk.
     
  2. At first glance it seems like you're using the word "event" to refer to both the concept of a trading setup and to the outcome of that setup- makes it a little confusing because obviously those are two different things.
     
  3. That is a very good observation and a detail that I missed.
    I'll work to remove that confusion.

    Thanks.
    :)
     
  4. version 0.2 :)


    This is a framework to go from any trading idea (or hypothesis) to a trading system, it is (by definition) work in progress and open to feedback and constructive criticism, since this is the only way to make it stronger.

    This framework is built around the idea of building algo's based on a hypothesis about the market and how it behaves under a given set of conditions, I call this set of conditions an event.

    The scope of this framework covers the process from the conception of the idea to the point where it is trading live.

    The goal of each step in the framework is to prove the hypothesis wrong, or falsify it, once falsified we go back to the top adjust the hypothesis and start over. The idea behind this is that it is cheaper to discard a wrong idea earlier in the process.

    The goal of this process of iterations is to make the hypothesis stronger based on the observations in the data.

    The hypothesis is an attempt to describe a set of market conditions that provide a statistical edge, we will call this an event.
    In order to describe the event, we speak of it in terms of the conditions leading to the event, or in terms of the conditions following the event.


    1. Event defined in terms of its outcome.
    In a very limited sense, the outcome of the event is known, but the conditions leading to the event are unknown. The outcome is known in the sense that we are able to use this description to label our historic data for instances of the event --using hindsight -- and to use these labels to train AI systems that will forecast the probability of the event given prior conditions that the AI will learn from the data.
    But our knowledge of the event's outcome is limited to our ability to forecast it. This means that once we obtain a strong signal from the AI, we must treat the outcome of the event as unknown and study it as an event described in terms of its prior conditions, the prior condition being the AI's forecast.

    2. Event defined in terms of its prior conditions.
    When we describe the event in terms of its prior conditions, we are able to identify the event as it happens, and we must study the outcomes that may come from the event. The probability of the different possible outcomes and rules to follow in order to profit from the desired outcomes, while controlling the risk presented other probable outcomes.


    We will follow these steps:
    - We will first formulate the hypothesis on paper
    - Train a system to forecast the event.
    - Build the rules to handle the event's probable outcomes.


    - Formulation
    =======================

    During this stage we formulate a hypothesis that describes an event in the market.

    - Formulate the hypothesis in written words
    ----------------------------------------------------------
    During this step, we word out the hypothesis.
    Describe the market conditions, or theories that provide a foundation
    to the idea.
    The hypothesis should be stated as a description of a specific market event.
    What do we know about the event? what is unknown?
    Can the event be described in terms of prior conditions?

    - Formulate the hypothesis in terms of data
    ----------------------------------------------------------

    The goal of this step is to define the hypothesis in terms of the data that we will need in order to test the hypothesis.
    How much data will we use for the experiment?
    Which securities/assets?
    For what time range?

    - Forecasting the event.
    ========================
    If our event is defined in terms of its outcome, we must define the rules to label the historic data in terms that we may use to label the data automatically.
    Then we need to perform analysis on the data, to explore different alternatives for generating the signal, and train several AI's to compare their results.

    AI training
    -----------------------

    During this stage we will train an AI to forecast occurrences of the event on historical data.

    To prevent hindsight biases, the data must be separated by dates, so there is no overlapping timestamps on different data groups. This holds true for every data partition that we do during the experiments.

    Training data:
    Have the AI look at the training data, make predictions, measure the accuracy of the prediction against the labels and adjust the parameters.
    Repeat.

    Quiz data.
    Every few hundredth iterations on the training data, run the predictions on the quiz data and measure the accuracy. Do not adjust the parameters based on the quiz set!

    Test data.
    This data is used only once at the end of the training and the results are reported.


    Studying the event's outcome.
    ========================
    For this step we must have an event that is defined in terms of its prior conditions. If we built an AI then forecasts coming from the AI are the prior conditions, and during these tests we must let the AI identify the occurrences of the event, without learning (to prevent over-fitting)



    Profile the event
    -----------------------------------
    During the stage we will observe the outcome of the event and generate trading rules based on those observations.
    We will use an event profiler during this stage. This is a tool that allows us to compare the statistics generated from many instances of the event.
    The event profiler will allow us to make forecasts and predictions on what we can expect of the event's outcome in terms of risk/reward.


    Trading test
    =========================

    - Generate trading rules
    ----------------------------------------
    Up to this point we've worked only on the description of the data.
    From what we learned by describing the data we formulate a trading plan.
    What is the profit target?
    What is the maximum loss we will tolerate?
    What is the probability of the trade being a profit?
    Should the entry be made in 1 shot, or in steps?

    We will generate a set of rules to follow when a the conditions prior to the event are met.


    Back test
    ------------------------------
    We simulate executions of the trading rules on historical data and make adjustments.
    In order to simulate market conditions we "play the tape" of the data in chronological order feeding the data to the signal engine and the backtester in the same manner that they'll receive during live trading conditions and we simulate executions on the data based on the trading rules.


    We may use an AI to fine-tune the trading rules.
    Genetic algo's are a good fit for this type of job.
    Whether we adjust manually or through an AI, the backtest data must be divided in the same manner as the AI data to prevent overfitting.

    On the training data, we iterate finding adjusting the trade rules to optimize the results. We run the quiz every few hundredth iterations and run on the test data once we consider that we have a strong set of rules.



    - Walk Fwd test
    -----------------------------------------
    During this test we use the set of rules obtained during the back test and simulate executions against live market data.
    The AI identifies the trading event, and we enter/exit based on the rules.

    - Live test
    -----------------------------
    Test the algo in live markets and grow its volume slowly to measure the effect of slippage and control its risk.
     
  5. Thanks for sharing your trading system development framework.
    Do you have some success in identifying price patterns/regularities with this framework? What works? What does not?
    Do you find things that keep working out of sample?
    How easy/hard is it to transform a 'prediction' into a trading system?
    Does Machine Learning help you better understand the financial markets? Do you find things (correlations, patterns, regularities) that are not obvious but once discovered, make sense and can be explained in simple words?
    I am currently not using Machine Learning for my trading so I try to understand how this approach can be helpful in analyzing the markets.
     
  6. Hi.
    I'm currently working on 1 trading idea and building this framework around this idea.
    Although the original idea was based on 4 simple patterns using this framework i ended up splitting those 4 patterns into almost 50 patterns (by being very meticulous about where each of the 6 points in the pattern stand relative to each other) and I was able to analyze these independently in event profilers.

    Translating the event profilers into trading signals or trading rules has proven to be a challenge. For now Im keeping the trading rules fixed, so I only trade 1 particular expected outcome, but not the others.
    I am focusing my efforts on interpretation of the signals through machine learning.
    The decay of the signal is something that I don't fully comprehend yet, but in some experiments I have seen that training the signal with data that happened a brief time before the test set and in a similar context (same set of stocks) does deliver better results. for example, in one experiment I trained the data with 2014 data and then tested on 2010 data... with very poor results, I then repeated the test using 2010 data from a few days prior to the test set for training and the results improved significantly.
    So there seems to be regularities in the data that hold for a short period of time and then go away.
    Im currently preparing a large test with several years worth of data where I intend to accumulate the learning day after day, and trade every day using what was learned so far... the intention of this test is to find at what point I need to start throwing out events that have already expired (for example I intend to start by using the last 100 events as a "rolling sample")

    Although Im not using any conventional machine learning techniques, I am applying what I learned from Andrew Ng's class when designing the system that interprets the signals, so in a way the machine is learning from the data in the manner you would expect for an online training system. But I haven't gone fully into the AI section of the framework, since I am working with an idea that is described in terms of its previous conditions.

    I have found several patterns that upon close inspection, make sense, and I can make a narrative around them. But this is not the case for all of them.
     
  7. Just wondering, what program or programs are you using for this project?
     
  8. PostgreSQL, Python, C, C++
     
  9. ronblack

    ronblack

    "AI training
    -----------------------

    During this stage we will train an AI to identify the hypothesis on historical data. "

    Do you do only once or many times on the same training data? Because if you do it many times, no matter if you keep data for out-of-sample validation, data-snooping is introduced each time you reject a hypothesis and look for another one.
     
  10. You are correct, it is rather difficult to keep look-ahead bias from slipping into the system in various ways (such as the one you describe)
    There are several precautions in the system that are meant to prevent it from making decisions based on data that was not available at the time of the decision. Such as dripping the quotes chronologically so that the system doesn't have access to them before they would've been available. With enough data, one could prevent snooping by never using the same test-data twice between hypothesis tests --sort of a semi-walk fwd test?--, however I don't have enough data for this right now... so I must settle for less than perfect solutions that allow me to focus my energy during the back tests in preventing other biases, in preparation for the walk fwd test. Once in the walk fwd test snooping bias is excluded by the nature of the test.
     
    #10     Jul 16, 2014