How to determine "bins" for categorizing values in normal distribution

Discussion in 'Strategy Building' started by Sigue!, Aug 14, 2020.

  1. Sigue!

    Sigue!

    Are there any statistical "tricks" for categorizing values in a normal frequency distribution other than using multiples of standard deviation?

    I have a histogram chart that plots price rate of change values that I would like to classify into "bins" to help compare current values with historical values, and to determine how to segregate "significant/anomalous" values (ones that might have predictive value) from the normal day-to-day changes.
     
  2. gaussian

    gaussian

    I assume you actually mean a normal distribution. In your post you use normal in two different ways.

    You bin data based on on observation buckets. You need to first understand the data, and then bin them appropriately. If you were binning a day's worth of data you may split buckets in 30 minute increments. If you're binning medical data you may do it by age interval. You can't beat your data into bins without losing predictive power. For general binning you can look into Sturge's rule but be aware most attempts to create a generalized binning method result in some error such as smoothing.

    If your data is not normal you will not extract a normal histogram for the data. To determine normality you can run a number of tests that are more powerful than a histogram. For example QQ plots, Jarque-Bera, or Shapiro-Wilks. In your example I would doubt the rate of change of a time series variable would fall into a normal distribution. You have to be aware that while many people simplify the process of determining an underlying distribution in trading by assuming a normal distribution this is almost always wrong and can lead to some surprising and expensive results.
     
    Last edited: Aug 14, 2020
  3. Sigue!

    Sigue!

    Very informative. Thanks. So far I have just used frequency distribution charts to look at the distributions of positive vs. negative rates of change and they looked like a normal distribution on the frequency histograms to this graduate of freshman statistics. Ha!

    So, assuming I do discover that the data is not normal, where might I go to get some ideas how to properly do some deterministic testing. Back to university?
     
  4. gaussian

    gaussian

    You don't really need to go back to university. You will need a good background in calculus and at the very least an introductory university text in probability theory. In order to make good choices you have to know what choices there are to make.
     
    taowave likes this.
  5. Sigue!

    Sigue!

    Gotcha. Thanks for the education. I'm sure glad I decided to ask. Here's a look at part of a spreadsheet I did for 1-day/bar rates of change for the ETF VTI. Now that even looks like a normal distribution without the histogram to my naive eyes. But then again it's just one run for one etf.
    upload_2020-8-14_21-2-23.png
     
    Last edited: Aug 14, 2020
  6. ph1l

    ph1l

    Calculating excess kurtosis might be helpful.

    https://www.investopedia.com/terms/e/excesskurtosis.asp
    https://www.macroption.com/kurtosis-formula/
     
    Sigue! and gaussian like this.
  7. In fact, making the calculations is one of the most difficult, but at the same time quite important moments, which ensure the literacy of calculations and accurate maintenance of statistics, and every trader must understand how this process is carried out and how it can be done in such a way as to find the most beneficial moments in trading itself or to determine what amounts can be safely used in the work. The main thing is not to try to make your calculations or results perfect. First of all, you need to stick to realistic indicators and build on your real capabilities. This is what allows you not to deceive yourself and always work as a real trader and professional.