K-Nearest Neighbor Algorithm on Astronomical Data for Trading Signals

Discussion in 'Strategy Building' started by ph1l, Jul 7, 2025 at 9:39 PM.

  1. ph1l

    ph1l

    This example shows how to use the k-nearest neighbor algorithm on data that one might assume has no relationship to markets. It uses daily price data of ETFs for evaluation and only generates entry and exit signals for long trades.

    Get Ephemeris Data

    Use https://ssd.jpl.nasa.gov/horizons.cgi to retrieve csv ephemeris data for an observer from a fixed location at a fixed time each day viewing target celestial bodies:
    • Azimuth
      Direction of target body from observer measured clockwise from north with values 0 through 360 degrees.
    • Elevation
      Position of target body above or below the horizon with values -90 through 90 degrees.
    • SkyMotion
      Rate of movement of the target body in the plane-of-sky (vertical plane an observer would see). Values are in arcseconds per minute.
    • SkyMotionPA
      Angle of movement of the target body in the plane-of-sky measured counterclockwise from north with values 0 through 360 degrees.
    • RelVelAng
      Angle of movement of the target body relative to an observer's line-of-sight with values -90 through 90 degrees where positive values mean the target body is moving away from the observer, negative values mean the target body is moving toward the observer, and 0 means the target body is moving perpendicular to the observer's line-of-sight.
    For this example, I use New York City as the location, 20:00 UTC as the daily time, 1957 through 2057 for the observation period, and target bodies Mercury, Venus, Mars, Jupiter, Saturn, and the Moon, The ephemeris data shows the location of the target bodies in the sky and how they are moving.

    Get Price Data
    This example uses up to about 25 years (20000522 through 20250609) of daily price data (adjusted for splits and dividends) for 88 ETFs with different asset classes, market caps, regions, and sectors. The ETFs have at least 18 years of data available.

    Preprocess Data
    To make it easier to measure the distance between two sets of ephemeris data, each combination of target body and attribute is scaled from 0 to 100 percent of the range of that combination over the 101 years of readings. Azimuth and SkyMotionPA use the theoretical range of that combination (0 through 360) instead of the observed range.

    For a given date of price data for a particular symbol, the value is assigned 1 when the open price two trading days from today is higher than the next day's open price; 0 otherwise.

    Store the combined scaled ephemeris data and price signal data as training data.

    Run K-Nearest Neighbor Algorithm to Generate Trading Signals
    For each date, symbol in the evaluation period starting 20171205 (30% of data for ETFs with data starting 20000522; less for other ETFs), calculate scaled ephemeris data for evaluation.

    For each date, symbol in the training data whose date two trading days later is less than an evaluation record, calculate the Euclidean distance between the evaluation record and training record. The distance for attributes Azimuth and SkyMotionPA is circular, so the distance between 1 and 99 is 2 for them.

    Then, for the 100 smallest Euclidean distances, calculate the mean price signal. A mean price signal >= 0.6 means enter long at the next trading day's open price, and a mean price signal <= 0.5 means exit an active long trade at the next trading day's open price.

    Results
    The evaluation data resulted in 2475 simulated trades with a mean profit 2.7%, median profit 1.8%, mean winning trade profit 5.35%, mean losing trade loss 3.16%, win rate 69.78%, mean trade duration 22.3 trading days, and median trade duration 14 trading days. Trade duration includes the day of entry and the day of exit. The simulated trades do not account for slippage and commission. All 88 of the ETFs were profitable. See the attached, semicolon-delimited astro_results_summary.csv and astro_trades.csv for details.

    To compare the results with similar-duration buy and hold trades, I calculated the simulated long trade result of each overlapping 22 trading day period (open price to open price) in the evaluation data. The mean profit for 164,296 instances was 0.66% which is less per trade than the system results.

    Summary
    While one might think ephemeris data cannot be used for trading, this example makes it appear otherwise. This does not mean the positions and movements of celestial bodies like planets cause prices to change, but instead tries to show those positions and movements can be predictive.
     
    Baron and trismes like this.
  2. Sekiyo

    Sekiyo

    Profitable long only SPY lol
    My monkey brain can do it too.
    Does it even beat the SPY itself ?

    Show me a profitable short only SPY and I'll be impressed.
     
    Last edited: Jul 8, 2025 at 7:46 AM
  3. Post hoc, ergo propter hoc
     
  4. Sekiyo

    Sekiyo

    Correlation does not imply causation.
     
    apocolypse-forecaster likes this.
  5. SunTrader

    SunTrader

    Although it seems promising using daily data is a non-starter for me. A lot happens during a trading session and basing a strat on each individual daily bar makes assumptions about how and in what order those things happen. Tick by tick is ideal, minute by minute the otherwise bare minimum requirement.
     
  6. ph1l

    ph1l

    Trading lower than daily timeframes never really appealed to me, because I think it would need more of my time watching markets.

    This example used the open price for entries and exits which should be fairly close to filled prices (ETFs seem to usually have small bid-ask spreads). The trades averaged about a month duration, so slippage at the opens is something one would have to live with.
     
  7. ph1l

    ph1l

    Causation is not necessary for prediction, and I mentioned that in the post.

    But speaking of correlation, using the QuantDare method, I calculated the three-year, daily return correlation among the 88 ETFs used. Since many of the correlations are low, this suggests the method is valid because it uses the same rules on each ETF. See the attached semicolon-delimited astro_backtest_corrmatrix.csv.
     
    Sekiyo likes this.
  8. SunTrader

    SunTrader

    Disagree, but nonetheless good luck.