What is the best tool(s) to analyze data progammatically?

Discussion in 'App Development' started by narafa, Apr 14, 2020.

  1. userque

    userque

    Wow... ok ... that also explains your RAM appetite.

    I'm (currently) using a few hundred MB's per instrument, (min./ohlc data) per back-test. I plan to expand to several instruments per back-test (but less than twenty) in a few months or so.

    Hope you get it sorted.
     
    #41     Jun 22, 2020
    d08 likes this.
  2. 931

    931

    If using float datatype of 4bytes*63mil , 252000000 Bytes = 252 MB, of course there is more overhead but it might not be significant.

    I remember testing daily data at some point and i had about 10 to 20gb loaded in ram with ~80k stocks but many didnot have full 10 year data and few stocks were from 1960s.
    If i remember correct data structure was OHLC plus 1byte with 7 boolions containing info about data validity and potential problems.

    Many tasks could probably be accomplished by loading from disk and summing up even with 4 to 16gb ram on random laptop but fast workstation will speed everything up and you may not want to write first implementation of some test with all the effort going into buffering to disk etc..

    Another way is creating class that does the buffering etc and reuse it in algos.

    Before getting the workstation with 64 or more threads i would make a graph showing how algos throughput per thread reduces with increasing count of threads. It can be dif on various cpus also but knowing that you can decide weather you benefit from lower core cpu with faster cores or more cores and those tend to be slower in per core perormance.

    Testing algos that require fast jumping between stocks to test for some sort of market interconnection etc. will probably consume alot of ram and idk how to reduce usage, maybe in memory compresion but that also reduces speed.
     
    Last edited: Jun 22, 2020
    #42     Jun 22, 2020
    userque and d08 like this.
  3. igr

    igr

    Another advantage of going with Python + Pandas is that its API gets partially adopted by other products. For example, when you Pandas program hogs all the memory and crashes, it's very easy to adopt it for running on Apache Spark :)
     
    #43     Mar 7, 2021