What is the best tool(s) to analyze data progammatically?

Discussion in 'App Development' started by narafa, Apr 14, 2020.

  1. reinghar

    reinghar

    I spent the first couple of decades of my career working in tech so I'm overly familiar with this dilemma. If I were making the decision, I would look at the intangible things you'll need to do later.

    If you're just doing analysis, R is probably better.

    If you need to perform any general-purpose programming to bootstrap your framework or think you every may need to, you'll likely want to choose Python. The tooling is much better.

    I have built projects in both languages, but I primarily use Python because of the ecosystem. Want to build an API and deploy it on kubernetes on GCP or AWS, no problem. Need to deploy a model with SageMaker or Tensorflow, Python is the best choice. Want to build a bot that uses the model you just built, hard to beat Python.

    If that isn't confusing enough, now we have Julia https://julialang.org/, which is more performant than either R or Python and is growing quickly.
     
    #11     May 11, 2020
    Atikon likes this.
  2. My opinion is to use Python, with the libraries Pandas(panel data), SciPY(scientific python), talib(technical analysis library).
    Pandas is confusing to learn at first. But once you get to know its quirks the above combination gives ultimate flexibility in what you expect.
    I also have used above tools to analyze large price data sets over and over and produce results.
    One of my main problems in starting with pre made software like MT4/5, ninja trader etc are they do not offer flexibility
     
    #12     May 13, 2020
  3. Atikon

    Atikon

    What did you use Panda for? I've read that it's great for data integrity/cleansing
     
    #13     May 14, 2020
  4. Pandas is the library for holding tabular data in “dataframes” and can do all sorts of manipulation and analysis with that data. For e.g date, time, open, close, high, low will be a data frame object.
     
    #14     May 14, 2020
    Atikon likes this.
  5. 931

    931

    Custom testing platform written in C++?
    How do you plan to gather the conditions/patterns?
     
    Last edited: May 24, 2020
    #15     May 24, 2020
  6. narafa

    narafa

    Thanks a lot for the feedback. I will have a look at Julia as well.

    The objective is primarily analysis only, so I guess R should be sufficient, however, I won't rule out future expansions, so I guess I will pass on R and look for a more general purpose programming language for that.

    Thanks again.
     
    #16     May 25, 2020
  7. narafa

    narafa

    Thanks a lot. I am looking to do statistical and pattern analysis, hundreds of combinations, any idea on a library for Python which can help on that for time series data?

    PS: It's nothing to do with technical analysis.
     
    #17     May 25, 2020
  8. narafa

    narafa

    I guess that's probably the library I would be needing, depending on what statistical tools that would be available with it out of the box.
     
    #18     May 25, 2020
  9. narafa

    narafa

    I guess C++ would be too complicated & time consuming for me to do the job.

    I plan to write/code the conditions myself (or with the help of a friend who runs a development company), run them and gather the results. Certain conditions/patterns are almost identical but with tiny differences.
     
    #19     May 25, 2020
  10. 931

    931

    Over the years I have bodged together a C++ program that may solve many tasks you are likely to face.
    I could implement your specific idea with pre existing codebase or clean it up and sell it.
    It offers high flexibility if custom solution needed as almost all functions are reinventing the wheel and good speed as multithreaded fileloading, data filtering , testing etc.
    Com to outer software is over socket connections, GUI is based on Qt ide,graphing on opengl.
    Started the project on windows 6-7years ago, atm ported to linux due to icc compiler speed advantages and better cuda support.
    Currently I am working on binary caching to disk using lz4 compression to support quickly swaping in thousands of low timeframe stocks from random locations at higher than disk speed.
    There still is alot to do , more ideas for improvements than time.

    But I don't offer it for free, and lots of code is not well readable or documented.
    I think in systematic trading most code goes into the surrounding stuff that supports it , core algo might easily be less than 1-5% of codebase.
    But timewise could be spending most on core.
     
    Last edited: May 25, 2020
    #20     May 25, 2020
    d08 likes this.