Trilogy of MATLAB, R and Python in quantitative trading

Discussion in 'App Development' started by gmst, Apr 12, 2013.

My most preferred tool for quant strategy development and research?

  1. R

    18 vote(s)
    29.5%
  2. MATLAB

    18 vote(s)
    29.5%
  3. Python

    16 vote(s)
    26.2%
  4. I am old school and I make my millions using the power of Excel+VBA

    9 vote(s)
    14.8%
  1. gmst

    gmst

    A lot of funds use MATLAB whereas a large group of funds are R based. Some big ones like AQR have started to make heavy use of Python for creation of research infrastructure and strategy development.

    I am creating this thread to hear from the ET residents active in quantitative trading - which of these tools is your choice and why? If you use multiple tools (among R, MATLAB and Python), all the more power to you! It will be very interesting to have a debate on which tool you find most appropriate for what kind of tasks and why.

    Specifically, lets discuss:
    Relative advantages/disadvantages of these 3 languages to develop research infrastructure and strategy development.

    Following the philosophy of Ray Dalio, lets have an open, argumentative, passionate "no holds barred" kind of debate and lets uncover the truth!
     
  2. misaki

    misaki

    Can't wait for someone to hijack this thread and start talking about the merits of C# from nowhere.

    Weak aspects...

    Python: Fragmented packages - and hence, multithreading issues, documentation, lack of a robust IDE for finance.

    MATLAB: Compiler support on Linux. gfortran 4.3 omg. Bet I can find an automated trash can in Tokyo with a later gcc installed. String parsing. Memory management is non-existent. Language does not lend well to polyglotism, e.g. vectorization pattern, one-based array indexing (although IMO it interfaces quite well with Java and Perl; with C/C++ only for mex)
     
  3. MATLAB:

    Advantages: Easy to use, it is a standardized application with complete backwards compatibility of all of it's functions. Extensive help files, documentation, support. Easy to compile libraries and extensions which should always work. You don't have to worry if a new version will no longer compile your old code. Contains so many toolboxes that are not just linked to statistics. Usually hassle-free and quick development.

    Disadvantages: Not open source, so it's more difficult to change the very basic functions that are inside, or verify how they work. While this becomes a problem only for the most detailed solutions, extending/modifying existing functions becomes a wall more often than you'd think. Sharing code or libraries demands that the others in your team all have the same installed, but that's not such a big issue with the standardized library they have, however, licensing issues are often unclear on what you can share, even though they say you can share anything "royalty free". It's also not free so obtaining multiple licenses can get quite expensive, especially if you only want to use a small part of what you are paying for.

    R:

    Advantages: Open source, free, lots of code for just about anything you can Google out. Fairly large community for this type of software. You can modify the code that you find, so that means that it's "open source" in a sense that there are possibilities to choose what source you look at and modify, and how. Very widely used by the quantitative finance community.

    Disadvantages: New versions can break old code, so it's really important to know your dependencies and what has changed etc. This can complicate things quite a lot on more complex projects, and often requires that once you install a version of R, you don't update anything. Very little help files, no real books on learning to use R, so the learning experience can get tough, especially for non-programmers. Some odd things happen all the time due to either lack of documentation or lack of version control.

    My thoughts:
    R is far more widely used than MATLAB in this business, and it can get things done just as well as MATLAB but will often require more time to get it working as perfectly as you want. R is better for sharing code and/or libraries, so it's better if you work in a team and need full control of your code. MATLAB is better if you mostly work alone and don't need to modify the low level stuff. I personally use MATLAB and C++, and i use R when i can't find something i need for MATLAB, which happens sometimes. I prefer R overall, so will be switching to it more and more.

    Python:
    Well, it's a good language to hold everything together. I wouldn't do speed-critical data processing with it, but it's easy to understand and a lot of people aren't programmers so this is an easier language for them. If you need any kind of performance-critical code, I'd go for C++ since it has better compilers, and for larger projects better IDEs as well, imo. It really depends what you use it for, so there's no right answer.
     
  4. r and matlab rock, matlab being easier to use but less open.

    both can have functions compiled to dlls (matlab easier), so easy to roll into more comprehensive open source solutions yet retain r/matlab analysis benefits.
     
  5. Makis

    Makis

    Plus one for R. Using it for quick prototyping, research and analysis on tick data. My only gripe is that it chokes with very large data sets but you can get around that with divide and conquer. Also the fact that it was designed without multithreading in mind limits its usage on certain tasks. There are several packages that attempt to remediate those two issues though.

    Big supply of packages and constant introduction of new packages is the biggest advantage over matlab.

    I wouldn't put python and R under the same umbrella. Like other areas you could do everything with a single language, but this is not the most optimal solution.
    R is great for statistical analysis and research, prototyping of strategies and model validation, but when it comes down to asynchronous tasks, it is the worst solution anyone can choose. That is where python starts picking up. Backtesting, simulation and anything that is event driven can be better implemented on python compared to R.
     
  6. 2rosy

    2rosy

    Python:
    simple syntax and is a programming language; not only numerical crunching

    R:
    tons on packages. it lets me pretend to be a statistician

    Matlab:
    I used it in college and recently now at work. There's a nice gui but it doesn't appear to have changed. OO is still the same. And do I have to have a separate file for every function I write? From what I see matlab isn't programming but rather getting data, loading in a matrix, and flipping it around via indexes in an unreadable way.
     
  7. misaki

    misaki

    @2rosy:

    Yes, you reminded me of two other problems with MATLAB. (1) Weak OO model. (2) Since 2012b, the GUI has gotten worse with a toolstrip that I can't understand.

    I have to say a few good things about MATLAB though. 3D plotting is one of them. I see many comments about the beauty of Python from a developer's POV, but from a physics POV, I think MATLAB is more beautiful to the eye and closer to the language that I think in. tic/toc is not the right way to do it, but it's easy to throw in - if you are just looking for cheap bottleneck removal in a one-use analysis, why not? The dot notation is elegant and has no equivalent in Python as far as I remember. The documentation is written in intuitive language, e.g. blsprice mentions "volatility", "yield" as argument names etc. In the spirit of fragmented, Pythonic module documentation, everything seems to be a nondescript "n" or "k".

    Not one language feature is suitable for everyone. Some poor features of MATLAB improve its code clarity:

    % Global namespace
    help gridtop

    # Why do I have to do this to get help documentation?
    from django.contrib.gis.geos import GEOSGeometry

    % Array assignment
    myArray = 1:50;

    # I like zero-based array indexing but sometimes it gets ugly
    from numpy import *
    myArray = r_[1:51]
     
  8. Butterball

    Butterball

    I am using Access databases to store a couple hundred instances of data and R to analyze and graph it. Its easy to use, very powerful and free. I am sure Matlab is at least as good but I never bothered to get into it being as happy with R as I am.

    I am trader longer-time frames (weekly rather than hourly/daily) so performance was never a big argument for me.
     
  9. Murray Ruggiero

    Murray Ruggiero Sponsor

    In both R and Python we can write packages in C++ so , that will deal with speed issues for heavy math.
     
  10. C# is too slow, for HFT only C will do...
     
    #10     Apr 25, 2013