R vs MATLAB

Discussion in 'App Development' started by a-greenwell, Jul 25, 2012.

  1. Craig66

    Craig66

    +1 for R & Postgres.
     
    #31     Jul 29, 2012
  2. are you stupid or have some psychological issue or fetish for "free" stuff? in 2012 everyone knows what "open source" entails. Are you done rambling over the cost structure of open source?

     
    #32     Jul 29, 2012
  3. I would neither go with R or Matlab or Python. I dont like interpreted languages, even when writing efficient vectorized code in R you can still run circles around it with a very well written C# code and new .Net 4.0 libraries. Why I say that is because you end up using bits and pieces of many different programs and potentially programming languages to achieve what you want. Profiling? Hmm, maybe a bit R or Python? Backtesting? R? Distributed and/or Parallel? Matlab? High performance? C/C++? Ease of integration? None!!! If you are a mature C/C++ coder then stick to it. If you really want to step up from Excel and have to learn a new language anyway I can really only highly recommend C# on .Net 4.0. What gives you more flexibility than that? How much more can you tweak back tests than through completely customizable functions and code?

    My experience (I worked for many years as quant in rates exotics) and programmed in C++, now for years in C#) is that most quant newbies try to accomplish things in R and Matlab in banks and hedge funds. Why? Because it looks fancy when you show it to your boss. A nice cointegration chart from R looks better than some ugly numbers coming out of a console C# app. Also it gives the impression you are analytically versed, while coding just gives away the info that you know how to program, not necessarily that you are able to analyze data. So much these days is about perception, ideology, and making impressions. If you can afford to not having to sell your soul to a bank or hedge fund I highly recommend to look for a package/language that really lets you do what you want to do. Python is incredibly slow and also in R I have not seen a single library that could beat the processing power of my simply C# backtest platform. I process around 5million ticks per second, which includes loading the data in binary format from files, deserializing the data, merging multi symbol feeds, serializing, sending over a tpc stack, utilizing an open source library (ZeroMQ for messaging purposes), deserializing again at the recipient side, and running a high frequency strategy in back test mode, all that at 5mil ticks/second. Please show me a single package, distributed, parallel whatever, in Python, Matlab, R, that can accomplish that. Any.

    My point is that fancy graphics and an IDE with lots of libraries ( others wrote and most users dont really understand anyway other than invoking couple functions they expose) is great to get one started and it sounds like the way to go to get to the next level but is it really worth the investment? Python shows off crafty libraries with access to hdf5 db files, yet my simple binary file based datastore runs an order of magnitude faster than that. A lot of marketing and hype by those who do not want to get their hands dirty coding too much, but in the end you still suffer from the same limitation than any platform that does not expose a pure programming language, lack of flexibility. Maybe you end up with a little more code in C/C++/C# than R, Matlab,Python on basic functions and algorithms but once you require more flexibility you are stuck with interpreted stuff. (Note I am not implying all listed languages or packages I am not fond of are interpreted languages or based on interpreted languages).

    My recommendation in one sentence: Go with a real programming language, if you want to be professional and take your business seriously. R is to play/toy around with ideas, please do not tell anyone you are running live trading strategies in R, Matlab or Python, people who know this specific business will laugh at you. And why would you want to completely segregate your testing platform from the implementation platform??? I turn a switch and the historical tick based data feed passes control over to a live data aggregation and consolidation engine that feeds the very same trading strategies that I ran tests over just seconds ago. No code changes, no additional testing, no more unit testing, no nothing. Test, evaluate, run. Keep things efficient.

     
    #33     Jul 29, 2012
  4. Craig66

    Craig66

    All my heavy lifting is done in C++, R is good for certain types of rough prototyping and also for providing verification of numerical functions written in C++. Fully agree with the above, strategy code should be the same in both the back-testing and live environments.
     
    #34     Jul 29, 2012
  5. sle

    sle

    What exactly is "this" specific business? If you are looking at stuff tick-level, sure, go with C++. If you care about dynamics of implied volatility or simple technical indicators in a non-latency dependent way, why would I want to be f*cking around with memory, pointers and virtual functions? As I said, there are multi-billion dollar funds that are running fully in interpreted languages.
     
    #35     Jul 29, 2012
    exGOPer likes this.
  6. why? Because if you had efficiency in mind you would not want to have to port your code from your development environment to a live trading environment. Its highly inefficient and prone to errors. By the way, there is not a single 1 billion+ AUM fund I know of that runs live trading strategies off Matlab, R, or Python. Please provide the name of such fund and proof that they trade off such platforms to help everyone here expand their knowledge, at least I have never heard of such.

    P.S.: People "f*ck" around with memory, pointers, and virtual functions because they solve complex problems, problems python, R, or Matlab would have no answers for. Of course you can take 3 detours and wrap here and there and run C++ code within Matlab so you can declare to the world you are running off Matlab but it would defy any logic.

     
    #36     Jul 29, 2012
  7. sle

    sle

    Off the top of my head. In vol-arb space, Laurion runs exclusively off R, including risk management scenario analysis and that sort of stuff, they got 2 yards last time I checked. Winstone capital runs on R (not a yard yet, but close). In fixed income/rates, Prologue (a rates fund, something along the lines of 1.75bn) runs on Matlab, Pine River runs on python. I am not sure what kind of "proof" are you looking for, I have personal relationships with people at these firms and that's how I know whats going on there.

    You do realize that there is way more to quantitative trading then high frequency? Larger fraction of alpha-generating capital in the world is deployed in the areas where you do not need to (e.g. volatility arbitrage, equity long/short) or can not (e.g rates, credit) execute automatically. They do not need the worry about porting code and they do not need to worry about memory management etc.

    I had to write a fair amount of C++ code to improve efficiency of some of my models (e.g. generating historical volatility surfaces), but I still did not have to write any sort of frame work around it.
     
    #37     Jul 30, 2012
  8. Well, I am glad you shifted a gear back (by clarifying that you were not referring to algorithmic execution), yes there may be some guys who feed live data into R for signal generation purposes, but I would ask you to not walk the slippery slope in suggesting to people unfamiliar with Matlab or R that you can easily feed either with even the S&P500 constituents and that R, for instance, could handle it just fine. It could not be further from the truth. For starters, you need some heavy tweaking of memory management because otherwise the first thing R would respond with before shutting down is "not sufficient memory". Even running 500 symbols in live mode and only for signal generation purposes is something R was never designed to do, and anyone doing so must have one of the worst IT department heads, project managers and programmers on board. I can hardly think of a worse solution. I could whip up a Java or .Net app in a few hours that can display real-time data on a grid for 500 assets, something I am happy to challenge you to do in R or Matlab (heck Matlab has a hard time to even deal with the Bloomberg adapter, its far from being production stealth stable).

    With "proof" I meant some weblinks, there are numerous IT, hedge funds, buy side firms who publish papers or just inform about what can be done with technology they are perusing. I guess someone who runs R for all their signal generation purposes as you claimed would be more than happy to at least mention so publicly. And yes, sorry, hearsay does not really count as "proof". I am happy to revise my statement that I know none of the hedge funds in the billion USD+ league who use R as front end in live trading but I am unfortunately not convinced by you saying you have friends who work there. I used and tried R (and still use it for specific purposes) but I can tell you with a high level of confidence that R is not a product that shines running live data on 500 different symbols, computing indicators or metrics concurrently and outputting such data in as close to real-time as possible. But hey, challenge me, there must be one of those thousands of packages (that nobody knows how bug free they really are) in the R repository that does exactly that.

    Please do not get me wrong, R has its place and applicability but unfortunately its not in concurrency, asynchronicity, handling financial data feeds with hundreds of subscribed symbols (or do billion USD hedge funds these days only trade 20-30 stocks?), and outputting computed algorithms in a multi threaded environment.

     
    #38     Jul 30, 2012
  9. SamGold

    SamGold

    I'm stupid and I have a lot of fetishes. You got me. You're a genius. A high frequency one. Also you are amazing.

    Did I already tell you that I get MATLAB, SCILAB, R, C++, C#, and many others for free?. $0 cost. I don't pay for them.

    I also get my many women for free. Actually some of them pay for my software and a very select few for my hard, very hard, ware.
     
    #39     Jul 30, 2012
  10. This has turned into a pissing match. There's no argument that for control of memory management and other reasons C++ or some other low level language is best. But given the original post, someone who is currently using Excel/VBA, doing on the scale of 100 trades a day, and wants to do more backtesting...I think diving into C++ is going to be a daunting project vs using Octave, MATLAB, R etc.
     
    #40     Jul 30, 2012