Python versus C++ Speed

Discussion in 'Trading Software' started by nononsense, Jan 29, 2006.

  1. DrChaos

    DrChaos

    In my experience, what happens is usually that the first version gets run only a few times and there is a long cycle of iterative improvement.

    Here, an interpreted/easily programmed environment like python is great.

    But then there comes a point when you want to do more sophisticated statistical analysis. For me, that may be large ensemble statistics, or checks based on randomization hypotheses. Lots of new statistical algorithms (e.g. Markov Chain Monte Carlo) are predicated upon repeated and rapid computations.

    "OK, that's great. Now let's see what happens over 10^6 bootstrap replications compared to ensemble average on 10^6 synthesized timeseries from a GARCH model with Bayesian parameter estimates. "

    And then, you really need the speed. All of it, and more. The problem with switching is translation error, that your new model isn't quite the same as the old one in another language.

    I don't think C++ is a very good numerical language for most uses, especially in early development. (I prefer Fortran 95---yes it's not your grandfather's Fortran).

    C++ programs with significant use of C++ features and libraries tend to have fat tails in the difficulty of their bugs.
     
    #21     Jan 30, 2006
  2. Yes ... and that is why once you write libraries that work well and are dependable you rarely switch......... For prototyping numerical computations I still fall back on Mathematica as often as not ..... I can write most things in less than 50 lines of code ... and that would be a big program..... When things scale up, we move out to C/C++. I have not touched Fortran since about 1990.... Most of the BLAS and similar libaries have been ported out to C/C++ - even on vector supercomputers......
     
    #22     Jan 30, 2006
  3. DrChaos

    DrChaos

    prt_systems: have you used Fortran 95, and I mean real 95 and not 77?

    Of course the libraries have been ported---the issue becomes when you write your own code, and debug.

    From C++ to F95 is a move up in most areas, except objects. It's usually a win, I've found.

    I don't know about many C++ implementations now, but all modern Fortran compilers have built in bounds checking and pointer dereferencing checks. Built in arrays know their own size, like Matlab. And unlike Java you can turn them off.
    (A Java program legally cannot because it is permissible to trap and resume execution on array out of bounds exceptions which also requires precise in-order array mutation semantics, also killing performance.)

    There are C++ libraries which do this too, but you end up with a plethora of incompatible ones when computational library A uses matrix class MA and library B uses matrix class MB, etc.
     
    #23     Jan 30, 2006
  4. Do you find it a hassle that Fortran is stored/read as column major rather than row major?

    I have been dedicating myself entirely to "C" and got the impression that nearly all Fortran sw development (particularly scientific) was being deprecated/translated in favor of C.
     
    #24     Jan 31, 2006
  5. Hi All,
    If you look at what scipy brings together, you will see that Fortran components play an important part in these math libraries. In fact these have been developed and exhaustively tested for numerical accuracy over many years.
    There exist several installation options for scipy but if I am not mistaken you always need a fortran compiler like gcc-fortran. Note that this is completely transparent for a Python user of scipy.

    In this sense, the heading I chose for this thread is misleading. I simply picked C++ because so much is written about it at ET.
    Your example of running GARCH under Python/scipy/rpy should not penalize you significantly. (I have never done this yet)

     
    #25     Jan 31, 2006
  6. Hi nonon,

    I thought they removed any need for fortran compilation. I was under the impression that scipy's old problematic building process (required: fortran compilation) was one of the main purposes for the nearly complete re-write.
     
    #26     Jan 31, 2006
  7. Here is an interesting comparison of various language ( or more accurately compilers/interpreters) benchmarks.

    http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=python&lang2=gpp

    Unsurprisingly gcc code is significantly faster than Python code.

    It is worth emphasising for benefit of non-programmers / beginners that this may or may not have any bearing on one's choice of language.

    For a start no program is an island. All programs use both libraries and operating system services. The libraries used may or may not be written in a different language than that of the calling program. For many programs the amount of CPU time executing library code may far outweigh that of the program itself. In this context execution speed of the program code itself might not be relevant.

    Secondly if a program is 'fast enough' then faster is not relevant. Modern hardware is fast and in many, many cases choice of programming language based on execution speed is not warranted. Much more important is speed/ease of development AND maintainability. In the real world, the ability to find staff with expertise in the chosen development environment will also be important.

    Finally quality of design, algorithms used, familiarity/genuine expertise with the development environment can have an enormous impact on performance and swamp any difference that choice of programming language may make.
     
    #27     Jan 31, 2006
  8. #28     Jan 31, 2006
  9. Yes and No.

    Yes on fast enough: I think I have said this here (over and over) ,,,,, The difference between a hack and a software engineer is that the engineer knows the limits of their design: they know under what conditions it breaks or saturates. Moreover, a good architect also knows how the breaking and saturation points factor into the needs of the organization in relation to its business plan and cost metrics.

    As far as staffing goes ... most software development can be outsourced at very low prices: Most companies that have significant software work are simply establishing offices in locations where they can utilize these resources. For the small company that does not have inhouse staff or resources to handle their own software needs this is potentially a problem since local resources will continue to become scarce - so before you venture down the road of custom software development you need to keep in mind how acccess to programmers etc will change over the next couple of years. If the only programmer is yourself or you are working with a limited staff then the easiest way to plan for the future is to use the lowest cost tool that will get the job done and in most cases this will be an open source solution.

    Speed and ease of development factor into the cost equation: you can give up costs in one place to get a net gain down the road.
     
    #29     Jan 31, 2006
  10. Yes. I have reviewed Fortran 95 - or compatible implementations. People still use fortran and there are still specialized libraries that exist only in fortran.

    However, even 15 years ago most general purpose numerical libraries were ported to C/C++. The situation today is that Fortran is used even less than is was .... If you know it and it meets your needs then that is great. Satellites and other launch vehicles still go into space using Fortran based computations in some cases so it should be good enough to get your work done.

    You are correct regarding libraries that are nonstandard. Our libraries represent about 20 years of testing and work: We use them and they are not for sale. They are dependable and accurate and we continue to improve them. Like anything else once you have something that works you just keep using it ...
     
    #30     Jan 31, 2006