Python - Read and split lines from text file into indexes.

Discussion in 'App Development' started by OTM-Options, Apr 28, 2015.

  1. This is one of the worst performance I have ever seen. I think even VBA can do better than that. This is what happens when you let amateurs lose on a linux environment. Goodness...well I am happy Python did the job for you, after all you can always brew some fresh coffee in between

     
    #41     May 8, 2015
  2. i960

    i960

    The fact that you seriously think all of that has to do with Python cracks me up. I don't even like Python but you're just grinding a pointless axe here.
     
    #42     May 8, 2015
  3. so, then where is a simple Python solution that reads text based data and parses it to columnar arrays? Because thats exactly what OP asked for. And it is generally an often required function, to read in time series and parse to columnar arrays. Let's compare performance...I am happy to whip up a quick C# solution and compare figures...

     
    Last edited: May 8, 2015
    #43     May 8, 2015
  4. i960

    i960

    WTF? If he uses the built in CSV library or panda libraries he'll have his array-of-arrays output to deal with. We're not dictating the actual lines for that because it's cookie cutter crap that anyone who understands basic languages will know how to deal with.

    When you talk about trading do you tell people how to open and close orders as part of a trade? No. That's the same reason we don't tell people how to deal with lists or arrays - it's common knowledge and not worth pointing out.
     
    #44     May 8, 2015
  5. C#
    Number Columns:= 10
    Number Rows: = 1,000,000
    Delimiter: ","
    Machine: i7-3930K (3.20GHz), 64bit Windows, 32gb memory, SSD drive

    Reading in a comma delimited csv file took an average (20 runs) of 5.01 seconds (1.33 seconds to read data from disk, 3.68 seconds to parse data from string to double and arrange in columnar arrays). Note that data parsing is involved here, so you end up with strongly typed data. And I used some Linq which is slower than a more optimized version. Also the parsing can be parallelized for large data sets which I have not done here.

    P.S.: Reading in 10 million rows and parallelize (5 threads) will cut the time to import, parse (strongly typed) and arrange in proper arrays down to 2.2 seconds total per 1 million rows. Memory consumption is extremely conservative and can be finetuned (which I have not done here)

    Let's compare numbers. Maybe Python will blow my mind and I will do all my text manipulations in Python going forward?


     
    Last edited: May 8, 2015
    #45     May 8, 2015
  6. Despite your talk you still would get a failed grade. If your stats professor asked you to calculate a covariance matrix and you presented a correlation matrix then you will get a point for ink usage but not much more.

    you are the one who dictates OP how to present HIS data. Are you doing the same to your customers, internal or external?

    So, for comparison sake, how fast is your Python implementation? Use Pandas or whatever pleases you, but present it in the end as being asked by OP. I am curious.

    By the way, I think Pandas, to my knowledge, makes very inefficient use of memory when reading a csv file. Imagine you have 10 million rows: Pandas, I think, does not read the csv/text file line by line but all at once and then processes it. That means the memory requirement will be twice as much as the data actually warrants. I am happy to stand correct on this last statement but I think I heard thats how it works.

     
    Last edited: May 8, 2015
    #46     May 8, 2015
  7. i960

    i960

    First off, if I were hellbent on speed I'd simply write it in POSIX C as I do most of the stuff I'm concerned about speed wise. Otherwise I'd write it in Perl. If I wrote it in Python I'd write it the straightforward route first - and then optimize if necessary.

    Stop being hard headed. You're talking to someone who's been doing this shit for over 20 years.
     
    #47     May 8, 2015
  8. i960, come on, we don't talk a few seconds difference here and you know that. For large data sets Python will choke and get down on its knees. I did not post my results to fight for milliseconds or 1-2 seconds but 120 seconds and 2 seconds is a difference, no? And when using Pandas you end up somewhere in the middle but still multiple times slower than a quickly whipped up C# version.

    Please try to get my point here, Python is incredibly slow for this kind of work and it should not even be the tool of choice for this. Much less should it be the tool of choice for an algorithmic trading framework. Its simply utter nonsense. (I say this to some on this thread who vehemently attack me just because I friendly pointed out their thought shortcomings when they presented an algorithmic architecture, written in Python, on their blog. )

     
    #48     May 8, 2015
  9. jj1111

    jj1111

    volpunter = Troll()

    if volpunter.attribute() like [atticus, riskarb, convexx]:
    volpunter.set_ignore(True)
    else:
    volpunter.set_ignore_anyway()

    Written on my kids Leapfrog, 30 Watts, two 9V Duracell's wired in parallel, 1.8 seconds. Even Python run on a Fisher Price is fast.

    Highly efficient, elegant code, that rapidly prototypes how most can save HOURS of their day by doing...
     
    #49     May 8, 2015
    i960, eusdaiki and volpunter like this.
  10. this is actually funny. Thanks for the laugh.

    what are you doing in the Programming thread anyway. Don't you belong into the Excel and VBA thread? ;-)

    [​IMG]
    http://www.elitetrader.com/et/index...trade-based-stops.290372/page-20#post-4117769

     
    Last edited: May 8, 2015
    #50     May 8, 2015