High Performance/Speed CSV File Parser/Reader

Discussion in 'Automated Trading' started by CPTrader, Oct 8, 2010.

  1. There are so many csv parsers/readers o the web it is hard to know a reliable one.

    1. Can someone recommend a high performance, fast csv file reader/parser.

    2. How much time is required for a computer to read a csv file containing say (1000 - 2000) records, convert the strings to floating point or integer and load them into variables or an array for use in a program? I am sure this depends on the language, the computer specs and the quality of code; nonetheless, give me a range of time in s or ms.

    3. What can be done on the computer hardware/OS spec side to increase the performance of the csv parser code?

    Thank you.
     
  2. depending on the size of the file and the amount of data you can execute a task like this anywhere from a few hundred MS down to <1ms, it all depends on the amount of data.

    Depending on who you execute through there may be some good open-source stuff out there already.

    My morning CSV files are about 25mb all together and they process/upload in less than 500ms
     

  3. Thanks WinstonTJ, as always.

    Wow! that's fast. What are you using for parsing/reading? 25Mb in 500ms??!!

    My simple csv file is under 100KB at most and I can make it even half the size i.e. 50KB if I want.

    Does your 500Ms time estimate include converting the csv values to the right data type i.e string to floating point or string to integer? Does it also include the time to assign the floating point values/integer values to variables for use in your ATS? If the 500ms estimate does not incude these two steps, what is the time for the full three step process: (i) read/parse csv value, (ii) convert from string to fp or integer, (iii) assign fp or integer to variables?l

    Thanks.
     
  4. wave

    wave

  5. I'm just loading a bunch of historical data + index weightings into a database and into memory for the application. Its a one-time daily event, not a realtime thing. Its VERY quick, well under a second and 500ms is a guess.
     
  6. Many thanks!
     
  7. I'm fairly certain you could write a straightforward function in C code that could process this extremely quickly.

    Obviously the factors that would affect it are size (2000 records is nothing), CPU speed, and where the file is store (hard drive, SSD, memory, etc). I wrote a simple program that parses tick data, reformats it, and outputs it to a file, and it does about 700k rows in a few seconds I think (maybe 10?), and I didn't optimize it whatsoever.
     
  8. Code it in .NET to utilize multi core. Use a quad core system with at least 8Gb RAM and OS on one SSD drive and data on RAID 5 SSD.
    Then you can expect to parse probably about 100k records per sec or better.


     
  9. Gosh.

    First, a CSV parser is not something you can multi core. Even assuming you can split thigns up into multiple threads (destroying sequence in the process - not always wanted, even when parsing a CSV) the overhead is possibly a lot more than waht you gain.

    100k records per second is not THAT much when written efficiently. The bottleneck is the disc, and that is also something that can be handled (just use LARGE buffers to make IO efficient).

    The main question is what happens AFTER the parsing. That may slow down significantly. Loading 100k records into a database is hard when not done with bulk load (that starts mostly with files, not items in memory).
     
  10. njrookie

    njrookie

    you can use matlab csvread function. a piece of cake.
     
    #10     Nov 3, 2010