quick C++ STL questions..

Discussion in 'Trading Software' started by EliteInterest, Jan 26, 2006.

  1. Hi,

    Does anyone know how to use Standard C++ to efficiently convert a string to an int or float? This code below uses a template to convert an STL string to type int - but when processing a 50k line CSV file with OHLC (plus date, time, volume, etc). data, it takes a huge amount of time to process.. it is by far the biggest bottleneck - at first I thought it was the stream methods for loading/buffering the ASCII CSV file from the drive, or perhaps pushing back onto a nested vector, but it turns out those are rather fast, and this conversion is quite slow in comparison.
    Code:
    #include ...
    template < class T>
    bool from_string(T& t, const string& s, ios_base& (*f)(std::ios_base&))
    {
    	istringstream iss(s);
    	return !(iss >> f >> t).fail();
    }
    
    int main(...)
    {
    string field;
    int iconvert(0);
    ...
    from_string< int>(iconvert, field, std::dec);
    ...
    }
    
    I might try resorting to char[]'s and methods such as atoi() or atof() - or sscanf(), or strtod() - and compare the performance - but would much prefer to keep it Standard, modern and type safe (as the above template does).

    Ideally the conversion from ASCII to numerical types, pushed into an STL container would be a one time process; read the data and subsequently store the information in an additional file in some type of binary format (this is called serialization?). I am not sure of where to start with this process - I would like to store standard numerical types such as ints and floats into a binary file, and be able to later reload them in the fastest manner possible. As such, using a template (or any method for that matter) to convert a string into an int will be a big performance hit. I would like the program to read int (or whatever numerical data type is used) directly from a binary file - no conversion - and fast.. Perhaps someone knows of an online resource for such techniques, or can point in the right direction? Maybe there is a great book that discusses modern techniques for this? This way, the performance of the initial conversion from ASCII to numerical types becomes a non-issue (although it is still nice to understand/implement the 'right' or 'best' way to solve a problem).

    Thanks.
     
  2. .....Take a look at boost.org and lexical cast.....
    or just roll your own .... its pretty easy.
     
  3. You might find this a useful performance comparison. On an Athlon 2800 (Barton) with an ordinary disk it takes 1.9 seconds to do a Select * on a 90,000 row table of OHLCV data in a MySQL database. Linux 2.6.15 kernel.

    If you really want to get performance do it in C. For reading/writing binary files use memory mapped files for best performance.
     
  4. Cool, thanks. I'm getting there - trying to use <i>strtok()</i> and token separators to separate the data, and <i>strtod()</i> to convert to type <i>double</i> (as an example):

    Code:
    typedef vector< double> LINEOUT;
    vector< LINEOUT> bardata;
    ....
    char * npt;
    char * ept = 0;
    const char tokseps[] = " /\,;:";
    double convertit;
    while( fgets(linee,128,datafile) )
    {
        LINEOUT lnt;
        npt = strtok(linee,tokseps);
        //cout << npt << " ";
        //convertit = strtod(npt,&ept);
        //cout << convertit << " ";
        //lnt.push_back(convertit);
        while (npt)
        {
            npt = strtok(NULL,tokseps);
            //cout << npt << " ";
            //convertit = strtod(npt,&ept);
            //cout << convertit << " ";
            //lnt.push_back(convertit);
        }
        //cout << endl;
        //bardata.push_back(lnt);
    }
    
    ..running into some issues with it crashing here at the end of the inner <i>while</i> loop, although the C book I am referring to (C Primer Plus 5th) shows an example for <i>strtok()</i> in precisely the same way.. if you see something I am missing (before I find the problem), I would appreciate the help.

    As far as your other suggestion of memory mapping - wow, lots to learn - google is so fascinating sometimes - thanks for that advice.
     
  5. <i>strtod()</i> is not happy after the string is empty - the inner <i>while</i> loop completes the last iteration before breaking, and <i>strtod()</i> does not want an empty string, perhaps? I'll find it. Sorry for thinking out loud.

    It was easier using C++ strings and streams, but I suspect, as you have suggested, that using C will be the key..

    Code:
    while( getline(fin,line) )
    {
        LINEOUT templine;
        while ( (pos = line.find_first_of(' .')) >= 0)
        {
            line.erase(pos,1);
        }
        while ( (pos = line.find_first_of("/\,;:")) >= 0)
        {
            string field = line.substr(0,pos);
            line = line.substr(pos+1);
            from_string< int>(iconvert, field, std::dec);
            templine.push_back(iconvert);
        }
        from_string< int>(iconvert, line, std::dec);
        templine.push_back(iconvert);
        bardata.push_back(templine);
    }
    
    
    This only works because the CSV file has the OHLC data in a fixed XXXX.XX format, so it would store an ES value (for example) of 1300.00 as 130000.. I like the other way better, but just have to find the bug.
     
  6. A small tip. Never use strtok() for anything. Ever. It is one of the worst functions in the standard C library. Always use the reentrant strtok_r () instead.
     
  7. holy! so much faster (now that its working) - reads the entire 50k line CSV file in 2 seconds!! it tooks about 58 seconds using strings and streams.

    This is probably a kludge, but it works:

    Code:
    while(npt)
    {
        npt = strtok(NULL,tokseps);
        if(npt)
        {
            convertit = strtod(npt,&ept);
        }
    }
    
     
  8. Code:
    error C3861: 'strtok_r': identifier not found, even with argument-dependent lookup
    
    ..msft always has to spoil all the fun. hope i can get vc71 compiler to work with it.. and if not mistaken, it seems that vc71 is not c99 compliant (unless i am missing something).. argh, there are so many cool c99 functions i am reading about..
     
  9. sorry, 5 seconds (after pushing onto vectors) - still much faster. 2 is without the pushes. 5 is still significantly better than 58..
     
  10. lnt.reserve(15);

    now its 3 seconds.. yes, i'm pretty happy.

    (already had bardata.reserve(55000);)

    have to learn how to count the lines in the file and use that variable instead of 55000...
     
    #10     Jan 27, 2006