quick C++ STL questions..

EliteInterest · Feb 11, 2006

Quote from Spelunk:

There is no simple way of using iterators on a custom object like you are creating. A much more useful way to implement this is to organize your data into vectors of Open, High, Low, etc rather than by bars. This will be much faster, flexible and will allow you to use the standard vector functions.
More...

Thanks Spelunk,

That's what I figured. I thought of doing it that way - but the problem is, with the custom Vector of a Vector, I can grow each internal Vector row dynamically with calculated data.. was hoping to use iterators to simplify/expedite the process of accessing/manipulating the OHLC _and_ newly calculated data.. but what you are saying makes sense - I'll try it out and see if I can make it work. It is just nice and secure having all data for a given point in time tied together in a single vector - but as you have pointed out, there are more efficient ways.

I am wondering if STL is even the answer.. eventually speed will be a concern - for now I don't have piles of data, but when tick data becomes part of the equation in the future, fast access and well-organized data structures become crucial.. of course I read on the other threads about some of the other guys using databases to hold their historical data.. still kicking around some ideas here - just want to do it right, and build on that... seems more practical than kludging it all together and having to redesign when limitations are met.. especially once interfaced with a GUI, and combined with massive amounts of tick data. Maybe using C is a better approach as some of the others here are doing - much (much!) to learn..

Spelunk · Feb 11, 2006

I recommend you stick with C++ and stl it will reduce the number of bugs you create in the long run. C is the land of non dynamic arrays and objects which is just asking for pain and trouble. And you won't see a speed difference unless your program unwisely.

Stl vectors will not slow down your code. In the end realize that when you access a vector it is just like any other array. It just gives you the ability to grow it with ease.

When you get a new bar of data just push each value on a separate vector, not really any different. The real beauty however comes when making calculations on your vectors. Take a simple calculation of a moving average, the input to that function can be any vector of floats with the output being another vector of floats. In your bar type case, you need specialized code for every component.

Good Luck.

Quote from EliteInterest:

Thanks Spelunk,

That's what I figured. I thought of doing it that way - but the problem is, with the custom Vector of a Vector, I can grow each internal Vector row dynamically with calculated data.. was hoping to use iterators to simplify/expedite the process of accessing/manipulating the OHLC _and_ newly calculated data.. but what you are saying makes sense - I'll try it out and see if I can make it work. It is just nice and secure having all data for a given point in time tied together in a single vector - but as you have pointed out, there are more efficient ways.

I am wondering if STL is even the answer.. eventually speed will be a concern - for now I don't have piles of data, but when tick data becomes part of the equation in the future, fast access and well-organized data structures become crucial.. of course I read on the other threads about some of the other guys using databases to hold their historical data.. still kicking around some ideas here - just want to do it right, and build on that... seems more practical than kludging it all together and having to redesign when limitations are met.. especially once interfaced with a GUI, and combined with massive amounts of tick data. Maybe using C is a better approach as some of the others here are doing - much (much!) to learn..
More...

rufus_4000 · Feb 11, 2006

Quote from EliteInterest:

Maybe using C is a better approach as some of the others here are doing - much (much!) to learn..
More...

If I were you, I would stick with C++. I chose C for more or less idiosyncratic reasons, I ended up write a lot of base library in a OO design anyways (just using structures and function pointers).

While there is a speed difference between C and C++, heck, there is even a slight speed difference between just using memset versus iterate through the array and initializing (run it under something that counts off the instruction counter, say WMI, sometime). Also, standard tricks like data structure alignment to page, structure size to ensure cache hit, register variables, all can come into play in the land of almost "extreme" performance tuning (take a class in high performance graphics and rendering to see how far the envelope can be pushed).

But the speed difference won't matter that much in the scale of milliseconds, and anybody that absolutely require that type of speed tuning *must* know precisely what they are doing.

EliteInterest · Feb 11, 2006

Quote from Spelunk:

I recommend you stick with C++ and stl it will reduce the number of bugs you create in the long run. C is the land of non dynamic arrays and objects which is just asking for pain and trouble. And you won't see a speed difference unless your program unwisely.

Stl vectors will not slow down your code. In the end realize that when you access a vector it is just like any other array. It just gives you the ability to grow it with ease.

When you get a new bar of data just push each value on a separate vector, not really any different. The real beauty however comes when making calculations on your vectors. Take a simple calculation of a moving average, the input to that function can be any vector of floats with the output being another vector of floats. In your bar type case, you need specialized code for every component.

Good Luck.
More...

Thanks again. I do see the convenience of using vectors over C - which is why I am experimenting with them now - simple to implement, less prone to bugs, etc. Let's say I have four vectors - one each for OHLC.. they grow dynamically as the data grows - fine. Now where are you pushing your moving average to? The way I was doing it, I just had to push onto the vector of each row of data and have a numerical reference to the position in that row for further use.. the access/manipulation takes place not only on the OHLC data, but on the calculated data.. I understand that you can declare any desired number of vectors in advance, but what if the program uses some unknown, dynamic number of calculations? Maybe I want 50 moving averages, or whatever, at runtime, the number of which being accessible as a parameter from the GUI (when I get there).. Or maybe I am missing something obvious, where the program can declare a dynamic number of vectors at runtime.. Pardon my inexperience, still learning as I go.. I'll think about it some more, surely there's an easy solution. I am just trying to make it as flexible as possible, for expansion/reusability purposes. If I can code it right from the beginning, it will be much easier to build on later.

As far as your advice on C vs. C++ STL - certainly good advice. It would be interesting to know how much performance difference there is using STL containers vs. custom solution - maybe not much at all, as you suggest.

On a somewhat related note, I have rechecked my little code project on Linux again - I think I am underestimating exactly how much FASTER the Linux version is over the M$... I will have to learn more about coding on linux, in order to use precise timestamping to get an objective comparison, but it might be a full order of magnitude or more! There has to be something wrong with how I have VC71 set up, it cannot suck that much in comparison. (or maybe it can?) As Nono suggested earlier in the thread, it would be interesting to see how alternate compilers stack up on Windows, vs. the stock VC71 compiler. Maybe my speed issues are ultimately not a concern with C++ STL vectors, if I stick with Linux or some other compiler that is fast.. We'll see....

Spelunk · Feb 11, 2006

You need to envision a hierarchy of where things go. This is going to take you some time and nothing like trial and error.

But imagine you have a chart object. On that chart object you can display indicator objects. Each indicator has a set of inputs which consist of vectors of data and fixed variable inputs. It also has outputs which can be plotted on your display.

What I'm getting at is the indicator like your moving average has nothing to do with your security bar data ohlc. Except that the security may be used as input to the indicator. Once you see these as separate objects it will make it simpler to grow things.

I'm a professional programmer been doing Windows since 91. And while I have no love for MS they do create a standardized interface that works well. MS compilers and development systems are well known for generating quick code and for ease of use. I don't know where the speed difference came up that you are seeing but it's usually due to the program design. In the end the OS doesn't make much difference it is only used to display your app. Most of the code is generic calculations which should be the same on any OS. From this you can also see one of the principles is to separate your display code from the rest. Look up things like the MVC pattern. This may be overkill for your program and your level of programming, but it might make more sense once your program turns into a monster .

Quote from EliteInterest:

Thanks again. I do see the convenience of using vectors over C - which is why I am experimenting with them now - simple to implement, less prone to bugs, etc. Let's say I have four vectors - one each for OHLC.. they grow dynamically as the data grows - fine. Now where are you pushing your moving average to? The way I was doing it, I just had to push onto the vector of each row of data and have a numerical reference to the position in that row for further use.. the access/manipulation takes place not only on the OHLC data, but on the calculated data.. I understand that you can declare any desired number of vectors in advance, but what if the program uses some unknown, dynamic number of calculations? Maybe I want 50 moving averages, or whatever, at runtime, the number of which being accessible as a parameter from the GUI (when I get there).. Or maybe I am missing something obvious, where the program can declare a dynamic number of vectors at runtime.. Pardon my inexperience, still learning as I go.. I'll think about it some more, surely there's an easy solution. I am just trying to make it as flexible as possible, for expansion/reusability purposes. If I can code it right from the beginning, it will be much easier to build on later.

As far as your advice on C vs. C++ STL - certainly good advice. It would be interesting to know how much performance difference there is using STL containers vs. custom solution - maybe not much at all, as you suggest.

On a somewhat related note, I have rechecked my little code project on Linux again - I think I am underestimating exactly how much FASTER the Linux version is over the M$... I will have to learn more about coding on linux, in order to use precise timestamping to get an objective comparison, but it might be a full order of magnitude or more! There has to be something wrong with how I have VC71 set up, it cannot suck that much in comparison. (or maybe it can?) As Nono suggested earlier in the thread, it would be interesting to see how alternate compilers stack up on Windows, vs. the stock VC71 compiler. Maybe my speed issues are ultimately not a concern with C++ STL vectors, if I stick with Linux or some other compiler that is fast.. We'll see....
More...

EliteInterest · Feb 11, 2006

Quote from Spelunk:

MS compilers and development systems are well known for generating quick code and for ease of use. I don't know where the speed difference came up that you are seeing but it's usually due to the program design. In the end the OS doesn't make much difference it is only used to display your app. Most of the code is generic calculations which should be the same on any OS.
More...

You are probably right, but the code is exactly the same - in fact I took the code that was modified (very slightly) to work under KDevelop/g++, and put that back into VS2003 (complied fine with not a single change, of course) - still much slower than the gnu version - that is what I based the last comparison of ~10x speed difference upon. I will have to investigate further to see why it is so much slower on VS - there has to be a way to speed it up under Windows.

I can appreciate the last bit of advice from Rufus regarding optimization, but I feel that improving code that originally took about minute to simply read from a text file and push the data to the container, to 3 seconds by using a bit of C, and finally discovering that the improved code does its job in a small fraction of a second under Linux/g++, is a worthwhile pursuit. The program has gone from completely unacceptable, to useful, and now finally to being very fast, just from changing some code and trying a different platform... I'm not trying to overoptimize, but rather make it acceptable enough for now and the future. Now I will try to focus on growing the beast, since I am pretty stoked where it stands now on Linux.

I will think some more about your advice on structuring data, and run some experiments.

Thanks to all for all of your help so far.

ktmexc20 · Feb 11, 2006

Not to mention that nearly any software you might need is available for Linux. It is OpenSource and free as in beer. Stick with Linux. I haven't been dissappointed running pure OpenSource for a year now.

Just to inform and not to flame,
kt

btw, I'd be interested to hear of any tests with Intel compiler if that's you're arch.

nitro · Feb 11, 2006

Quote from Spelunk:

There is no simple way of using iterators on a custom object like you are creating. A much more useful way to implement this is to organize your data into vectors of Open, High, Low, etc rather than by bars. This will be much faster, flexible and will allow you to use the standard vector functions.
More...

You can do it.

To the original poster - look at

std::slice

It is not pretty, but it allows you to do what you want to do.

Imo, you should look into

http://www.oonumerics.org/blitz/

instead of trying to do it with STL using slice.

nitro

EliteInterest · Feb 12, 2006

Thanks, Nitro. I have added it to my list.

Installed Novell Linux Desktop 9 - has different quirks compared to RedHat - almost wish I could combine the best of RHEL and NLD..

The only source code difference for this test was that the Windows versions use ctime, clock(), CLOCKS_PER_SEC in my timelog function. The C++ style sys/time.h, gettimeofday() apparently will not work in Visual Studio without manually tracking down and adding sys/time.h to the include library - not inclined to do so for now... Anyway, as a matter of fact, I tried the linux version with both styles of timelog function, and the time difference was negligible.

Forgot to write down the compiler version on Redhat. Anyway, here are the results:
Code:
RHEL WS v4 x86_64:
It took: 0.0000629 seconds for this part: Program has been started
It took: 0.2767849 seconds for this part: Data pushed to Vector, file closed
It took: 0.1068301 seconds for this part: Add'l data calc's completed..
It took: 1.4785469 seconds for this part: Trades finished..


NLD9 x86_64, Kdevelop 3.0.3-4.14, kde 3.2.1, gcc 3.3.3-43.41:

It took: 0.0000789 seconds for this part: Program has been started
It took: 0.2372341 seconds for this part: Data pushed to Vector, file closed
It took: 0.1067100 seconds for this part: Add'l data calc's completed..
It took: 1.2746360 seconds for this part: Trades finished..


Windows XP Pro, VS2003:

It took: 0.000 seconds for this part: Program has been started
It took: 1.765 seconds for this part: Data pushed to Vector, file closed
It took: 1.453 seconds for this part: Add'l data calc's completed..
It took: 15.641 seconds for this part: Trades finished..


Windows XP Pro x64, VS2005Express:

It took: 0.000 seconds for this part: Program has been started
It took: 3.828 seconds for this part: Data pushed to Vector, file closed
It took: 4.218 seconds for this part: Add'l data calc's completed..
It took: 33.641 seconds for this part: Trades finished..
Will try to investigate further. I find it hard to believe that the difference is supposed to be so drastic. I forgot to try VS2005Express on XP Pro... not too important anyway.. more interested in the Win vs. Linux issue.

EliteInterest · Feb 15, 2006

This is interesting.

Installed Bloodshed 5.0 beta on WinXP x64.. the results leave both Visual's in the dust - very comparable to execution under Linux using native GCC. I tried Suse and the performance is almost identical (although the Linux versions in general are noticeably faster for the file read/vector push section).. got the best results by executing the version compiled under RHEL WS v4 on Suse - but these results on WinXP x64 / MinGW are pretty close (actually the big iterative Trade loop runs fastest here, by a small margin - was getting about 1.07-1.11 seconds under Suse)..

Of course, this shows that:

1) I have installed/configured Visual 2003 and Visual 2005 Express incorrectly..

-or-

2) The Visual compilers stink, and I should consider trying another if I continue to work with the Visual IDE. Intel, MinGW, not sure what else to try..
___

I wouldn't doubt if there is something that can be done to *significantly* speed up the program execution using the Visual compilers.. maybe someone knows? They are installed 'stock' - no mods or options changed. Again - the source is exactly the same as the test using Dev-C++/MinGW.
Code:
Windows XP Pro x64, Dev-C++ 4.9.9.2 (beta) MinGW/GCC 3.4.2


It took: 0.000 seconds for this part: Program has been started
It took: 0.734 seconds for this part: Data pushed to Vector, file closed
It took: 0.110 seconds for this part: Add'l data calc's completed..
It took: 1.046 seconds for this part: Trades finished..
I am pretty sure the new version 4 of QT Open-Source Windows Version installs (by default) MinGW, so maybe that is the direction I will go in now.. this way I can go back and forth between Suse and WinXP, as desired..