Contractors who suck

ScoobyStoo · Jun 23, 2009

Quote from lolatency:

This I semi-don't agree with.

If you're writing code to deal directly with the exchanges, you should use C/C++ and you should do it correctly. This is where there is real trading edge, because too many participants on Wall St. don't understand what makes this piece of the puzzle really work.

Even if your entire infrastructure is based on some high level language, the part that is the work-horse should be engineered the most carefully in the language that gives the most bang for the buck. The beauty of most languages is that they have bindings that make it easy to create interfaces to something written in C or C++, or you can just open a pipe/socket and use your carefully crafted engine.

OTOH, you may be right, as hiring people who know how to do this is very hard. The best people for this are in Silicon Valley, not Wall St.
More...

But why? As someone who has written apps in C/C++/C#/Java I simply don't see the benefit of writing your apps in a low level language such as C/C++ unless you are exploiting HFT inefficiencies. At these timescales network latency issues become your biggest concern.

Given the greater productivity inherent in using a high level language I just don't see a valid business case for C/C++ in the vast majority of use cases.

lolatency · Jun 23, 2009

Quote from ScoobyStoo:

But why? As someone who has written apps in C/C++/C#/Java I simply don't see the benefit of writing your apps in a low level language such as C/C++ unless you are exploiting HFT inefficiencies. At these timescales network latency issues become your biggest concern.

Given the greater productivity inherent in using a high level language I just don't see a valid business case for C/C++ in the vast majority of use cases.
More...

Even in the general business case, people underestimate the impact of a malloc() or a new in their code, or the cost of a constructor. I've seen, in the IBank world, people losing milliseconds because they have the idea that that 'new' is cheap. It isn't cheap, and neither are operations that convert from floats to text and such, in operations like sprintf(). In fact, these kinds of operations can add up to the point where network latency is not the biggest concern. Sometimes HLL programmers will abuse copy constructors, or go overboard with the use of classes like std::string. People also tend to abuse threads, ignoring context switch overhead.

You may be lucky in that you are in an environment where programmers are reasonable, but I tend to believe the college grad coming out of school now has no idea what it takes to facilitate a 'new'. In Java, 'new' is so natural to programmers they don't even stop to think what's happening.

... And, if you look very, very closely at latency, you would be surprised how much can be knocked out in just code before network latency becomes an issue.

propseeker · Jun 23, 2009

lolatency wrote:
it makes zero sense to sit around writing a parser in C++ for tick data.
More...

you changed your problem scope as a counter-argument to my reply. hard to have a discussion when the basic tenets change, but i think the underlying theme that you're consistently trying to get across, is that IF you work in c++, it must be by definition _low level_ or else what's the point. point taken, i just disagree.

you don't need to run getline's and searches and matches on an fstream to parse a tick file. you just don't. you don't need to regex or lex/yacc either. i don't know why you keep implying things have to be so complex. really, the bar is pretty low to match a simple few line python code parse... which _i thought_ you were using as the baseline comparison. also, in any modern vs ide debugging any weirdness in the tick file for the below is pretty trivial:

fstream tickfile("ticks.txt", fstream::in);
string line;
int date, time, tradsize;
double tradeprice;

while(tickFile >> line){
istringstream parse(line, istringstream::in);
parse >> date >> time >> tradesize >> tradeprice;
}

to me, that's not that much more complicated than some simple python code and probably equal or better in speed. things are as simple or as complex as you make them and in my opinion, things are the most simple in your area of expertise. mine happens to be c++, it sounds like yours is primarily python. i wouldn't pay $70/hr to a python programmer to try and figure out how to run a simple parse in c++ and output to an MFC window, just like i wouldn't pay that same $ to a c++ programmer to try and run a python console and ditz around with a wxwidget. and if either of those guys, working in their area of expertise, churns out overly complex code for simple tasks, then neither should have a job for very long.

lolatency · Jun 23, 2009

Quote from propseeker:

you changed your problem scope as a counter-argument to my reply. hard to have a discussion when the basic tenets change, but i think the underlying theme that you're consistently trying to get across, is that IF you work in c++, it must be by definition _low level_ or else what's the point. point taken, i just disagree.

you don't need to run getline's and searches and matches on an fstream to parse a tick file. you just don't. you don't need to regex or lex/yacc either. i don't know why you keep implying things have to be so complex. really, the bar is pretty low to match a simple few line python code parse... which _i thought_ you were using as the baseline comparison. also, in any modern vs ide debugging any weirdness in the tick file for the below is pretty trivial:

fstream tickfile("ticks.txt", fstream::in);
string line;
int date, time, tradsize;
double tradeprice;

while(tickFile >> line){
istringstream parse(line, istringstream::in);
parse >> date >> time >> tradesize >> tradeprice;
}

to me, that's not that much more complicated than some simple python code and probably equal or better in speed.
More...

My bad. To be fair, there was no way you could have read my mind. Still, I'd like to make it clear to you that your approach is still not scalable with a practical example. I'll use an example that isn't all that uncommon.

The code you have listed above is not really what I had in mind -- consider, a simple alteration to your case where there are various FIX tags, with their delimiters and such. E.g.:

8=FIX.4.2 | 9=67 | 35=8 | 49=PHLX | 56=PERS | 11=ATOMNOCCC9990900 | 52=20071123-05:30:00.000 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=102

(from wikipedia, not my own trading.)

In Python, for a given file, say 'test.txt', with lots of lines like the above, to extract the tag and value:
Code:
reader = open( 'test.txt', 'r' )
for line in reader:
    fields = line.split( '|' )
    for f in fields:
        (tag, value) = f.split( '=' )
        print "%s is %s" % ( tag, value )
Come back to me with something in C++ faster (to implement, not to execute) than that. It is not unreasonable to expect data in FIX like that. Your example starts to fail the simplicity aspect as soon as the slightest bit of complexity is added.

propseeker · Jun 23, 2009

if i had to parse tick data that looked like that i'd be bitching too! i DO have to parse real-time msgs that look like that, but that's done with a honed parser and then saved to disk in simplified proprietary formats. the internal formats allow for as complex or as simple of a parser as needed later on. controlling the data formats internally, imo, is where parsing should start.

but, since it seems i'm getting suckered into a general python vs c++ parsing debate... i won't disagree with you, python is great out of the box. i have my own c-style parser that gets called for complex stuff which is simple and is very fast, but that probably doesn't count. if i did have to use something out of the box in c++ though, i'm pretty sure boost::split would do the trick:
Code:
string line;
fstream tickfile("ticks.txt", fstream::in);
vector<string> fields, values; //[code] doesn't show the string <>
while(tickfile >> line){
  boost::split(fields, line, boost::is_any_of("|"));
  for(size_t i = 0; i < fields.size(); i++){
    boost::split(values, fields[i], boost::is_any_of("="));
    cout << values[0] << " is " << values[1] << "\n";
  }
}
i would say, implementation wise for a c++ coder, this would be roughly as fast to implement as your python code for a python coder. not significant enough to spill any tears over. there would be puddles though if you ever allowed them to pull out their profilers .

lolatency · Jun 24, 2009

Quote from propseeker:

if i had to parse tick data that looked like that i'd be bitching too! i DO have to parse real-time msgs that look like that, but that's done with a honed parser and then saved to disk in simplified proprietary formats. the internal formats allow for as complex or as simple of a parser as needed later on. controlling the data formats internally, imo, is where parsing should start.

but, since it seems i'm getting suckered into a general python vs c++ parsing debate... i won't disagree with you, python is great out of the box. i have my own c-style parser that gets called for complex stuff which is simple and is very fast, but that probably doesn't count. if i did have to use something out of the box in c++ though, i'm pretty sure boost::split would do the trick:

Code:

string line; fstream tickfile("ticks.txt", fstream::in); vector<string> fields, values; //[code] doesn't show the string <> while(tickfile >> line){ boost::split(fields, line, boost::is_any_of("|")); for(size_t i = 0; i < fields.size(); i++){ boost::split(values, fields[i], boost::is_any_of("=")); cout << values[0] << " is " << values[1] << "\n"; } }

i would say, implementation wise for a c++ coder, this would be roughly as fast to implement as your python code for a python coder. not significant enough to spill any tears over. there would be puddles though if you ever allowed them to pull out their profilers .
More...
Boost is cheating, but I think you know what you are doing so we'll just silently acknowledge each other's points. ;-)

lolatency · Jun 24, 2009

Quote from propseeker:

if i had to parse tick data that looked like that i'd be bitching too! i DO have to parse real-time msgs that look like that, but that's done with a honed parser and then saved to disk in simplified proprietary formats. the internal formats allow for as complex or as simple of a parser as needed later on. controlling the data formats internally, imo, is where parsing should start.
.
More...

This is why I started the thread -- I can't control the way the text files are. They're created for us by a third party, and they're wrong.

The vendor has some loose system where you have logging of tick type events, and those ticks are more or less marked so you can track responses to them. So, in essence, I need to take tick data, parse that, and then marry that together with what other elements of the system so we can follow and rewind market events and see what happened.

This goes back to the point of contractors and the wealthy hedgies who hire them. I know what I'm talking about, damnit, and these guys suck.