Programming Journal

Discussion in 'Journals' started by honoruru, Dec 16, 2008.

  1. honoruru

    honoruru

    Ok, here I'll keep a record of some dabbling in code, using Qt/C++. The code will be sloppy and inefficient and the variable names will be random, but the goal is to create an easy environment to carry out tests, not to code a masterpiece.

    To start, there will be minimal compression to store tick data over a period less than 10 years. The data starts from 2002, and ends now, so we can offset Jan 1, 2002, 00:00:00 from the same date in 1970.

    Let's store date/time data with 1 second resolution in an unsigned 32 bit int. Nice and easy. To start, the price data has a tick size of 100*0.25=25, so I will divide each price by 25 so it will fit the data into an unsigned 16 bit int. I'll use a 16 bit int for quantity. For now, this will suffice. The data is in SQL with table names such as es0712 and we'll create files called, for example, es0712.dat.

    Serialization will happen via QDataStream. I'll further look into QFile::map() for easy memory mapping. For now, I'll just stick to reading off the drive since it is easy to do.

    The data lives in SQL, so here's the code I am using to get a table with fields (datetime, askbid, price, quantity). I'll ignore ask/bid and create .dat files with the name of the contract and the delivery. Again, this is nothing fancy - just enough to get ES and NQ into data files for now.

    Code:
    QSqlDatabase dbTIck = Database::connectTick();
    QSqlQuery qTick;
    QDateTime reference = QDateTime(QDate(2002,01,01),QTime(0,0,0));
    quint32 datetimeoffset = reference.toTime_t();
    for (int esnq =1 ; esnq < 3; ++esnq) {
    	QString esnqstring; 
    	if (esnq ==1) esnqstring=QString("es"); else esnqstring=QString("nq");
    	for (int i = 1; i < 5; ++i) {   //5
    		for (int j = 2; j < 9; ++j) {   //10
    			QString filenew;
    			QString month;
    			int ii = i *3;
    			if (ii > 9) month = QString::number(ii);
    			else month = QString("0") + QString::number(ii);
    			filenew =  QString("contracts/") + esnqstring +QString("0") 
    				+ QString::number(j) + month + QString(".dat"); 
    			QFile file(filenew);
    			if (!file.open(QIODevice::WriteOnly | QIODevice::Text))
    			return;	
    			QDataStream out(&file);
    			QString abc = QString("SELECT datetime, askbid, price, quantity FROM ")
    				+ esnqstring +QString("0") + QString::number(j) + month;
    			QSqlQuery query(abc);
    			quint16 ticksize = 25;   //es/nq tick = .25
    			while (query.next()) {
    			if ((query.value(1).toString() != QString("A")) 
    			&& query.value(1).toString() != QString("B")) {
    				QString thedatetime = query.value(0).toString();
    				QDateTime origdate = QDateTime(QDate(thedatetime.mid(0,4).toInt(),
    				thedatetime.mid(5,2).toInt(),thedatetime.mid(8,2).toInt()),
    				QTime(thedatetime.mid(11,2).toInt(),thedatetime.mid(14,2).toInt(),
    				thedatetime.mid(17,2).toInt()));
    				out << quint32(origdate.toTime_t() - datetimeoffset) << 
    				quint16(query.value(2).toInt()/ticksize) << quint16(query.value(3).toInt());
    			}
    		}
    	}
    }
    
    File indexing for the tick data will be based on the position in the file, since it is constant length. I've stored about 150 million ES ticks in about a gig - not great but certainly not terrible.

    Next will be time, tick, and volume bars, and we'll tie them to the indexing of the tick data with a simple map for time and volume bars (tick bars will correlate based on simple multipliers). I'll try to come up with an interesting list of intervals.

    After that, it will be time to create some simple classes to place orders and track performance. Nothing too fancy - just track the trades and equity and write them to another log file along with the source code.

    Then will come the fun part. Playing with the data and running various tests will be exhilarating. I have no idea what to expect, but even if finding trends and edges proves to be unsuccessful, it will have been a barrel of monkeys.

    I'll try to post the steps as I find time to do them.
     
  2. honoruru

    honoruru

    Ok,

    I've created an routine for tick bars..

    The tick data previously lived in SQL, and now it is in binary files as per previous post. I am using 8 bytes per tick for the original tick data. Total data so far for ES and NQ is around 1.5 gigs. The program is able to iterate through all of the tick data extremely fast - perhaps orders of magnitude faster than in SQL. So that settles it.. SQL is useless for this project.

    The tick bars are constructed with the same 32 bit datetime, and four 16 bit entries for OHLC. 96 bits per bar - this will be good enough since the file sizes are now really small. I am not sure if I really need quantity per bar, nor do I currently need to keep track of the number of ticks of the last created bar (not adding to the data later on).

    Code:
    	QStringList filters;
     	filters << "*.dat";
    	QDir directory("contracts");
    	directory.setFilter(QDir::Files | QDir::NoSymLinks);
    	directory.setNameFilters(filters);
    	directory.setSorting(QDir::Name);
    	QStringList rawfiles = directory.entryList();
    	QString fileprepend = QString("contracts/");
    	for (int i = 0; i < rawfiles.size(); ++i) {
    		QString currentfile;
    		currentfile = fileprepend + rawfiles.at(i);
    		QFile fTick(currentfile);
    		qint32 a;
    		qint16 b, c;
    		QString number = QString::number(i);
    		if (fTick.open(QFile::ReadOnly)) {
    			QDataStream in(&fTick);
    			qint16 hi, low, open, close;
    			qint32 datenumber;
    			int tickinterval = 100;
    			QString currentoutfile = fileprepend 
    				+ QString::number(tickinterval) 
    				+ QString("tick_") + rawfiles.at(i);
    			QFile fileout(currentoutfile);
     			fileout.open(QIODevice::WriteOnly);
    			QDataStream out(&fileout); 		
    			while (!in.atEnd()){
    				in >> a >> b >> c;
    				datenumber = a;
    				hi = b;
    				low = b;
    				open = b;
    				close = b;
    				for (int ii = 0; ii < tickinterval-1; ++ii) {
    					in >> a >> b >> c;
    					if (b > hi) hi = b;
    					if (b < low) low = b;
    				}
    				close = b;
    				out << datenumber << open << hi << low << close;
    			}
    			out << datenumber << open << hi << low << close;
    		}
    	}
    
    I might need to clean up the logic surrounding the end of file.. currently I am just dumping the data to a new bar at the end but this will probably create inaccuracies in certain situations.. the last bar isn't too important anyway since the contract will not be front month at that point.

    Again, the goal is to keep things simple. Making tick bars from tick data is very simple, so here we go. No problem. Assuming I created the tick bars correctly: There are 96 bits per tick bar, and 64 bits per tick. That is a ratio of 1.5:1.. so to correlate indexing between files is simple: use a multiplier 1.5*tickinterval.. I suspect that using memory mapping will be so super fast, and easy too since it is built into Qt's library.. :)

    Time bars should be next since they are also pretty easy. Volume bars are not too complicated either.

    I now have to decide how to best handle the chosen time/tick/volume bar intervals. In this example, I hard coded a 100-tick interval.

    More later.