c++ ohlc structs and ta-lib

stratman · Jun 1, 2012

I'm struggling how best to code my ohlc series data for performance during back testing. Typically, I am using Hourly, 15 minute, 5 minute and 1 minute ohlc data. I am only interested in persisting the most recent 100 bars of each.

I have ideas in mind but really concerned that they're too inefficient. I will be back testing years of tick data (10 GB) i.e. reading ohlc from a file/db. I thought I'd ask you guys of your opinion before I get too far down the wrong(?) path.

Should I only maintain a collection of 1 minute ohlc structs for this purpose( most recent 100 hrs worth) i.e. derive the other series from it? Or keep a collection of each independently?

For the indicator values, I use ta-lib which requires arrays of a single member of my struct e.g. array of close prices. What is an efficient way to create the array of close prices from a collection of ohlc structs?

What type of collection (map, list, vector, etc) would be best for holding the series of structs?

I read http://www.elitetrader.com/vb/showthread.php?s=&threadid=224217&perpage=6&pagenumber=1 that discusses what kind of collection (list, map etc) is best for ohlc data. Although interesting, it is not mindful of the need to get an array of close prices from the collection.

Appreciate any ideas you guys might have.

PocketChange · Jun 1, 2012

Skip OHLC trade prints and build out data using ask/bid price changes.

Keep last print as a reference but know that it will deviate from the market prices. A trade can be reported up to 3 minutes late and still update T&S.

Set your interval to the precision of your execution capabilities or data source. ie 1000ms. Record HLC for ask and bid inside this interval.

Symbol Timestamp - Ask High, Ask Low, Ask Close, Ask Size, Bid High, Bid Low, Bid Close, Bid Size, Last High, Last Low, Last Close, Volume

Now you can build out tick accurate consolidations to a precision of 1000ms. Inside of the second you won't know the order of HLC... But this range is much tighter than 1 min bars.

A consolidation for SPY RTH for the entire month of April at 25ms precision is only 110,000 records (distinct changes in NBBO).

Now build out wider consolidations: 1min, daily ... Calculate the high,low close inside these intervals and use these as helper tables to drill down to locate specific tick records.

ie. 2012-04-04 9:37:07.025 Long 1000 @ (Use Ask and check Size) 137.07.

Events: Stop 136.50, TP 137.50

Query the Daily Consolidation table to locate the day the range covers the events. Query the hour, minute and second consolidations to drill down and locate the specific tick.

These consolidation helper tables allow your queries to run very fast and are tick accurate up to your precision setting.

For most purposes a single hourly consolidation table for 10 years (20,000 records per symbol) and daily tick db set to 25ms precision can return accurate results incredibly fast. Scalable simple SQL setup ie. Sqlite with Consolidations running as in memory DB's attaching specific daily tick files... Can easily move these structures to Hadoop H-Base or Stream Based DB's.

stratman · Jun 1, 2012

Thanks for your reply. I didn't know much about large dbs and your comments about scaling up to those got me into reading about them. It's not quite what I'm after though as the database part is already supplied. In fact the strategy engine is supplied to. What I'm writing is a DLL that is portable across multiple brokers platforms. I write code in the broker platforms to wrap my dll. All the broker platform requires is to be able to connect to my shared library (windows or linux).

The broker supplies the price feed via my wrapper (written in their scripting language) into my shared library. My shared library creates the OHLC and uses ta-lib for producing indicator values. My trading algorithm uses that data to send back trading commands via the wrapper into the trading platform.

It works fine in real-time i.e. live forward trading but I'm concerned that my library is not using good practice for efficiency during back testing. This is why I'm asking about how best to store the OHLC in memory during the back test.

I don't want to have any connection from my dll to anything else (db, files etc) i.e. only to the broker platform via my wrapper.

slacker · Jun 1, 2012

Quote from PocketChange:

For most purposes a single hourly consolidation table for 10 years (20,000 records per symbol) and daily tick db set to 25ms precision can return accurate results incredibly fast. Scalable simple SQL setup ie. Sqlite with Consolidations running as in memory DB's attaching specific daily tick files... Can easily move these structures to Hadoop H-Base or Stream Based DB's. [/B]
More...

Thank you PocketChange.

Selecting from Sqlite, MySQL or Postgres which would you use?

How many inserts and updates can you do in a second using off the shelf hardware?

Thanks again

PocketChange · Jun 2, 2012

Sqlite batch processing daily historic message files: (avg 2GB)

PRAGMA synchronous=OFF
PRAGMA count_changes=OFF
PRAGMA journal_mode=OFF

Inserts within a Transaction (100K chunks) using prepared statements.

The db setup is capable of 100K+ inserts per second.

Use whichever db your fluent with optimizing... Oracle has nice windowing and analytic functions (lag, Lead etc.) SQLite in memory db simplicity and performance is hard to beat.

For our case we store consolidated ticks in daily DB's by symbol sets and have historic consolidated helper DB's of 1 minute and daily intervals. Ticks are captured at 25ms precision RTH's only.

Quote from slacker:

Thank you PocketChange.

Selecting from Sqlite, MySQL or Postgres which would you use?

How many inserts and updates can you do in a second using off the shelf hardware?

Thanks again
More...

stratman · Jun 2, 2012

I think I didn't explain correctly what I am trying to do.

Good info on db's. I had a look at your suggestions as database work is something I'm into. I've been programming since 1981 ... started with cobol (punch cards) on a mainframe.

I have code I use for live trading on multiple trading platforms e.g. metatrader, JForex, etc. I write a wrapper for my library in the language of the trading platform. The trading platform provides the ticks and places trades. Like I say, all working well.

Under backtesting within each platform, I'm concerned about performance of my c++ code. As I bring the ticks in, I create OHLC structure for M1, M5, M15 and H1. I only require last 100 bars of each for my algorithm.

I have 10 yrs of tick data for my dukascopy JForex platform on each instrument. The strategy tester (JForex) gets tick data into my c++ library via the wrapper I wrote (in this case using JNA). I do not wish to rewrite my c++ code in java. I just want to make sure that my part (c++) is performing optimally.

I use the c++ ta-lib indicator library inside my own c++ library. The ta-lib functions use arrays of prices e.g. open, close etc. Because my data is stored in a collection of OHLC structures, there's a cost to get the array of all closes from the array of structs.

I'm interested in how others may be holding this data in memory. An idea I had was to hold only M1 series and create the M5 etc 'on the fly' but there also would be a performance hit during back testing. I also considered holding all the ticks for the most recent 100 hrs. But how would one supply an array of M15 bar close prices to ta-lib functions in a high performance manner?

cheers
Rob

jordanf · Jun 5, 2012

To be honest most if not all the functions in ta-lib are pretty trivial and you probably only use a small subset of what is in ta-lib so why not code the indicators yourself?

2rosy · Jun 7, 2012

Quote from stratman:

[B

I'm interested in how others may be holding this data in memory. An idea I had was to hold only M1 series and create the M5 etc 'on the fly' but there also would be a performance hit during back testing. I also considered holding all the ticks for the most recent 100 hrs. But how would one supply an array of M15 bar close prices to ta-lib functions in a high performance manner?

cheers
Rob [/B]
More...

structs are fine. maybe a 4 element array is faster

bar[OPEN]
bar[HIGH]
bar[LO]
bar[CLOSE]

januson · Jun 7, 2012

Hi...

I've created a class for this:
public interface IBarData
{
IList<Common.Entities.ITickData> TickData { get; set; }
int BarNumber { get; set; }
DateTime? BarStartTime { get; set; }
double Close { get; set; }
double High { get; set; }
bool IsComplete { get; set; }
double Low { get; set; }
double Open { get; set; }
int Volume { get; set; }
}

The TickData can be enabled for debugging, so don't think about that.

I'm able to compress around 2million ticks per second into a 15 min OHLC/ Bar.

I don't keep all the bars in memory, but rather let them stream through and do my calculation on the fly.
Depending on the nature of the indicator/ calculation this can be 200 bars, 21 bars etc.