TickZOOM Decision. Open Source and FREE!

greaterreturn · Dec 23, 2008

Quote from Trader922:

Wayne - Very impressive so far. I had some issues with the mp4 format though. Although i resolved the, quickly, others may not be able to.

You may be able to upload the video to YouTube & link to it from there. I think Camtasia or Cam Studio can export videos in that format.

Eric
More...

Okay, it's uploaded to youtube but it says it's "processing" the video. When it's done, I guess tomorrow, I can put a link on tickzoom.org.

greaterreturn · Dec 23, 2008

Quote from Trader922:

Wayne - Very impressive so far. I had some issues with the mp4 format though. Although i resolved the, quickly, others may not be able to.

You may be able to upload the video to YouTube & link to it from there. I think Camtasia or Cam Studio can export videos in that format.

Eric
More...

Sorry. Youtube doesn't work. It compresses the video until you can't read anything. The sound is good.

Anyway, one visitor was nice enough to turn it into an SWF and I put that link on the page also. That looks better and plays easier but has poor sound quality.

Hmmm didn't think it would be so much trouble to put a video up!

Wayne

janus007 · Dec 23, 2008

Quote from greaterreturn:
So if we simply make a separate folder for each Symbol. And drop tick files in there which each have a "header" with the offset to every 10,000 tick boundary, that will be very fast and easy to work with since you just drop files there whether they cover 1 year, 6 months, 3 months, etc.

That's my thought at present and I'm open to well tested ideas in that area.

Sincerely,
Wayne [/B]
More...

Hi Wayne

I have been thinking about storing ticks as binary files, I haven't tested it though, but it could be interesting.

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
http://research.microsoft.com/apps/pubs/default.aspx?id=64525

jprad · Dec 23, 2008

Quote from janus007:

Hi Wayne

I have been thinking about storing ticks as binary files, I haven't tested it though, but it could be interesting.

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
http://research.microsoft.com/apps/pubs/default.aspx?id=64525
More...

A BLOB is a Binary Large OBject, like images, audio, etc.

You might want to check into Berkely DB, which is free for personal use: http://www.oracle.com/database/berkeley-db/index.html

Graham1 · Dec 23, 2008

Yes, I would agree with the above.

The main problem I see with flat files is if you are simultaneously collecting new data (adding data to the end of files) and running simulations on past data there are likely to be short periods of time during updates when the end of a file contains a truncated tick record (because the operating system and the library code generally only write integral multiples of 512 bytes at a time until a file is finally closed - unless totally unbuffered writes are used).

Of course, I know nothing about the tickzoom code, so this may not be a problem.

So feel free to ignore this post if this is unlikely to be an issue.

My suggestion would be simple database tables, one column (also an index) a date/time value and another column a blob containing binary data for all the ticks within a fixed time interval (be it 15 minutes, an hour, a day or whatever is appropriate for that symbol according to it's tick frequency).

Or some similar variation of this, such as a fixed number of ticks per blob.

Unfortunately I can't volunteer to help as I have no c# writing experience.

good one · Dec 23, 2008

Quote from jprad:

You might want to check into Berkely DB, which is free for personal use: http://www.oracle.com/database/berkeley-db/index.html [/B]
More...

Berkely DB would be amazing with this.

Isn't the obvious solution really to use both a DB and binary files?
I dont see how you can get around that no DB will be as fast as a binary file but a binary file will not have the organization of a DB.
What about seperating the collection of data from backtesting completely with some kind of preprocessing to a binary file from a db before backtesting even starts?

greaterreturn · Dec 23, 2008

Quote from janus007:

Hi Wayne

I have been thinking about storing ticks as binary files, I haven't tested it though, but it could be interesting.

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
http://research.microsoft.com/apps/pubs/default.aspx?id=64525
More...

Excellent reference on BLOBS!!!

Okay, here's some facts.

The article recommends that objects less than 256K be stored in the database.

When collecting full ticks with every changed of the dom, that's currently about 100Megabytes per week on a single currency pair.

(I'm actually planning to add a binary diff between ticks since they have minimal change between ticks--to reduce the file size.)

Still it will be over 100 Megabytes per week.

That 100 meg file takes about 2 seconds to load into memory without the engine running.

I don't think making a separate BLOB per day makes sense. Too granular.

Weekly blob is fine.

Therefore that article recommends the file system be used for files that size. I agree.

Folks, it will be a lot easier when sharing these files to just receive a file and plop it into the right folder so TickZOOM can find it than to have to load into some database.

Of course, if it's not already in TickZOOM format, you will need to run a conversion.

Also, there's an API to select ticks if you need to for some purpose--the same API TickZOOM uses to load ticks. Just submit a symbol and date range.

So, especially after reading that article, I plan to go with a data folder which has a sub folder for every symbol used. Within each symbol folder it will have one folder per year.

Inside each yearly folder will have just weekly tick files.

The tick files will have a header which has handle for every 10000 ticks in the file. Each handle gives the timestamp and file offset.

A weekly file will have approximately 1 million ticks.

At 10,000 ticks per "chunk" that makes around 100 handles.

So for every new file the engine creates it will reserve a header with space for 2048 handles (for potential expansion).

It will endeavor to store the entire week in that one file and mark the offset to every chunk in the process.

NOTE: One feature I would like to see is for TickZOOM to list the available symbols and date ranges available.

So on startup, TickZOOM can scan all the headers and make an index.

It can automatically check file modification dates in the directories at each startup to see if there's a new file dropped in.

Doesn't that solve it?

Databases only give value when objects or rows need to reference each other like joins or object references.

In the case of ticks, there's none of that. Just raw time series data.

Sincerely,
Wayne

greaterreturn · Dec 23, 2008

Quote from Graham1:

Yes, I would agree with the above.

The main problem I see with flat files is if you are simultaneously collecting new data (adding data to the end of files) and running simulations on past data there are likely to be short periods of time during updates when the end of a file contains a truncated tick record (because the operating system and the library code generally only write integral multiples of 512 bytes at a time until a file is finally closed - unless totally unbuffered writes are used).

More...

Actually, that's not an issue.

When the execution server/quote server gets a connection to trade live, it first loads history and sends that to the client.

Meanwhile, it queues up the new ticks into memory.

When the historical data has loaded, then it flushes the new ticks to the file and send them to the client.

Finally, it starts writing tick simultaneously to the file (separate thread) and to the client for real time trading.

Wayne

greaterreturn · Dec 23, 2008

Quote from jdeezero05:

What about seperating the collection of data from backtesting completely with some kind of preprocessing to a binary file from a db before backtesting even starts? [/B]
More...

This part is necessary and already happens. The .tck file must already be in TickZOOM binary format to load that fast each time for streaming technology.

Did you notice the number of bytes per tick?

At 159Mega as 11.7 million ticks, that's only about 12 bytes per tick.

You can't achieve that with ASCII. Pluse, parsing the ascii is a very slow process.

So you are right, that pre-processing to binary has to be done upfront.

Wayne

greaterreturn · Dec 23, 2008

Quote from jdeezero05:

Berkely DB would be amazing with this.

Isn't the obvious solution really to use both a DB and binary files?
I dont see how you can get around that no DB will be as fast as a binary file but a binary file will not have the organization of a DB.
More...

That is obvious, at first, until you try it.

Remember, ticks must load to memory in less that 1 microsecond each. How can a database speed that up? Any over head even one single method call will add processing time.

If you do it in large blobs then what value does the DB add? DB's excel at handling referencing between objects like joins in RDBMS or object references in OODBMS. Plus they're good at indexing and find records quickly that match some criteria.

Storing blobs like images or videos in a database makes sense when you have many ways of querying them and you need to find 1 or a few at a time that fit different criteria.

For example, you might ask, "show me the video clip with a woman, a child, and mountain." Then DB's excell at indexing and cross referencing to find which blobs that fit your criteria.

(I engineered these types of systems.)

However, with ticks, what selecting do you want? What else beside symbol and date range?

Please, if someone can come up with some other realistic ideas for queries other than Symbol and Date Range then a DB would make more sense.

Otherwise, it's easier all the way around to just use weekly Blobs of 100Meg each in a file system structure.

Wayne