TickZOOM Decision. Open Source and FREE!

Discussion in 'Trading Software' started by greaterreturn, Dec 15, 2008.

Thread Status:
Not open for further replies.
  1. Okay, it's uploaded to youtube but it says it's "processing" the video. When it's done, I guess tomorrow, I can put a link on tickzoom.org.
     
    #141     Dec 23, 2008
  2. Sorry. Youtube doesn't work. It compresses the video until you can't read anything. The sound is good.

    Anyway, one visitor was nice enough to turn it into an SWF and I put that link on the page also. That looks better and plays easier but has poor sound quality.

    Hmmm didn't think it would be so much trouble to put a video up!

    Wayne
     
    #142     Dec 23, 2008
  3. janus007

    janus007

    Hi Wayne

    I have been thinking about storing ticks as binary files, I haven't tested it though, but it could be interesting.

    To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
    http://research.microsoft.com/apps/pubs/default.aspx?id=64525
     
    #143     Dec 23, 2008
  4. jprad

    jprad

    #144     Dec 23, 2008
  5. Graham1

    Graham1

    Yes, I would agree with the above.

    The main problem I see with flat files is if you are simultaneously collecting new data (adding data to the end of files) and running simulations on past data there are likely to be short periods of time during updates when the end of a file contains a truncated tick record (because the operating system and the library code generally only write integral multiples of 512 bytes at a time until a file is finally closed - unless totally unbuffered writes are used).

    Of course, I know nothing about the tickzoom code, so this may not be a problem.

    So feel free to ignore this post if this is unlikely to be an issue.

    My suggestion would be simple database tables, one column (also an index) a date/time value and another column a blob containing binary data for all the ticks within a fixed time interval (be it 15 minutes, an hour, a day or whatever is appropriate for that symbol according to it's tick frequency).

    Or some similar variation of this, such as a fixed number of ticks per blob.

    Unfortunately I can't volunteer to help as I have no c# writing experience.
     
    #145     Dec 23, 2008
  6. Berkely DB would be amazing with this.

    Isn't the obvious solution really to use both a DB and binary files?
    I dont see how you can get around that no DB will be as fast as a binary file but a binary file will not have the organization of a DB.
    What about seperating the collection of data from backtesting completely with some kind of preprocessing to a binary file from a db before backtesting even starts?
     
    #146     Dec 23, 2008
  7. Excellent reference on BLOBS!!!

    Okay, here's some facts.

    The article recommends that objects less than 256K be stored in the database.

    When collecting full ticks with every changed of the dom, that's currently about 100Megabytes per week on a single currency pair.

    (I'm actually planning to add a binary diff between ticks since they have minimal change between ticks--to reduce the file size.)

    Still it will be over 100 Megabytes per week.

    That 100 meg file takes about 2 seconds to load into memory without the engine running.

    I don't think making a separate BLOB per day makes sense. Too granular.

    Weekly blob is fine.

    Therefore that article recommends the file system be used for files that size. I agree.

    Folks, it will be a lot easier when sharing these files to just receive a file and plop it into the right folder so TickZOOM can find it than to have to load into some database.

    Of course, if it's not already in TickZOOM format, you will need to run a conversion.

    Also, there's an API to select ticks if you need to for some purpose--the same API TickZOOM uses to load ticks. Just submit a symbol and date range.

    So, especially after reading that article, I plan to go with a data folder which has a sub folder for every symbol used. Within each symbol folder it will have one folder per year.

    Inside each yearly folder will have just weekly tick files.

    The tick files will have a header which has handle for every 10000 ticks in the file. Each handle gives the timestamp and file offset.

    A weekly file will have approximately 1 million ticks.

    At 10,000 ticks per "chunk" that makes around 100 handles.

    So for every new file the engine creates it will reserve a header with space for 2048 handles (for potential expansion).

    It will endeavor to store the entire week in that one file and mark the offset to every chunk in the process.

    NOTE: One feature I would like to see is for TickZOOM to list the available symbols and date ranges available.

    So on startup, TickZOOM can scan all the headers and make an index.

    It can automatically check file modification dates in the directories at each startup to see if there's a new file dropped in.

    Doesn't that solve it?

    Databases only give value when objects or rows need to reference each other like joins or object references.

    In the case of ticks, there's none of that. Just raw time series data.

    Sincerely,
    Wayne
     
    #147     Dec 23, 2008


  8. Actually, that's not an issue.

    When the execution server/quote server gets a connection to trade live, it first loads history and sends that to the client.

    Meanwhile, it queues up the new ticks into memory.

    When the historical data has loaded, then it flushes the new ticks to the file and send them to the client.

    Finally, it starts writing tick simultaneously to the file (separate thread) and to the client for real time trading.

    Wayne
     
    #148     Dec 23, 2008
  9. This part is necessary and already happens. The .tck file must already be in TickZOOM binary format to load that fast each time for streaming technology.

    Did you notice the number of bytes per tick?

    At 159Mega as 11.7 million ticks, that's only about 12 bytes per tick.

    You can't achieve that with ASCII. Pluse, parsing the ascii is a very slow process.

    So you are right, that pre-processing to binary has to be done upfront.

    Wayne
     
    #149     Dec 23, 2008
  10. That is obvious, at first, until you try it.

    Remember, ticks must load to memory in less that 1 microsecond each. How can a database speed that up? Any over head even one single method call will add processing time.

    If you do it in large blobs then what value does the DB add? DB's excel at handling referencing between objects like joins in RDBMS or object references in OODBMS. Plus they're good at indexing and find records quickly that match some criteria.

    Storing blobs like images or videos in a database makes sense when you have many ways of querying them and you need to find 1 or a few at a time that fit different criteria.

    For example, you might ask, "show me the video clip with a woman, a child, and mountain." Then DB's excell at indexing and cross referencing to find which blobs that fit your criteria.

    (I engineered these types of systems.)

    However, with ticks, what selecting do you want? What else beside symbol and date range?

    Please, if someone can come up with some other realistic ideas for queries other than Symbol and Date Range then a DB would make more sense.

    Otherwise, it's easier all the way around to just use weekly Blobs of 100Meg each in a file system structure.

    Wayne
     
    #150     Dec 23, 2008
Thread Status:
Not open for further replies.