Python advice on CME datafiles...

Discussion in 'Data Sets and Feeds' started by brokershopping, Oct 12, 2005.

  1. I am trying to format the globex intraday datafiles I have collected.

    http://www.cme.com/trading/dta/hist/ftp_gtimeandsales3098.html

    The files appear to be ascii with the data 'delimited' by it's column position. I had hoped to do all the work in Matlab, but I ran into problems. Matlab doesn't seem to do well with blank spaces, of which there are many. Matlab folks suggested using perl or python to do the dirty work of pre-processing the data. I decided I like the look of python, and it seems to be regarded very highly around here, so that is what I am going with.

    As a newbie, I don't know exactly where to start. There is so much info out there that my searches don't focus in on what I need. I was hoping someone here might be able to save me some google time by pointing me in the general direction.

    Any links to tutorials on file manipulation?

    What would be the best structure to store the processed data in?

    What functions will allow you to access data by column?

    Anything else I should know?

    Any advice would be appreciated!
     
  2. Personally I really like using pyTables. It's excellent for organizing your data whether it be character(ascii), numerical or even object arrays. This is considered a hierarchical database, not a relational db. I've never used a relational myself, I think this is much better.

    There are several pyTable structures. Tables (column aquisition by "name" ;-), N-Dimensional Arrays, Character Arrays, Nested Record Arrays, File Hierarchy, Meta-data, etc. It is built on top of HDF5 which is itself written in C, and was created by the NCSA. Sooo it's fast.

    Last thing I'll mention it's also built with Python's numerical package(s) which is also written in C, and this then provides seamless numerical computing right from pytables. I would recommend you convert your ascii files into numerical array/tables for easy processing.

    http://pytables.sourceforge.net/html/WelcomePage.html
    There is a mailing list if you have questions. Francesc and company are good about that.
    hth

    EDIT IN:
    A very large chunk of the matlab community has converted to python. There's an excellent 2d plotting package called matplotlib, and there's currently a thrust in the number crunching community about to bring out a newly organized numerical package in Scipy. Enjoy :)
     
  3. Hi brokershopping,

    Matlab? Not good at all for this kind of work. Very clumsy - you will agree, I hope, after you will be through.

    In fact your kind of problem is an excellent exercise to get going in Python. Of course, when you start out, it takes some learning and patience to find your way around, but I can assure you that Python is a most terrific tool!

    Perhaps the best way to tackle your problem is to look at:

    Python Cookbook #1 and #2 by Alex Martelli & others. Publisher: O'Reilly.
    Look in the index for "split". You will find everything you need.

    Good going,
    nono
     
  4. seconded ...beyond the basic documentation this is really all you need ...
     
  5. Hi guys,

    I have just loaded Python up this week on my backup PC and am still far away from anything real usefull.

    At this moment my needs are for some simple interfacing but I am wondering: is there any charting out there that is using Python?

    Thank you
    Maria
     
  6. Dad was a database specialist. He reckoned that there was too much hype about the relational databases and that in a lot of cases there is no need for a relational database. Performance wise the relational database is a dog. His notes state that in trading - with the sequential stream of data - there is no need for a relational database and that you'll handicap yourself severely (performance wise) if you were to utilise one.

    Maria
     
  7. This is some GREAT info!! Thanks a lot.

    BTW, a hint for new Python users wanting to use pytables: Don't forget to copy hdf5dll.dll, zlib1.dll and szipdll.dll into your system directory from the HDF5 website.

    It's in the install instructions, but if you're like me and jump in without reading the directions you might not catch it.
     
  8. You could also try in Access.
    Use an importquery and the file while be imported exactly as you want.
    The advantage might be that it is perhaps easier than in python, and it can be exported easily in almost any format.
     
  9. True spike,

    Access is a good tool for processing CME data. If you don't like windoz, Access runs great on wine it seems. Easier than python? This depends on the mechanic.

    Be good,
    nononsense
     
    #10     Oct 17, 2005