HDF5 Layout for Multiple Stocks

Discussion in 'App Development' started by clearinghouse, Aug 31, 2011.

  1. The website you referenced talks about amazon and map reduce
     
    #21     Sep 14, 2011
  2. I've been wanting to take some time to study possible uses of map reduce, but have never gotten around to it. Sounds like a powerful setup.
     
    #22     Sep 14, 2011
  3. Map-Reduce at first I thought was a really difficult topic, but then as I dug into it what I liked is that it was a clean way to interact with data to set it up in different ways.

    You have your data in format X. But when data mining you want your format Y, which have if's and the likes in it. Normally you would write code to create that new format. With map reduce it is done for you by the server.

    The end result is that you have multiple questions answered for you, and data in a format that allows you to easily process it. I use map reduce to scan equities into a score on a scale of 0 to 6.

    In most cases map-reduce is something that is included in noSQL, but there is no reason why you can't write it yourself. Especially when you have access to the source code like in HDF5.
     
    #23     Sep 15, 2011
  4. ndrd

    ndrd

    Totally agree...Why do people feel the need to argue ever stupid point to death on here at the total expense of the thread itself.
    If you prefer SQL great, go make a new thread.

    I would love to hear more about what is in the topic as to me the biggest problem with HDF5 is the lack of educational material to the non specialist.
     
    #24     Sep 15, 2011
  5. first, great sub forum ModulusFE..It has been years since I've even wanted to check this forum on a regular basis. It would be nice if people interested in this sub forum self regulate as far as trying not to get too far off topic. If you like other things that is great but there is just not enough info on HDF5 for trading to clog up a good thread with debate, even if good debate.

    I've wanted to learn HDF5 for years but always run into a wall and get uninspired.

    Maybe we can try "One directory per day with each column broken out into a separate file" on a simple structure using yahoo data.
     
    #25     Sep 15, 2011
  6. This is a good idea as a test-case. If I manage to get some free time after the close, I'll give this a shot before doing something more complex and maybe write-back to the thread.
     
    #26     Sep 15, 2011
  7. rosy2

    rosy2

    i use hdf5 for futures. one file per symbol per day capturing all messages (N levels deep). for single stocks you would need a lot of space. As far as layout its the book.

    You can look into
    1) ssd
    2) a file system like ceph
    3) turn off all the OS stuff that updates when you write to disk like last access time or the like
     
    #27     Sep 16, 2011