best way to hack up an rdbms to approximate something like kdb for poor/dumb people

Discussion in 'Automated Trading' started by garchbrooks, Mar 23, 2010.

  1. Cloud Servers can scale in a similar way.

    They are relatively cheap and MS even provides a free azure cloud server to try.

    http://www.microsoft.com/windowsazure/offers/

    Take advantage of this offer to try a limited amount of the Windows
    Azure platform at no charge. The subscription includes a base level of monthly compute hours, storage, data transfers, a SQL Azure database, Access Control transactions and Service Bus connections at no charge.

     
    #41     Mar 24, 2010
  2. rosy2

    rosy2

    ok. i didnt notice that. but using a relational database for timeseries data has been discussed in other threads. its more trouble than its worth.

    look into python, pytables, scikits.timeseries, scipy, and hdf5. with that you can easily window over the day of ticks
     
    #42     Mar 24, 2010
  3. I figured out the problem with infobright. My CSV file had a space after the comma, and for whatever reason, the loader honored the space.

    If there's anyone out there ever googling for errors with infobright LOAD DATA INFILE, be skeptical of the 'ENCLOSED BY' operator, and format your CSV more appropriately.

    And btw, infobright solved my query speed issue. The queries are returned very quickly, no need for separate tables. This product is the solution for me.

    Thank you everyone.
     
    #43     Mar 24, 2010
  4. There is another open source product called Pentaho Data Integration (Kettle) that can be used to scrub the data & then load it into infobright. It works well for everything I've used it for so far.

    http://community.pentaho.com/
     
    #44     Mar 25, 2010
  5. The concept of “cloud servers” have been around for decades. As a bank DBA manager I was inundated with requests to outsource processing of the banks data. IBM and Oracle have spend over $50 billion and $30 billion respectively since 1995 to make sure that if a business did not have the back room talent to do IT that they would do it for them. This allows these companies great profit potential because you are initially “hooked” on the cheap cost of doing business. But as contracts are renewed the cost can be extremely expensive

    Cloud computing on smaller computers has grown out of the same concepts that IBM and Oracle pioneered for mainframes and mid ranges. The first widely accessible cloud was the elastic cloud two (EC2) computing infrastructure service from Amazon web services. Amazon Web services with its EC2/S3 was a pioneer for small businesses renting computers.

    The most important contribution to cloud computing since Amazon Web Services has been the emergence of "killer apps" from leading technology giants such as Microsoft and Google. When these companies deliver services in a way that is reliable and easy to consume, the knock-on effect to the industry as a whole is a wider general acceptance of online services. I agree that cloud computing will ultimately transform today's computing landscape.

    Microsoft is now trying to do with small business what IBM and Oracle did with large business. Microsoft’s objective is to get you ‘hooked’ with special intro offers. Then once they have all your data you will soon be inundated with subscription offers and their MSDN premium service. In the Garchbrooks case he will be able to get the fire power he needs to do the same as mainframe. The question is “Will he end up paying more in long run in subscription fees than it would to buy it out right and do it himself?” Remember the azure offer is only valid for 3 months. Microsoft is betting there is no way that you can put up an app in 3 months, do all of your computing and then terminate their services.


     
    #45     Mar 25, 2010
  6. Cloud server is just bs marketing hype. For Gods sake I wish someone would kill these morons who come up with these words just to make it sound new. Its just distributed computing via remote procedure calls people!
     
    #46     Mar 25, 2010
  7. yea, except all hardware and networking is under the hood, abstracted in a such a way that it's easily useable/available over the internet... that's what's new.
     
    #47     Mar 25, 2010
  8. FYI: The Ms cloud was just a suggestion for a free trial.

    For traders: an interesting alternative to colocating at Equinix is www.softlayer.com

    They are expanding their cloud with a point presence at Equinix Chicago (Cermak) with (2) x 10 GBPS connections.


    CloudLayer Computing Price
    Public Cloud
    Base Unit Options Hourly Monthly
    1 Core + 1GB RAM + 100GB SAN Storage $0.15 $99.00
    2 Core + 2GB RAM + 100GB SAN Storage $0.25 $159.00
    4 Core + 4GB RAM + 100GB SAN Storage $0.35 $199.00
    8 Core + 8GB RAM + 100GB SAN Storage $0.50 $299.00

    Hard to beat the scalability factor and their relative low cost subscription model either for a fixed monthly rate or by the hour.
     
    #48     Mar 25, 2010
  9. promagma

    promagma

    I am looking forward to try InfiniDB when it is released for Windows, probably in July. It is similar to Infobright but supports insert/delete/update.
     
    #49     Mar 27, 2010
  10. I am still trying to understand how using infobright or any other column based database is helping you. A good paper on this topic can be found here. Quoting from that article: "If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but analytic applications are typically looking at all rows and only a few columns" says Gartner analyst Donald Feinberg. "When you put that type of application on a column-store DBMS, it outperforms anything that doesn't take a column-store approach."

    Now, your typical query would bring back all data from all row all the time, wouldn't it? Except, well, if you are just comparing close prices of 2 or more instruments, or querying close prices only. In those cases, you can leave out the OHL data. But say for building bars etc., theoretically, there is no benefit.

    Maybe some benefit based on the fact that some of the columns are compressed? (again, see the article).

    I would be very interested in some hard numbers for comparison purposes. For example,
    a) total # of rows, total # of rows for MSFT, total # of rows returned for "select * where ticker="MSFT" and from=thisdate and to=thatdate (say for 2 years)
    b) same as above, but comparison between 2 stocks

    Time taken for the above queries, and of course your hardware.
     
    #50     Mar 28, 2010