Server for DB/Stats

Discussion in 'Hardware' started by sle, Dec 23, 2011.

  1. sle

    sle

    I am try to buy a server to use for data storage and statistical models. Since I am not a great hardware maven, I could use some advice. Here are the basic inputs:

    (1) going to collocate it. Not nessesarily in any R/T collocation facilities, but away from my home to increase redundancy

    (2) OS is probably going to be some sort of LINUX, most probably RedHat or maybe one of the Debian flavors

    (3) Database will consist of 1-min data for global equities and futures/indices plus EOD data for options and corporate bonds. So we are talking about 5+ TB of data if I had to guess

    (4) A fair number of updates are going to happen in real time, but I am not leveraged on latency or speed of updates

    (5) Every now and then the server will run rather large statistical calculations and the data sets have to be memory-based, so I recon I'd need 5+GB ram

    (6) I am paying for this out of my own pocket, so I have to control the costs (I know I can buy a 32GB/10T dell server for 10k)
     
  2. I sent you two PMs - I sell traders VMs exactly what you describe.

    You have a few mis-conceptions though. your tick database can be anywhere. the daily config files are less than 2mb. you can host data anywhere in the world and not need colo.

    you said colo for redundancy - why not have a second box at home. if your house burns down are you really going to worry about the strategy>?

    in terms of HW, run a Linux Server LTS OS - 10.0.4 or whatever the latest ubuntu OS is. put it on a dual-core @ 3.2-4.0ghz with 3gb RAM and you'll be fine - NICs are what you will worry about next.

    OS: RedHat, Debian.... good luck & more power to you.... Ubuntu Server LTS is what I see more than not.

    you are looking at 25-30TB of data and you have no idea how to store or manage it.

    #4 i have no idea what you are talking about please describe

    #5 - ram is sized to system I/O. 12-24gb DDR3 is cheap these days, especially in a VM.

    don't ever buy that crap from Dell - buy barebones on ebay or buy from dell factory (or buy Super Micro) but don't ever buy new dell stuff.

    i have no idea who you are but it seems that you need some guidance & counseling. if you are even remotely serious please let me know before you buy. I do this for a living but more importantly I can at least educate you so that you know how much of a premium you are paying.
     
  3. rosy2

    rosy2

    i would use a cloud service.
     
  4. macdice

    macdice

    For what it's worth, here's my low budget hobbiest market hacker approach: for storing historical data, doing various kinds of crunching and backtesting, I have a couple of super cheap machines at home filled up with the largest consumer grade 5400RPM 'eco' drives I could get (pre Thailand floods we were at around £50 per 2TB drive here on my island) configured as software RAID arrays. Since I don't need much real CPU grunt (well, I do actually, but I am prepared to wait!) I use HP Microservers which are small black cubes that can hold 5 x 3.5" SATA drives and 8GB of RAM each but have low power sluggish AMD64 CPUs. The machines set me back only a couple of hundred quid each + disks and RAM, and quietly whir away on a bookshelf in my hallway drawing something like 50W each fully loaded with spinning metal. If you're paranoid about losing your data (and let's face it, the cost of acquiring the data over a few years may vastly exceed the hardware it's running on depending on how you go about it, and/or be hard to impossible to replace) you could arrange to stick a machine at a friend's house for offsite backups or replication. Alternatively if you need even more space and/or CPU grunt are allowed ugly loud hardware in your house you could easily build power hungry towers crammed with squillions of drives and controllers out of cheap bits.

    As for running automated trading systems, that's a different kettle of fish altogether, for that it's probably worth renting VM or dedicated near the exchange or broker you're interested in, something with fast IO and great connectivity, but probably not mountains of disk... you can pull all data back to home base after hours, or stream it throughout the day. Not sure about RAM, all depends what you're doing (the same work in a tightly written C++ program using almost no memory might require gigabytes in Java...) It's a different profile of machine altogether than a tick database and backtest/munching box.

    I personally use Debian and love it. for headless boxes. (But many of my Debian-using friends have defected to Ubuntu and swear by it).

    That's my penny pinching 2c.
     
  5. nitro

    nitro

  6. 10TB should cost no more than $2,500-$3k if you put it in a server that's going to be rack-mounted.

    You also have solutions like Ubuntu One or Backblaze which offer very cheap unlimited or semi-unlimited network/cloud storage but access speed will be an issue here (perhaps just use it for backing up data not for daily access).

    The prices quoted here are way off the mark. For reference, you can build a 135TB database for less than $8k. $10-$15k should get you ~250TB.

    http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

    These two threads are moving in tandem but I posted here as well because there were prices and budgets mentioned.

    As I said in the other thread - $2k gets you a 10TB array that you can keep locally - and then for $5/month you can pay for Backblaze service and use that as a cloud backup solution - just upload incrementally.

    I also host databases but I'm not here to sell my services, just to help steer people in the right direction in terms of HW and budgets.