How to build a 16TB box for less than $5000

Discussion in 'Hardware' started by WinstonTJ, Jul 20, 2012.

  1. dyson

    dyson

    My server is split into 3 zfs pools: work ( 6 x 2 ), media ( 8 x 2 ), and temp ( 2 x 2 ).

    The work pool contains my historical data and support files. I have been adding to my tick collection over time (now around 2.5 TB). Unless you are using tick level data, daily and minute take up only minimal space ( < 200 GB).

    The media pool contains my DVD (mkv/x264) and CD (mp3 & flac) collections. The media pool is only for streaming to other devices; no encoding. All transcoding was done over the years on other (faster) machines. I would not recommend using a i3-21xxT for encoding.

    The temp pool contains my nightly backups, etc. This pool is for everyday use.

    Most home-users may have a simple home NAS of some kind (drobo, etc). I started out with a 4 bay NAS which I quickly grew out of.

    Do you run your simulations locally on the server or over the network on workstations?

    If over the network, do you experience any network saturation? Even with gigabit, I think network bandwidth will be more of a factor than access speed (read/write) depending on the number of processes trying to access the server simultaneously.

    When I planned my design, one key decision was to make a server for storage i/o only. No user applications are run directly on the server. All CPU intensive tasks are run on a desktop machine.

    I perfer software RAID (in my case zfs) vs hardware. Not all hardware RAID cards are compatible. If you don't have a backup of the same card (make/model), you may run into difficulty restoring a RAID 5/6 partition if you have a hardware failure.

    I am running zfs raidz (RAID 5 equivalent). When factoring raidz overhead, usable space is closer to 24 TB total. 26 TB is the free space reported by linux; although different linux tools/utils will give different results when querying free space.
     
    #11     Jul 23, 2012
  2. Awesome thread WinstonTJ... Nice posts
     
    #12     Jul 23, 2012
  3. Thx TICKTRADER!

    Dyson,
    Hope I remember everything you asked. I don't know what it is but with the way these things get used I see a fairly high initial rate of failure in drives. There is so much being pushed and pulled to & from them that stuff dies. I wasn't trying to imply that there is a difference in quality, just simply that the way that the ones I build get used and abused it requires fairly robust hardware.

    The vast majority of these guys all have databases with either tick level data from the exchanges or tick level that they record from something like Activ, Bloomberg (B-Pipe) or NxCore. It depends on the feeds but compressed it's between 2.5 to 50GB per day.

    I'm very happy to say that I've not traded a single share since the end of June and I have no intention to ever trade again... :D So it's all other people's simulations and machines and it depends on the machine. This one has really old CPUs and would get killed if it was used for local simulations whereas the one I built last month (only 12TB) has two 6-core intel CPUs and that firm's quant will be running the simulations on that machine. It varies and hardware needs to be spec'd for the use. This one is going to be used more similarly for your "temp pool" type use. They will run their extractors or simulators locally and then push the results or larger files to this machine vs. their own.

    Most often the access is over the network. I either put 10gig NICs in the database boxes or I will buy one or two quad-gigabit cards and bond them. I'll usually put a dedicated NIC into the quant or trader's local machine or I will put a dual NIC and bond that. If they are saturating two or three gigabit connections it should be over their local machine's read/write capacity.

    Another great workaround that I have for all of that over-the-network type BS is virtual machines. If you put a 16TB database and a virtual machine host server right next to each other then you only need two 10G NICs. The rest are all running 10G connections inside the virtual host.

    I didn't mean that the local machine would necessarily be running processes, just that if I have a simulation going on (lets say 5 years) and then you have two extractors running trying to isolate 6-months of two pairs (all running from our local machines) that's three processes accessing the database server. Add to that one or two more users and before you know it there are 6-7 processes telling the drives what to do.

    RE: Hardware RAID vs. Software, I agree, backups and spares are a must. That adds to the overall cost of the machine but at least you know you have a spare when the time comes.

    Are you really running RAID5 and don't have a spare on-hand? I have a client that wanted to go a little inexpensive and run RAID5 over RAID6 and I was freaking out when a drive died and the thing was rebuilding. It takes a LONG TIME to rebuild a 2TB hard drive. I guess with ZFS you can just shrink the total available space right? With hardware RAID if you lose a drive you better have a backup quickly.

    I have about three boxes sitting here with parts for the next one I'm building so at least I can show pictures of the SSD & RAM in the boxes not just thrown on the table.

    Crazy... Looking at the prices I pay (yes they are different drives) but pre-flood Dyson paid ~$1,900 for 24 drives and I pay $3,600 for 16 drives. Crazy.
     
    #13     Jul 23, 2012
  4. dyson

    dyson

    I debated raidz (RAID 5) vs raidz2 (RAID 6). I tried to balance redundancy with future available free space. If I had 3 TB drives, I may have gone with RAID 6.

    I have 2 spares (same brand, same model, new-in-box); I should have added this to my total cost. However, they are not installed as I don't have any room in the case. If you find an interior photo for the Fractal Design Define XL case, you will see it has 4 x 5.25" and 10 x 3.5" bays. I have the media pool ( 8 x 2 ) in the top bays (4 x 5.25" & 4 x 3.5") and the work pool (6 x 2) in the lower bays ( 6 x 3.5"). The last 2 drives I have stacked with some standoffs and cable ties :)

    I have added cron scripts to check for smart errors (smartmontools). If a problem is detected, an email is sent. I also have automatic zfs scrubs performed periodically.

    If I ever need to rebuild a drive, the time (and intensity on the working drives) will be on par with a scrub (approximately 8-10 hours with current utilization).

    No drive failures in 10 months of 24/7 operation (6-8 hours per day of high usage), but I did have one issue concerning the drives during the initial burn-in; a bad SAS cable (SFF-8087)was causing smart errors (CRC error count).

    I am also hoping with the reduced heat from the 5400 RPM, the lifespan of the drives is increased.
     
    #14     Jul 23, 2012
  5. #15     Jul 24, 2012
  6. eurpar

    eurpar

    I have been following this thread (lots of good info).

    Just one warning from my own experience. To anyone building a NAS box, try to avoid Western Digital "Green" hard drives. I had to learn this the hard way.
     
    #16     Jul 26, 2012

  7. Higher end ("enterprise") drives should last longer, or at least have a better warranty. And higher end RAID cards would save the drives, of course then you're spending as much as $10,000 on the RAID card alone.
     
    #17     Jul 27, 2012
  8. I had intentions of fully documenting that machine build but client deliverables have been quite demanding so that machine is already in production.

    This is how the HDD's come when you buy them in bulk. This is another order for 2TB drives but the 9x 3TB drives I got from Newegg looked exactly the same, two rows of four and the 9th drive on the end. Strangely if you look at the front left drive you see a 2D barcode on the static wrapper. That's a Dell Part Number (DP/N 0YD6FM). These drives (that came from NewEgg) were intended for Dell and I'm REALLY lucky I caught those part numbers because if there was a warranty issue I would have needed to go through Dell, not go to Seagate directly. It would have been ugly. They ended up exchanging the HDD's out for new OEM Seagate drives, not manufacturer supplied Dell HDDs.

    [​IMG]

    [​IMG]


    So I got some goodies in the mail. This is 8x 2GB sticks of RAM (total of 16gb of RAM) and a cheap $100 SSD (128GB) that will be used for system cache on the machine. To put things in perspective, below the RAM and SSD photo is a picture of a little USB thumb drive with a 4GB micro SD card in it. That's what the operating system will run from - and 2.0GB is all that's required but I didn't have any so I used a 4GB Micro SD card for this one. The operating system is FreeNAS which is a BSD based operating system (all free and open-source). 128GB SSD was probably a little overkill but I bought a 10-pack of them so it was there and the price was right. I'm upgrading a few things (netbooks, desktop PC, etc.) from 64GB SSD's to 128GB SSD's. I may still put only a 64GB SSD into this but it doesn't matter.

    [​IMG]


    [​IMG]

    Now that I had all the parts I finished going through the machine and needed to reset the CMOS and BIOS. Actually I didn't know if anything needed to be reset but I almost always just reset to factory defaults and start over rather than messing around with how someone else may have configured things. The photo below is the CMOS battery and the jumpers on the motherboard. You can see on the right it says PSWD and on the left, above the battery it says RTCRST. If you look closely the jumper is only on one of the two PSWD pins. You just put the jumper on both pins and turn the power on for a few seconds and it resets any BIOS passwords. Do the same thing for the RTCRST (CMOS reset) and the BIOS is back to factory defaults with all cache cleared and no remaining prior settings.

    [​IMG]

    Next I had to check to see if there was a CPU under the heat-sink. You'd be surprised at how many eBay boxes come without CPUs under the heat-sink. In this case the box was sold with a 3.0ghz dual core CPU and there was a dual-core 1.6ghz CPU. Good news is it was enough to test and burn in the machine, also these CPUs are only ~$10 each (shipped) so it's not even worth the hassle of going back to the seller over. I just throw them into the pile and eventually sell a whole lot of random junk on eBay.

    [​IMG]


    Over time you acquire a pile of spare parts and "stuff". I'm running low on IDE drives and parts but I still had one IDE DVD player so I could load up Windows7 just to test out the motherboard and do a few quick benchmark tests to make sure that the motherboard and PCI, PCIe and PCI-X slots are all functional. For testing and burn in of the motherboard (just run it for a few days to make sure nothing will break) I used random stuff. A 250GB Seagate HDD, 4GB (4x 1GB sticks) of Dell RAM and the 1.6ghz CPU.

    [​IMG]

    [​IMG]

    [​IMG]


    Take a look at the difference between the two types of RAM. The one with the little copper heat spreader came with the box. It's older and slower as well as those little sticks really heat up. The sticks with the aluminum heat spreaders will be better over time. Either way we aren't talking the newest, cutting edge technology (this stuff is 4-5 years old) so it'll be fine and plenty fast. Remember I'm using fully buffered RAM and server motherboard & CPU to essentially build a NAS so compared to a Drobo or Dlink NAS box this will be night & day faster.

    [​IMG]
     
    #18     Jul 27, 2012
  9. Very cool.
    :cool:
     
    #19     Jul 27, 2012
  10. winstontj - why dont you intend to trade again ? you posted that in this thread and I have not seen that elsewhere
     
    #20     Aug 2, 2012