Sql Server hardware - advice needed

Discussion in 'Hardware' started by CloroxCowboy, Jun 23, 2009.

  1. I'm building an automated analysis engine to feed me the data and calculated results I need to make trading decisions. I'm interested in setting up Sql Server on a separate machine which will be dedicated solely to processing queries. The idea is to have my current pc handle the reporting and trade execution functions, and to use the SS machine to pull data from my broker, store it, run the analysis and feed the results back to the first pc.

    I know how to code all that, but I'm having some problems selecting hardware to use for my dedicated server. My queries are not overly complex, but they'll be working with several million records at a time. Would an AMD 64 X2 4400 be sufficient? Another big unknown for me is how many hard disks to purchase and what configuration to use. I'm more interested in performance than backing up data so I don't think I need a RAID setup. I've heard that putting Windows, system tables, and storage tables on separate disks is a good idea to improve query speed though.

    Would anyone be kind enough to help steer me through this? I can give more details if needed.
     
  2. DannoXYZ

    DannoXYZ

    Actually, unless you're running really complex queries that require complex calculations, disk-speed will be the primary limitation on the performance of SQL-server. You want as FAST as disks as possible, like SAS 15k-rpm drives in RAID-0 configuration for 150-250Mb/s sustained performance. Or RAID-10 for redundancy protection. Avoid RAID-5 or 6 due to slower writes, although still superior to single drives.

    I use an HP EVA5000 with 98-drives for my datastore and get about 800-900Mb/s throughput, but that's probably overkill for your needs.

    Here's some articles on optimizing SQL-server performance:
    http://technet.microsoft.com/en-us/library/cc966534.aspx
    http://www.scribd.com/doc/7202388/High-Performance-SQL
    http://weblogs.asp.net/scottgu/arch...rive-speed-and-visual-studio-performance.aspx
    www.mce.com/files/tms/f000174.pdf Faster SQL Server Access with SSD
     
  3. Thanks, Danno. Maybe I'll go with RAID-0 at first, my database doesn't really require a lot of redundancy protection since I'll be downloading almost all the data from scratch each morning...possibly move to RAID-10 if I do add some functions that require historic tracking.

    As far as throughput...I'd agree, it sounds like your setup is more that what I'd need, but I really don't know what I'll need. Is there any way for me to estimate a "good" throughput value to shoot for based on the number of records, the types of operations in my queries, etc?
     
  4. DannoXYZ

    DannoXYZ

    It's hard to simulate SQL-server's performance, so you can't really predict anything until you've got the actual databases deployed. Then you'll use PerfMon to watch overall CPU-usage, memory/swapfile usage and disk-queue times and lengths.

    I manage servers for a nationwide company (500+offices, $2bil+/yr revenues) and we've got over 100 SQL-servers for various departments doing various tasks. The most powerful one is a Dell M600 with dual quad-core Xeon 3.0ghz CPUs (8x3=24ghz total). The CPUs are never utilized more than 5%, even at peak-usage times (often less than 0.05%).

    However, the disks and memory are the ones that gets maxed out. SQL-server will grab all available memory for caching the tables so whatever level you have installed, it will always be 80-90% used, that's fine. The disk-queues, queue-lengths and response-times ends up being the rate-limiting steps.

    Performance scales pretty linearly with the number of disk-spindles available. Much more linearly than increasing CPU-speed/cores or memory. For your set-up, you may want to get 5-drives with 3 separate controllers:

    - onboard IDE/SATA + 1 drive for OS and programs
    - onboard RAID-0 + 2 drives for databases
    - PCIe/PCI-x RAID-0 + 2 drives for logfiles/tempfiles/DB-backups

    SQL-server writes logs and temp files simultaneously along with the DB, so you'll want to have a 2nd RAID-0 array to send data to two different places simultaneously.

    There are many ASUS motherboards with two separate RAID controllers on-board that may work for you. One is a SiliconLogic hardware-RAID that's good for the databases. The other is an nVidia software-RAID that's fine for log & temp files.
     
  5. If your database is not too big (say < 256G) , then SSDs may be worth a look. The read and "seek" times far exceed any HDD or small RAID array. Write is still err .... an issue.

    It would pay to do some research, as use of SSDs for database storage is still pretty new and all SSDs are far from equal. It seems the type of controller chip can make a big difference.

    I'm currently having a look at SSDs for MySQL database, but will probably hold off a bit until the market for SSDs settles down.

    Unless you are storing mountains of tick data, database size tends to be fairly modest in today's terms. My MySQL database is only about 55G total and includes 1 min, 30 min and daily data for 5+ years for most US stocks, a fair bit of futures "tick" data recorded from IB and historical fundamental data for US stocks.
     
  6. DannoXYZ

    DannoXYZ

    Yeah, if you don't have large storage needs, SSD's higher cost-per-gigabyte won't be too bad. The bang-for-the-buck performance simply can't be beat. Intel's new SSD drives have done away with the write-performance issue. They're seeing 220-240Mb/s reads and 100-110Mb/s writes for single drives!!! The latest Windows7 also has optimizations for SSD drives to enhance performance and longevity.

    However, the main benefits will be for database-operations which can be limited by IO/sec throughput. Traditional hard-drives are limited to about 100-200 IO/s and up to 1000 IO/s in RAID configurations. Intel's SSD can manage to get 10000-13000 IO/s on a single drive! Wow!

    Just two of these suckers in RAID-0 stripe can match the sustained throughput of enterprize-level fibre-channel storage and completely demolish them in IO!
     
  7. That being said, if you do not want to go that way....

    Here is a decent setup. You can scale it down, though.

    * Decent computer.
    * Get a proper RAID card - Adaptec are not bad. Makre sure you have a bbu on it (Battery Backup Unit), so you can use write back caching without risk. I have a 5805 in my database server.
    * Get a SAS expander. I personally have a 24 disc 2HE computer cage for the database.

    From there on... I use 300gb Velociraptor discs for now. They area lot better price wise than the 15k SAS discs, still delivering quite a punch. I have them in general in RAID 10 configurations with a 64 kb stripe size. I am not sure SAS discs ever make sense... given that the price of a SSD is not that much higher and you can basically use a RAID 5 of SSD's to get better IO than a RAID 10 of SAS discs. This really needs to be calculated properly. Note that you do not buy "space" or even "bandwidth", but "random independant IO per second". SSD's often score 100 times as fast as SAS in this area.

    * Make sure you properly format the discs. 95% of all servers are badly installed, costing them up to 40% io performance. To proper install align the partitions (automatic on 2008, manual on 2003) and FORMAT THE FILE SYSTEM WITH 64kb BLOCK SIZE. This is important because it means a 64kb block is a 64kb block on the RAID, and SQL Server likes to write out pages in 64kb clusters (8 pages of 8kb).

    * Split Logs, Temp and Data. This is key - Logs may do a lot of writing, but it is mostly "linear". Sadly it is it sime sensitive, so if you make updates the head movement kills you... unless the logs are on separate discs.

    * Do NOT AUTOGROW FILE SPACE. Period. That is evil. Sadly most people are too ignorant to read up those parts of the documentation, or do not care.

    My layout is:
    * 4 Velociratopros in a RAID 10 config. 64kb partition at the start for the OS, the rest is for the SQL Server logs.
    * 8 Velociraptors in a RAID 10 config for the data. I also store interim backups there (i.e. backups taken before data cleanup etc.).
    * 2 separate discs for Tempdb and temp logs. tempdb should never be written to (as per sql standard behavior - it only writes out when memory overflows), but the disc counter serves me as a warning. Basically if I get IO wait times here.... I can look for the reason.
    I still have space... the cage has 24 slots for SAS / SATA Discs ;) As I start storing tick data.... I may add a RAID 5 of Velociraptors and move the binary data there (note: sql 2008, storing binary fields as files in a file system - nice feature).

    Other imporant things:

    * Make at least as many files for data and log and as well for temp data and log as you have processor cores. Another often overlooked item - certain operations require exclusive file locks (allocating file space for an object), so if you want parallelism... have as many files are you can have parallel operations. which is the number fof parallel threads executing. I.e. cores, or cores*2 if hyperthreading is in use.

    * Use 64 bit. Point.

    * Preallocate memory to about physical - 512mb (to give the OS some room). 64 bit may autoallocate a little wrong... so do it manually (mostly 2005 does allocate too much). If you have tons of memory, go hardware-1gb. Possibly around the 16gb limit.

    To scale that down...

    * Use a minimum of 2 discs (data, logs + os). I can not overstretch it. 2 discs give you easily 3-4 times the IO than one, simply because the head does not have to move between logs and data all the time.
    * Get some RAM. Seriously, 8GB do not cost a lot anyway.

    Depending on your type of usage... you may be IO or processor bound. Depends on whether you have lots of updates (io bound) or enough ram and complex queries (processor bound). Watch the performance counters. Getting fast discs is more or less a must in any case.

    For performance analysis. Lots of people will point you to perf counters like "average disc quueue length" etc. - that is bullocks. It is valid, but it is too open for interpretation. What you care is how fast IO is handled...
    ...which is "seconds per IO", given in milliseconds. I do not care how long my average queue length is... if I get results in 1ms. Problem with queue length is that it REALLY depends on the hardware. For example, SATA discs can optimize access with up to 32 commands (reordering them), so there is nothing bad in that. SAN systems often have queue lengths of 64 upward - I have seen IO getting etter on some systems with queue lengths in excess of 100. So, this is really open to interpretation and hardware. But what is not interpretative is how many ms it takes to get an ANSWER from the system, and that is the counter I indicated. Once that goes higher... look for the issue. During my time at MS (SQL Server ENterprise Level 3 support - did performance stuff all the time), I saw people having response times in SECONDS and wondering their server was slow.
     
  8. Thanks for the feedback, everyone. It sounds like a good, scaled-down compromise of all those suggestions would be to use 2 RAID-0 arrays...one for the OS and log, one for the database. Eventually I will look to expand that to RAID-10 once I have a chance to experiment and work out my actual performance needs/issues.

    I wasn't very familiar with the current state of SSD, and after reading I think I'll stay with SATA for now. It sounds very promising, but Wikipedia's article mentioned several disadvantages (limited write cycles, slower write speed, and of course cost) that worry me. It also mentioned that Windows is not optimized for SSD yet. Has anyone noticed any of those in the real-world, and how much do they detract from the advantages? Using SSD in a hybrid-drive configuration sounded like a reasonable solution until prices come down and some of the negatives are addressed. Anyone care to share their hybrid-drive performance experience?

    Thanks again.
     
  9. The Windows / SSD disadvantage is not so big for SQL Server because of the way SQL Server writes data (8kb pages, in 64kb blocks). And ... WIndos 7 / 2008R2 are JUST AROUND THE CORNDER (2 weeks to RTM windows 7... rumoues are... tomorrow, actually, with availability within 2 weeks from volume license locations).

    Your approach sounds good. Do yourself a favour and consider Velociraptors - the enterprise 300gb version. They pretty much kill everything else you can see.... Tremendous. Not SAS high end discs, but... again... pretty well balanced ;) I acutally get some now for my workstations.
     
  10. DannoXYZ

    DannoXYZ

    Well.. I wouldn't consider WikiPedia to be a credible source. Good for basic info, but most of it isn't technically valid and lags current developments.

    Here's an article on Windows7 + SSD drives and how they're optimized for each other.
    http://hothardware.com/Articles/Windows-7-Disk-Performance-Analyzed/?page=1
    IO is where the SSD really shines over mechanical drives. If you're looking at 5-6ms per IO on a really good RAID set-up (2-3x faster than single disk), it's still nothing compared to 0.1ms on SSD.

    Haven't played with Win2k8 too much, but we're moving some of our SQL-server to it as we upgrade memory. Lots of memory alone will make a big difference in performance and you'll want 64-bit OS to handle LOTS of it, like 64gb+.
     
    #10     Jun 25, 2009