Home > Tools of the Trade > Hardware > personal back-testing setup

personal back-testing setup

  1. So, I have decided that I need to have a way to test some ideas while at home. Since there will be bunch of people chiming in, I will make it structured:
    Speclets:
    1. 128GB RAM
    2. 4TB+ hard drive
    3. Drive mirroring
    4. Headless (single VGA monitor sufficient)
    5. Linux
    Constraints:
    1. Price: it's for home so I don't have an expense account
    2. Size: it's a Manhattan apartment, so space is limited
    My options as I see them:
    1. build my own from parts. There seem to be specs kicking around the web that would allow me to build a server like that for about $1500 (since I don't care about video). Sounds like a hassle though and I have to pick the right hardware
    2. buy a refurbished server on Newegg or something like it. It's apparently possible to get a reconditioned server or workstation with 128GB of RAM for $600-$800, adding nicer SAS disks would be another $200, so I can get a very spiffy box for about 1k. Rack-mounts seem to be cheaper for the same specs, but they need more space.
     
  2. I'm not sure 128GB RAM is necessary or the 4TB hard drive. I would go with a 500GB SSD drive for the OS and programs and a HD for extra storage. I've built about 10 machine in 15 years using newegg.com. If you do that, double check to make sure components are compatible. I have called Intel and the motherboard manufacture a number of times. They are very helpful. The first build is hard without help but it can save a lot of money,

    Bob
     
  3. Tick or (or 1 min options) datasets get to be pretty large. At work, I have a nice little machine that has 1TB of RAM and I am probably ordering another one like that :)

    Previously, I would have gone the DIY route for sure. This time, I am not sure sure - it's shocking how cheap the refurbished severs are and it's hard to beat that type of value. On the other hand, I am not sure I have the space for a rack mount box, while I can probably get a compact tower via DIY route.
     
  4. The making of a mad data scientist :)
     
  5. 1TB of RAM? I am guessing you meant the HD.

    As for the reason you want the machine (just testing), seems more efficient to go the pre-built route.
     
  6. No, I did mean RAM :)
     
  7. That is so freaky geek of you. Out of my league on why you would need that much, or how it could be utilized to an advantage price-wise over, say, 128GB of ram, for testing.

    Party on dude!
     
  8. I ordered a prebuilt gaming machine that I use for everything, since I don't need it to be dedicated yet, as just crunching EOD data. It came with faulty 3D card which they happily fixed/replaced and have worked wonderfully since. 64 GB RAM and 8 cores, enough for most chores + gaming and compatible with most peripherals. Not the cheapest, more like middle tier, not too pricey. Quality costs though, and may be worth it just to escape unnecessary limitations and hopefully more robustness.

    If having a server at home, make sure it isn't too noisy. Old/bad equipment may not be best choice to keep at home unless you're prepared for it, and may be harder to get spare parts.
     
  9. There goes that budget.
     
  10. To give you a sense, 1 Day of ES full book data is easily 1GB. You need about 2X of the dataset size. If you are testing across several weeks, it's pretty east to fill up a TB, especially if you are doing some manipulations and have a few different assets concurrently.

    It's freaky how cheap that stuff is nowadays, just over 30k ( I share it with a neighbor since neither of us need it full time). Apparently, you can buy a reconditioned one for 6-7k now.

    Interesting. How much did you spend? What size of a case is it?

    Well, for $700-$1000 I am ok junking it if it dies, as long as the data and results are backed up.
     
  11. More like $2300 for prebuilt. I think it's an older version of Storm Trooper, aka. something like this but non-transparent: https://www.newegg.com/Product/Product.aspx?Item=N82E16811119297

    Though bought through local (non US) webshop several years ago. Has been robust enough and no plans to replace it yet. There are fans, but adjustable. Decent size to expand, but haven't had need for RAID or much beyond SSD and external USB HDDs.

    A decent place should provide a good custom build and warranty. I just upped the memory some and the GPU/CPU at the time.
     
  12. Check out microcenter sometimes they have good deals in the store . Its on 3rd ave in bklyn so not to bad of a trip from manhattan
     
  13. 1TB ram? That must be for the whole cluster not for a single processor? Or, are you referring to a PCI solid state hd?

    What are you using for back testing software? You are right that an entire day of ES tick is around a gb, but why do you need to keep all of that data in ram memory?
     
  14. Nope, it's a single box :)

    NinjaTrader, of course.. just kidding. I got a bunch of different backtesting engines, some are built for higher frequency stuff and deal with tick data, some are more geared toward minutely data for stuff that's less sensitive to latency. In reality, it's the cross-sectional vol stuff that is RAM-hungry, once you start dealing with hyper-cubes of minutely option prices across a few hundred names it gets big very quickly.
     
  15. Damn, I didn't know that even know that existed. What kind of chip is it ? What do y'all use it for ?

    The curse of dimensionality rules us all, unfortunately :( .
     
  16. I think it's 4x4 Xeon (Broadwell?), but I don't remember. It's blindingly fast for everything, including silly things like 1K x 1K SVD.

    PS. It's not mine, obviously - too grownup of a toy
     
  17. Hmmmm. 1k x 1k of doubles is 8mb. SVD (assuming that means singular value decomposition) on a 1000x1000 matrix is trivial on almost any system. Did you mean 100k x 100k? That would be ~80gb but would still run in seconds on almost any system, and run much faster if, as is likely the case, you are interested in far fewer than 100k singular values/vectors (see Lanzos algo and other online methods).

    I'm impressed by the 1tb ram though. That's a lot of ram! On my database server (and its clone) I have 384gb, but it's mostly used for Oracle's inmemory column store, I can hardly imagine what it would be like to have 1tb of free memory to play with.
     
  18. @Kevin Schmit Lol, I meant one million a side - that’s what you get by typing from the cell phone
     
  19. OK, I'll bite, how do you manipulate 1m x 1m (=1t) matrix of doubles in 1t of RAM (assuming not sparse or other special case)?
     
  20. In cross-sectional cases where there are relatively few assets and a lot of data, it's fairly easy to chunk it (many commercial/OS libraries do that for you). In longitudinal cases (time series to time series, that's where I get these large matrices) it naturally becomes a "special case" since I only care about relatively few principal components. There are various tricks that have been worked out by people who deal with really big datasets (e.g. genomics and image analysis guys). It boils down to doing some sort of random subsampling or a "compression" of your matrix - I think I am using a compression type algo, don't remember where I got it from.
     
  21. BS
     
  22. You have a better numerical recipe you'd like to suggest, I take it?
     
  23.  
  24. I was specifically asked by @truetype how I manage to perform SVD on very large matrices to get principal components. I gave a general answer because (a) I don’t know enough details as I am using someone else’s code and (b) this particular calculation is not particularly important for me any more as that strategy has bled out and died. If that failed to impress, I am sorry and will try harder the next time :)
     
  25. So, stupid question - with the new hardware exploit, Xeon chips should be cheaper now, right?