Oh boy... That sounds like a fun project. I will do it end of next year if I'm still trading. Because otherwise... I'm going to be down the rabbit hole for months! Loading tick data from parquet files into pandas dataframes wasn't faster than loading from SQL. Take from that what you will but I did the tests. Edit: Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance. Yawn, I do this with SQLite the best database in history
Cloud computing (rather, VPS) can be much cheaper than that. I pay about $10 a month for a decent single core machine. I don't have a GUI and it runs Ubuntu with 2GB of RAM, perfectly fine. I definitely don't want to run a trading algo on a home computer ever again.
Not very often, maybe every 2 weeks or so and less. I have a script to upload (scp) the compiled updated to the VPS. The compiling takes about a minute while the upload is about 10 seconds. So there's a few steps but nothing too bad. Consider that I'm in a country with a historically bad internet connectivity (+blackouts), I did run things locally for years but that was extremely stressful. I trust my VPS connectivity much more and they're not far from IB, so any intercontinental disruptions aren't a thing either.
I agree with you that the reliability of a "cloud computer" is most likely higher than having your hardware at home and relying on your ISP and local electricity supplier. However, in my case has it not yet reached the point where I'm willing to pay money for that extra reliability. My impression from what I'm reading at ET is that your and @nooby_mcnoob's trading systems are much more advanced than mine.
Great points all. Fundamentally, the "issue", if there is one, is that I'm just not really making use of the remote machine except to run data collection, and even that, I'm not really using it anymore. For now, I will probably shut the remote machine down and return it to the cloud graveyard but I think I would eventually want to resurrect it for the reasons that d08 stated. Will let the decision fester for a bit before I pull the trigger.
Two issues here, the data storage and the cloud. I've discussed the cloud stuff before, but briefly if machines breaking is your issue then you can stay local by buying another computer. I am also using headless NUC type units, and I have 3 of them that are high enough spec to run my system which cost about $500 each new, though I bought two of them secondhand (there are a few slower machines knocking around the house that I keep meaning to use for various hobby projects and never get round to). I use one for development, the other two are live and backup, and I swap them round regularly. In the last 6 years I've had one machine failure, so I really wouldn't trust one machine entirely. Similarly I've had issues with backups, so I've gone to town with a RAID NAS drive on which everything backups every night, plus a USB drive that backs that up, plus everything is on at least 2 machines anyway. For belt and braces I should probably set up some offline storage like dropbox, and that's on my to do list. That also stores all our household data, and I can write the cost of all this stuff off against tax which I couldn't do if I was purely a trader. Of course this doesn't help you if your internet fails, or your power fails... and that I guess is the appeal. $1000 of local hardware does buy you quite a lot of cloud computing time. I think local hardware is still cheaper but time I've done the maths it's got closer and closer depending on how many years you amortise your hardware over. I'm considering containerising my new system when it eventually goes on line (that's basically when pysystemtrade is production ready), which will seriously reduce the up front hassle of moving everything to the cloud, so this is still something I might well do in the future. I would still want local copies of everything for persistence reaasons, and so I could spin it up locally if I wanted to, so I'd need at least one machine to do this on. It is also also pretty cool having a stack of computers though... As for data storage, sqllite has done well for me and I think it's acceptable for low frequency trading, but I have had occasional issues with files becoming corrupted (writing when a process fails? it doesn't have the concept of a record lock, just uses the OS file lock). I'm planning to move everything to mongoDb / Arctic which is how the production side of pysystemtrade is set up. Using it for the last few years it's just a nicer solution once you get the nosql idea in your head, and a lot quicker. I'm a bit reluctant to rely on third party libraries (even if it's AHL!) but I could easily write a native mongoDb pandas read/write client if I had to in about 5 minutes. The 'black box' nature of how stuff is stored in mongoDb slightly worries me however - you can do dump a backup file but if you can't recover from that file for whatever reason, you're stuffed. So I'm also planning to write backup files in .csv format so I can always manually recover from a corruption issue. GAT
You're running your own data center! I agree that the costs are converging. Highly recommended. Whether I do remote or local, I use terraform + docker to deploy the code and the best part about this is the reproducibility. My terraform config file is the workhorse. Not sure what problems you've had here but make sure that you use PRAGMA journal_mode=WAL. Additionally, if you are committing to the database on a "high frequency" basis - I batch ticks every 5 seconds which seems to work OK for my purposes - then you should have the high frequency table on a separate database that you attach as it could block other tables. I haven't had any problems with corruption yet, though it has not been long enough for that problem to manifest. I regularly kill processes without any nice exit though so I would have expected to see some issue by now. I looked into Arctic/Arrow/etc but in my testing, it didn't seem to be any better than a mildly tuned SQLite database. Obviously, we couldn't have so much global investment into such technology if I was right, so my testing must have been wrong. One of the things I want to do is start dumping realtime data into one of these fancy pants things and see if it does better without me tuning it since the tuning is what make SQLite fast. The only issue I have with using SQLite is SQLite -> Pandas is super duper slow since apparently no one really cares about it but even when I exported the tick data to parquet files, it was slow af to load in Pandas.