What do you think of backtrader? Is it any good? One of the other members who used to post built this https://github.com/chrism2671/PyTrendFollow
Actually this is exactly how the production code in pysystemtrade works; so all the stuff that gets and process prices, does back adjustements and so on, is an inheritance* of a data frame or series identified according to whether it is an individual futures contract price, several prices as you'd use to calculate carry or do rolls, or a stitched adjusted price, whether it includes OHLC and volume or just closing prices, whether it is an FX price (all these objects are here if anyone cares). This also means there are some very specific object methods attached to most of these. * Not quite the same as an encapsulation as it means you get all the methods of a data Frame exposed; which potentially is dangerous I suppose. However once inside the 'simulation' part (that also runs daily in live production to generate optimal poistions) you are right everything is just a data frame. Partly this is because I wrote the simulation code first, and partly it's to gain flexibility which you need in a research system. The dynamic nature of the code does make this potentially dangerous for obvious reasons; a few @wopr as alluded to; another is that a change to the simulation code could break production (or worse produces wrong positions), although regression testing should help with that (your live system simulation is the first thing you should test if you change something!). You could avoid releasing changes to simulation code to the production system, but then you will probably end up with two code bases to maintain. I do have some safeguards; for example I don't estimate any parameters in the daily simulation, these are all hardwired. This also makes the thing run a lot faster. I can see two obvious ways of tightening this up. The first is to 'do a @wopr ' and bring the encapsulated objects into the simulation code. I'd possibly be tempted then to move some of the logic out of the 'system' and into methods of the objects; eg given a multiple prices data frame tell me what the carry signal is please. However this would make it harder for people to use and understand the system, as you'd have to be looking at lots of files to work out what all these different objects do rather than just following through the system logic; you'd also have to rewrite object methods to change the way the carry signal worked for example... it does come down to the flexibility vs robustness argument which you can't get away from if you try and combine production and simulation. A slightly more extreme option is to turn the 'live' 'simulation' (if this is not a tautology) into a hardwired script once the details are nailed down, and indeed this is how my first trading system was implemented. Effectively now we are pretty much at the stage when live and simulation are seperate; even if the script calls the same functions that the simulation would also call. I would caution against this however, although it sounds more robust it often isn't; it makes it much harder to make and test changes, and it's a nightmare to debug if it goes wrong. GAT
Backtrader is amazing if you have a single strategy system and are trading stocks. I was really surprised how easy it is to pick up on the concepts and create a single strategy stock system. Docs are great too. You get a lot of visualizations out of the box as well. Concepts like Sizers, Analyzers and Observers are well defined and allow for some extensibility. They use the similar approach to Rob in pysystemtrader - they have "datas" (which is the equivalent of the Pandas DataFrame) which are sent around, so theoretically, you can compute anything you want and stick it in there. I had much less success with multi-strategy and didn't even try with futures, from what I can see, you can only define one multiplier for an entire strategy, which means you can't test on multiple markets at once. Ways to distribute funds between strategies are also limited. I haven't seen PyTrendFollow before, thanks for sharing, I'll check it out. Exactly, and this is where it's most clear that our systems serve different purposes. My system currently can not do any parameter estimations and any research is quite hard. If I want to estimate some parameter, I start a Jupyter notebook, import repositories that give me price data, and start from there. Nothing in the system can help me. Mine is as far away from a research system as possible, which makes the software design side of the problem much simpler. I'd first say that I'm not sure anything in pysystemtrade needs "tightening up" as you say, unless you are experiencing some pains from the current design or are getting feedback from users that they find it hard to understand. That's the challenge with open source, some users of the software might not have the correct mental model of how the system works. For the rest of us it's easier, we only have one user That said, what my next project in trying to improve this interface between research and production in my system, is to see if I can somehow leverage Panas DataFrame.apply(). It's sort of an inversion control - you can have your logic in methods of objects, or pure functions, and then pass them into this other part of the system to do an optimized calculation. The object or function you're passing in has to take in a pandas Series, which does leak the implementation a bit, but can well be approximated with a list in the rest of the system and testing. I'm not sure how that would look like yet, but seems worth a shot.
So I do it this way: while the system is running and trading I'm recording everything into basically 3 SQL tables: Prices(timestsmp, bid,ask..), Orders, and "Calculations"(the actual name is different but it's stupid and I'd rename it if it wasn't so ingrained in all the other code already and I could think of a better name (that table currently has like 80 columns and I keep adding more from time to time)., so because the system is event-driven, every new price-tick is going through the whole system and I repeat all the calculations for each of them (forecasts, positions, etc.)., and each tick might result in an order(or a change in the limit-price for an existing order) in the end. So these 3 tables are of course huge (~ tens of gigabytes, tens of millions records right now). I decided from the beginning that the system should operate at the "natural resolution of the problem", so I record every tick and then I can always down-sample it if I want (in hindsight a huge overkill, but too late to change it now ). So the first time I reduce that resolution is daily, I realised that I really don't need to be that precise and record everything after the system was mostly written, so I just added a daily SQL job which deletes all ticks and associated calculations (if that tick doesn't have an associated order) except 1 per minute, so effectively at the end of the day the system stores 1-minute resolution. So that real-time part of the system is all "regular programming", like for-loops, classes, etc. Then weekly (or at any time I want) I run Matlab scripts on basically these 3 tables. At this stage all the code is vectorized and fast as I'm working with time-serieses loaded from SQL to matlab. And calculating things like PnL or the whole equity curve is quite simple when your code is vectorized (python is essentially the same thing with a different syntax). e.g. I can calculate the whole PnL curve of one contract like this: dollPosWhenLong = legPriceBid .* legPositions .* (legPositions > 0); dollPosWhenShort = legPriceAsk .* legPositions .* (legPositions < 0); dollPosTot = sum(abs(dollPosWhenLong) + abs(dollPosWhenShort), 2); PnL = sum((longPnL + shortPnL), 2); and then return like this: ret = PnL ./ lag(dollPosTot, 1); and then the whole equity curve in dollar or percentage terms with just 2 lines of code: compRet = cumprod(1+ret)-1; cumPnL = cumsum(PnL); Which I can then plot with another line: figure;plot(cumPnL) (this is not the final plot, as I need to merge these time-serieses for all the contracts of the current future, or merge all contracts in the system if I want to plot the overall performance..) So I essentially implemented all analytics myself.. Also, when loading this data from SQL, I usually down-sample it again by loading e.g. only every 100th data-point to make it faster. But I still can plot the most detailed graph with every point, which sometimes helps a lot with troubleshooting., as I also plot a bunch of other time-serieses from that "Calculations" table. e.g. here's all forecasts for 3KTB over time: and this is the main price plot with both bid and ask prices showing the filled orders: the nice thing that matlab can do is you can put the cursor on any individual point and you can setup it that way that the tool-tip will show the actual database ID of that point, so you can go back to the db and run a SQL-query to see more details. Actually in the beginning this was a very new problem to me - how do I even check what my system is doing? I mean it's easy to reason about your code when you have like 5 data-points\variables and 3 for-loops, but when you get a million of different things happening every day, how do you even look at them all together? At first I remember thinking - ok, no worries, I'll just log everything - easy. So I logged everything, and got like 3 million log records per day in the table in the form "placing order for Vix at such price..", which if read them all, I would still be reading - So plotting everything is the only way out, inspecting visually first and then going back to SQL and\or actual code if something looks wrong on the graph. As the graph immediately gives you the picture of what the system did over the whole time-period. (also that's how I avoided writing unit-tests for everything, which would've quadrupled my efforts otherwise )
Reading the contributions of the past 2 weeks I can't help but notice how much more basic, simplistic (and primitive?) my software setup is. None of those fancy features and none of those databases in my software implementation. Anyway, as already mentioned by others did I also see a nice increase in account value in the last couple of weeks. Comparing to beginning of June has the high water mark (hwm) increased by some 15%. I had added more cash into the account in early June and therefore was the system able to diversify a bit more and open a few more positions. Positive contributions were noted from lean hogs and XINA50, whereas lately also metals (gold, silver, copper) are showing some very positive numbers. The green and blue line refer to the left hand axis (which isn't visible) and the drawdown relates to the right hand vertical axis.
Let's all just sit and admire this post, 'proving'* that there is no positive correlation between how fancy your system is and how much money you make. GAT * Terms and conditions apply. Sample of 1 over a one month period does not, by any means, represent anything within a million miles of statistical significance.
I do miss the plotting functionality in Matlab. So much nicer than the cryptic API of matplotlib in python. Pity that you have to pay for it. GAT
I did some work on the logging infra, so wanted to share, maybe others find it useful. I mentioned previously that I use Google Cloud for some stuff, one of them is logging. I have logging configured to log locally to disk (SD card actually, it's a Raspberry Pi) and to Google Cloud (GCP). The whole thing was easy to setup. I don't want to run any agents that would listen for file changes and send them to GCP on my Pi (which is the standard way to do log collection), so I'm using the Google's Python lib to send logs https://github.com/googleapis/python-logging. I was concerned about performance, and for sure didn't want to make an RPC call to Google's DC every time I log something, but the library handles it well, it does buffering and makes calls asynchronously in a separate thread. Since my system like most is heavily IO bound, there's no impact to performance. Nice thing is that it supports structured logging, so I can send arbitrary key-value pairs, and then later on filter based on those in the web UI (screenshot below). I use this for example for environment, I have TEST, DEV and PROD environments, and I want to filter logs based on that (I don't sent TEST logs to GCP tho). Another thing that was imperative to me is that I don't have to do anything special to log something, eg. calling a function in that library was out of the question. This hooks in nicely into Python's logging system, by attaching a handler to the logger you want, so there's nothing in my code that says where the logs go, logging is a simple log.info("Message"). I pass in the environment when instantiating the logger once, which causes each log message to have the env and it shows up in the web UI as well, here's what that looks like. There's also a nice way to fetch and read those logs via API, using either the Python client or the CLI they provide, but so far I've used only the web UI. Then, I configured a filter to send all logs with level ERROR or higher to a PubSub, which on the other end triggers a cloud function (Google's equivalent to AWS Lambda) that sends that to Slack, so I can see if something is wrong in Slack. I'm not a fan of noisy systems, so tweaking what goes in there is still work in progress, I only send stuff I absolutely need to see ASAP and that are actionable. This part doesn't work yet unfortunately, because of a bug in Cloud Functions, but ETA for the fix was this past Friday Here's a screenshot of an order execution: I'm still tweaking the filters and log levels so it's a bit noisy.
so actually you can have multiple "keys" for one log-record not just one, like tags essentially ? (e.g. the actual message "something went wrong" will have several tags like environment, severity, etc)