Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Fully automated futures trading

Discussion in 'Journals' started by globalarbtrader, Feb 11, 2015.

traider
- 2,712
  Posts
- 1,097
  Likes
wopr said:
I've been reading last few posts between you and Rob on this topic and thinking about where does my system fit in, and I think I'm smack in the middle between very dynamic nature of Rob's system and static nature of yours.
I don't stitch functions together at runtime, but I also don't store static values in the DB.
The struggle for me was having the design of the system resemble an actual domain of a trading system, while still maintaining flexibility. This is vague, so here's what I mean when I say this, explained on the example of pandas.
Pandas structures that store data and allow efficient operations that we sometimes have to do, DataFrame and Series, really have no place in a sound system design. For example, if you have a series of prices, that tells you nothing about what you can do with those. Can you compute risk from these prices? Are they adjusted or not? Are they for the currently traded contract or carry? In which currency are they? In addition, the dynamic nature of DataFrames means any part of the system can easily add arbitrary columns, encapsulation is hard to that way (also, as a software engineer by trade, it was hard for me to do something like that ). Aditionally, I look at pandas as an implementation detail. I'm using pandas today, but it might be something different tomorrow, who knows. If I have to change that, I don't want to have to change hundreds of places in my code.
I almost look at it this way: I'm not interested in this pandas dataframe or running a rolling mean on it, what I really want to do is compute instrument risk from historical adjusted prices.
However, one of the things I decided early on was that I have to be able to run the backtest on the same system I trade in production, so I do like some of the benefits that pandas gives me, mainly performant operations on a lot of data.
So the way I overcame the problem was to use pandas and DataFrames and all the stuff, but encapsulate it behind a well defined interface, that I defined in terms of the domain. For example, I have an AdjustedPrices class, that has a method calculate_risk(lookback). I have another class called MultiplePrices that has a method calculate_carry(). Both use DataFrames under the hood, but the rest of the system doesn't know that and doesn't care. It also prevents me from making a mistake of, say, calculating risk on carry prices.

As I mentioned above, I do use the same system for trading and backtesting. Just recently, I've been looking to extend backtesting capabilities a lot (price data store redesign - I posted a picture above, I'll be storing data for 100+ markets, and I'd also like better stats) so I looked around at how some other tools do backtesting and was surprised to see that this is not a solved problem at all, especially for futures.
I played with backtrader, quantstrat (in R) Amibroker (standalone application), even considering giving up on my "same system for backtests as for trading" mantra if I find something good, but TL;DR: if you have a multi-strategy futures system you're on your own. So now I'm hacking something on my own, currently trying to see if I can use pyfolio to just render the results. I'm not looking for much, just basic stats around performance, drawdowns, costs, trading speed and some charts.

I've seen how Rob does it in pysystemtrade, this is where things are really nice if you have all the interim data of your system in DataFrames, rendering charts and computing stats is super easy, but how do others do it?
More...

What do you think of backtrader? Is it any good?
One of the other members who used to post built this
https://github.com/chrism2671/PyTrendFollow

#2221 Jul 9, 2020

Share
globalarbtrader
- 2,960
  Posts
- 2,623
  Likes
wopr said:
So the way I overcame the problem was to use pandas and DataFrames and all the stuff, but encapsulate it behind a well defined interface, that I defined in terms of the domain. For example, I have an AdjustedPrices class, that has a method calculate_risk(lookback). I have another class called MultiplePrices that has a method calculate_carry(). Both use DataFrames under the hood, but the rest of the system doesn't know that and doesn't care. It also prevents me from making a mistake of, say, calculating risk on carry prices.
More...

Actually this is exactly how the production code in pysystemtrade works; so all the stuff that gets and process prices, does back adjustements and so on, is an inheritance* of a data frame or series identified according to whether it is an individual futures contract price, several prices as you'd use to calculate carry or do rolls, or a stitched adjusted price, whether it includes OHLC and volume or just closing prices, whether it is an FX price (all these objects are here if anyone cares). This also means there are some very specific object methods attached to most of these.

* Not quite the same as an encapsulation as it means you get all the methods of a data Frame exposed; which potentially is dangerous I suppose.

However once inside the 'simulation' part (that also runs daily in live production to generate optimal poistions) you are right everything is just a data frame. Partly this is because I wrote the simulation code first, and partly it's to gain flexibility which you need in a research system.

The dynamic nature of the code does make this potentially dangerous for obvious reasons; a few @wopr as alluded to; another is that a change to the simulation code could break production (or worse produces wrong positions), although regression testing should help with that (your live system simulation is the first thing you should test if you change something!). You could avoid releasing changes to simulation code to the production system, but then you will probably end up with two code bases to maintain.

I do have some safeguards; for example I don't estimate any parameters in the daily simulation, these are all hardwired. This also makes the thing run a lot faster.

I can see two obvious ways of tightening this up. The first is to 'do a @wopr ' and bring the encapsulated objects into the simulation code. I'd possibly be tempted then to move some of the logic out of the 'system' and into methods of the objects; eg given a multiple prices data frame tell me what the carry signal is please. However this would make it harder for people to use and understand the system, as you'd have to be looking at lots of files to work out what all these different objects do rather than just following through the system logic; you'd also have to rewrite object methods to change the way the carry signal worked for example... it does come down to the flexibility vs robustness argument which you can't get away from if you try and combine production and simulation.

A slightly more extreme option is to turn the 'live' 'simulation' (if this is not a tautology) into a hardwired script once the details are nailed down, and indeed this is how my first trading system was implemented. Effectively now we are pretty much at the stage when live and simulation are seperate; even if the script calls the same functions that the simulation would also call. I would caution against this however, although it sounds more robust it often isn't; it makes it much harder to make and test changes, and it's a nightmare to debug if it goes wrong.

GAT

#2222 Jul 9, 2020

Share

wopr and Elder like this.
wopr
- 113
  Posts
- 93
  Likes
traider said:
What do you think of backtrader? Is it any good?
One of the other members who used to post built this
https://github.com/chrism2671/PyTrendFollow
More...

Backtrader is amazing if you have a single strategy system and are trading stocks. I was really surprised how easy it is to pick up on the concepts and create a single strategy stock system. Docs are great too. You get a lot of visualizations out of the box as well. Concepts like Sizers, Analyzers and Observers are well defined and allow for some extensibility. They use the similar approach to Rob in pysystemtrader - they have "datas" (which is the equivalent of the Pandas DataFrame) which are sent around, so theoretically, you can compute anything you want and stick it in there.
I had much less success with multi-strategy and didn't even try with futures, from what I can see, you can only define one multiplier for an entire strategy, which means you can't test on multiple markets at once. Ways to distribute funds between strategies are also limited.

I haven't seen PyTrendFollow before, thanks for sharing, I'll check it out.

globalarbtrader said:
it does come down to the flexibility vs robustness argument which you can't get away from if you try and combine production and simulation.
More...

Exactly, and this is where it's most clear that our systems serve different purposes. My system currently can not do any parameter estimations and any research is quite hard. If I want to estimate some parameter, I start a Jupyter notebook, import repositories that give me price data, and start from there. Nothing in the system can help me. Mine is as far away from a research system as possible, which makes the software design side of the problem much simpler.

globalarbtrader said:
I'd possibly be tempted then to move some of the logic out of the 'system' and into methods of the objects; eg given a multiple prices data frame tell me what the carry signal is please.
More...

I'd first say that I'm not sure anything in pysystemtrade needs "tightening up" as you say, unless you are experiencing some pains from the current design or are getting feedback from users that they find it hard to understand. That's the challenge with open source, some users of the software might not have the correct mental model of how the system works. For the rest of us it's easier, we only have one user
That said, what my next project in trying to improve this interface between research and production in my system, is to see if I can somehow leverage Panas DataFrame.apply(). It's sort of an inversion control - you can have your logic in methods of objects, or pure functions, and then pass them into this other part of the system to do an optimized calculation. The object or function you're passing in has to take in a pandas Series, which does leak the implementation a bit, but can well be approximated with a list in the rest of the system and testing. I'm not sure how that would look like yet, but seems worth a shot.

#2223 Jul 9, 2020

Share
Kernfusion
- 386
  Posts
- 157
  Likes
wopr said:
I've seen how Rob does it in pysystemtrade, this is where things are really nice if you have all the interim data of your system in DataFrames, rendering charts and computing stats is super easy, but how do others do it?
More...

So I do it this way: while the system is running and trading I'm recording everything into basically 3 SQL tables: Prices(timestsmp, bid,ask..), Orders, and "Calculations"(the actual name is different but it's stupid and I'd rename it if it wasn't so ingrained in all the other code already and I could think of a better name (that table currently has like 80 columns and I keep adding more from time to time)., so because the system is event-driven, every new price-tick is going through the whole system and I repeat all the calculations for each of them (forecasts, positions, etc.)., and each tick might result in an order(or a change in the limit-price for an existing order) in the end. So these 3 tables are of course huge (~ tens of gigabytes, tens of millions records right now).
I decided from the beginning that the system should operate at the "natural resolution of the problem", so I record every tick and then I can always down-sample it if I want (in hindsight a huge overkill, but too late to change it now ). So the first time I reduce that resolution is daily, I realised that I really don't need to be that precise and record everything after the system was mostly written, so I just added a daily SQL job which deletes all ticks and associated calculations (if that tick doesn't have an associated order) except 1 per minute, so effectively at the end of the day the system stores 1-minute resolution.
So that real-time part of the system is all "regular programming", like for-loops, classes, etc.
Then weekly (or at any time I want) I run Matlab scripts on basically these 3 tables. At this stage all the code is vectorized and fast as I'm working with time-serieses loaded from SQL to matlab. And calculating things like PnL or the whole equity curve is quite simple when your code is vectorized (python is essentially the same thing with a different syntax). e.g. I can calculate the whole PnL curve of one contract like this:

dollPosWhenLong = legPriceBid .* legPositions .* (legPositions > 0);
dollPosWhenShort = legPriceAsk .* legPositions .* (legPositions < 0);
dollPosTot = sum(abs(dollPosWhenLong) + abs(dollPosWhenShort), 2);
PnL = sum((longPnL + shortPnL), 2);

and then return like this:

ret = PnL ./ lag(dollPosTot, 1);

and then the whole equity curve in dollar or percentage terms with just 2 lines of code:
compRet = cumprod(1+ret)-1;
cumPnL = cumsum(PnL);
Which I can then plot with another line:
figure;plot(cumPnL)

(this is not the final plot, as I need to merge these time-serieses for all the contracts of the current future, or merge all contracts in the system if I want to plot the overall performance..)
So I essentially implemented all analytics myself..
Also, when loading this data from SQL, I usually down-sample it again by loading e.g. only every 100th data-point to make it faster. But I still can plot the most detailed graph with every point, which sometimes helps a lot with troubleshooting., as I also plot a bunch of other time-serieses from that "Calculations" table. e.g. here's all forecasts for 3KTB over time:

and this is the main price plot with both bid and ask prices showing the filled orders:

the nice thing that matlab can do is you can put the cursor on any individual point and you can setup it that way that the tool-tip will show the actual database ID of that point, so you can go back to the db and run a SQL-query to see more details.
Actually in the beginning this was a very new problem to me - how do I even check what my system is doing? I mean it's easy to reason about your code when you have like 5 data-points\variables and 3 for-loops, but when you get a million of different things happening every day, how do you even look at them all together? At first I remember thinking - ok, no worries, I'll just log everything - easy. So I logged everything, and got like 3 million log records per day in the table in the form "placing order for Vix at such price..", which if read them all, I would still be reading
- So plotting everything is the only way out, inspecting visually first and then going back to SQL and\or actual code if something looks wrong on the graph. As the graph immediately gives you the picture of what the system did over the whole time-period. (also that's how I avoided writing unit-tests for everything, which would've quadrupled my efforts otherwise )
- upload_2020-7-10_0-49-33.png
  
  File size:
  
  25.2 KB
  
  Views:
  
  12
Last edited: Jul 10, 2020

#2224 Jul 10, 2020

Share

globalarbtrader, wopr and Elder like this.
HobbyTrading
- 3,213
  Posts
- 1,355
  Likes
Reading the contributions of the past 2 weeks I can't help but notice how much more basic, simplistic (and primitive?) my software setup is. None of those fancy features and none of those databases in my software implementation.

Anyway, as already mentioned by others did I also see a nice increase in account value in the last couple of weeks. Comparing to beginning of June has the high water mark (hwm) increased by some 15%. I had added more cash into the account in early June and therefore was the system able to diversify a bit more and open a few more positions. Positive contributions were noted from lean hogs and XINA50, whereas lately also metals (gold, silver, copper) are showing some very positive numbers.
The green and blue line refer to the left hand axis (which isn't visible) and the drawdown relates to the right hand vertical axis.

#2225 Jul 10, 2020

Share

globalarbtrader, wopr, Kernfusion and 1 other person like this.
globalarbtrader
- 2,960
  Posts
- 2,623
  Likes
HobbyTrading said:
Reading the contributions of the past 2 weeks I can't help but notice how much more basic, simplistic (and primitive?) my software setup is. None of those fancy features and none of those databases in my software implementation.

Anyway, as already mentioned by others did I also see a nice increase in account value in the last couple of weeks.
View attachment 233784
More...

Let's all just sit and admire this post, 'proving'* that there is no positive correlation between how fancy your system is and how much money you make.

GAT

* Terms and conditions apply. Sample of 1 over a one month period does not, by any means, represent anything within a million miles of statistical significance.

#2226 Jul 11, 2020

Share

Elder, wopr, Kernfusion and 1 other person like this.
globalarbtrader
- 2,960
  Posts
- 2,623
  Likes
Kernfusion said:
So I do it this way: while the system is running and trading I'm recording everything into basically 3 SQL tables: Prices(timestsmp, bid,ask..), Orders, and "Calculations"(the actual name is different but it's stupid and I'd rename it if it wasn't so ingrained in all the other code already and I could think of a better name (that table currently has like 80 columns and I keep adding more from time to time)., so because the system is event-driven, every new price-tick is going through the whole system and I repeat all the calculations for each of them (forecasts, positions, etc.)., and each tick might result in an order(or a change in the limit-price for an existing order) in the end. So these 3 tables are of course huge (~ tens of gigabytes, tens of millions records right now).
I decided from the beginning that the system should operate at the "natural resolution of the problem", so I record every tick and then I can always down-sample it if I want (in hindsight a huge overkill, but too late to change it now ). So the first time I reduce that resolution is daily, I realised that I really don't need to be that precise and record everything after the system was mostly written, so I just added a daily SQL job which deletes all ticks and associated calculations (if that tick doesn't have an associated order) except 1 per minute, so effectively at the end of the day the system stores 1-minute resolution.
So that real-time part of the system is all "regular programming", like for-loops, classes, etc.
Then weekly (or at any time I want) I run Matlab scripts on basically these 3 tables. At this stage all the code is vectorized and fast as I'm working with time-serieses loaded from SQL to matlab. And calculating things like PnL or the whole equity curve is quite simple when your code is vectorized (python is essentially the same thing with a different syntax). e.g. I can calculate the whole PnL curve of one contract like this:

dollPosWhenLong = legPriceBid .* legPositions .* (legPositions > 0);
dollPosWhenShort = legPriceAsk .* legPositions .* (legPositions < 0);
dollPosTot = sum(abs(dollPosWhenLong) + abs(dollPosWhenShort), 2);
PnL = sum((longPnL + shortPnL), 2);

and then return like this:

ret = PnL ./ lag(dollPosTot, 1);

and then the whole equity curve in dollar or percentage terms with just 2 lines of code:
compRet = cumprod(1+ret)-1;
cumPnL = cumsum(PnL);
Which I can then plot with another line:
figure;plot(cumPnL)
View attachment 233733

(this is not the final plot, as I need to merge these time-serieses for all the contracts of the current future, or merge all contracts in the system if I want to plot the overall performance..)
So I essentially implemented all analytics myself..
Also, when loading this data from SQL, I usually down-sample it again by loading e.g. only every 100th data-point to make it faster. But I still can plot the most detailed graph with every point, which sometimes helps a lot with troubleshooting., as I also plot a bunch of other time-serieses from that "Calculations" table. e.g. here's all forecasts for 3KTB over time:
View attachment 233734
and this is the main price plot with both bid and ask prices showing the filled orders:
View attachment 233736

the nice thing that matlab can do is you can put the cursor on any individual point and you can setup it that way that the tool-tip will show the actual database ID of that point, so you can go back to the db and run a SQL-query to see more details.
Actually in the beginning this was a very new problem to me - how do I even check what my system is doing? I mean it's easy to reason about your code when you have like 5 data-points\variables and 3 for-loops, but when you get a million of different things happening every day, how do you even look at them all together? At first I remember thinking - ok, no worries, I'll just log everything - easy. So I logged everything, and got like 3 million log records per day in the table in the form "placing order for Vix at such price..", which if read them all, I would still be reading
- So plotting everything is the only way out, inspecting visually first and then going back to SQL and\or actual code if something looks wrong on the graph. As the graph immediately gives you the picture of what the system did over the whole time-period. (also that's how I avoided writing unit-tests for everything, which would've quadrupled my efforts otherwise )
More...

I do miss the plotting functionality in Matlab. So much nicer than the cryptic API of matplotlib in python. Pity that you have to pay for it.

GAT

#2227 Jul 11, 2020

Share
wopr
- 113
  Posts
- 93
  Likes
I did some work on the logging infra, so wanted to share, maybe others find it useful.

I mentioned previously that I use Google Cloud for some stuff, one of them is logging.
I have logging configured to log locally to disk (SD card actually, it's a Raspberry Pi) and to Google Cloud (GCP). The whole thing was easy to setup.
I don't want to run any agents that would listen for file changes and send them to GCP on my Pi (which is the standard way to do log collection), so I'm using the Google's Python lib to send logs https://github.com/googleapis/python-logging. I was concerned about performance, and for sure didn't want to make an RPC call to Google's DC every time I log something, but the library handles it well, it does buffering and makes calls asynchronously in a separate thread. Since my system like most is heavily IO bound, there's no impact to performance.
Nice thing is that it supports structured logging, so I can send arbitrary key-value pairs, and then later on filter based on those in the web UI (screenshot below). I use this for example for environment, I have TEST, DEV and PROD environments, and I want to filter logs based on that (I don't sent TEST logs to GCP tho).
Another thing that was imperative to me is that I don't have to do anything special to log something, eg. calling a function in that library was out of the question. This hooks in nicely into Python's logging system, by attaching a handler to the logger you want, so there's nothing in my code that says where the logs go, logging is a simple log.info("Message").
I pass in the environment when instantiating the logger once, which causes each log message to have the env and it shows up in the web UI as well, here's what that looks like.

There's also a nice way to fetch and read those logs via API, using either the Python client or the CLI they provide, but so far I've used only the web UI.

Then, I configured a filter to send all logs with level ERROR or higher to a PubSub, which on the other end triggers a cloud function (Google's equivalent to AWS Lambda) that sends that to Slack, so I can see if something is wrong in Slack. I'm not a fan of noisy systems, so tweaking what goes in there is still work in progress, I only send stuff I absolutely need to see ASAP and that are actionable.
This part doesn't work yet unfortunately, because of a bug in Cloud Functions, but ETA for the fix was this past Friday

Here's a screenshot of an order execution:

I'm still tweaking the filters and log levels so it's a bit noisy.

#2228 Jul 20, 2020

Share

globalarbtrader and FCT like this.
Kernfusion
- 386
  Posts
- 157
  Likes
wopr said:
, so I can send arbitrary key-value pairs, and then later on filter based on those in the web UI (screenshot below).
More...

so actually you can have multiple "keys" for one log-record not just one, like tags essentially ? (e.g. the actual message "something went wrong" will have several tags like environment, severity, etc)

#2229 Jul 22, 2020

Share
wopr
- 113
  Posts
- 93
  Likes
Kernfusion said:
so actually you can have multiple "keys" for one log-record not just one, like tags essentially ? (e.g. the actual message "something went wrong" will have several tags like environment, severity, etc)
More...

Correct, I think "tags" is a good name, or "labels".

#2230 Jul 23, 2020

Share

(You must log in or sign up to reply here.)

Search