Fully automated futures trading

Kernfusion · Jul 3, 2020

yeah, why good times can never last with trend following ? Now when-ever I see a jump to new highs, I just want to sell everything and stop trading forever, because I know that most of it almost certainly will be given back in the next few days
The way I do logging is I have only one simple db table for all logs. And each log-message has a "severity" column, which can take many int values, the higher - the more critical. And then in the code I have a logic that all log msgs with severity>X will also be emailed to me. During development I changed that emailing severity threshold several times and also added some new severity numbers in between existing ones, to achieve the right balance of what I want to see in emails and what is fine to only have in the database. I also have a logic which prevents sending more than X number of emails per minute and more than Y per hour (the rest emailing attempts will be ignored untill the current hour\minute ends), because getting e.g. 1k emails in 1 minute wouldn't be very useful anyway.. And for details I can always go and run a select from that log table..
Apart from severity I also log timestamp, the message itself and "source", which indicates the part of the system where the situation occurred., so it's a pretty-simple structure, and it was enough for me, because usually, I only filter by severity and time and sometimes by source. And if I'm looking for something more specific, by a phrase in the message with like '%%'..
But these are the unstructured\unpredictable log-messages. I don't build reports from them., All the concrete expected decisions the system took (e.g. raw position after receiving a tick, raw and final forecast values, etc.) I record in the more structured tables, where each value has it's own column. So I can use these tables to plot all sorts of things and run specific queries against specific values stored in separate columns..

wopr · Jul 4, 2020

globalarbtrader said:
[Crontab format: minutes then hours. WHY?!]
More...

Rob, if I may suggest trying out systemd timer for periodic tasks, instead of cron. I'm using it in my system. It's highly likely systemd is already running on your system and you don't have to install anything.
Here's a taste, a timer config that triggers my forecasts every day:
Code:
[Timer]
OnCalendar=Mon..Fri 14:40
If you have really weird demands, you can also do something like this (this is my price sync which I run on weird schedules):
Code:
[Timer]
OnCalendar=Mon..Thu 01,02,03,04,05,06,07,08,09,10,11,12,13,15,16,17,18,19,20,21,22,23:11,31,51
OnCalendar=Sun 15,16,17,18,19,20,21,22,23:11,31,51
OnCalendar=Fri 01,02,03,04,05,06,07,08,09,10,11,12,13:11,31,51
There are many other benefits as well. This is a good intro article https://moshib.in/posts/replacing-cron-jobs-with-systemd-timers/

globalarbtrader said:
Oh and I lost some money this week.
More...

Heh, same here, dang grains. I was solid 3% up for the month, but then that went out the window

globalarbtrader said:
I've also been working on a new podcast series with these guys (as well as my regular monthly slot with them).
More...

The way you folks have been teasing this, especially Niels, I'm really excited for this.

In the mean time, I'm redesigning my database design to prepare for loading up a lot more markets (planning to buy historical data from CSIData), I'll be writing up a blog post, but here's the preview.

globalarbtrader · Jul 6, 2020

Kernfusion said:
yeah, why good times can never last with trend following ? Now when-ever I see a jump to new highs, I just want to sell everything and stop trading forever, because I know that most of it almost certainly will be given back in the next few days
The way I do logging is I have only one simple db table for all logs. And each log-message has a "severity" column, which can take many int values, the higher - the more critical. And then in the code I have a logic that all log msgs with severity>X will also be emailed to me. During development I changed that emailing severity threshold several times and also added some new severity numbers in between existing ones, to achieve the right balance of what I want to see in emails and what is fine to only have in the database. I also have a logic which prevents sending more than X number of emails per minute and more than Y per hour (the rest emailing attempts will be ignored untill the current hour\minute ends), because getting e.g. 1k emails in 1 minute wouldn't be very useful anyway.. And for details I can always go and run a select from that log table..
Apart from severity I also log timestamp, the message itself and "source", which indicates the part of the system where the situation occurred., so it's a pretty-simple structure, and it was enough for me, because usually, I only filter by severity and time and sometimes by source. And if I'm looking for something more specific, by a phrase in the message with like '%%'..
More...

That's pretty similar to me. I like the idea of throttling the number of emails per day or something, although my severity level is set pretty high this isn't normally a problem unless something goes badly wrong (or there is a large price movement, resulting in a lot of price thresholds being hit requiring manual checking).

I think the main differences is that I'm using fixed attributes (as well as the source) rather than free text search to label and find log entries.

Kernfusion said:
But these are the unstructured\unpredictable log-messages. I don't build reports from them., All the concrete expected decisions the system took (e.g. raw position after receiving a tick, raw and final forecast values, etc.) I record in the more structured tables, where each value has it's own column. So I can use these tables to plot all sorts of things and run specific queries against specific values stored in separate columns..
More...

This is something I had in my previous system, but haven't yet set up in this one. I pickle the results of the daily backtest that generates optimal positions and buffers, but nothing else after that.

An easy way to do this, which just occured to me, is to introduce a new field into my logging 'value' which means I can write a value with some attributes. Then I can pull those into whatever report I need. It means I can keep the flexibility of the attributes idea, since from experience trying to nail down exactly what should be in these diagnostics can be tricky.

GAT

globalarbtrader · Jul 6, 2020

wopr said:
Rob, if I may suggest trying out systemd timer for periodic tasks, instead of cron. I'm using it in my system. It's highly likely systemd is already running on your system and you don't have to install anything.
Here's a taste, a timer config that triggers my forecasts every day:
Code:
[Timer]
OnCalendar=Mon..Fri 14:40
If you have really weird demands, you can also do something like this (this is my price sync which I run on weird schedules):
Code:
[Timer]
OnCalendar=Mon..Thu 01,02,03,04,05,06,07,08,09,10,11,12,13,15,16,17,18,19,20,21,22,23:11,31,51
OnCalendar=Sun 15,16,17,18,19,20,21,22,23:11,31,51
OnCalendar=Fri 01,02,03,04,05,06,07,08,09,10,11,12,13:11,31,51
There are many other benefits as well. This is a good intro article https://moshib.in/posts/replacing-cron-jobs-with-systemd-timers/

Heh, same here, dang grains. I was solid 3% up for the month, but then that went out the window

The way you folks have been teasing this, especially Niels, I'm really excited for this.

In the mean time, I'm redesigning my database design to prepare for loading up a lot more markets (planning to buy historical data from CSIData), I'll be writing up a blog post, but here's the preview.

View attachment 232447
More...
Thanks, will look at systemd. One of the problems I have is that anything that has appeared since I stopped working (in 2013) will pass me by unless I come across it or someone gives me a heads up. I do try and keep up, but it's not easy...

I'm very jealous of your schema picture: how did you generate it (I assume not hours of work on MS powerpoint?) Could do with one myself.

GAT

wopr · Jul 6, 2020

globalarbtrader said:
I'm very jealous of your schema picture: how did you generate it (I assume not hours of work on MS powerpoint?) Could do with one myself.
More...

Thanks! Not hours of work in MS powerpoint, but MS paint. Joke.
This was generated with PgModeler tool, https://pgmodeler.io/
It's free if you compile it from source yourself, but they also have a free demo which works well, if your database does not have a lot of tables, they limit that in the demo.

However, If I recall correctly, you use something MongoDB backed for storing prices, this, right https://github.com/man-group/arctic? So this tool won't do it for that, but if you have any other stuff in SQL, this would work.

globalarbtrader · Jul 6, 2020

wopr said:
Thanks! Not hours of work in MS powerpoint, but MS paint. Joke.
This was generated with PgModeler tool, https://pgmodeler.io/
It's free if you compile it from source yourself, but they also have a free demo which works well, if your database does not have a lot of tables, they limit that in the demo.

However, If I recall correctly, you use something MongoDB backed for storing prices, this, right https://github.com/man-group/arctic? So this tool won't do it for that, but if you have any other stuff in SQL, this would work.
More...

Yes I use Mongo for everything now, Arctic on top for time series. Appreciate it will be a lot harder for mongoDb, since you have to allow the possibility that field names will be different across documents (==records), and infer the common field names across databases, but will have a google and see if someone has done it...

GAT

Kernfusion · Jul 7, 2020

globalarbtrader said:
That's pretty similar to me. I like the idea of throttling the number of emails per day or something, although my severity level is set pretty high this isn't normally a problem unless something goes badly wrong (or there is a large price movement, resulting in a lot of price thresholds being hit requiring manual checking).

GAT
More...

yeah, ideally if everything works well, I should receive exactly 3 emails 2 times a day when the system starts after morning and evening restart for maintenance, and these are mostly informative log messages like "the system started, our current positions are:.." for which I specifically increased severity. For any other "normal" log messages the severity is lower than the emailing threshold.
I also setup independent email count thresholds (minute\hour) because if something goes haywire, it might generate lots of emails in a matter of seconds, and I don't want to see them all because they will probably be identical, but I don't want to block all emails after such an event for the rest of the day, as something else might happen 6 hours later and I want to see first X emails of that event too, so each hour and minute have their own max email count..

globalarbtrader said:
An easy way to do this, which just occured to me, is to introduce a new field into my logging 'value' which means I can write a value with some attributes. Then I can pull those into whatever report I need. It means I can keep the flexibility of the attributes idea, since from experience trying to nail down exactly what should be in these diagnostics can be tricky.

GAT
More...

yes, I think sometimes instead of raw text people save structured documents in the "message" field of logs, like xml\json, etc, and then they can search in these structures more effectively, parsing out specific expected values.. Another approach is to enable Full-text search on the db-server and it will automatically index the raw text and speed-up search.. But for me a simple query like '%MyText%' works fast enough, when I also limit search by timestamps.. (and I only search in these free-form logs when I want to do a deep investigation on what exactly my system did in that time-period, and it's usually more qualitative than quantitative..)

Also, your system is much more dynamic that mine in general (not sure if it's Python language influence or because it's more geared towards research and experimentation). You dynamically link functions in runtime, can define the whole pipeline from the config, etc. In my system almost everything is "nailed to the floor" and predefined from the beginning., like if I'm going to have an EWMA32 forecast value somewhere and want to plot it in my reports later - then there must be a specific column in some table to store that exact value. If there's no such column - sorry, go back and create one, and then define all the logic around saving it, and only then will this all become possible
It's more work to change such a "static" system, but it keeps things more predictable and clear (I think)..
But I don't use that system for back-testing and research (I can run historical prices through it just to check everything but it's slow and inflexible), it's purpose is to be a real-time trading engine only. For doing experiments and research a more dynamic approach would of course be better..

traider · Jul 7, 2020

Kernfusion said:
yeah, ideally if everything works well, I should receive exactly 3 emails 2 times a day when the system starts after morning and evening restart for maintenance, and these are mostly informative log messages like "the system started, our current positions are:.." for which I specifically increased severity. For any other "normal" log messages the severity is lower than the emailing threshold.
I also setup independent email count thresholds (minute\hour) because if something goes haywire, it might generate lots of emails in a matter of seconds, and I don't want to see them all because they will probably be identical, but I don't want to block all emails after such an event for the rest of the day, as something else might happen 6 hours later and I want to see first X emails of that event too, so each hour and minute have their own max email count..

yes, I think sometimes instead of raw text people save structured documents in the "message" field of logs, like xml\json, etc, and then they can search in these structures more effectively, parsing out specific expected values.. Another approach is to enable Full-text search on the db-server and it will automatically index the raw text and speed-up search.. But for me a simple query like '%MyText%' works fast enough, when I also limit search by timestamps.. (and I only search in these free-form logs when I want to do a deep investigation on what exactly my system did in that time-period, and it's usually more qualitative than quantitative..)

Also, your system is much more dynamic that mine in general (not sure if it's Python language influence or because it's more geared towards research and experimentation). You dynamically link functions in runtime, can define the whole pipeline from the config, etc. In my system almost everything is "nailed to the floor" and predefined from the beginning., like if I'm going to have an EWMA32 forecast value somewhere and want to plot it in my reports later - then there must be a specific column in some table to store that exact value. If there's no such column - sorry, go back and create one, and then define all the logic around saving it, and only then will this all become possible
It's more work to change such a "static" system, but it keeps things more predictable and clear (I think)..
But I don't use that system for back-testing and research (I can run historical prices through it just to check everything but it's slow and inflexible), it's purpose is to be a real-time trading engine only. For doing experiments and research a more dynamic approach would of course be better..
More...

any chances that you open source your system without rules?

Kernfusion · Jul 7, 2020

traider said:
any chances that you open source your system without rules?
More...

The rules are not a secret, it's all in the Rob's books and blog actually, I only use a sub-set of it.. (+ the small stock-pairs trading part which just keeps loosing my money on commissions without really making any )
I wasn't planning to open-source it.. not sure if there's much value in it, as it's just a mechanical thing that implements known science.. Also it's quite bulky and spread-out (has many components: db, a bunch of services, matlab scripts, SQL-queries to monitor and update things..), like even installation of it is not straightforward (god forbid my server breaks down and I have to move it to another one, that would be a nightmare ), all the individual services, db-tables that need to be populated with correct stuff.. And of course, there's no documentation whatsoever Like I would personally never trusted to just download something like that from the internet and start trading my own money I think a much better approach for people is to write the trading code themselves, which they will then intrinsically understand and be able to maintain, fix and trust.. I'm happy to share the implementation approaches I took, although, the "value" of that knowledge doesn't really go beyond the mundane software development..

wopr · Jul 9, 2020

Kernfusion said:
Also, your system is much more dynamic that mine in general (not sure if it's Python language influence or because it's more geared towards research and experimentation). You dynamically link functions in runtime, can define the whole pipeline from the config, etc. In my system almost everything is "nailed to the floor" and predefined from the beginning., like if I'm going to have an EWMA32 forecast value somewhere and want to plot it in my reports later - then there must be a specific column in some table to store that exact value. If there's no such column - sorry, go back and create one, and then define all the logic around saving it, and only then will this all become possible
It's more work to change such a "static" system, but it keeps things more predictable and clear (I think)..
More...

I've been reading last few posts between you and Rob on this topic and thinking about where does my system fit in, and I think I'm smack in the middle between very dynamic nature of Rob's system and static nature of yours.
I don't stitch functions together at runtime, but I also don't store static values in the DB.
The struggle for me was having the design of the system resemble an actual domain of a trading system, while still maintaining flexibility. This is vague, so here's what I mean when I say this, explained on the example of pandas.
Pandas structures that store data and allow efficient operations that we sometimes have to do, DataFrame and Series, really have no place in a sound system design. For example, if you have a series of prices, that tells you nothing about what you can do with those. Can you compute risk from these prices? Are they adjusted or not? Are they for the currently traded contract or carry? In which currency are they? In addition, the dynamic nature of DataFrames means any part of the system can easily add arbitrary columns, encapsulation is hard to that way (also, as a software engineer by trade, it was hard for me to do something like that ). Aditionally, I look at pandas as an implementation detail. I'm using pandas today, but it might be something different tomorrow, who knows. If I have to change that, I don't want to have to change hundreds of places in my code.
I almost look at it this way: I'm not interested in this pandas dataframe or running a rolling mean on it, what I really want to do is compute instrument risk from historical adjusted prices.
However, one of the things I decided early on was that I have to be able to run the backtest on the same system I trade in production, so I do like some of the benefits that pandas gives me, mainly performant operations on a lot of data.
So the way I overcame the problem was to use pandas and DataFrames and all the stuff, but encapsulate it behind a well defined interface, that I defined in terms of the domain. For example, I have an AdjustedPrices class, that has a method calculate_risk(lookback). I have another class called MultiplePrices that has a method calculate_carry(). Both use DataFrames under the hood, but the rest of the system doesn't know that and doesn't care. It also prevents me from making a mistake of, say, calculating risk on carry prices.

Kernfusion said:
But I don't use that system for back-testing and research (I can run historical prices through it just to check everything but it's slow and inflexible), it's purpose is to be a real-time trading engine only. For doing experiments and research a more dynamic approach would of course be better..
More...

As I mentioned above, I do use the same system for trading and backtesting. Just recently, I've been looking to extend backtesting capabilities a lot (price data store redesign - I posted a picture above, I'll be storing data for 100+ markets, and I'd also like better stats) so I looked around at how some other tools do backtesting and was surprised to see that this is not a solved problem at all, especially for futures.
I played with backtrader, quantstrat (in R) Amibroker (standalone application), even considering giving up on my "same system for backtests as for trading" mantra if I find something good, but TL;DR: if you have a multi-strategy futures system you're on your own. So now I'm hacking something on my own, currently trying to see if I can use pyfolio to just render the results. I'm not looking for much, just basic stats around performance, drawdowns, costs, trading speed and some charts.

I've seen how Rob does it in pysystemtrade, this is where things are really nice if you have all the interim data of your system in DataFrames, rendering charts and computing stats is super easy, but how do others do it?