Building a systematic system part one - data capture

globalarbtrader · Apr 28, 2015

A while ago I ran a series of posts on how you would write some python code to systematically trade using the interactive brokers C++ API.

Whilst I hope this was helpful it was just a starting point. There are at least two major projects to undertake before one could actually trade. The first is the design of such a system. This is the subject of a book I am writing, which I hope will be published (although if I am being honest writing this blog post is displacement activity to avoid proofreading duties). Secondly is the implementation; the nuts and bolts if you like. Apart from a single post on execution I've not given you many hints in this direction.

Thanks to overwhelming demand I've decided to write a series of posts on the issues around implementing such a system, the key decisions you'd have to make, and some options for solving the problems involved.

The rest is on my blog:
http://qoppac.blogspot.co.uk/2015/04/system-building-data-capture.html

rohan2008 · Apr 28, 2015

IMHO: you might want to add notes about forward contract tick/candle data as well. This can help for spread trading. I collect forward contract info although I don't trade forward contracts, the change in the spread can at times help ones strategies.

volpunter · Apr 29, 2015

Thanks for sharing your thoughts on your Python implementation. I may not agree Python is the right tool for this but that aside I have couple points that may disagree with points made in your blog. I wish you could take the below criticism in a constructive way because I have added an explanation and rational to each point, made:

* You said "I am trading futures, in a fully automated system, which is relatively slow and where latency is not an issue, using only price data. "
-> I am not sure I agree, algorithmic futures trading is anything but slow and latency is of utmost importance, why else do you think there are microwave mechanisms in place between Aurora and Mahwah. Of course you can choose to implement an architecture that is only capable of trading low latency strategies but certainly futures market structure and market capability is more in the realm of microseconds.

* "Proactive or passive tick response" -> This should be a non-issue. Virtually every broker API and trading architecture is built on an event-processing model. Incoming pricing data are the defined events and your system ought to react as response to the incoming pricing data. I have not seen any rational on your end why you would want to even consider a "pull implementation" where you ask a broker or data vendor for prices (other than of course for backfill purposes or to acquire historical back-testable data series). You can internally implement a timer if you so wish and build candles that way.

* "Open, Close and intraday" -> I do not fully understand what you are trying to say here. If you want to develop and test a strategy then you need historical data to test over. In the case of futures contracts that would be the historical data of each single contract from the first day of trading until the expiration of the contract. Tick based historical data are nowadays very easy to come by so I am not sure why you make the latency of your system a function of the availability of your captured data. You can cheaply purchase high precision futures contract exchange data. Also, closing prices are NOT untrustworthy. Closing prices are what they are: Official closing prices. And trading at the next session's open just because you cannot get a fill at closing prices may not be the best idea: If you truly care to close your position before the end of the trading session in a given contract then there are a myriad of other, better, ways to close the position near or at the closing price. By the way the following US exchanges all offer Market on Close order type capability: CBOT, CME, KCBOT, MGE, NYBOT, COMEX, NYMEX

* When and how often? -> As said above if you deal with intraday data then you should not pull data but subscribe to events to have incoming data streamed to your platform.

* Irregular timeseries -> You think about this the wrong way around: To build daily compressed time series from intaday time series you need to either have the complete set of intraday data (down to the tick) or else accept that your daily compressed bars will be inaccurate. Why you want to re-invent the wheel in the first place is beyond me (data providers that make available tick based intra day live and historical data as well as daily data are very cheap nowadays especially when limiting oneself to futures data), but building daily compress data points from only intra day snap shots is a horrible way to go about things, especially with a sampling frequency of 1 hour: You will almost necessarily be off by several ticks on each, your high and low of the day because highs and low occur with a very high probability in between your sampled snapshots. You make this tragically complex and error prone the way you described it.

* "TimeStamps": Why are you concerning yourself with this issue at all? You are dealing with exchange traded futures contracts, hence the price data you receive should already contain the official exchange time stamp, not a broker time stamp, not your own time stamp but an official exchange time stamp. Simple as that, done!

* "Getting synchronised tradeable intraday prices isn't easy, except when an explicit market in the spread is quoted (as for calendar spreads in certain markets, like Eurodollar)."
-> This is not true: You simply subscribe to all the open Eurodollar contracts and receive all live streaming prices for each traded contract in Eurodollar. Simple as that. No need to synchronize. For historical data backtesting you simply store the timestamped contracts and read them in a timestamp sorted fashion.

* Spikes and cleaning -> There should not be different options. When you receive a price of zero or one that lies x standard deviations away from the previously traded prices then that is an erroneous quote, period. You filter it out and are done.

* Volumes – beware You say: "One general reason is that as a rule volume data doesn't seem to be as reliable as price data. "
-> that is not true: The volume shown at a specific trade with time stamp is exactly that. Hence the cumulative volume during a given trading session can be 100% accurately determined.

globalarbtrader said:
A while ago I ran a series of posts on how you would write some python code to systematically trade using the interactive brokers C++ API.

Whilst I hope this was helpful it was just a starting point. There are at least two major projects to undertake before one could actually trade. The first is the design of such a system. This is the subject of a book I am writing, which I hope will be published (although if I am being honest writing this blog post is displacement activity to avoid proofreading duties). Secondly is the implementation; the nuts and bolts if you like. Apart from a single post on execution I've not given you many hints in this direction.

Thanks to overwhelming demand I've decided to write a series of posts on the issues around implementing such a system, the key decisions you'd have to make, and some options for solving the problems involved.

The rest is on my blog:
http://qoppac.blogspot.co.uk/2015/04/system-building-data-capture.html
More...

globalarbtrader · Apr 29, 2015

rohan2008 said:
IMHO: you might want to add notes about forward contract tick/candle data as well. This can help for spread trading. I collect forward contract info although I don't trade forward contracts, the change in the spread can at times help ones strategies.
More...

I covered spread collection although I probably could have been more explicit.

I didn't go into candles since I don't use them, so I've added a few lines.

Thanks for the constructive feedback

volpunter · Apr 29, 2015

I said: Of course you can choose to implement an architecture that is only capable of trading low latency strategies but certainly futures market structure and market capability is more in the realm of microseconds.

of course I meant to say "low frequency" not "low latency"

volpunter said:
Thanks for sharing your thoughts on your Python implementation. I may not agree Python is the right tool for this but that aside I have couple points that may disagree with points made in your blog. I wish you could take the below criticism in a constructive way because I have added an explanation and rational to each point, made:

* You said "I am trading futures, in a fully automated system, which is relatively slow and where latency is not an issue, using only price data. "
-> I am not sure I agree, algorithmic futures trading is anything but slow and latency is of utmost importance, why else do you think there are microwave mechanisms in place between Aurora and Mahwah. Of course you can choose to implement an architecture that is only capable of trading low latency strategies but certainly futures market structure and market capability is more in the realm of microseconds.

* "Proactive or passive tick response" -> This should be a non-issue. Virtually every broker API and trading architecture is built on an event-processing model. Incoming pricing data are the defined events and your system ought to react as response to the incoming pricing data. I have not seen any rational on your end why you would want to even consider a "pull implementation" where you ask a broker or data vendor for prices (other than of course for backfill purposes or to acquire historical back-testable data series). You can internally implement a timer if you so wish and build candles that way.

* "Open, Close and intraday" -> I do not fully understand what you are trying to say here. If you want to develop and test a strategy then you need historical data to test over. In the case of futures contracts that would be the historical data of each single contract from the first day of trading until the expiration of the contract. Tick based historical data are nowadays very easy to come by so I am not sure why you make the latency of your system a function of the availability of your captured data. You can cheaply purchase high precision futures contract exchange data. Also, closing prices are NOT untrustworthy. Closing prices are what they are: Official closing prices. And trading at the next session's open just because you cannot get a fill at closing prices may not be the best idea: If you truly care to close your position before the end of the trading session in a given contract then there are a myriad of other, better, ways to close the position near or at the closing price. By the way the following US exchanges all offer Market on Close order type capability: CBOT, CME, KCBOT, MGE, NYBOT, COMEX, NYMEX

* When and how often? -> As said above if you deal with intraday data then you should not pull data but subscribe to events to have incoming data streamed to your platform.

* Irregular timeseries -> You think about this the wrong way around: To build daily compressed time series from intaday time series you need to either have the complete set of intraday data (down to the tick) or else accept that your daily compressed bars will be inaccurate. Why you want to re-invent the wheel in the first place is beyond me (data providers that make available tick based intra day live and historical data as well as daily data are very cheap nowadays especially when limiting oneself to futures data), but building daily compress data points from only intra day snap shots is a horrible way to go about things, especially with a sampling frequency of 1 hour: You will almost necessarily be off by several ticks on each, your high and low of the day because highs and low occur with a very high probability in between your sampled snapshots. You make this tragically complex and error prone the way you described it.

* "TimeStamps": Why are you concerning yourself with this issue at all? You are dealing with exchange traded futures contracts, hence the price data you receive should already contain the official exchange time stamp, not a broker time stamp, not your own time stamp but an official exchange time stamp. Simple as that, done!

* "Getting synchronised tradeable intraday prices isn't easy, except when an explicit market in the spread is quoted (as for calendar spreads in certain markets, like Eurodollar)."
-> This is not true: You simply subscribe to all the open Eurodollar contracts and receive all live streaming prices for each traded contract in Eurodollar. Simple as that. No need to synchronize. For historical data backtesting you simply store the timestamped contracts and read them in a timestamp sorted fashion.

* Spikes and cleaning -> There should not be different options. When you receive a price of zero or one that lies x standard deviations away from the previously traded prices then that is an erroneous quote, period. You filter it out and are done.

* Volumes – beware You say: "One general reason is that as a rule volume data doesn't seem to be as reliable as price data. "
-> that is not true: The volume shown at a specific trade with time stamp is exactly that. Hence the cumulative volume during a given trading session can be 100% accurately determined.
More...

Butterfly · May 2, 2015

nothing wrong with pulling quotes, instead of "waiting" to receive them as an event

that's how it has been done since the beginning of the stock market,

only a strategy that simply reacts on price would need to handle quote feeds as events, not a strategy based on fundamentals or historical price patterns based on OHLC.

volpunter · May 2, 2015

when fundamental data change you want to be informed right away or you wanna find out during your next data pull tomorrow morning?

Butterfly said:
nothing wrong with pulling quotes, instead of "waiting" to receive them as an event

that's how it has been done since the beginning of the stock market,

only a strategy that simply reacts on price would need to handle quote feeds as events, not a strategy based on fundamentals or historical price patterns based on OHLC.
More...

Butterfly · May 2, 2015

maybe if you had any experience in managing billions with fundamental valuation as the core of your strategy, you would know that it would not impact your short term decision making process.

You are not going to sell all your position overnight simply because some fundamental figure change one day. You take into consideration a number of fundamental factors at a regular interval (hence the pull data) and do your decision process for the trades. You do not overreact or "pre-empt" any decision simply because one fundamental data has changed. The process cycle is much more long term term than that and reacting to every "news" out there. That's only what a trader would do, not multi-billions Fund Managers, even though multi-billions Hedge Funds do act like traders rather than classic Fund Managers.

Even with an Earnings surprise strategy, the announcement would be done after the close or before the open, so a pull function after market close is perfectly acceptable.

Managing an event queue for fundamentals, no way ? only a technical poseur with no clue would pull such a stunt.

When price is at the center of your trading strategy and nothing else, a constant data feed of prices through a "channel" like a message queue is indeed relevant. Events can then be triggered when certain patterns are detected.

volpunter · May 2, 2015

A) we are not talking about billion dollar fundamental funds in case you have not noticed. You again miss the boat here.

B) even a huge hedge fund or long only fund processes information right away not an hour later or day later. If you are analyst and deliver significant fundamental news a day late to the likes of Grundlach or Griffin your ass is fired right away. Nobody stated that incoming fundamental news will always result in a trade. It's about having a strategy know what is going on regardless of strategy type. That is what event processing is all about.

Butterfly said:
maybe if you had any experience in managing billions with fundamental valuation as the core of your strategy, you would know that it would not impact your short term decision making process.

You are not going to sell all your position overnight simply because some fundamental figure change one day. You take into consideration a number of fundamental factors at a regular interval (hence the pull data) and do your decision process for the trades.

Even with an Earnings surprise strategy, the announcement would be done after the close or before the open, so a pull function after market close is perfectly acceptable.

Managing an event queue for fundamentals, no way ? only a technical poseur with no clue would pull such a stunt.
More...

Butterfly · May 2, 2015

A) same, a fundamental strategy has generally a long term view and therefore wouldn't react preemptively simply because one fundamental has changed. Fundamental data changes are not prices. Again, you are focusing on the wrong point.

B) I thought we were talking trading apps ? I didn't realize an analyst was a trading app. Incoming news processed by humans is not the issue here, it's how it's being used by the trading app. A fundamental strategy would not react on news or events simply because they are being pushed. Hence, a pull is more appropriate since the strategy will be more focus on a periodic cycle rather that an "ad hoc" cycle.