You have to be vigilant in ensuring the data you use for back-testing your strategy is correct. I would suggest having multiple data sources that you can cross reference. Yes, that will cost more, but losses from incorrect data may surpass any extra cost of having multiple data sources. Below is the issue I have recently encountered. Luckily I caught it before it cost me dearly in trading losses. If anyone reading this find any issues with my test, I welcome your response so that I could correct my mistakes. I have identified a serious issue with backtestdata.com’s TICK data. According to my finding backtestdata’s TICK charts indicate that Flash Crash of 2010 happened at 7:45am EDT, while it actually happened at 2:45PM. When I highlighted this issue to backtesdata.com, not only they refused to fix it, they refused to even acknowledge it. I provided numerous sources of data including backtestdata’s 5-min charts, but without any success to get attention to this issue. If you are using backtesdata.com data in your research, I urge you to re-verify your test results as they may be impacted by this issue. If you are interested in more details, visist the blog I wrote describing this issue at http://mrbacktestdata.blogspot.com/2015/03/dont-buy-back-test-data-from.html I am not a blogster, but elite trader does not allow posts with more than 10,000 characters so I needed another way to get all of the info out. After numerous attempts backtestdata.com has not rectify the issue and every attempt I made to point out the problem, including showing them discrepancy between their own TICK and Minute charts was faced faced with defensive response. In a number of emails I was told that their Minute charts were generated from TICK data, so they must be wrong although to me they appeared correct. If they "CORRECT" the issue with Minute charts, anyone purchasing their Minute data will have issues as well. read below for chronological sequence of events leading up to my discovery and all the attempts I made to have backtestdata.com to rectify the problem, but unfortunately with NO SUCCESS. Thank you and good luck in your trading. MR
07h45 AM EST is equivalent to 02h45 PM UTC ("Universal Time Coordinates", broadly equivalent to GMT). In most data repositories I am familiar with, tick data is stored in UTC (regardless of the time zone of the exchanges/dealing it relates to), and you have to transform it to the time zone you are interested in. Have you accidentally imported EST data which your backtest system thinks is UTC data?
Thanks for the quick response abattia. I don't think I imported the data in UTC format, although I am almost sure they did. I had them generate same charts from their historical data server and I saw the same issue in their charts. Do you know if there a way to correct my data files which are already in Ninja Trader format? Thanks again, MR
By default, NT assumes the imported data is in UTC. So if you cannot remember what you did, you likely went with the default and imported EST data as though it was UTC (hence 02h45pm appears as 07h45 am). Delete the data you have in NT, and then re-import from the original source data, but using EST as the time zone of the imported data. And take down your dig at backtestdata, as the fault is probably yours rather than theirs ...
Indeed that is probably the problem. If tickdata would be wrong there would be a lot of people who would report this. Did you find anywhere somebody who had the same problem? You can easily check if it was a time problem or not: Compare the data in Ninja trader with the original data. if you have exactly the same quotes but always with the same time difference it is for 100% a time problem, not a data problem. So simple to check.
I agree with both of you. I actually do believe it is time not data issue. I even suggested to backtestdata's tech support as such in hopes that they would verify this and correct it. I did not have the info abattia provided, which would perhaps clarify their efforts but I did go as far as telling them that their data have 7 hour timestamp offset. Unfortunately, they refused to even look into this issue. I got to say abattia, you gave me a ray of hope. I really wanted it to be my mistake. Unfortunately, it only lasted as long as it took to recopy the data and run through backtestdata's install program. Then I remembered that the only input you can provide is your license#. Backtestdata.com provides the data in encrypted files, then you need to run through an install program which unpacks the files and converts them into NT format and copies them to appropriate NT directory. Below are few snapshots to show you. My best guess based on your comments is that backtest guys recently updated the installer program and messed up on EST/UTC setting on ES tick data, there could be others as well but I only looked at ES. After data is imported to NT7\db\tick\ES##.## directory I see the data that is used by NT. It appears to be in "year""month""date""hour".last.ntd format. So for example 201110241000.Last.ntd. There is one file per hour. Does anyone know, if I write a script to change the name of these files to appropriate hour, will that work or will I totally mess up my NT configuration? I know I am probably oversimplifying the problem, but any input on how to fix it is much appreciated. Here are the snapshots for more info. Thanks.
Export the data using NT standard export. Then delete within NT the data you have just exported. Then reimport to NT the data you exported, but with the import timezone set to EST.
Well said! I once created a very promising equities system that relied on daily data from a well-known and widely used supplier. My backtesting assumed that the data presented an accurate record of daily lows, highs, etc. over a 5-year period. No big deal. So I thought. I started trading the system, and although the market environment was perfectly supportive of the system logic, my results weren't good. So I backed off trading and gave myself 90 days to test the system in real-time - logging my own daily data for about 500 equities - then checking it against a few data sources. What I found wasn't pleasant. Daily lows on my list of equities were often incorrectly logged not only by the data supplier, but by major financial internet sites that were reporting the same incorrect data. Thus, given the fact that real-time data was suspect, I had every reason to doubt the results of my backtesting, and I tossed the system. There are two lessons here. (1) The results of backtesting should be considered in the context that the data is likely to have some errors that may be difficult to identify in the backtesting environment, and (2) A system needs to be forward tested in real time to verify the assumptions developed in backtesting, before cash is put to work. Yes, we'd all like to believe that the results we see in backtesting are a reliable prediction of what we'll experience going forward, but it just ain't so. Reality is really the only judge of what works, and what doesn't.
@lindq Yes, this is especially true when you use Range or Tick data. Different providers will record execution data slightly differently and depending on your strategy, the difference could have significant P&L and other stats impact. For example: I trade ES (S&P Futures) using pre-programmed strategies. I did initial back test using historical data from Kinetick, then because I trade at Interactive Brokers, I turned on my strategies using data from IB. What I noticed after few weeks, is that I had discrepancy between my real trades and simulated trades (using Kinetick data). Apparently to improve data reliability IB aggregates executions every 0.1 sec, which doesn't sound like a lot but did create discrepancy between my real and test parameters. Over long term, I am sure P&L impact would be negligible since changes would work both ways. However, in real time it drove me nuts, every time I got into a loosing trade that did not register on test data. I ended up switching to Kinetick data for real trading to try to keep my sanity. @abattia Thanks again for your valuable input. I have not had the chance to try it yet, but will do as soon as I can and will let you know if it work.
It really does sound like a basic timezone mistake. Wait until you try and live in countries like Singapore and try and track the different daylight savings changes worldwide - you invariable screw it up. A simple look at the chart pattern on a longer timeframe from two different vendors should verify the timestamp shift. You should also verify exactly what types of trades contribute towards the tick data... e.g. spread trades, other multi-leg/combination trades, swap-for-physical etc. It's quite possible that different vendors use different rules. On the stock market side, this is also apparent on tick data. Some use the primary exchange venue trades only, others exclude specific trade condition codes. In other words - try and compare apples with apples rather than shooting the messenger. And if you data vendor is unable to give you a precise answer then you should not use them.