Data base setup and polygon.io API, cost?

dholliday · Jan 30, 2021

Hi Daniel.a,

Instead of a database, I keep historical data in CSV files. The file structure looks like this:

Historical Data
--DTN
----DAILY
------A
--------A.csv
----------RequestID,2020-06-19T02:07:46,89.7900,87.7600,89.4100,88.7300,1784741,0, ----------RequestID,2020-06-18T02:07:46,88.5000,87.2600,87.4300,88.2100,1491515,0,
----------RequestID,2020-06-17T02:07:46,88.7500,87.4600,87.8500,87.9200,900794,0,
--------ANN.csv
--------AAP.csv
--------AAPL.csv
--------etc
------B
------etc
----MINUTE
------A
------B
------etc
--Other Data Provider would be separate
----DAILY
----etc

This makes so many things easier. I find a database an annoyance and completely useless.

I write the data exactly as it comes from IQFeed.

After downloading the data, I plug the holes. ie; There are days where there are no trades and IQFeed does not download data for that day. Same with minutes. Most of the day there may not be any trades so I fill in the missing days and minutes.

- If for some reason you need the data in a database, just write a program to read the files and put it in the database. It should only take a minute to load when you need it.

- point in time data for index constituents for equities. IQFeed has this.

- I have some day-trading systems that close out a few minutes before the market close. I backtest these with daily data. Works fine. Averages out.

That said I will soon be back-testing with minute data. The reason for this is that daily data from IQFeed, and other data providers that I’ve checked with, is incorrect. Daily data has trades that did not occur in the tick data and therefore could not be traded on. Think large blocks traded between funds/brokers, etc. and reported later. These trades can take place well outside of the daily range, but there they are in the daily data. This problem can often be seen as very long tails on a chart. Minute data from DTN is not corrected, and therefore correct. Backtests will be more accurate.
As ValeryN inferred, data maintenance is a big project by itself.

Daniel.a · Jan 30, 2021

dholliday said:
Hi Daniel.a,

Instead of a database, I keep historical data in CSV files. The file structure looks like this:

Historical Data
--DTN
----DAILY
------A
--------A.csv
----------RequestID,2020-06-19T02:07:46,89.7900,87.7600,89.4100,88.7300,1784741,0, ----------RequestID,2020-06-18T02:07:46,88.5000,87.2600,87.4300,88.2100,1491515,0,
----------RequestID,2020-06-17T02:07:46,88.7500,87.4600,87.8500,87.9200,900794,0,
--------ANN.csv
--------AAP.csv
--------AAPL.csv
--------etc
------B
------etc
----MINUTE
------A
------B
------etc
--Other Data Provider would be separate
----DAILY
----etc

This makes so many things easier. I find a database an annoyance and completely useless.

I write the data exactly as it comes from IQFeed.

After downloading the data, I plug the holes. ie; There are days where there are no trades and IQFeed does not download data for that day. Same with minutes. Most of the day there may not be any trades so I fill in the missing days and minutes.

- If for some reason you need the data in a database, just write a program to read the files and put it in the database. It should only take a minute to load when you need it.

- point in time data for index constituents for equities. IQFeed has this.

- I have some day-trading systems that close out a few minutes before the market close. I backtest these with daily data. Works fine. Averages out.

That said I will soon be back-testing with minute data. The reason for this is that daily data from IQFeed, and other data providers that I’ve checked with, is incorrect. Daily data has trades that did not occur in the tick data and therefore could not be traded on. Think large blocks traded between funds/brokers, etc. and reported later. These trades can take place well outside of the daily range, but there they are in the daily data. This problem can often be seen as very long tails on a chart. Minute data from DTN is not corrected, and therefore correct. Backtests will be more accurate.
As ValeryN inferred, data maintenance is a big project by itself.
More...

Hi dholliday,

Thanks for your input, yes that could be an option, perhaps its just an incorrect route to go with database, but somehow i think a database would be better in the long run and better when using several pieces of software in my process.. but you for sure have a point, i will investigate this route.

In regards to IQ Feed, have index constituents, point in time... i have checked with them several times, and i have been told this is not something they offer, either as a list per index, or from the symbol data they provide. Are you sure about this?

Also, when you use csv files and get your price data from IQ, they adjust for splits for daily data (so i believe their data needs to be reloaded after a split to get it correct), they dont adjust for dividends at all, and intraday data they dont adjust at all, and as far as i know they dont provide factors for adjustments for the end user either.. how do you handle this for the data you fetch from them ?

And yes, getting the equity data itself correct is proving to be a big task, life is easier with futures ..

dholliday · Feb 1, 2021

Daniel.a said:
Hi dholliday,

Thanks for your input, yes that could be an option, perhaps its just an incorrect route to go with database, but somehow i think a database would be better in the long run and better when using several pieces of software in my process.. but you for sure have a point, i will investigate this route.

In regards to IQ Feed, have index constituents, point in time... i have checked with them several times, and i have been told this is not something they offer, either as a list per index, or from the symbol data they provide. Are you sure about this?

Also, when you use csv files and get your price data from IQ, they adjust for splits for daily data (so i believe their data needs to be reloaded after a split to get it correct), they dont adjust for dividends at all, and intraday data they dont adjust at all, and as far as i know they dont provide factors for adjustments for the end user either.. how do you handle this for the data you fetch from them ?

And yes, getting the equity data itself correct is proving to be a big task, life is easier with futures ..
More...

You would be surprised how many times I open up the data files to help me figure out why something doesn't work. The data format has changed. Or some field that should have data doesn't. Or some unaccounted for error from the data vendor has occurred. Or...
I had one a few weeks ago where I downloaded daily data for around 1200 symbols. All of the data for all of the symbols came back fine except one. One symbol had the last (most recent) date one day in the future. I kid you not. WTF. I have never seen that in the more than 15yrs I have been using my data vendor. I started a system, it barfed, I open up the symbols file, saw the problem immediately, deleted the line with the future data, and done, everything worked.
I wrote a simple server program so my systems have easy access to the data.

You are correct about IQFeed data. When I first wrote my data download software I wrote code to only download and append the most recent dates and adjust for splits automatically. Within the first week of running live, a symbol that said had split hadn't in the streaming data for two more days. Instead of trying to account for all the possible data problems, I now just delete and download all the data I need every night (I will revisit this in the future). I just did a test and downloaded four years of daily data for 984 symbols. It took about 32 seconds. Another 2 seconds to plug the holes in the data. No systems I run care about dividends.
I don't have any great truths, everybody has a different vision. A few things from my experience:
Companies first collect and save data in flat files (and back them up), then they can rebuild their databases any time they want, any way they want.
Data management is a big job. As you have noted, splits, dividends, bad data, etc.

I misunderstood "index constituents". You are correct. In the past when I have had this kind of problem, assuming there isn't a web service to get this kind of information, I have resorted to web page scraping.

Possibly the best data service is something like NxCore. DTN used to offer this but I think they are on their own now. If I understand, they give you "the tape", all the data for the markets exactly as it happened. from this you can rebuild your minute, daily bars. Account for splits and dividends, etc. You can also backtest as market-replay. The exact same data in real-time and historical. Of course, there is the cost.
Take care

Daniel.a · Feb 1, 2021

dholliday said:
You would be surprised how many times I open up the data files to help me figure out why something doesn't work. The data format has changed. Or some field that should have data doesn't. Or some unaccounted for error from the data vendor has occurred. Or...
I had one a few weeks ago where I downloaded daily data for around 1200 symbols. All of the data for all of the symbols came back fine except one. One symbol had the last (most recent) date one day in the future. I kid you not. WTF. I have never seen that in the more than 15yrs I have been using my data vendor. I started a system, it barfed, I open up the symbols file, saw the problem immediately, deleted the line with the future data, and done, everything worked.
I wrote a simple server program so my systems have easy access to the data.

You are correct about IQFeed data. When I first wrote my data download software I wrote code to only download and append the most recent dates and adjust for splits automatically. Within the first week of running live, a symbol that said had split hadn't in the streaming data for two more days. Instead of trying to account for all the possible data problems, I now just delete and download all the data I need every night (I will revisit this in the future). I just did a test and downloaded four years of daily data for 984 symbols. It took about 32 seconds. Another 2 seconds to plug the holes in the data. No systems I run care about dividends.
I don't have any great truths, everybody has a different vision. A few things from my experience:
Companies first collect and save data in flat files (and back them up), then they can rebuild their databases any time they want, any way they want.
Data management is a big job. As you have noted, splits, dividends, bad data, etc.

I misunderstood "index constituents". You are correct. In the past when I have had this kind of problem, assuming there isn't a web service to get this kind of information, I have resorted to web page scraping.

Possibly the best data service is something like NxCore. DTN used to offer this but I think they are on their own now. If I understand, they give you "the tape", all the data for the markets exactly as it happened. from this you can rebuild your minute, daily bars. Account for splits and dividends, etc. You can also backtest as market-replay. The exact same data in real-time and historical. Of course, there is the cost.
Take care
More...

Thanks dholliday,

Appreciate your view of things, and you sharing your experience in solving it so it works for your workflow.

Someone her at ET shared this link for getting index constituents on a rolling basis and historic point in time.. https://siblisresearch.com/data-services-pricing/

I believe ("I" as someone that have never structured this before) that best would be to save the raw unadjusted data, and then get the factors for splits and dividends separate, and create adjusted files for splits, dividends and both.. this way i have the raw file and can do what i want in the future, and have ready files for back test usage by my software.

I want to avoid reloading the data as you are forced to do (which in your case perhaps is the best solution for you), i believe this is the only way when using IQ Daily... however it will not solve intraday data issue while using IQ.

Polygon supplies downloads for Raw or adjusted for aplits, dividends or both. For both eod and intraday. It also supplies the factors for splits and dividends. So to me, they "should" be able to supply me what i need, (one could just use their factors for adjusting IQ intraday data though).
My only concern is if Polygon.io api is as reliable for live trading as my IQ api that never has failed me for ears.

Cheers

ValeryN · Feb 1, 2021

Daniel.a said:
I want to avoid reloading the data as you are forced to do (which in your case perhaps is the best solution for you), i believe this is the only way when using IQ Daily... however it will not solve intraday data issue while using IQ.
More...

About 5-7 years ago, I used IQFeed to download large quantities of intraday data and had similar experience as @dholliday.

Also, occasionally, their historical data would just disappear for some interval or before certain date. I'd contact support and they reply with something like - there was some problem but now fixed. Also there were instances where historical data was not correctly adjusted or had busted intraday low/high and they would adjust if I talk to them. So, while what @dholliday mentioned is not the prettiest solution, regularly calculating data diff to keep up with the changes they make will be more complex and error prone.

So, wasn't a great source for historical data for my use. Ended up using them only for real-time, and later ditched all together.

I feel like picking data service is like picking a wife. You probably will never be happy ever after but somewhere in between good life with the kinda problems every other couple has or miserable existence with a high monetary consequences if pick a wrong one and blindly trust it will just work.

Val

dholliday · Feb 1, 2021

Daniel.a said:
Thanks dholliday,

Appreciate your view of things, and you sharing your experience in solving it so it works for your workflow.

Someone her at ET shared this link for getting index constituents on a rolling basis and historic point in time.. https://siblisresearch.com/data-services-pricing/

I believe ("I" as someone that have never structured this before) that best would be to save the raw unadjusted data, and then get the factors for splits and dividends separate, and create adjusted files for splits, dividends and both.. this way i have the raw file and can do what i want in the future, and have ready files for back test usage by my software.

I want to avoid reloading the data as you are forced to do (which in your case perhaps is the best solution for you), i believe this is the only way when using IQ Daily... however it will not solve intraday data issue while using IQ.

Polygon supplies downloads for Raw or adjusted for aplits, dividends or both. For both eod and intraday. It also supplies the factors for splits and dividends. So to me, they "should" be able to supply me what i need, (one could just use their factors for adjusting IQ intraday data though).
My only concern is if Polygon.io api is as reliable for live trading as my IQ api that never has failed me for ears.

Cheers
More...

Saving the raw data seems best to me too. Then do with it want you want.

My data solution is not the best, but I'm off working on other things.

I, and I'm sure others, would love to hear your experiences with Polygon.io and a comparison with IQFeed. They look pretty good. Keep us posted.

Thanks,
D

NxCORE by NANEX · Feb 1, 2021

dholliday said:
You would be surprised how many times I open up the data files to help me figure out why something doesn't work. The data format has changed. Or some field that should have data doesn't. Or some unaccounted for error from the data vendor has occurred. Or...
I had one a few weeks ago where I downloaded daily data for around 1200 symbols. All of the data for all of the symbols came back fine except one. One symbol had the last (most recent) date one day in the future. I kid you not. WTF. I have never seen that in the more than 15yrs I have been using my data vendor. I started a system, it barfed, I open up the symbols file, saw the problem immediately, deleted the line with the future data, and done, everything worked.
I wrote a simple server program so my systems have easy access to the data.

You are correct about IQFeed data. When I first wrote my data download software I wrote code to only download and append the most recent dates and adjust for splits automatically. Within the first week of running live, a symbol that said had split hadn't in the streaming data for two more days. Instead of trying to account for all the possible data problems, I now just delete and download all the data I need every night (I will revisit this in the future). I just did a test and downloaded four years of daily data for 984 symbols. It took about 32 seconds. Another 2 seconds to plug the holes in the data. No systems I run care about dividends.
I don't have any great truths, everybody has a different vision. A few things from my experience:
Companies first collect and save data in flat files (and back them up), then they can rebuild their databases any time they want, any way they want.
Data management is a big job. As you have noted, splits, dividends, bad data, etc.

I misunderstood "index constituents". You are correct. In the past when I have had this kind of problem, assuming there isn't a web service to get this kind of information, I have resorted to web page scraping.

Possibly the best data service is something like NxCore. DTN used to offer this but I think they are on their own now. If I understand, they give you "the tape", all the data for the markets exactly as it happened. from this you can rebuild your minute, daily bars. Account for splits and dividends, etc. You can also backtest as market-replay. The exact same data in real-time and historical. Of course, there is the cost.
Take care
More...

Please contact me if you'd like more information about NxCore.

Log in or Sign up

Data base setup and polygon.io API, cost?

dholliday

Daniel.a

dholliday

Daniel.a

ValeryN

dholliday

NxCORE by NANEX Sponsor