google finance, yahoo finance, msn money

fareastcoast · Jul 2, 2013

Quote from gmst:

Based on the feedback I have received, I have mostly started working using yahoo's data. I just saw this website today - from the sound of it and history, it seems a credible alternative to yahoo data.

https://quantquote.com/support_faq.php

Does anyone has any experience with this data?
More...

Yep, the data is good and fixes many of the errors that show up in Yahoo. Also agree with the above poster. Yahoo > Google, google just has a lot of errors.

Yahoo is not perfect either, for example, from yesterday:
http://finance.yahoo.com/q/hp?s=FOXA+Historical+Prices
The are missing the 'dividend' on 7/1 as shareholders got a quarter share of NWSA.....

My QuantQuote historical stock data update for today had a dividend there of $3.82 which fixed the jump.

gmst · Jul 3, 2013

Quote from gmst:

For holding time periods > 1 day, using adjusted close prices and also deriving your own adjusted OHL prices from yahoo/google data is required.

However, if you are testing for only intra-day trading, then imo using adjusted prices for splits and dividends is not required. Rather, backtesting should be done on unadjusted actual historical price data.

Does above make sense?
More...

Follow-up question:

Consider a multi-day/week trading strategy. Using adjusted close is almost a necessity and it will give correct results. However, there is an inherent assumption when you use adjusted close:
1) Assumption: Dividends are reinvested.

Now, practically, if one has 100k account and invests 10k each in 10 stocks. Let us say all these 10 stocks sell for 100$. So, one buys 100 shares of all these stocks.

When stock A issues a dividend of say 3$, then dividend received = 300$. Now this 300$ must be re-invested in the same stock, if using adjusted close has to make sense.

However, practically speaking - most likely that 300$ will just stay in the account in cash, because of operational issues (odd-lot etc.) and a trader will not care about this 300$. So, if this is what happens in reality, then using adjusted close is not the correct way to do your backtesting.

Rather, a more correct way would be to use "actual close" and actual dividend information in the backtesting to compute total gain = actual price gain + dividend gain. This approach will also result into correct DD figures. Using adjusted close (since it uses a multiplicative factor for dividends) will result into incorrect DD figures.

Comments Please

blah12345678 · Jul 3, 2013

Just to clarify, we need to define adjusted and unadjusted.

To me, today's OHLCV reported by the exchanges is the adjusted price.

The real unadjusted price is the price on trading day #1, with no splits or dividends factored in.

So, in my case, I have 2 fields in my price table - adj_lower and adj_higher.

adj_higher is used to multiply the OHLCV to get the real price on that day.
adj_lower is used to divide into the OHLC to get the split/dividend-adjusted price, as reported by Yahoo and the exchanges.

The reason why I use adj_higher is that (1) it eliminates the discontinuities introduced by the splits and dividends, (2) bigger numbers are easier to calculate with than the really small numbers for stocks having lots of splits.

For example, CSCO's split-adjusted closing price for 3/26/1990 is something like $0.002. But the real price was $24.25

Fast forward to 7/2/13, CSCO's split-adjusted price is $24.32, but it's real, unadjusted price would be $7,371.87.

$7,371.87 is a lot easier to work with than $0.002.

That's just me. You can adjust up, or you can adjust down. But you need to use something to eliminate the discontinuities...
Interesting tidbit...

CSCO's all-time real high reached $23,019 ($82 split-adjusted) on 3/27/2000 with around 6B shares outstanding. On that day, if CSCO merely had 1.65 million shares outstanding like BRKA does now, it's share price would've been around $298,000. Today, it would be $88,450 or so. Compare that to BRKA's current price of $169,000....

But I digress...

Regarding your back-testing, you run into the discontinuity problem mentioned above using the adjusted prices. Plus you'll have multiplication/division errors if you use Yahoo's adj_close since it's rounded to just 2 decimal places. For better accuracy, you'll need to maintain another table with split and dividend info to accurately calculate the real prices and volume.

Yahoo's CSV download doesn't have that info. I don't know who has it.

To get it, you'll have to scrape every price page of every stock to build your table.

gmst · Jul 3, 2013

Quote from blah12345678:

adj_higher is used to multiply the OHLCV to get the real price on that day.
adj_lower is used to divide into the OHLC to get the split/dividend-adjusted price, as reported by Yahoo and the exchanges.

The reason why I use adj_higher is that (1) it eliminates the discontinuities introduced by the splits and dividends, (2) bigger numbers are easier to calculate with than the really small numbers for stocks having lots of splits.

For example, CSCO's split-adjusted closing price for 3/26/1990 is something like $0.002. But the real price was $24.25

More...

From yahoo, I see CSCO split adjusted closing price for 3/26/1990 to be 0.08 whereas real price matches with what you have at 24.25.

http://finance.yahoo.com/q/hp?s=CSCO&d=6&e=3&f=2013&g=d&a=2&b=26&c=1990&z=66&y=5808

Where did you find the value of 0.002

Do I understand you correctly here:
1) When you are using adj_lower, you are starting from TODAY's real OHLCV and dividing for splits and dividends to compute adjusted prices while moving backward in time.
2) When you are using adj_higher, you are using real OHLCV from trading day#1 and then multiplying it with splits and dividends to compute adjusted prices moving forward in time.

Quote from blah12345678:

Regarding your back-testing, you run into the discontinuity problem mentioned above using the adjusted prices. Plus you'll have multiplication/division errors if you use Yahoo's adj_close since it's rounded to just 2 decimal places. For better accuracy, you'll need to maintain another table with split and dividend info to accurately calculate the real prices and volume.

Yahoo's CSV download doesn't have that info. I don't know who has it.

To get it, you'll have to scrape every price page of every stock to build your table.
More...

Makes sense thanks for clear explanation. I also thought the "Most Correct" way would be to start with real OHLCV prices from yahoo and then create a split and dividend table to compute adjusted prices to overcome the error introduced by using 2 decimal place as yahoo does.

However, my question is for "Most Correct" backtesting, don't you think we should use real OHLCV and not assume dividend reinvestment (which is an assumption when we are using adjusted prices of any type). Since in reality, as I gave the example in my post above, 300$ received in dividend most likely won't be invested again, rather it will sit in the account as cash. What are your thoughts on this particular aspect? Also, if we are using real OHLCV prices as they prevailed historically for backtesting, we would measure the real Drawdown as experienced historically. On the other hand, using adjusted prices will result into a different drawdown. The main point of using adjusted prices is that you get the correct log returns over your investment period. However, drawdowns won't be correct. Am I making sense here?

panzerman · Jul 3, 2013

The following site has a powershell script to download data from yahoo:

http://portfolioslicer.com/stock-quote-download-scripts/free-script-download-quotes-yahoo-finance

I had to add this code to line 222 in the script, because I wanted a sorted .csv file as output.

Import-CSV $mergedQuotes | Sort-Object Ticker,{[datetime] $_.Date} | Export-CSV sorted.csv -NoType

blah12345678 · Jul 3, 2013

Quote from gmst:

From yahoo, I see CSCO split adjusted closing price for 3/26/1990 to be 0.08 whereas real price matches with what you have at 24.25.

Where did you find the value of 0.002
More...

I just threw in that particular number. I didn't bother looking that one up. My database does show the .08, however, but it's still useless because the real value is rounded to 2 places. That would be fine if all splits were 2-1, 3-1, 4-1. But they're not. Some stocks split 4/3, 5/4, 6/5. And some reverse split some ungodly number, like 1/10000, 1/2500, etc.

Do I understand you correctly here:
1) When you are using adj_lower, you are starting from TODAY's real OHLCV and dividing for splits and dividends to compute adjusted prices while moving backward in time.
2) When you are using adj_higher, you are using real OHLCV from trading day#1 and then multiplying it with splits and dividends to compute adjusted prices moving forward in time.
More...

Yes

However, my question is for "Most Correct" backtesting, don't you think we should use real OHLCV and not assume dividend reinvestment (which is an assumption when we are using adjusted prices of any type). Since in reality, as I gave the example in my post above, 300$ received in dividend most likely won't be invested again, rather it will sit in the account as cash. What are your thoughts on this particular aspect? Also, if we are using real OHLCV prices as they prevailed historically for backtesting, we would measure the real Drawdown as experienced historically. On the other hand, using adjusted prices will result into a different drawdown. The main point of using adjusted prices is that you get the correct log returns over your investment period. However, drawdowns won't be correct. Am I making sense here?
More...

If all the data is multiplied/divided by the factor, every value will be proportionate to the original - shares purchased, purchase price, liquidation price, etc.. Just re-multiply or divide by the inverse of the factor to get the appropriate prices for your stats. Or keep 2 arrays of prices - adjusted and unadjusted - and reference each with the same index.

But the real question is - how important is it the imaginary CAGR and DD generated by the backtest(s) are 100% correct?

They're just numbers that tell you what you can expect if the future plays out perfectly like the past.

What would be more enlightening is running a Monte Carlo simulation so you can see the range of all possible outcomes. If you dig deeper, you'll see how dependent the final numbers are on the timing of the trades. And you'll see how severe the DD could be in the worst case scenario...

As an example, if Warren Buffett started his career 6 or 12 months earlier or later, there's a good chance he'd be living in his starter home by necessity instead of by choice... and BRKA would be the delisted symbol of a bankrupt textile company.

gmst · Jul 9, 2013

Quote from blah12345678:

But the real question is - how important is it the imaginary CAGR and DD generated by the backtest(s) are 100% correct?

They're just numbers that tell you what you can expect if the future plays out perfectly like the past.

What would be more enlightening is running a Monte Carlo simulation so you can see the range of all possible outcomes. If you dig deeper, you'll see how dependent the final numbers are on the timing of the trades. And you'll see how severe the DD could be in the worst case scenario...

As an example, if Warren Buffett started his career 6 or 12 months earlier or later, there's a good chance he'd be living in his starter home by necessity instead of by choice... and BRKA would be the delisted symbol of a bankrupt textile company.
More...

Hey blah,

Somehow I missed replying to this post. Thanks for your post. While dealing with technical challenges, I had somehow overlooked the broader view of simulations. Thanks for the tip.

syswizard · Jul 17, 2013

What about the new kid on the block..... www.quandl.com ?
http://www.quandl.com/help/api-for-stock-data

gmst · Jul 17, 2013

Quote from syswizard:

What about the new kid on the block..... www.quandl.com ?
http://www.quandl.com/help/api-for-stock-data
More...

Thanks it is interesting and could become very useful. Especially if you download data from multiple sources thus dealing with multiple APIs, quandl allows you to use one interface to download data coming from multiple sources using one API. I think it is a powerful concept.

The other user-friendly thing is that it connects to so many applications viz. excel, R, maple etc. etc.

Where did you hear about quandl originally from?

blah12345678 · Jul 17, 2013

I use Quandl to download the basic futures markets.

You need to register in order to automatically download the data with an API key...

And you need to email Tammer Kamel <connect@quandl.com> in order to gain access for unlimited downloads.

The one problem I've encountered is that the data isn't updated consistently or in a timely manner. For example, 2 days ago, I downloaded around 11 pm/23:00 on Monday 7/15 to grab the data for 7/15, but I had to wait until today (Wed 7/17) to download the Tuesday 7/16 data.

I'm looking at eoddata.com to get a wider range of available data. They'll allow you to get the files via their downloader, FTP, or even via email if you want.