Problem with Yahoo Finance historical data using R quantmod

Discussion in 'Data Sets and Feeds' started by ajensen, Oct 29, 2019.

  1. ajensen

    ajensen

    I have an R script that uses the quantmod library to pull daily data for about 500 stock symbols every morning. Usually it works fine, but today it is giving "HTTP error 401" for many symbols, although it is getting data for some. Has anyone had this problem? I don't know if my R code needs to be changed.
     
    Nobert likes this.
  2. d08

    d08

    Most likely an invalid cookie. Your cookies get banned if you make too many requests in a certain time. I don't know what quantmod uses but I suspect it's some sort of cookie cycling.
     
  3. ajensen

    ajensen

    Thanks d08. I added a Sys.sleep(10) line to pause for 10 seconds between requests. There are still failures to get data for some symbols, and of course the script runs much more slowly. I wonder if there is another solution.
     
  4. d08

    d08

    The cookie could be invalidated? I don't use Yahoo much anymore but that was the issue before. This would explain why it's partially working.
     
  5. I pull end-of-day data on about 600 symbols from Yahoo and I haven't been seeing any issues. Since I made the changes to support the cookies a couple of years ago the downloads have been pretty reliable.

    BTW, I generate a new cookie for each new session (i.e. daily).
     
  6. ajensen

    ajensen

    Thanks for the information. How do I generate a new cookie for each session in R? The R code for the function getSymbols that gets the data is at https://github.com/joshuaulrich/quantmod/blob/master/R/getSymbols.R

    Here is the block of code that has "cookie" in it:

    new.session <- function() {
    tmp <- tempfile()
    on.exit(unlink(tmp))
    for (i in 1:5) {
    h <- curl::new_handle()
    curl::handle_setopt(h, .list = curl.options)
    # random query to avoid cache
    ru <- paste(sample(c(letters, 0:9), 4), collapse = "")
    cu <- paste0("https://finance.yahoo.com?", ru)
    curl::curl_download(cu, tmp, handle = h)
    if (NROW(curl::handle_cookies(h)) > 0)
    break;
    Sys.sleep(0.1)
    }
    if (NROW(curl::handle_cookies(h)) == 0)
    stop("Could not establish session after 5 attempts.")
    return(h)
    }
     
  7. d08

    d08

    I've been using a randomized set of cookies, the same since they became a requirement.
     
  8. Not sure. If you do a search for "C# yahoo cookie" in google you will get a bunch of hits. This is what I based my code on. The important thing is that you establish a new cookie for each session that way the cookie doesn't have a chance to expire. I've never had Yahoo block me even during my testing when I was doing a lot of requests.
     
  9. gaussian

    gaussian

    From an engineering standpoint, the way we deal with this is to use an exponentially increasing wait time. Since this service is not as robust, you may prefer to linearly scale your wait times.

    If the interpretation of `401 Unauthorized` is simply you being rate limited, each time you get this request increase your sleep timer by some constant amount (perhaps half a second). If this is still too little, move on to exponentially increasing wait times - timing out after some upper threshold.

    As you've discovered you can also play with cookies here, but you may be walking a fine line with their terms of service. But if you do go the black hat route, you may also want to randomize your user agent and locale as well - I don't recommend it though since this can get you banned permanently.
     
    jtrader33 likes this.
  10. TheBigShort

    TheBigShort

    @ajensen 500 stocks on getSymbols? How long does that take you?

    Instead try installing the packages "BatchGetSymbols". You can run it in parallel. 500 stocks should only take you 1 minute.

    If you want help with the code (you cant figure it out yourself) reply back and I will help you out.

    Also if you are just looking for the quote, use getQuote() instead. It'll be even faster!
     
    #10     Oct 30, 2019