Need Help with A simple math problem

TheBigShort · Jan 1, 2020

Fun brain teaser.

I am making calls to an API where I am limited to my calls.

I have a dataset that contains 471 unique equities, each equity has between 12 and 16 dates.
In total there are 741 unique dates.

The data set is 7536 rows.

Each call I can put in as many tickers and as many dates. However, I am limited to 5000 rows per call.

I am also limited to 80 calls for the month.

Which algorithm does the least calls but includes all dates and tickers in the whole data set?

Example of one call. Pull data for AAPL, MSFT, IBM on "2019-01-01, 2018-01-01". Would return 6 rows. AAPL data for 2019-01-01 and 2018-01-01, MSFT data for both dates and IBM data for both dates.

I uploaded the data set with tickers and dates. I believe I have to do an iterative process but I thought I'd ask here for fun while I work on a for loop.

p.s. I'll also share data set with the first person to solve(if solvable)!

TheBigShort · Jan 1, 2020

OHHHH Soo close!!!! I got it down to 101 calls!!! Can someone do better??

Code:

library(tidyverse)
counter =  0
call.data = list()

while(nrow(data) > 0){
  counter = sum(counter, 1)
  data = data %>% group_by(Ticker)%>% mutate(N = n())%>%arrange(desc(N))
  ticker.inuse = data %>% pull(Ticker) %>%.[1]
  ticker.data = data %>% filter(Ticker == ticker.inuse)
  data = data %>% anti_join(ticker.data)
  new.data = data %>% filter(DataTimeStamp %in% ticker.data$DataTimeStamp)
  data = data %>% anti_join(new.data)
  call.data[[counter]] = bind_rows(ticker.data, new.data)
  print(nrow(data))
}

call.data

data is the data set I posted

tommcginnis · Jan 1, 2020

I would favor an output that fitted seemlessly into the analysis over one that computed its sub-part quickly. For example, if the data were directed into a chronological sequencer, I would work the calls such that date was followed by ticker. If the data were directed towards comparing individual tickers over the entire time horizon, I'd work the ticker for each (available) date, and then compare the results. If this pattern needed to violate the 5000 row limit, I would weigh the costs of that against the loss of utility of mid-study output (samples -- which I use as a sanity check), or the added time to index/rotate the data later, in order to reconfigure rows to what I was *really* after in the first place.

In short, I'd happily violate the 5000 row limit if the benefit of a better-fitting result in the larger analysis suggested it.

[EDIT!] It looks just now like you went with ticker-then-time -- would love to see a few lines of the output!

Illini Trader · Jan 1, 2020

TheBigShort said:
Fun brain teaser.

I am making calls to an API where I am limited to my calls.

I have a dataset that contains 471 unique equities, each equity has between 12 and 16 dates.
In total there are 741 unique dates.

The data set is 7536 rows.

Each call I can put in as many tickers and as many dates. However, I am limited to 5000 rows per call.

I am also limited to 80 calls for the month.

Which algorithm does the least calls but includes all dates and tickers in the whole data set?

Example of one call. Pull data for AAPL, MSFT, IBM on "2019-01-01, 2018-01-01". Would return 6 rows. AAPL data for 2019-01-01 and 2018-01-01, MSFT data for both dates and IBM data for both dates.

I uploaded the data set with tickers and dates. I believe I have to do an iterative process but I thought I'd ask here for fun while I work on a for loop.

p.s. I'll also share data set with the first person to solve(if solvable)!
More...

Could you repeat the question please?

gaussian · Jan 1, 2020

I'm not understanding the problem - you have a fixed number of rows (7536) and an API request limit of 5000 rows per call. The API is unspecified so I will make the assumption "any number of equities and dates" means it will give you back all data for any equity/date combination in the request.

You'd need 2 requests. Use windows of 5000. Your second request would be just over half full. You did not specify a time constraint, so assuming 1 run per day at EOD you'd only consume ~60 of your 80 API calls. Your API call floor is based entirely on your time constraint, but in this case your floor for EOD is just somewhere south of 60 API calls per month. You won't be able to improve this further.

TheBigShort · Jan 1, 2020

gaussian said:
I'm not understanding the problem - you have a fixed number of rows (7536) and an API request limit of 5000 rows per call.

You'd need 2 requests. Use windows of 5000. Your second request would be just over half full. You did not specify a time constraint, so assuming 1 run per day at EOD you'd only consume ~60 of your 80 API calls.
More...

Okay maybe I did not explain properly.

I am making a call to an API. In this call, I have to specify two parameters, Ticker(s) & Date(s).
If I specify AAPL and 2019-01-01 I will get 1 row back with AAPL Data on 2019. If I specify AAPL, MSFT and 2019-01-01 I will get back 2 rows. 1 row with AAPL Data for 2019-01-01 and 1 row with MSFT data for 2019-01-01. If I specify 2 dates I will now receive back 4 rows.

I can only make 80 calls where the max row per call is 5000 rows.

I am trying to come up with an algorithm that will keep me in the parameters.

I will need way more than 2 calls because I can not specify 2 stocks each having a unique date. Does that make sense?

TheBigShort · Jan 1, 2020

I figured out a workaround by combining some of the 101 calls that have the fewest requests. However, it's a bit of a tricky problem none the less and I would like to have a one-off algorithm where I don't need to modify it after the run.

Gtrade · Jan 2, 2020

Worse case should be 75 calls. That gets you in your 80 call limit.
471 equities x 10 dates per call
741 dates/10 dates per call = 75 calls. Last call would 471 equities x 1 date.

To further reduce the calls here is an algorithm:
1. Create the above call data sets (75)
2. In each data set eliminate the equities that don't have a matching request date from that call data set.
3. If all equities were reduced from a call data set, eliminate that call.
4. Combine remaining data sets. A few approaches could be done. Iterative would be the most optimal. A simple approach could be this:. If call data set 1 had been reduced to say 60 equities for those 10 dates. (600 rows), and call data set 2 had been reduced to 70 equities for it's 10 dates (700 rows). This would then give you a call data set of 130 equities x 20 dates ( 2600 rows). Eliminate any duplicate equities from that combined call data set. For example if there was 20 duplicates, then that call set would reduced down to 110 equities x 20 dates (2200 rows). Keep adding other call data sets to that combined data set and reducing duplicate equities until you don't exceed the 5000 row limit. Eliminate call data sets that you have combined.

Log in or Sign up

Need Help with A simple math problem

TheBigShort

dataET.csv

TheBigShort

tommcginnis

Illini Trader

gaussian

TheBigShort

TheBigShort

Gtrade