Java - Storing data in memory for post-runtime access

jtrader33 · Jul 16, 2013

That thread title is probably poorly worded but this is the issue:

- I've bought 1 min option data and stored it in csv files

- The actual data required for any given backtest is a small subset of the whole dataset (e.g. 5 min quotes derived from the 1 min) and should easily fit into RAM

- Rather than read through the entirety of the csv files on each backtest, I'd like to store only the data I need into ArrayLists and then have the backtesting code freely access it

- The catch is that I want to be able to make substantial changes to the backtesting code in between runs (not just simple parameter changes)

I know Amibroker does this - how can it be done in Java? Appreciate any suggestions.

lwlee · Jul 16, 2013

Have a secondary app that simply caches the data in memory. First app can communicate with the second one through various ways, web services, restful, etc.

jtrader33 · Jul 16, 2013

Quote from lwlee:

Have a secondary app that simply caches the data in memory. First app can communicate with the second one through various ways, web services, restful, etc.
More...

Thanks for the post. I figured that is the big picture solution, but have no idea on where to start to make it happen practically. Can you share any specific Java tools that I should look into?

hft · Jul 16, 2013

Quote from jtrader33:

That thread title is probably poorly worded but this is the issue:

- I've bought 1 min option data and stored it in csv files

- The actual data required for any given backtest is a small subset of the whole dataset (e.g. 5 min quotes derived from the 1 min) and should easily fit into RAM

- Rather than read through the entirety of the csv files on each backtest, I'd like to store only the data I need into ArrayLists and then have the backtesting code freely access it

- The catch is that I want to be able to make substantial changes to the backtesting code in between runs (not just simple parameter changes)

I know Amibroker does this - how can it be done in Java? Appreciate any suggestions.
More...

Might not be exactly what you're looking for, but you could serialize arraylists into files for future access.
Code:
		final ArrayList<Integer> list = new ArrayList<>();
		list.add(1);
		list.add(2);
		
		final FileOutputStream fileOut = new FileOutputStream("/tmp/test.ser");
		final ObjectOutputStream out = new ObjectOutputStream(fileOut);
		out.writeObject(list);
		out.close();
                fileOut.close();

		final FileInputStream fileIn = new FileInputStream("/tmp/test.ser");
		final ObjectInputStream in = new ObjectInputStream(fileIn);
		System.out.println("List content: " + in.readObject());
		in.close();
		fileIn.close();
Personally, in the example you give, I would just add 20% to my storage space and store 5 min intervals permanently as csv's. Understandably that does not scale to other use cases.

lwlee · Jul 16, 2013

Are you saying you are a complete beginner with Java?

Just keep it simple. First program simply reads your data into memory and stays alive. Your 2nd program that you will be constantly bouncing, just needs to figure out how to communicate with first program. Basic core java libraries should have what you want.

I would suggest google "RESTful" for the simplest examples you can find. Using html request/response protocol is nice and simple approach.

Edit: let me qualify that, RESTful isn't part of the core java libraries but it's a standard that is prevalent so if you want to spend a little extra time, it would be good to know. Otherwise, basic html req/resp is part of the java libraries.

Quote from jtrader33:

Thanks for the post. I figured that is the big picture solution, but have no idea on where to start to make it happen practically. Can you share any specific Java tools that I should look into?
More...

jtrader33 · Jul 16, 2013

Quote from hft:

Might not be exactly what you're looking for, but you could serialize arraylists into files for future access.

Code:

final ArrayList<Integer> list = new ArrayList<>(); list.add(1); list.add(2); final FileOutputStream fileOut = new FileOutputStream("/tmp/test.ser"); final ObjectOutputStream out = new ObjectOutputStream(fileOut); out.writeObject(list); out.close(); fileOut.close(); final FileInputStream fileIn = new FileInputStream("/tmp/test.ser"); final ObjectInputStream in = new ObjectInputStream(fileIn); System.out.println("List content: " + in.readObject()); in.close(); fileIn.close();

Personally, in the example you give, I would just add 20% to my storage space and store 5 min intervals permanently as csv's. Understandably that does not scale to other use cases.
More...
Thanks for that. I suppose the big drawback there would be the difference in speed between accessing the hard disk vs. reading arraylist from memory; I haven't quantified it but imagine it would be material. That said, your suggestion is similar to something else I considered...

- Currently, the csv files are massive with all options for a given underlying in a single daily file. So number of quotes is on the order of [# of expiries] x [# strikes at each expiry] x [390 mins/day] x [2 (calls/puts)]

- Since I'm not analyzing vol surfaces or the like at this point, what I thought about doing was cutting up each file so that each option had its own daily file (e.g. AAPL_20130719_445.00C_20130627.csv, AAPL_20130719_445.00C_20130628.csv)

- From there, I would create a tradeDate/expiry/strike mapping file for each underlying that I could load into a hashmap (or similar) and quickly determine things like first expiration date, ATM strike, ATM - 2 stdevs strike, etc. Then with only 390 lines per file, reading the appropriate csv for prices should be pretty quick.

I couldn't decide if that approach was silly or not so I decided to see how slow the large files were to work with first...and the answer is too slow. Anyhow, I'm grateful for the suggestions...being self taught and not having a sounding board makes this input very useful in thinking through things.

jtrader33 · Jul 16, 2013

Quote from lwlee:

Are you saying you are a complete beginner with Java?

Just keep it simple. First program simply reads your data into memory and stays alive. Your 2nd program that you will be constantly bouncing, just needs to figure out how to communicate with first program. Basic core java libraries should have what you want.

I would suggest google "RESTful" for the simplest examples you can find. Using html request/response protocol is nice and simple approach.
More...

I've programmed my own ATS and various other tools, so wouldn't say I'm a complete beginner...but I only learned what was required for those projects and never encountered this particular issue. Repeatedly re-entering the 1st program is the concept most foreign to me; previously I've only shared data between apps launched at the same time or spawned by another. Thanks for the RESTful suggestion...simple is what I was hoping for...will give it a look.

lwlee · Jul 16, 2013

It shouldn't be foreign at all. Analogy is similar to you inputting a web link into the browser and the browser returning data, a webpage, back to you. HTTP has POST/GET for retrieving data. Pretty basic web programming.

The nice thing about this approach is that it's FAST and has little overhead. I worked on a project where we used this approach to cache a ton of product/pricing data into memory for lightning fast retrieval. We used a Tomcat container for easier maintenance but prob overkill for you.

Quote from jtrader33:

I've programmed my own ATS and various other tools, so wouldn't say I'm a complete beginner...but I only learned what was required for those projects and never encountered this particular issue. Repeatedly re-entering the 1st program is the concept most foreign to me; previously I've only shared data between apps launched at the same time or spawned by another. Thanks for the RESTful suggestion...simple is what I was hoping for...will give it a look.
More...

jtrader33 · Jul 16, 2013

Quote from lwlee:

It shouldn't be foreign at all. Analogy is similar to you inputting a web link into the browser and the browser returning data, a webpage, back to you. HTTP has POST/GET for retrieving data. Pretty basic web programming.

The nice thing about this approach is that it's FAST and has little overhead. I worked on a project where we used this approach to cache a ton of product/pricing data into memory for lightning fast retrieval. We used a Tomcat container for easier maintenance but prob overkill for you.
More...

Ah, I see. So if I'm understanding you correctly, I simply set up a connection between the two apps similar to all the other tcp sockets I'm using for datafeed/broker/etc. except that in this instance it's likely better to use http because it's simpler and perhaps faster...that sound about right?

lwlee · Jul 16, 2013

Correct. In addition, we were actually able to query our cache server (Tomcat container) from a web browser and see data in CSV format. So you could potentially access your data from anywhere on the web.

But at some point if this is not a throw-away project, seriously look at RESTful. Adhering to standards can only make things easier for you in the future.

Quote from jtrader33:

Ah, I see. So if I'm understanding you correctly, I simply set up a connection between the two apps similar to all the other tcp sockets I'm using for datafeed/broker/etc. except that in this instance it's likely better to use http because it's simpler and perhaps faster...that sound about right?
More...