Java - Storing data in memory for post-runtime access

jtrader33 · Jul 16, 2013

Quote from lwlee:

Correct. In addition, we were actually able to query our cache server (Tomcat container) from a web browser and see data in CSV format. So you could potentially access your data from anywhere on the web.

But at some point if this is not a throw-away project, seriously look at RESTful. Adhering to standards can only make things easier for you in the future.
More...

Thanks for all your suggestions - think I have a pretty good bead on this now.

Eyez · Jul 16, 2013

Create class responsible for reading initial csv and storing.

Then use serialization to save object state and use de-serialization to bring object back to life.

seems pretty straight forward

luckyputanski · Jul 16, 2013

Or you can store all data in proper sql database and hope that db engine will do some caching for you.

Businessman · Jul 16, 2013

If its all in java then you could write a simple rmi server process.
or you could just serialise the data over a tcp socket from the server that is caching the data in memory.

quatron · Jul 17, 2013

Quote from lwlee:

HTTP has POST/GET for retrieving data. Pretty basic web programming.

The nice thing about this approach is that it's FAST and has little overhead.
More...

I strongly advise not to go down this path or any other inter-process communication. Make it simple and it will be very fast. Do not over complicate.
I read gziped csv files with daily tick data. One file per day per underlying. Such a file usually has about a million ticks (quotes and trades). It only takes about a second to read through the whole file. Your strategy will almost surely take more time to process the data than the time spent reading it. Do not optimize what will take only a fraction of the whole backtesting time.
Another point against reading the whole file into the memory is that you will pollute the old generation with lots of data and your garbage collection time will grow significantly. And what will you do when you will start backtesting on a few year's data? It won't scale well.
If you still want to keep all data in memory then look into hot deployable classes. Java can to it easily. Look at various deployment containers, OSGi, etc.

2rosy · Jul 17, 2013

Quote from lwlee:

Have a secondary app that simply caches the data in memory. First app can communicate with the second one through various ways, web services, restful, etc.
More...

I would cache the data too. If you leave your data on disk what happens if you run multiple backtests in parallel that need the same data

emg · Jul 17, 2013

Quote from jtrader33:

- how can it be done in Java? Appreciate any suggestions.
More...

u are hacked!

jtrader33 · Jul 17, 2013

Quote from emg:

u are hacked!
More...

Sorry, not sure what you mean. Can you elaborate?

lwlee · Jul 17, 2013

The idea is to separate your data from your application code. There are advantages like flexibility and running multiple apps against the data.

Complicated?? Here is a HTTP server in like 20 lines of code, using purely java jdk libraries. Go to a web browser and type in "http://localhost:8000/test". You get "This is the response".

package com.example;

import java.io.IOException;
import java.iutputStream;
import java.net.InetSocketAddress;

import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;

public class Test {

public static void main(String[] args) throws Exception {
HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
server.createContext("/test", new MyHandler());
server.setExecutor(null); // creates a default executor
server.start();
}

static class MyHandler implements HttpHandler {
public void handle(HttpExchange t) throws IOException {
String response = "This is the response";
t.sendResponseHeaders(200, response.length());
OutputStream os = t.getResponseBody();
os.write(response.getBytes());
os.close();
}
}
}

Quote from quatron:

I strongly advise not to go down this path or any other inter-process communication. Make it simple and it will be very fast. Do not over complicate.
I read gziped csv files with daily tick data. One file per day per underlying. Such a file usually has about a million ticks (quotes and trades). It only takes about a second to read through the whole file. Your strategy will almost surely take more time to process the data than the time spent reading it. Do not optimize what will take only a fraction of the whole backtesting time.
Another point against reading the whole file into the memory is that you will pollute the old generation with lots of data and your garbage collection time will grow significantly. And what will you do when you will start backtesting on a few year's data? It won't scale well.
If you still want to keep all data in memory then look into hot deployable classes. Java can to it easily. Look at various deployment containers, OSGi, etc.
More...

jtrader33 · Jul 17, 2013

As lwlee suggested, I ended up caching the data in a persistent server app that reads the csv files and then stores the filtered 5min data into an ArrayList<OptionQuote> per day. The backtest app requests a day's worth of data from the server and receives it via objectinputstream* over the socket. It works great - a test that took over two hours previously is now finished in a few minutes (save for the initial data loading into the server). However, to quatron's point about scaleability, the issue I am up against now is managing the memory consumption. One month of csv files (with all 1min data) is 1.0GB and yet my server loaded with just the 5min data is 1.2GB (I only have 24GB of RAM and need to test 36 months of data). I've tried to be careful with my serialized OptionQuote class: OptQuote(long quoteDateTime, double undBid, double undAsk, long expiryDateTime, double strike, char right, double optBid, double optAsk) ...but perhaps I will need to keep each quote as a single String on the server side and then convert the ArrayList<String> to ArrayList<OptionQuote> on the client side. I'd rather not have to do that though since:
1) it will require splitting the string on the server side anyway (to determine if its a 5min interval)
2) an additonal iteration through the entire day of quotes on the client side to convert to ArrayList<OptionQuote> (this is necessary for methods that determine earliest expiry, closest strike, etc.)
3) I'm not certain that it will actually be more efficient from a memory standpoint

Any suggestions on the memory aspect would be helpful. Regardless, thanks for all the input - memory issue aside, I'm pretty pleased that I was able to get something that works well together in a single afternoon + evening.

*In case anyone finds this post in a search and attempts something similar:

There was a huge difference in performance between this...

out = new ObjectOutputStream(connection.getOutputStream());
in = new ObjectInputStream(connection.getInputStream());

and this...

out = new ObjectOutputStream(new BufferedOutputStream(connection.getOutputStream()));
in = new ObjectInputStream(new BufferedInputStream(connection.getInputStream()));

Without the inclusion of BufferedOutputStream/BufferedInputStream, it took ~10 seconds to push my ArrayList objects through the socket (which was no faster than reading from disk). After adding them, the transmission is virtually instantaneous.