FORUMS BROKERS SOFTWARE
Home
 
    Forums > Technically Speaking > Programming > Calling all C++ programmers


Reply
 
Thread Tools
Old Jul 26th, 2011, 08:55 PM   #1
Maverick1
 
 
Join Date: Jun 2002
Posts: 528
Question related to data mining and backtesting: what's the best container to use with the typical csv/comma delineated index futures?

Let's assume that the basic data components, i.e., open, high, low, close, price, volume are part of a structure or public class. Then should it be a vector of structures? Or a set, or a map?

I'm wondering especially about how to deal with date ranges and times when doing basic analysis. Typically date is of type string in the csv file. So does it need to be converted for analysis to be possible? Or is there a workaround. Say for example, I wanted to read in data from a csv file and then find the low price over a given range of dates. That sort of thing.
    Quote
Old Jul 26th, 2011, 09:08 PM   #2
rosy2
 
 
Join Date: Aug 2006
Posts: 1,399
list of bar objects

and its delimited not delineated
    Quote
Old Jul 26th, 2011, 09:21 PM   #3
Craig66
 
 
Join Date: Sep 2006
Location: Auckland, NZ
Posts: 452
Depends what you want to do with the objects, if you want to index them by time then you're going to have to stick them in a map, however this will be more expensive than an unordered container.

A list is a good idea if your going to be doing a lot of adds and removes from the middle of the container, but I'm not sure why you would want to do that.

The best option IMO is either a vector or a deque, these containers will allow you to index the collection by numerical index, add objects at the end cheaply (in the vector case), or at both ends (in the deque case). Standard min/max functions will work over any container as long as you supply an appropriate functor.
    Quote
Old Jul 26th, 2011, 09:43 PM   #4
keyser1
 
 
Join Date: Jun 2005
Location: Manhattan
Posts: 148
have you thought of using a database? sql server express or ms access?

'find me the lowest price within this date range' is easy to do in sql.
Its also relatively easy to do in c# .net using linq -- a bunch of technologies to learn, but speed of development will be much faster in .net; execution speed will be slower than c++, but that can be solved by having a faster machine.

The answer to your question really depends on
1. How fast do queries have to be
2. How much data is there
3. How often are you adding data
4. How often are you querying data
    Quote
Old Jul 26th, 2011, 11:21 PM   #5
gtor514
 
 
Join Date: Mar 2005
Posts: 217
Most likely you will use one of the sequence containers (list, deque, vector) to store your time,open,high,low,close,volume object. If you don't need to alot of inserting use the vector or deque. If you need to do a lot of accessing use the vector. Just set up a test application from which to do some "profiling" of each of the containers. You can change the container used in your test app in just a few lines of code. That's the beauty of c++.

I store my data objects in a vector because I do a lot of accessing. As for the time stamp from the .csv file. I read the time string into my own custom time class that stores each of the date component and time component into two integers. As a starting point, you could store the time into a single long variable, which could be the Unix times (seconds past the 1970 epoch).

1.) parse the string of the time stamp from the csv file into the date/time components (year, month, day, hour, min, sec)
2.) create a tm struct (see ctime time.h) from time components
3.) use mktime function to convert tm struct to time_t value which is just a long integer.
4.) use the time_t as the time in your object.
    Quote
Old Jul 27th, 2011, 10:22 AM   #6
Maverick1
 
 
Join Date: Jun 2002
Posts: 528
Quote:
Quote from Craig66:

Depends what you want to do with the objects, if you want to index them by time then you're going to have to stick them in a map, however this will be more expensive than an unordered container.

A list is a good idea if your going to be doing a lot of adds and removes from the middle of the container, but I'm not sure why you would want to do that.

The best option IMO is either a vector or a deque, these containers will allow you to index the collection by numerical index, add objects at the end cheaply (in the vector case), or at both ends (in the deque case). Standard min/max functions will work over any container as long as you supply an appropriate functor.
Agree with your thought on the list. I use a vector of structure objects with my 1 min data file (spans a year). Call the object 'Bar' for example, containing: date time open high low close volume, where date and time are strings.

From there, for sorting, I've tried using a set and sorting it using a simple functor. I'm running into some trouble however when I try to select a date range because the dates are stored in the structure object as a string.

Is there a way to iterate over the vector using the date only? Do I have to build a separate index to do this like mentioned above?
    Quote
 
Reply
Thread Tools

Forum Jump



   Conduct Rules   Privacy Policy   Sitemap Copyright © 2014, Elite Trader. All rights reserved.   

WHILE YOU'RE HERE, TAKE A MINUTE TO VISIT SOME OF OUR SPONSORS:
Advantage Futures
Futures Trading & Clearing
AMP Global Clearing
Futures and FX Trading
Collective2
Automated Trading Services
CTS
Futures Trading Software
dom993trading.com
NinjaTrader Consulting
eSignal
Trading Software Provider
FXCM
Forex Trading Services
Global Futures
Futures, Options & FX Trading
Interactive Brokers
Pro Gateway to World Markets
JC Trading Group
Direct Access Trading
MB Trading
Direct Access Trading
NinjaTrader
Trading Software Provider
optionshouse
Option Trading & Education
Rithmic
Futures Trade Execution Platform
SpeedTrader
Direct Access Trading
SpreadProfessor
Spread Trading Instruction
thenut-trader.com
FX, Gold, & Stock Signals
TradersStudio
System Building & Backtesting
Tradier
Equity and Options Trading
Trading Technologies
Trading Software Provider