HOME FORUMS BROKERS SOFTWARE BOOKS CONTACT US
Elite Trader Your Account  •  Become a Member  •  Help  •  Search    
    Forums ›› Technically Speaking ›› Programming ›› Data storage for backtesting  


Post A Reply
    Page 1 of 4:   1  2  3  4  
calhawk01
 

Registered: Jun 2009
Posts: 72

 

08-24-12 08:46 PM

Hi looking for some opinions on what the most efficient way to store 1 min data in a database and then fetching the data using a script. Script should also be able to recompile the 1 min data to 5 min etc. And the bactesting will be done on a website. Backtesting period = 10+ years

What database?
What language should the script be written in?

    Edit/Delete Quote Complain
2rosy
 

Registered: May 2012
Posts: 335

 

08-24-12 08:52 PM


Quote from calhawk01:



What database?
What language should the script be written in?



for 1 minute data...

makes no difference
makes no difference

    Edit/Delete Quote Complain
calhawk01
 

Registered: Jun 2009
Posts: 72

 

08-24-12 08:55 PM


Quote from 2rosy:

for 1 minute data...

makes no difference
makes no difference



Don't you think it would make a difference depending on what language we use? I'm talking how much time it would take to recompile the 1 min data to 5 min etc, and then backtesting the variables etc.

    Edit/Delete Quote Complain
2rosy
 

Registered: May 2012
Posts: 335

 

08-24-12 09:51 PM


Quote from calhawk01:

Don't you think it would make a difference depending on what language we use? I'm talking how much time it would take to recompile the 1 min data to 5 min etc, and then backtesting the variables etc.



you're a consultant's wet dream. language makes no difference; you're not recompiling anything.

    Edit/Delete Quote Complain
Steven.Davis
 

Registered: Jun 2010
Posts: 307

 

08-25-12 01:57 PM

Any SQL database can aggregate 1 min data to 5 min data either on-the-fly using a view or stored using a stored procedure.

    Edit/Delete Quote Complain
PocketChange
 

Registered: Jul 2008
Posts: 2036

 

08-25-12 03:48 PM

We store 1 minute bars, 1 second bars and 25ms bars for many markets from 2000 to present.

Our one minute bars for equities is only about 600gb stored and indexed in sql db's. Obviously the second and ms are substantially larger data sets (50+ TB's and growing)

Your I/O bound when dealing with these types of data volumes and structures. Just copying 1TB of data is time consuming and taxes SATA3 limits. Traditional Fault tolerance and recovery are not realistic options. Traditional Big server / multi tb drive arrays do not service the load well nor scale.

Our solution was building out a farm of sql appliances and feed handlers with infiniband and breaking up the historic data sets into 500GB containers. The data containers are replicated across a minimum of 3 appliances and the collective pool of appliances maintains a cache of 10% of the repository in memory. Kind of our own Hadoop / map reduce but for sql tick data.

This redundancy not only protects the data but provides 3 to N x the I/O.
Queries can be processed in parallel... Different indexes can be maintained based on purpose. Different views and schemas can be managed without impacting the repository. Our attempt at a self healing and updating data vault.

Your 600GB or so of historic 1 minute bars will quickly occupy 10x the raw space based on replication and managing different views and schemas.

For example suppose you want to maintain a portfolio view of the S&P 500 and its composites all adjusted for splits and dividends during RTH's.

A subset of optimized tables are created from the repository master and maintained by triggers. The indexes are different, the views are custom and the I/O distribution is optimized for feeding MatLab.

Matlab is optimized to use GPU's (400 + cores) accessing an inmemory sql db also optimized to use GPU's for virtualizing its opcodes and queries. As a result this specialized portfolio application can run in real time with 25ms precision to both real time market data and its historic data.

This is a huge undertaking to do right both from an infrastructure expense plus all the coding and data management to get down to tick precision.

One Minute Bars should be much lighter and easier but you'll inevitably want to query higher precision.

    Edit/Delete Quote Complain
    Page 1 of 4:   1  2  3  4  
Post A Reply


Receive an email whenever a new post is added to this thread by subscribing to it.
 
Rate This Thread:

Forum Jump:
 

 

   Conduct Rules  -  Privacy Policy  -  Day Trader -  Day Trader Forum -  Best Trading Software -  Sitemap Copyright © 2013, Elite Trader. All rights reserved.    
 
WHILE YOU'RE HERE, TAKE A MINUTE TO VISIT SOME OF OUR SPONSORS:
Advantage Futures
Futures Brokerage & Clearing
AMP Global Clearing
Futures and FX Trading
Bright Trading
Professional Equities Trading
CTS
Futures Trading Software
DaytradingBias.com
Professional Trading Analytics
ECHOtrade
Professional Trading Firm
eSignal
Trading Software Provider
FXCM
Forex Trading Services
Global Futures
Futures, Options & FX Trading
Interactive Brokers
Pro Gateway to World Markets
JC Trading Group
Direct Access Trading
MB Trading
Direct Access Trading
MultiCharts
Trading Software Provider
NinjaTrader
Trading Software Provider
OANDA
Currency Trading
optionshouse
Option Trading & Education
Rithmic
Futures Trade Execution Platform
SpeedTrader
Direct Access Trading
SpreadProfessor
Spread Trading Instruction
thinkorswim by TD Ameritrade
Direct Access TradingAdvertisement
TradersStudio
System Building & Backtesting
Trading Technologies
Trading Software Provider
Trend Following
Trading Systems Provider