Database organization

Discussion in 'App Development' started by cjbuckley4, Oct 18, 2014.

  1. how do you test looped code in non vector-based form in R? It must be taking an eternity to test a strategy algorithm and run even one day's worth of tick data over such algorithm in R. But I am curious how you would do that in R.

     
    #51     Nov 23, 2014
  2. a looped backtester in R??? Im not THAT patient!! lol... I use R mostly for modelling on data samples, and if possible I'll try to use python first...

    I've done a looped backtester in Python and it was too SLLLLOOOOWWWW for my taste...
    now I have the looped backtester implemented in C... and I have 2 backtesters in python one that is vectorized for exploratory analysis and one that runs loops to prototype alternate-exit-plans for the trades that I already got on the db from the C backtester...
     
    #52     Nov 23, 2014
  3. another technical thread with posters developing arcane software mish mash solutions without any idea how itll help them generate alpha.
     
    #53     Nov 23, 2014
  4. Butterfly

    Butterfly

    very good point,
     
    #54     Nov 23, 2014
  5. Butterfly

    Butterfly

    you would if you had any programming experience, but you don't

    technical chartists imagining "software" development is not the same as actual financial engineering for the big boys,
     
    #55     Nov 23, 2014
  6. cjbuckley4

    cjbuckley4

    In my experience using MATLAB and R for minute bar and higher resolution backtests, they work pretty well for these discrete intervals where you calculate a signal and a cumulative return for each period. I'm not an algorithms genius, but I think it would be pretty hard to do event driven backtesting on tick data with vectorized languages like these. MATLAB seems to perform a little better on loops, but that's not something I've tested, just an opinion from using both for different classes. I like both. I've never used Python in the manner I use MATLAB and R.

    I've tried to design my little event driven Backtesting platform so that it's rather language agnostic among my choice of languages. For example, the loop that loops over my data in the main program is in C#, but inside that loop you can call code to evaluate a new tick in a few different languages. I'm not sure really how much overhead is involved in moving the data into MATLAB or R...I'm not claiming this approach will make them faster, I just wanted the flexibility to do event driven backtests from one platform with the same logic utilizing both lower level languages and high level languages like R and MATLAB.
     
    #56     Nov 23, 2014
    eusdaiki likes this.
  7. I am sure some of the Python lovers here hate to hear that, too slow...

    So, that is the point I tried to make, to run a robust backtest forget R and forget Python or any interpreted language (though I think Python could be optimized to run a rudimentary backtest though). R and Python are good to analyze a subset of data/results and investigate its statistical nature.

     
    #57     Nov 23, 2014
    eusdaiki likes this.
  8. A point I have been emphasizing from the beginning. The focus should be on strategy ideas, design, testing, and implementation, not on how to store data or where to source data from.

    Please add value by sharing your approach to generating alpha because else little or no value is derived from your post.

     
    #58     Nov 23, 2014
  9. Sounds like you have decent programming skills for your age, good for you.
    Whats wrong with filing on disk in csv, do you need the database structure?
    Loading data with ifstream line at a time or whatever should be fast enough, then you would feed it tick by tick for backtesting.
    Am I missing something?
     
    #59     Dec 4, 2014
    cjbuckley4 and eusdaiki like this.
  10. cjbuckley4

    cjbuckley4

    No, I don't think you're missing anything there really. My very first swing at an event driven Backtesting system when I was too scared to experiment with dbms a while back was pretty much exactly that: I build a file system of the form /year/month/instrument.csv

    Reading the csv's line by line and using each line as a new event for my backtesting system is probably easier and more efficient than a lot of databases, but I didnt really see a way to do any kind of meaningful work on the data outside of Backtesting. I found having my data querable would be important down the line. Maybe there are solutions to make csv's querable, but databases are just the direction I went.

    Some issues I ran into/dreamt up just now using csv's.
    1. I look at data in discrete time intervals often. It seemed like an awful lot of work to have a low level program to loop over a csv, calculate the bars, then throw away and bars that aren't in my interval each time I want data, and then I have to write the program so that it understands the filesystem I've designed and merges all the data correctly together. Thankfully I was using fx data, so I wouldn't need to roll anything, but rolling for futures and options discrete intervals sounds like it would be a nightmare to program this way. I ended up writing a program to transform the ticks into one minute bars and put that csv into each months folder so I had a discrete time csv I could easily convert into any higher timeframe, but things seemed to break a lot with this approach (I was using FX data and it didn't play nicely with time intervals in some ways, such as random ticks during the weekends, really long either missing data or periods of no trading. Several duplicates...it could've been related to the data more than the validity of the approach in this case).
    2. I want to be able to use some queries as sanity checks for my data. Ex:
    -Does my backtest have the same number of ticks as my dataset? (I realize you can easily count csv lines in bash or pretty much and programming language)
    - Do I have intervals without ticks?
    - Do volumes add up?
    - Do I have ticks outside of market hours?
    - Do I have bad ticks?
    - How do I filter/flag bad ticks? maybe the description of a bad tick changes over time, how do I keep a record of what was bad before and good now and vice versa to assess the validity of my approach to flagging them?
    - Etc etc etc.

    I realize you probably suggest csv's for simplicity, but the way I look at it, they seem to add a lot of complexity. I'm by no means saying just using csv's or flat files in General doesn't work, I'm just saying originally I really gravitated toward the simplicity and ease of use of SQL. I think given my current programming ability, time available, etc. letting a dbms take care of the tough stuff will allow me to focus on trading sooner and spend less time correcting my own programs inevitable faults and second guessing if I coded XYZ component of the flat file reading system correctly. In my limited trading system design experience, I've found time and time again it's better to not reinvent the wheel and to pay a little bit out of pocket for simplicity when applicable.
     
    #60     Dec 4, 2014
    eusdaiki likes this.