i need a tickdata file (.txt) edited, after having done some work with it in wordpad i am stuck and have no clue how to continue. i have tried excel but 1. it has an limit of max 65000 rows and 2. after having it edited and saved it shows me strange signs in the file after opening them with wordpad. i guess it has to do with the formatting from excel or open office back to txt or csv file. the files are in following format 20101115 09:31:47.373 6689.0 2 20101115 09:31:47.374 6689.0 1 20101115 09:31:47.375 6689.0 1 20101115 09:31:47.376 6689.0 1 20101115 09:31:47.377 6689.0 1 to be able to import them i need them to be in following format 20101115, 09:31:47 ,6689.0, 2 20101115, 09:31:47 ,6689.0, 1 20101115, 09:31:47 ,6689.0, 1 20101115, 09:31:47 ,6689.0, 1 20101115, 09:31:47 ,6689.0, 1 this means i need comma's behind each number exept the last, and the dot in the time format needs to be deleted any idea how i could do this with a large tickfile ? EDIT: this might be more clear http://img816.imageshack.us/img816/3352/1activewindow.png
you could probably make it work with sed on any *nix machine, the substitution regex might take some playing with though
awk -F " " '{print $1","$2","$3","$4}' myfile.txt http://gnuwin32.sourceforge.net/ - if you're running windows.
Thank you all @ LeeD I am using OpenOffice Calc, will that do ? I am not having much succes with it @ dcvtss what is a *nix machine ? Sorry i am kinda noob @ loltrader how exactly is this parcing done, or is this a command used as a macro ?
It could... In Excel you can got to top menu->File->Open... On the dialogue where you are supposed to find a file you can choose what kind of file you are opening. You choose "Text file" at the bottof teh dialogue, then find teh file you want to convert and click "Open" button. Then there are a few self-explanatory dialogue options where you explain that the file is "space-separated". In the end you save file as "SCV". Something like this might work in OpenOffice. This is a command line. loltrader linked free software that makes this command-line tool availabel on Windows.
Why would you ever drop the sub-second timestamp? That's extremely valuble, if you are going to drop, why not combine or drop all but one fill per second? I'm no programmer but I can tell you that sub-second prints make a huge difference with regards to a backtest/simulation.