My data consist of like 20 million rows of this: T33013 M000 D 431630 X 431629 1000 M003 D 431571 A 431665S 100 67272 1834000 M006 A 431666S 2600 1027 1176000 D 430996 In which program could I sort it the way that it would look like this: 33013000 D 431630 33013000 X 431629 1000 33013003 D 431571 33013003 A 431665S 100 67272 1834000 33013006 A 431666S 2600 1027 1176000 33013006 D 430996 So the that every action a,b,c,d would get a column with the Previous T and M number. And I could sort out different rows. Please help if you can thanks!
Maybe Python if its easier? Can you also make the script that way, that all the rows starting with S,O,R,H,B and Q would be deleted. Thanks in advance!
Code: import sys fin = open(sys.argv[1], 'r') skipLn = ['T','M','S','O','R','H','B','Q'] while 1: line = fin.readline() if not line: break; if line[:1] == 'T': bigTS = line[1:].rstrip() if line[:1] == 'M': milliTS = line[1:].rstrip() if line[:1] not in skipLn: print '%s%s %s' % (bigTS, milliTS, line[:].rstrip()) fin.close()
Im getting an error.. It says invalid syntax with red on the %s' Can you help? If my file is called: test.txt how should I open it in Python and run the script? Thanks!
What version of Python are you running, under what operating system? I tested it on Python 2.6.8 under Cygwin/Win7. Copy the script to a file with the extension ".py" e.g. test.py Then call it from the command line like this: python test.py test.txt See the attached gif for an example of how to do this.
Im a total beginner in programming.. I just need to edit my files to that new order Im running python 3.3.0 under Win32/Win7. I made the test.py from the script using python. I don´t know how to command a file. Like where should the file be in for example C:/Python33/test.txt I appreciate your help
Thanks It worked! I really appreciate your help! Would it be possible to add to that script that it would delete rows that don´t for example have the number "24311" or make another simple one for the new txt file. And a harder one: is it possible to make a script that would delete all rows that don´t have a same number line than all the rows containing A and 24311. Like this: 46138100 A 4568834S 35000 24311 24280 46139111 A 4569028S 2000 24311 24520 46138350 X 4568834 32823978 X 4480746 46140239 D 4569028 32823978 D 324847 So it would delete the rows without the same number line: 46138100 A 4568834S 35000 24311 24280 46139111 A 4569028S 2000 24311 24520 46138350 X 4568834 46140239 D 4569028