Data sorting help: Nordic ITCH data

Discussion in 'Data Sets and Feeds' started by evira, Jan 6, 2013.

  1. evira

    evira

    My data consist of like 20 million rows of this:

    T33013
    M000
    D 431630
    X 431629 1000
    M003
    D 431571
    A 431665S 100 67272 1834000
    M006
    A 431666S 2600 1027 1176000
    D 430996

    In which program could I sort it the way that it would look like this:

    33013000 D 431630
    33013000 X 431629 1000
    33013003 D 431571
    33013003 A 431665S 100 67272 1834000
    33013006 A 431666S 2600 1027 1176000
    33013006 D 430996

    So the that every action a,b,c,d would get a column with the Previous T and M number. And I could sort out different rows.

    Please help if you can thanks!
     
  2. Perl, AWK, Python.... the list goes on. Which do you prefer? I will write you a script.
     
  3. evira

    evira

    Maybe Python if its easier? Can you also make the script that way, that all the rows starting with S,O,R,H,B and Q would be deleted.

    Thanks in advance!
     
  4. Code:
    import sys
    fin = open(sys.argv[1], 'r')
    skipLn = ['T','M','S','O','R','H','B','Q']
    while 1:
      line = fin.readline()
      if not line: break;
      if line[:1] == 'T': bigTS = line[1:].rstrip()
      if line[:1] == 'M': milliTS = line[1:].rstrip()
      if line[:1] not in skipLn: 
        print '%s%s %s' % (bigTS, milliTS, line[:].rstrip())     
    fin.close()
    
     
  5. evira

    evira

    Im getting an error.. It says invalid syntax with red on the %s' Can you help?

    If my file is called: test.txt how should I open it in Python and run the script?

    Thanks!
     
  6. What version of Python are you running, under what operating system? I tested it on Python 2.6.8 under Cygwin/Win7.

    Copy the script to a file with the extension ".py" e.g. test.py
    Then call it from the command line like this:

    python test.py test.txt

    See the attached gif for an example of how to do this.
     
  7. evira

    evira

    Im a total beginner in programming.. I just need to edit my files to that new order

    Im running python 3.3.0 under Win32/Win7. I made the test.py from the script using python.
    I don´t know how to command a file. Like where should the file be in for example C:/Python33/test.txt

    I appreciate your help
     
  8. evira

    evira

    I got your script working! Thanks a lot!
     
  9. evira

    evira

    Thanks It worked! I really appreciate your help!

    Would it be possible to add to that script that it would delete rows that don´t for example have the number "24311" or make another simple one for the new txt file.

    And a harder one: is it possible to make a script that would delete all rows that don´t have a same number line than all the rows containing A and 24311.

    Like this:
    46138100 A 4568834S 35000 24311 24280
    46139111 A 4569028S 2000 24311 24520
    46138350 X 4568834
    32823978 X 4480746
    46140239 D 4569028
    32823978 D 324847

    So it would delete the rows without the same number line:

    46138100 A 4568834S 35000 24311 24280
    46139111 A 4569028S 2000 24311 24520
    46138350 X 4568834
    46140239 D 4569028