Python - Read and split lines from text file into indexes.

Discussion in 'App Development' started by OTM-Options, Apr 28, 2015.

  1. Pandas is taylor made for this kind of processing. I strongly encourage you to use it.

    For example you could do the first two things with one line of code.
    Third thing would be:

    data.NewTotal - data.Total - data.Costs

    Fourth thing would be:

    data[data.Type =="Buy"]

    The couple of hours you might spend understanding how to use pandas for these basic things will pay dividends many times over.

     
    #11     Apr 30, 2015
  2. Just to show how easy it is in pandas, this took me 2 minutes to do everything you wanted.

    I removed the first and third line from your csv. The second remaining line then becomes the header line.

    Code:
    import pandas as pd
    ans=pd.read_csv("temp.csv")
    ans.head(5)
    
    Out[1]:
      Transaction Date Settlement Date Activity Description  \
    0       2014-05-09      2014-05-12                 Sell
    1       2014-05-09      2014-05-12                 Sell
    2       2014-05-08      2014-05-09                  Buy
    3       2014-05-08      2014-05-09                 Sell
    4       2014-05-07      2014-05-08                  Buy
    
                   Description                      Symbol  Quantity  Price  \
    0  POWERSHARES QQQ TR SR 1    CALL QQQ 2014MAY09 86.50        -3   0.16
    1                    APPLE  CALL AAPL 2014MAY09 610.00        -2   0.01
    2  POWERSHARES QQQ TR SR 1    CALL QQQ 2014MAY09 86.50         3   0.30
    3             TESLA MOTORS  CALL TSLA 2014MAY09 240.00        -1   0.01
    4             TESLA MOTORS  CALL TSLA 2014MAY09 240.00         1   0.67
    
      Currency  Total Amount Currency.1
    0      USD         34.29        USD
    1      USD          0.00        USD
    2      USD       -103.70        USD
    3      USD          0.00        USD
    4      USD        -78.20        USD
    
    ans.Quantity*ans.Price
    
    Out[1]:
    0  -0.48
    1  -0.02
    2  0.90
    3  -0.01
    4  0.67
    5  0.98
    6  0.39
    7  -0.37
    8  0.68
    9  NaN
    
    subans=ans[ans['Activity Description']=="Sell"]
    print subans
    
    Out[1]:
      Transaction Date Settlement Date Activity Description  \
    0  2014-05-09  2014-05-12  Sell
    1  2014-05-09  2014-05-12  Sell
    3  2014-05-08  2014-05-09  Sell
    7  2014-05-02  2014-05-05  Sell
    10  2014-04-28  2014-04-29  Sell
    18  2014-04-25  2014-04-28  Sell
    20  2014-04-24  2014-04-25  Sell
    
      Description  Symbol  Quantity  \
    0  POWERSHARES QQQ TR SR 1  CALL QQQ 2014MAY09 86.50  -3
    1  APPLE  CALL AAPL 2014MAY09 610.00  -2
    3  TESLA MOTORS  CALL TSLA 2014MAY09 240.00  -1
    7  KEURIG GREEN MOUNTAIN COM  PUT GMCR 2014MAY02 92.00  -1
    10  APPLE  CALL AAPL 2014MAY02 590.00  -1
    18  BAIDU SPONSORED ADR REPSTG ORD  CALL BIDU 2014APR25 170.00  -1
    20  APPLE  CALL AAPL 2014APR25 560.00  -1
    
      Price Currency  Total Amount Currency.1
    0  0.16  USD  34.29  USD
    1  0.01  USD  0.00  USD
    3  0.01  USD  0.00  USD
    7  0.37  USD  25.79  USD
    10  5.88  USD  576.78  USD
    18  0.04  USD  0.00  USD
    20  8.13  USD  801.78  USD
    
    subans.to_csv("/home/rsc/temp/newfile.csv")
    
    
    I bet that is fewer lines than your script, and probably more readable than say index[3]*index[4]. And it works.

     
    #12     Apr 30, 2015
    OTM-Options and i960 like this.



  3. Thanks globalarbtrader. I'm going to try and avoid any modules, including the math module if I can. Below is a what I have so far, it's 60% complete. I hard coded one line of data into the script and it writes the manipulated line to a new CSV file - now I have to replace that line with the code to open a file and loop through all the lines.


    Python Code
    Code:
    #!/usr/bin/python
    
    csvOut = "TransactionHistoryOutput.csv"
    counter = 1
    split = ","
    join = ","
    
    header = "Row,Transaction Date,Buy/Sell,QTY,Security,Price,Amount,Commission,Total Amount,Currency"
    line = "2014-04-30,2014-05-01,Buy,KEURIG GREEN MOUNTAIN COM,PUT GMCR 2014MAY02 92.00,1,0.68,USD,-79.20,USD"
    
    items = line.split(split)
    if items[2] == "Buy" or items[2] == "Sell" or items[2] == "Expired":
    
       Row= '{0:03d}'.format(counter)
       TransactionDate = items[0]
       BuySell = items[2]
       QTY = items[5]
       Security = items[4]
       Price = items[6]
       TotalAmount = items[8]
       Currency = items[9]
       counter = counter + 33
    
       price = float(Price)
       total = abs(float(TotalAmount))
       com = abs(price * 100 - total)
       Commission = str(com) #Price * 100 - TotalAmount = Commissions
      
       mnt = total - com
       Amount = str(mnt) # TotalAmount - Commission = Amount
      
    dataOutput = [Row + join + TransactionDate + join +  BuySell + join + QTY + join + Security + join + Price + join + Amount + join + Commission + join + TotalAmount + join + Currency]
    dataOutput = ''.join(dataOutput)
    
    def writeFile(data):
       f = open(csvOut,'a')
       f.write(data + "\n")
       f.close()
    
    writeFile(header)
    writeFile(dataOutput)
    print dataOutput
    
    input("\n\nPress the Enter key to exit.")
    

    :)
     
    #13     Apr 30, 2015
  4. Thanks globalarbtrader. I was editing my post #13 when you posted all the code in post #12. I'm going out now and will review it in a few hours.



    :)
     
    #14     Apr 30, 2015
  5. Pandas is
    Pandas is the right tool for the job.
    Avoiding the modules is a mistake.
    Not only is the pandas DataFrame easier to use when manipulating data, it is also faster than your script since it uses Cython to call C libraries in many cases.

    If you want to make your own parser implementation and control all the low level details to make things run faster than pandas, then you'll need to use C/C++
     
    #15     May 2, 2015
  6. ThomasB

    ThomasB

    You really confirm every prejudice one has about php programmers.
    Unbelievable you refuse to use the proper libraries.
     
    #16     May 3, 2015
    eusdaiki likes this.
  7. OK ...... I checked out Pandas and the Python CSV module.

    Pandas
    It doesn't come with the standard Python installation so I couldn't run any tests, and I couldn't find any examples on line that interested me. It is overkill for my needs so I won't bother with it.


    Python CSV module
    It comes with the standard Python installation and lots of examples on line. But the CSV manipulation I require can be accomplished without it and probably just as efficiently. I do plan on using the Python CSV module on my next project.


    Using Python version 2.7.6 and the tasks I wanted my script to perform.
    • No editing of original CSV file. Any lines or empty data fields that can crash the script must be handled in the script.
    • Reverse the lines of the original CSV file before processing. Earliest transactions at the top, not the bottom.
    • Split each line at the commas and assign each column an index[] variable.
    • Delete lines that do not contain certain values such as "Buy", "Sell", or "Expired".
    • Delete unwanted columns and create new columns with values from some of the remaining columns.
    • Add a line counter, new column headers and footer labels.
    • Total the value of some of the columns and get P/L percentage.
    • Format the timestamp from 2014-05-06 to Tue May 06.
    • Change negative values into positive values in the quantity column.
    • Change 0 values to nothing in the debit, credit and commission columns.
    • Output the results to a new CSV file.

    My completed 100% working script using only the datetime module.

    Code:
    #!/usr/bin/python
    # Python version 2.7.6
    
    import datetime
    
    csv_in = "TransactionHistory_22523594.csv"
    csv_reverse = "temp_reverse.csv"
    csv_out = "transaction_history.csv"
    header = "Row,Date,Buy/Sell,QTY,Security,Price,Debit,Credit,Commission,Total Amount,Currency" + '\n'
    footer_commission = 0
    footer_debit = 0
    footer_credit = 0
    footer_total_amount = 0
    counter = 1
    split = ","
    join = ","
    
    def write_file(data,write_to):
      f_out = open(write_to,'a')
      f_out.write(data + "\n")
      f_out.close()
    
    def reverse_file(in_put):
      for line in reversed(list(open(in_put))):
      rows = (line.rstrip())
      write_file(rows,csv_reverse)
    
    reverse_file(csv_in)
    write_file(header,csv_out)
    
    f_in = open(csv_reverse, "r")
    for line in f_in.xreadlines():
      if len(line.strip()) != 0 :
      line = line.strip()
      parts = line.split(split)
      if parts[2] == "Buy" or parts[2] == "Sell" or parts[2] == "Expired":
      row= '{0:03d}'.format(counter)
      transaction_date = parts[0]
      buy_sell = parts[2]
      qty = parts[5]
      security = parts[4]
      price = parts[6]
      if price == "":
      price = "0"
      total_amount = parts[8]
      currency = parts[9]
      transaction_date = datetime.datetime.strptime(transaction_date, "%Y-%m-%d").strftime("%a %b %d")
      qty = abs(int(qty))
      price = float(price)
      total_amount = float(total_amount)
      amount = qty * price * 100
      amount = int(amount)
    
      t = abs(float(total_amount))
      if parts[8] >= "0" and  parts[8] <= "1":
      commission = 0
      else:
      commission = abs(t - amount)
      if total_amount <= 1:
      debit = amount
      credit = 0
      else:
      debit = 0
      credit = amount
      if debit > t:
      credit = debit
      commission = debit
      debit = 0
    
      footer_debit = footer_debit + debit
      footer_credit = footer_credit + credit
      footer_commission = footer_commission + commission
      footer_total_amount = footer_total_amount + total_amount
    
      qty = str(abs(qty))
      price = str(price)
      debit = str(debit)
      if debit == "0":
      debit = ""
      credit = str(credit)
      if credit == "0":
      credit = ""
      commission = str(commission)
      if commission == "0":
      commission = ""
      total_amount = str(total_amount)
      counter = counter + 1
    
      data_out = row + join + transaction_date + join + buy_sell + join + qty + join + security + join \
      + price + join + debit + join + credit + join + commission + join + total_amount + join + currency
    
      print data_out
      write_file(data_out,csv_out)
    
    pl = footer_debit + footer_commission
    pl_percent = (footer_credit - (pl)) / pl * 100
    footer_debit = str(footer_debit)
    footer_credit = str(footer_credit)
    footer_commission = str(footer_commission)
    footer_total_amount = str(footer_total_amount)
    pl_percent = str(pl_percent)
    
    footer = '\n' + join + join + join + join + join + "Subtotal" + join + footer_debit + join + footer_credit + join \
      + footer_commission + join + footer_total_amount + join + currency + '\n' + '\n' + join + join + join \
      + join + join + join + join + "P/L" + join + pl_percent + " %" + join + footer_total_amount + join + currency
    
    write_file(footer,csv_out)
    
    input("\n\nPress the Enter key to exit.")
    


    :)
     
    #17     May 6, 2015
  8. i960

    i960

    Are you seriously adding the 'join' string rather than using the join *function*? That is not the way to do things.
     
    #18     May 6, 2015
    eusdaiki likes this.

  9. Looks like I picked a variable name that closely resembles a function. I assigned the comma to the join and split variable to make the script easier to change to handle different characters in the CSV file.

    Example: I can output using ZZZZZZZ to join the lines instead of a comma. Then input using the same ZZZZZZZ or whatever character(s) I choose to split the lines.



    :)
     
    Last edited: May 6, 2015
    #19     May 6, 2015
  10. Why do you reverse the file into another file instead of doing it in memory (say, on a nested list?)
     
    #20     May 6, 2015