Pandas is taylor made for this kind of processing. I strongly encourage you to use it. For example you could do the first two things with one line of code. Third thing would be: data.NewTotal - data.Total - data.Costs Fourth thing would be: data[data.Type =="Buy"] The couple of hours you might spend understanding how to use pandas for these basic things will pay dividends many times over.
Just to show how easy it is in pandas, this took me 2 minutes to do everything you wanted. I removed the first and third line from your csv. The second remaining line then becomes the header line. Code: import pandas as pd ans=pd.read_csv("temp.csv") ans.head(5) Out[1]: Transaction Date Settlement Date Activity Description \ 0 2014-05-09 2014-05-12 Sell 1 2014-05-09 2014-05-12 Sell 2 2014-05-08 2014-05-09 Buy 3 2014-05-08 2014-05-09 Sell 4 2014-05-07 2014-05-08 Buy Description Symbol Quantity Price \ 0 POWERSHARES QQQ TR SR 1 CALL QQQ 2014MAY09 86.50 -3 0.16 1 APPLE CALL AAPL 2014MAY09 610.00 -2 0.01 2 POWERSHARES QQQ TR SR 1 CALL QQQ 2014MAY09 86.50 3 0.30 3 TESLA MOTORS CALL TSLA 2014MAY09 240.00 -1 0.01 4 TESLA MOTORS CALL TSLA 2014MAY09 240.00 1 0.67 Currency Total Amount Currency.1 0 USD 34.29 USD 1 USD 0.00 USD 2 USD -103.70 USD 3 USD 0.00 USD 4 USD -78.20 USD ans.Quantity*ans.Price Out[1]: 0 -0.48 1 -0.02 2 0.90 3 -0.01 4 0.67 5 0.98 6 0.39 7 -0.37 8 0.68 9 NaN subans=ans[ans['Activity Description']=="Sell"] print subans Out[1]: Transaction Date Settlement Date Activity Description \ 0 2014-05-09 2014-05-12 Sell 1 2014-05-09 2014-05-12 Sell 3 2014-05-08 2014-05-09 Sell 7 2014-05-02 2014-05-05 Sell 10 2014-04-28 2014-04-29 Sell 18 2014-04-25 2014-04-28 Sell 20 2014-04-24 2014-04-25 Sell Description Symbol Quantity \ 0 POWERSHARES QQQ TR SR 1 CALL QQQ 2014MAY09 86.50 -3 1 APPLE CALL AAPL 2014MAY09 610.00 -2 3 TESLA MOTORS CALL TSLA 2014MAY09 240.00 -1 7 KEURIG GREEN MOUNTAIN COM PUT GMCR 2014MAY02 92.00 -1 10 APPLE CALL AAPL 2014MAY02 590.00 -1 18 BAIDU SPONSORED ADR REPSTG ORD CALL BIDU 2014APR25 170.00 -1 20 APPLE CALL AAPL 2014APR25 560.00 -1 Price Currency Total Amount Currency.1 0 0.16 USD 34.29 USD 1 0.01 USD 0.00 USD 3 0.01 USD 0.00 USD 7 0.37 USD 25.79 USD 10 5.88 USD 576.78 USD 18 0.04 USD 0.00 USD 20 8.13 USD 801.78 USD subans.to_csv("/home/rsc/temp/newfile.csv") I bet that is fewer lines than your script, and probably more readable than say index[3]*index[4]. And it works.
Thanks globalarbtrader. I'm going to try and avoid any modules, including the math module if I can. Below is a what I have so far, it's 60% complete. I hard coded one line of data into the script and it writes the manipulated line to a new CSV file - now I have to replace that line with the code to open a file and loop through all the lines. Python Code Code: #!/usr/bin/python csvOut = "TransactionHistoryOutput.csv" counter = 1 split = "," join = "," header = "Row,Transaction Date,Buy/Sell,QTY,Security,Price,Amount,Commission,Total Amount,Currency" line = "2014-04-30,2014-05-01,Buy,KEURIG GREEN MOUNTAIN COM,PUT GMCR 2014MAY02 92.00,1,0.68,USD,-79.20,USD" items = line.split(split) if items[2] == "Buy" or items[2] == "Sell" or items[2] == "Expired": Row= '{0:03d}'.format(counter) TransactionDate = items[0] BuySell = items[2] QTY = items[5] Security = items[4] Price = items[6] TotalAmount = items[8] Currency = items[9] counter = counter + 33 price = float(Price) total = abs(float(TotalAmount)) com = abs(price * 100 - total) Commission = str(com) #Price * 100 - TotalAmount = Commissions mnt = total - com Amount = str(mnt) # TotalAmount - Commission = Amount dataOutput = [Row + join + TransactionDate + join + BuySell + join + QTY + join + Security + join + Price + join + Amount + join + Commission + join + TotalAmount + join + Currency] dataOutput = ''.join(dataOutput) def writeFile(data): f = open(csvOut,'a') f.write(data + "\n") f.close() writeFile(header) writeFile(dataOutput) print dataOutput input("\n\nPress the Enter key to exit.")
Thanks globalarbtrader. I was editing my post #13 when you posted all the code in post #12. I'm going out now and will review it in a few hours.
Pandas is Pandas is the right tool for the job. Avoiding the modules is a mistake. Not only is the pandas DataFrame easier to use when manipulating data, it is also faster than your script since it uses Cython to call C libraries in many cases. If you want to make your own parser implementation and control all the low level details to make things run faster than pandas, then you'll need to use C/C++
You really confirm every prejudice one has about php programmers. Unbelievable you refuse to use the proper libraries.
OK ...... I checked out Pandas and the Python CSV module. Pandas It doesn't come with the standard Python installation so I couldn't run any tests, and I couldn't find any examples on line that interested me. It is overkill for my needs so I won't bother with it. Python CSV module It comes with the standard Python installation and lots of examples on line. But the CSV manipulation I require can be accomplished without it and probably just as efficiently. I do plan on using the Python CSV module on my next project. Using Python version 2.7.6 and the tasks I wanted my script to perform. No editing of original CSV file. Any lines or empty data fields that can crash the script must be handled in the script. Reverse the lines of the original CSV file before processing. Earliest transactions at the top, not the bottom. Split each line at the commas and assign each column an index[] variable. Delete lines that do not contain certain values such as "Buy", "Sell", or "Expired". Delete unwanted columns and create new columns with values from some of the remaining columns. Add a line counter, new column headers and footer labels. Total the value of some of the columns and get P/L percentage. Format the timestamp from 2014-05-06 to Tue May 06. Change negative values into positive values in the quantity column. Change 0 values to nothing in the debit, credit and commission columns. Output the results to a new CSV file. My completed 100% working script using only the datetime module. Code: #!/usr/bin/python # Python version 2.7.6 import datetime csv_in = "TransactionHistory_22523594.csv" csv_reverse = "temp_reverse.csv" csv_out = "transaction_history.csv" header = "Row,Date,Buy/Sell,QTY,Security,Price,Debit,Credit,Commission,Total Amount,Currency" + '\n' footer_commission = 0 footer_debit = 0 footer_credit = 0 footer_total_amount = 0 counter = 1 split = "," join = "," def write_file(data,write_to): f_out = open(write_to,'a') f_out.write(data + "\n") f_out.close() def reverse_file(in_put): for line in reversed(list(open(in_put))): rows = (line.rstrip()) write_file(rows,csv_reverse) reverse_file(csv_in) write_file(header,csv_out) f_in = open(csv_reverse, "r") for line in f_in.xreadlines(): if len(line.strip()) != 0 : line = line.strip() parts = line.split(split) if parts[2] == "Buy" or parts[2] == "Sell" or parts[2] == "Expired": row= '{0:03d}'.format(counter) transaction_date = parts[0] buy_sell = parts[2] qty = parts[5] security = parts[4] price = parts[6] if price == "": price = "0" total_amount = parts[8] currency = parts[9] transaction_date = datetime.datetime.strptime(transaction_date, "%Y-%m-%d").strftime("%a %b %d") qty = abs(int(qty)) price = float(price) total_amount = float(total_amount) amount = qty * price * 100 amount = int(amount) t = abs(float(total_amount)) if parts[8] >= "0" and parts[8] <= "1": commission = 0 else: commission = abs(t - amount) if total_amount <= 1: debit = amount credit = 0 else: debit = 0 credit = amount if debit > t: credit = debit commission = debit debit = 0 footer_debit = footer_debit + debit footer_credit = footer_credit + credit footer_commission = footer_commission + commission footer_total_amount = footer_total_amount + total_amount qty = str(abs(qty)) price = str(price) debit = str(debit) if debit == "0": debit = "" credit = str(credit) if credit == "0": credit = "" commission = str(commission) if commission == "0": commission = "" total_amount = str(total_amount) counter = counter + 1 data_out = row + join + transaction_date + join + buy_sell + join + qty + join + security + join \ + price + join + debit + join + credit + join + commission + join + total_amount + join + currency print data_out write_file(data_out,csv_out) pl = footer_debit + footer_commission pl_percent = (footer_credit - (pl)) / pl * 100 footer_debit = str(footer_debit) footer_credit = str(footer_credit) footer_commission = str(footer_commission) footer_total_amount = str(footer_total_amount) pl_percent = str(pl_percent) footer = '\n' + join + join + join + join + join + "Subtotal" + join + footer_debit + join + footer_credit + join \ + footer_commission + join + footer_total_amount + join + currency + '\n' + '\n' + join + join + join \ + join + join + join + join + "P/L" + join + pl_percent + " %" + join + footer_total_amount + join + currency write_file(footer,csv_out) input("\n\nPress the Enter key to exit.")
Are you seriously adding the 'join' string rather than using the join *function*? That is not the way to do things.
Looks like I picked a variable name that closely resembles a function. I assigned the comma to the join and split variable to make the script easier to change to handle different characters in the CSV file. Example: I can output using ZZZZZZZ to join the lines instead of a comma. Then input using the same ZZZZZZZ or whatever character(s) I choose to split the lines.