I have a CSV file that I need to loop through in a specific pattern for specific columns and have the output patterns be stored in new files with the same name + "_pattern" + [1,2,3,etc.] + .csv.
This is the search pattern: Loop through column 1 and find the same # and grab them, then loop through column 2 of the grabbed list and then grab all that have the same date in column 2, then go to column 4 and grab all #s that are NOT the same, and then create a file with the pattern from column 1 and 2 and 4 organized by column time.
Example:
1 2 time 4
13.45 9/29/2016 6:00 98765
12.56 9/29/2016 6:05 76548
13.45 9/29/2016 6:07 98764
13.45 9/29/2016 6:21 98766
13.45 9/29/2016 6:20 96765
12.56 9/29/2016 6:06 76553
Better view of table
The result would be, file_pattern_1.csv would have:
1. 13.45 9/29/2016 6:00 98765
2. 13.45 9/29/2016 6:07 98764
3. 13.45 9/29/2016 6:21 98766
But would not include:
4. 13.45 9/29/2016 6:20 96765
Because column 4 repeats from a previous entry, file_pattern_2.csv would have:
1. 12.56 9/29/2016 6:05 76548
2. 12.56 9/29/2016 6:06 76553
This is what I have so far but I have become lost on the looping logic:
import os
infile = raw_input("Which file are we working with? ")
assert os.path.exists(infile), "Path is incorrect."
os.chdir(infile)
def createFile(csvFile, fileName):
with open (fileName, 'wb') as ftext:
ftext.write(csvFile)
def appendFile(csvFile, fileName):
with open (fileName, 'a') as ftext:
ftext.write(csvFile)
def setfilename(tread):
fileName = tread[0:tread.index('.')] + '_patterns' + str(countItem) + '.csv'
return fileName
for i in pcolumn:
if pcolumn == pcolumn:
return pfile
for x in date:
if date == date:
return date
for a in acolumn:
if acolumn != acolumn:
createFile(fileName)
else:
print "Finished."
The following should do what you need. It reads a csv file in and generates a matching datetime
for each of the entries to allow them to be correctly sorted. It creates output csv files based on the pattern number with the entries sorted by date. Column 4 entries already seen are omitted:
from itertools import groupby
from datetime import datetime
import csv
import os
filename = 'my_data.csv'
data = []
with open(filename, 'rb') as f_input:
csv_input = csv.reader(f_input, delimiter='\t')
header = next(csv_input)
for row in csv_input:
dt = datetime.strptime('{} {}'.format(row[2], row[1]), '%H:%M %m/%d/%Y')
data.append([dt] + row)
for index, (k, g) in enumerate(groupby(sorted(data, key=lambda x: x[1]), key=lambda x: x[1]), start=1):
line = 1
seen = set()
with open('{}_pattern_{}.csv'.format(os.path.splitext(filename)[0], index), 'wb') as f_output:
csv_output = csv.writer(f_output)
for item in sorted(g, key=lambda x: x[0]):
if item[4] not in seen:
seen.add(item[4])
csv_output.writerow([line] + item[1:])
line += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With