Python searching through columns

Question

I have a CSV file that I need to loop through in a specific pattern for specific columns and have the output patterns be stored in new files with the same name + "_pattern" + [1,2,3,etc.] + .csv.

This is the search pattern: Loop through column 1 and find the same # and grab them, then loop through column 2 of the grabbed list and then grab all that have the same date in column 2, then go to column 4 and grab all #s that are NOT the same, and then create a file with the pattern from column 1 and 2 and 4 organized by column time.

Example:

1       2           time    4
13.45   9/29/2016   6:00    98765
12.56   9/29/2016   6:05    76548
13.45   9/29/2016   6:07    98764
13.45   9/29/2016   6:21    98766
13.45   9/29/2016   6:20    96765
12.56   9/29/2016   6:06    76553

Better view of table

The result would be, file_pattern_1.csv would have:

1. 13.45    9/29/2016   6:00    98765
2. 13.45    9/29/2016   6:07    98764
3. 13.45    9/29/2016   6:21    98766

But would not include:

4. 13.45    9/29/2016   6:20    96765

Because column 4 repeats from a previous entry, file_pattern_2.csv would have:

1. 12.56    9/29/2016   6:05    76548
2. 12.56    9/29/2016   6:06    76553

This is what I have so far but I have become lost on the looping logic:

import os

infile = raw_input("Which file are we working with? ")
assert os.path.exists(infile), "Path is incorrect."
os.chdir(infile)

def createFile(csvFile, fileName):
    with open (fileName, 'wb') as ftext:
        ftext.write(csvFile)

def appendFile(csvFile, fileName):
    with open (fileName, 'a') as ftext:
        ftext.write(csvFile)

def setfilename(tread):
    fileName = tread[0:tread.index('.')] + '_patterns' + str(countItem) + '.csv'
    return fileName

for i in pcolumn:
    if pcolumn == pcolumn:
        return pfile
    for x in date:
        if date == date:
            return date
            for a in acolumn:
                if acolumn != acolumn:
                    createFile(fileName)
else:
    print "Finished."

Martin Evans · Accepted Answer

The following should do what you need. It reads a csv file in and generates a matching datetime for each of the entries to allow them to be correctly sorted. It creates output csv files based on the pattern number with the entries sorted by date. Column 4 entries already seen are omitted:

from itertools import groupby
from datetime import datetime
import csv
import os

filename = 'my_data.csv'
data = []

with open(filename, 'rb') as f_input:
    csv_input = csv.reader(f_input, delimiter='	')
    header = next(csv_input)

    for row in csv_input:
        dt = datetime.strptime('{} {}'.format(row[2], row[1]), '%H:%M %m/%d/%Y')
        data.append([dt] + row)

for index, (k, g) in enumerate(groupby(sorted(data, key=lambda x: x[1]), key=lambda x: x[1]), start=1):
    line = 1
    seen = set()

    with open('{}_pattern_{}.csv'.format(os.path.splitext(filename)[0], index), 'wb') as f_output:
        csv_output = csv.writer(f_output)

        for item in sorted(g, key=lambda x: x[0]):
            if item[4] not in seen:
                seen.add(item[4])
                csv_output.writerow([line] + item[1:])
                line += 1

Python searching through columns

Tags:

python

loops

datetime

csv

Aggs

1 Answers

Martin Evans

Recent Activity

Donate For Us

Python searching through columns

Tags:

python

loops

datetime

csv

Aggs

1 Answers

Martin Evans

Related questions

Recent Activity

Donate For Us