Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python to group csv data

Tags:

python

csv

I have a csv file with thousands of entries that need to be broken up into groups. In the example below, I need each row broken up into groups based on the River Name so later I can reformat the information based on their groups.

River Name, Branch, Length
Catnip, 1, 2145.30
Peterson, 2, 24.5
Catnip, 3, 15.4
Fergerson, 1, 5.2
Catnip, 1, 88.56
Peterson, 2, 6.45

The only way I can think of grouping the information would be to:

  1. Use python to read csv and create a list of just the unique river names.
  2. Create new individual csv based on the unique river names e.g Peterson.csv, Catnip.csv.
  3. Use python to read the original csv, and depending on the river name on the row being read, write that row to the corresponding .csv file. e.g row Catnip, 1, 2145.30 would be written to catnip.csv

I don't think this is an efferent way to go about this as it gives me about 1500 csv that will need to be open and written to, but I am at my limits of python knowledge. If any one could provide a better methodology, it would greatly be appreciated.

like image 707
TsvGis Avatar asked Dec 18 '22 23:12

TsvGis


1 Answers

You can also simply use the csv module and save the results to a dictionary. I enumerated the reader to skip the first row (I'm sure there must be an easier way...). I then read each row and assign the values to river, branch and length. If the river is not in the dictionary, then it initializes it with an empty list. It then appends the tuple pair of branch and length to the dictionary.

rivers = {}
with open('rivers.csv', mode='rU') as f:
    reader = csv.reader(f, delimiter=',')  # dialect=csv.excel_tab?
    for n, row in enumerate(reader):
        if not n:
            # Skip header row (n = 0).
            continue  
        river, branch, length = row
        if river not in rivers:
            rivers[river] = list()
        rivers[river].append((branch, length))

>>> rivers
{'Catnip': [('1', '2145.3'), ('3', '15.4'), ('1', '88.56')],
 'Fergerson': [('1', '5.2')],
 'Peterson': [('2', '24.5'), ('2', '6.45')]}
like image 127
Alexander Avatar answered Dec 29 '22 13:12

Alexander