Count repeated values in a specific column in a CSV file and return the value to another column (python2)

Question

I am currently trying to count repeated values in a column of a CSV file and return the value to another CSV column in a python.

For example, my CSV file :

KeyID    GeneralID
145258   KL456
145259   BG486
145260   HJ789
145261   KL456

What I want to achieve is to count how many data have the same GeneralID and insert it into a new CSV column. For example,

KeyID    Total_GeneralID
145258   2
145259   1
145260   1
145261   2

I have tried to split each column using the split method but it didn't work so well.

My code :

case_id_list_data = []

with open(file_path_1, "rU") as g:
    for line in g:
        case_id_list_data.append(line.split('	'))
        #print case_id_list_data[0][0] #the result is dissatisfying 
        #I'm stuck here..

Stephen Rauch · Accepted Answer

And if you are adverse to pandas and want to stay with the standard library:

Code:

import csv
from collections import Counter
with open('file1', 'rU') as f:
    reader = csv.reader(f, delimiter='	')
    header = next(reader)
    lines = [line for line in reader]
    counts = Counter([l[1] for l in lines])

new_lines = [l + [str(counts[l[1]])] for l in lines]
with open('file2', 'wb') as f:
    writer = csv.writer(f, delimiter='	')
    writer.writerow(header + ['Total_GeneralID'])
    writer.writerows(new_lines)

Results:

KeyID   GeneralID   Total_GeneralID
145258  KL456   2
145259  BG486   1
145260  HJ789   1
145261  KL456   2

Raj Damani · Answer

You have to divide the task in three steps: 1. Read CSV file 2. Generate new column's value 3. Add value to the file back import csv import fileinput import sys

# 1. Read CSV file
# This is opening CSV and reading value from it.
with open("dev.csv") as filein:
    reader = csv.reader(filein, skipinitialspace = True)
    xs, ys = zip(*reader)

result=["Total_GeneralID"]

# 2. Generate new column's value
# This loop is for counting the "GeneralID" element.
for i in range(1,len(ys),1):
    result.append(ys.count(ys[i]))

# 3. Add value to the file back
# This loop is for writing new column
for ind,line in enumerate(fileinput.input("dev.csv",inplace=True)):
    sys.stdout.write("{} {}, {}
".format("",line.rstrip(),result[ind]))

I haven't use temp file or any high level module like panda or anything.

Allen · Answer

import pandas as pd
#read your csv to a dataframe
df = pd.read_csv('file_path_1')
#generate the Total_GeneralID by counting the values in the GeneralID column and extract the occurrance for the current row.
df['Total_GeneralID'] = df.GeneralID.apply(lambda x: df.GeneralID.value_counts()[x])
df = df[['KeyID','Total_GeneralID']]
Out[442]: 
    KeyID  Total_GeneralID
0  145258                2
1  145259                1
2  145260                1
3  145261                2

jezrael · Answer

You can use pandas library:

first read_csv
get counts of values in column GeneralID by value_counts, rename by output column
join to original DataFrame

import pandas as pd

df = pd.read_csv('file')
s = df['GeneralID'].value_counts().rename('Total_GeneralID')
df = df.join(s, on='GeneralID')
print (df)
    KeyID GeneralID  Total_GeneralID
0  145258     KL456                2
1  145259     BG486                1
2  145260     HJ789                1
3  145261     KL456                2

Count repeated values in a specific column in a CSV file and return the value to another column (python2)

Tags:

python

csv

python-2.x

yunaranyancat

4 Answers

Stephen Rauch

Raj Damani

Allen

jezrael

Recent Activity

Donate For Us

Count repeated values in a specific column in a CSV file and return the value to another column (python2)

Tags:

python

csv

python-2.x

yunaranyancat

4 Answers

Stephen Rauch

Raj Damani

Allen

jezrael

Related questions

Recent Activity

Donate For Us