Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python algorithm of counting occurrence of specific word in csv

I've just started to learn python. I'm curious about what are the efficient ways to count the occurrence of a specific word in a CSV file, other than simply use for loop to go through line by line and read.

To be more specific, let's say I have a CSV file contain two columns, "Name" and "Grade", with millions of records.

How would one count the occurrence of "A" under "Grade"?

Python code samples would be greatly appreciated!

like image 206
laotanzhurou Avatar asked Feb 12 '12 07:02

laotanzhurou


People also ask

How do you count the number of times a word appears in a file in Python?

To count the number of occurrences of a specific word in a text file, read the content of text file to a string and use String. count() function with the word passed as argument to the count() function.

How do I count a specific word in Python?

Python Code:def word_count(str): counts = dict() words = str. split() for word in words: if word in counts: counts[word] += 1 else: counts[word] = 1 return counts print( word_count('the quick brown fox jumps over the lazy dog.

How do I count data in a CSV file in Python?

Because: It saves lot of memory without having to create list. def read_raw_csv(file_name): with open(file_name, 'r') as file: csvreader = csv. reader(file) # count number of rows entry_count = sum(1 for row in csvreader) print(entry_count-1) # -1 is for discarding header row. Show activity on this post.


2 Answers

Basic example, with using csv and collections.Counter (Python 2.7+) from standard Python libraly:

import csv
import collections

grades = collections.Counter()
with open('file.csv') as input_file:
    for row in csv.reader(input_file, delimiter=';'):
        grades[row[1]] += 1

print 'Number of A grades: %s' % grades['A']
print grades.most_common()

Output (for small dataset):

Number of A grades: 2055
[('A', 2055), ('B', 2034), ('D', 1995), ('E', 1977), ('C', 1939)]
like image 165
reclosedev Avatar answered Sep 18 '22 05:09

reclosedev


You should of course read all the grades, which in this case also means reading the entire file. You can use the csv module to easily read comma separated value files:

import csv
my_reader = csv.reader(open('my_file.csv'))
ctr = 0
for record in my_reader:
    if record[1] == 'A':
        ctr += 1
print(ctr)

This is pretty fast, and I couldn't do better with the Counter method:

from collections import Counter
grades = [rec[1] for rec in my_reader] # generator expression was actually slower
result = Counter(grades)
print(result)

Last but not least, lists have a count method:

from collections import Counter
grades = [rec[1] for rec in my_reader]
result = grades.count('A')
print(result)
like image 45
steabert Avatar answered Sep 21 '22 05:09

steabert