I've just started to learn python. I'm curious about what are the efficient ways to count the occurrence of a specific word in a CSV file, other than simply use for loop to go through line by line and read. To be more specific, let's say I have a CSV file contain two columns, "Name" and "Grade", with millions of records. How would one count the occurrence of "A" under "Grade"? Python code samples would be greatly appreciated!

You should of course read all the grades, which in this case also means reading the entire file. You can use the <code>csv</code> module to easily read comma separated value files: <pre class="prettyprint"><code>import csv my_reader = csv.reader(open('my_file.csv')) ctr = 0 for record in my_reader: if record[1] == 'A': ctr += 1 print(ctr) </code></pre> This is pretty fast, and I couldn't do better with the <code>Counter</code> method: <pre class="prettyprint"><code>from collections import Counter grades = [rec[1] for rec in my_reader] # generator expression was actually slower result = Counter(grades) print(result) </code></pre> Last but not least, lists have a <code>count</code> method: <pre class="prettyprint"><code>from collections import Counter grades = [rec[1] for rec in my_reader] result = grades.count('A') print(result) </code></pre>

Python algorithm of counting occurrence of specific word in csv

Tags:

python

algorithm

csv

counting

I've just started to learn python. I'm curious about what are the efficient ways to count the occurrence of a specific word in a CSV file, other than simply use for loop to go through line by line and read.

To be more specific, let's say I have a CSV file contain two columns, "Name" and "Grade", with millions of records.

How would one count the occurrence of "A" under "Grade"?

Python code samples would be greatly appreciated!

206

asked Feb 12 '12 07:02

laotanzhurou

2 Answers

Basic example, with using csv and collections.Counter (Python 2.7+) from standard Python libraly:

import csv
import collections

grades = collections.Counter()
with open('file.csv') as input_file:
    for row in csv.reader(input_file, delimiter=';'):
        grades[row[1]] += 1

print 'Number of A grades: %s' % grades['A']
print grades.most_common()

Output (for small dataset):

Number of A grades: 2055
[('A', 2055), ('B', 2034), ('D', 1995), ('E', 1977), ('C', 1939)]

165

answered Sep 18 '22 05:09

reclosedev

You should of course read all the grades, which in this case also means reading the entire file. You can use the csv module to easily read comma separated value files:

import csv
my_reader = csv.reader(open('my_file.csv'))
ctr = 0
for record in my_reader:
    if record[1] == 'A':
        ctr += 1
print(ctr)

This is pretty fast, and I couldn't do better with the Counter method:

from collections import Counter
grades = [rec[1] for rec in my_reader] # generator expression was actually slower
result = Counter(grades)
print(result)

Last but not least, lists have a count method:

from collections import Counter
grades = [rec[1] for rec in my_reader]
result = grades.count('A')
print(result)

answered Sep 21 '22 05:09

steabert

Related questions
                            
                                How to share variables between methods in a class? [duplicate]
                            
                                Django admin list_display property usage
                            
                                Search for a key in a nested Python dictionary
                            
                                Python: String of 1s and 0s -> binary file
                            
                                Dedupe and sort a list in Python 2.2
                            
                                python || backup statement
                            
                                NumPy arrays with SQLite
                            
                                Load/reload a portion of code in Python without restarting main script
                            
                                Using Django ORM get_or_create with multiple databases
                            
                                Python topological sort using lists indicating edges
                            
                                Can I use a dynamic mapping to unpack keyword arguments in Python?
                            
                                Will the function in python for loop be executed multiple times?
                            
                                How to reverse geocode serverside with python, json and google maps?
                            
                                Matplotlib animations - how to export them to a format to use in a presentation?
                            
                                LXML and XSL document() Function
                            
                                Python FileCookieJar.save() issue
                            
                                Store exception body in variable
                            
                                How to extract movie title from file name
                            
                                Combined list and dict comprehension
                            
                                Dynamically get dict elements via getattr?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With