Lets assume I have a (text) file with the following structure (name, score):
a 0
a 1
b 0
c 0
d 3
b 2
And so on. My aim is to sum the scores for every name and order them from highest score to lowest score. So in this case, I want the following output:
d 3
b 2
a 1
c 0
In advance I do not know what names will be in the file.
I was wondering if there is an efficient way to do this. My text file can contain up to 50,000 entries.
The only way I can think of is just start at line 1, remember that name and then go over the whole file to look for that name and sum. This looks horribly inefficient, so I was wondering if there is a better way to do this.
Read all data into a dictionary:
from collections import defaultdict
from operator import itemgetter
scores = defaultdict(int)
with open('my_file.txt') as fobj:
for line in fobj:
name, score = line.split()
scores[name] += int(score)
and the sorting:
for name, score in sorted(scores.items(), key=itemgetter(1), reverse=True):
print(name, score)
prints:
d 3
b 2
a 1
c 0
To check the performance of this answer vs. the one from @SvenMarnach, I put both approaches into a function. Here fobj
is a file opened for reading.
I use io.StringIO
so IO delays should, hopefully, not be measured:
from collections import Counter
def counter(fobj):
scores = Counter()
fobj.seek(0)
for line in fobj:
key, score = line.split()
scores.update({key: int(score)})
return scores.most_common()
from collections import defaultdict
from operator import itemgetter
def default(fobj):
scores = defaultdict(int)
fobj.seek(0)
for line in fobj:
name, score = line.split()
scores[name] += int(score)
return sorted(scores.items(), key=itemgetter(1), reverse=True)
Results for collections.Counter
:
%timeit counter(fobj)
10000 loops, best of 3: 59.1 µs per loop
Results for collections.defaultdict
:
%timeit default(fobj)
10000 loops, best of 3: 15.8 µs per loop
Looks like defaultdict
is four times faster. I would not have guessed this. But when it comes to performance you need to measure.
This is a good use case for collections.Counter
:
from collections import Counter
scores = Counter()
with open('my_file') as f:
for line in f:
key, score = line.split()
scores.update({key: int(score)})
for key, score in scores.most_common():
print(key, score)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With