Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a better, pythonic way to do this?

This is my first python program -

Requirement: Read a file consisting of {adId UserId} in each line. For each adId, print the number of unique userIds.

Here is my code, put together from reading the python docs. Could you give me feedback on how I can write this in more python-ish way?

CODE :

import csv

adDict = {}
reader = csv.reader(open("some.csv"), delimiter=' ')
for row in reader:
    adId = row[0]
    userId = row[1]
    if ( adId in adDict ):
        adDict[adId].add(userId)
    else:
        adDict[adId] = set(userId)

for key, value in adDict.items():
    print (key, ',' , len(value))

Thanks.

like image 876
Schitti Avatar asked Oct 20 '09 22:10

Schitti


People also ask

What does it mean to be more pythonic?

In short, “pythonic” describes a coding style that leverages Python's unique features to write code that is readable and beautiful.

What does it mean to write Pythonic code?

What does Pythonic mean? When people talk about pythonic code, they mean that the code uses Python idioms well, that it's natural or displays fluency in the language. In other words, it means the most widely adopted idioms that are adopted by the Python community.


1 Answers

Congratulations, your code is very nice. There are a few little tricks you could use to make it shorter/simpler.

There is a nifty object type called defaultdict which is provided by the collections module. Instead of having to check if adDict has an adId key, you can set up a defaultdict which acts like a regular dict, except that it automatically provides you with an empty set() when there is no key. So you can change

if ( adId in adDict ):
    adDict[adId].add(userId)
else:
    adDict[adId] = set(userId)

to simply

adDict[adId].add(userId)

Also, instead of

for row in reader:
    adId = row[0]
    userId = row[1]

you could shorten that to

for adId,userId in reader:

Edit: As Parker kindly points out in the comments,

for key, value in adDict.iteritems():

is the most efficient way to iterate over a dict, if you are going to use both the key and value in the loop. In Python3, you can use

for key, value in adDict.items():

since items() returns an iterator.

#!/usr/bin/env python
import csv
from collections import defaultdict

adDict = defaultdict(set)
reader = csv.reader(open("some.csv"), delimiter=' ')
for adId,userId in reader:
    adDict[adId].add(userId)
for key,value in adDict.iteritems():
    print (key, ',' , len(value))
like image 73
unutbu Avatar answered Nov 15 '22 13:11

unutbu