Pandas dataframe.loc method too slow

Question

I have a dataframe with +100K rows like this:

   user  document
0  john      book
1  jane   article
2  jane      book
3  jane      book
4   jim   article
5  john      book
6   jim  blogpost
7  jane  blogpost
8  jane  blogpost
9  jane  blogpost

I need the dataframe like this:

      blogpost  article  book
john         1        3     0
jane         0        0     1
jim          4        0     2

That is, I need numbers of downloads for every user, document combination.

I'm doing .groupby(['user', 'document']) and then use df.loc to set download number:

df = pd.DataFrame(index=users, columns=documents)
df.fillna(0, inplace=True)

grouped = records.groupby(['user', 'document'])
for elem in grouped:
    user, document = elem[0]
    downloads = len(elem[1])
    df.loc[user, document] = downloads

However, df.loc is very slow when used this way... I've commented out df.loc.. line and found that the loop completes fast, so almost certainly it's df.loc access that is slow.

How can I get this result faster?

Minimum working example:

records = pd.DataFrame([
    ('john', 'book'), 
    ('jane', 'article'),
    ('jane','book'),
    ('jane','book'),
    ('jim', 'article'), 
    ('john', 'book'),
    ('jim', 'blogpost'), 
    ('jane', 'blogpost'),
    ('jane', 'blogpost'),
    ('jane', 'blogpost')
    ], columns=['user', 'document'])
print(records)

users = list(set(records['user']))
users.sort()
documents = list(set(records['document']))
documents.sort()

print(users)
print(documents)

df = pd.DataFrame(index=users, columns=documents)
df.fillna(0, inplace=True)
print(df)

grouped = records.groupby(['user', 'document'])
for elem in grouped:
    user, document = elem[0]
    downloads = len(elem[1])
    df.loc[user, document] = downloads

BENY · Accepted Answer

There are plenty of ways achieves this without loop, pivot , pivot_table, crosstab, groupby count

pd.crosstab(df.user,df.document)
Out[1283]: 
document  article  blogpost  book
user                             
jane            1         3     2
jim             1         1     0
john            0         0     2

Pandas dataframe.loc method too slow

Tags:

python

pandas

pandas-groupby

LetMeSOThat4U

1 Answers

BENY

Recent Activity

Donate For Us

Pandas dataframe.loc method too slow

Tags:

python

pandas

pandas-groupby

LetMeSOThat4U

1 Answers

BENY

Related questions

Recent Activity

Donate For Us