Efficient way to get group names in pandas

Question

I have a .csv file with around 300,000 rows. I have set it to group by a particular column, with each group having around 140 members (2138 total groups).

I am trying to generate a numpy array of the group names. I have used a for loop to generate the names as of now but it takes a while for everything to process.

import numpy as np
import pandas as pd

df = pd.read_csv('file.csv')
grouped = df.groupby('col1')
group_names = []
for name,group in grouped: group_names.append(name)
group_names = np.array(group_names, dtype=object)

I am wondering if there is a more efficient way to do this, whether by using a pandas module or directly converting the names into a numpy array.

sacuL · Accepted Answer

The fastest way would most likely be just to use unique on the column you are grouping by, which gives you all unique values. The output will be an array of your group names.

group_names = df.col1.unique()

EdChum · Answer

groupby objects have a .groups attribute:

groups = df.groupby('col1').groups

this returns a dict of the group name->labels

example:

In[257]:
df = pd.DataFrame({'a':list('aabcccc'), 'b':np.random.randn(7)})
groups = df.groupby('a').groups
groups

Out[257]: 
{'a': Int64Index([0, 1], dtype='int64'),
 'b': Int64Index([2], dtype='int64'),
 'c': Int64Index([3, 4, 5, 6], dtype='int64')}

groups.keys()
Out[258]: dict_keys(['a', 'b', 'c'])

Efficient way to get group names in pandas

Tags:

python

python-3.x

pandas

csv

processing-efficiency

swopnil

2 Answers

sacuL

EdChum

Recent Activity

Donate For Us

Efficient way to get group names in pandas

Tags:

python

python-3.x

pandas

csv

processing-efficiency

swopnil

2 Answers

sacuL

EdChum

Related questions

Recent Activity

Donate For Us