I want to iterate over groups that are grouped by strings or dates.
df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
'B': ['me', 'you', 'me'] * 2,
'C': [5, 2, 3, 4, 6, 9]})
groups = df.groupby('A')
For eg in this code, I have groups by their names 'foo' and 'bar', and I can loop over them using;
for name, group in groups:
print name
My problem is I need to run another loop inside this loop and everytime I need to call different set of groups. like (assume groups has size n)
for name,group in groups:
for name1 in range(name, name + 9): # + 9 to get first 9 groups for every iteration`
Since, name is a string I am unable to do that. In short I just want a method by which I can access groups by numbers so that I can easily call required groups for computation. Something like
groups = df.group('A')
for i in range(0,n):
print group(i)[] + group(i+1)[]
so if I have following groups [g1,g2,g3,g4,g5], i want to iteratively call them in pairs like [g1,g2], [g2,g3], [g3,g4] .... and take the intersection of the 2 groups of series everytime. I am looking for way to call groups [g1,g2,..g5] by index or some no. so that I can use them for loop operations. Currently only way I know to call groups is through the names of the group, as mentioned above in example 'foo' and 'bar'. I want power to do operations such as:
for name,group in groups-1:
print gb.get_group(name)
print gb.get_group(name+1)
I know this might be a simple problem, but I have been struggling for this part since a while. I would appreciate any kind of help.
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .
Pandas' groupby() allows us to split data into separate groups to perform computations for better analysis. In this article, you'll learn the “group by” process (split-apply-combine) and how to use Pandas's groupby() function to group data and perform operations.
The .groupby()
object has a .groups
attribute that returns a Python dict of indices. In this case:
In [26]: df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
....: 'B': ['me', 'you', 'me'] * 2,
....: 'C': [5, 2, 3, 4, 6, 9]})
In [27]: groups = df.groupby('A')
In [28]: groups.groups
Out[28]: {'bar': [1L, 3L, 5L], 'foo': [0L, 2L, 4L]}
You can iterate over this as follows:
keys = groups.groups.keys()
for index in range(0, len(keys) - 1):
g1 = df.ix[groups.groups[keys[index]]]
g2 = df.ix[groups.groups[keys[index + 1]]]
# Do something with g1, g2
However, please remember that using for
loops to iterate over Pandas objects is generally slower than vector operations. Depending on what you need done, and if it needs to be fast, you may want to try other approaches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With