Slicing Pandas Dataframe according to number of lines

Question

I suppose this is something rather simple, but I can't find how to make this. I've been searching tutorials and stackoverflow.

Suppose I have a dataframe df loking like this :

Group   Id_In_Group   SomeQuantity
1        1              10
1        2              20
2        1               7
3        1              16
3        2              22
3        3               5
3        4              12
3        5              28
4        1               1
4        2              18
4        3              14
4        4               7
5        1              36

I would like to select only the lines having at least 4 objects in the group (so there are at least 4 rows having the same "group" number) and for which SomeQuantity for the 4th object, when sorted in the group by ascending SomeQuantity, is greater than 20 (for example).

In the given Dataframe, for example, it would only return the 3rd group, since it has 4 (>=4) members and its 4th SomeQuantity (after sorting) is 22 (>=20), so it should construct the dataframe :

Group   Id_In_Group   SomeQuantity
3        1              16
3        2              22
3        3               5
3        4              12
3        5              28

(being or not sorted by SomeQuantity, whatever).

Could somebody be kind enough to help me? :)

MaxU - stop WAR against UA · Accepted Answer

I would use .groupby() + .filter() methods:

In [66]: df.groupby('Group').filter(lambda x: len(x) >= 4 and x['SomeQuantity'].max() >= 20)
Out[66]:
   Group  Id_In_Group  SomeQuantity
3      3            1            16
4      3            2            22
5      3            3             5
6      3            4            12
7      3            5            28

Nickil Maveli · Answer

A slightly different approach using map, value_counts, groupby , filter:

(df[df.Group.map(df.Group.value_counts().ge(4))]
   .groupby('Group')
   .filter(lambda x: np.any(x['SomeQuantity'].sort_values().iloc[3] >= 20)))

enter image description here

Breakdown of steps:

Perform value_counts to compute the total counts of distinct elements present in Group column.

>>> df.Group.value_counts()

3    5
4    4
1    2
5    1
2    1
Name: Group, dtype: int64

Use map which functions like a dictionary (wherein the index becomes the keys and the series elements become the values) to map these results back to the original DF

>>> df.Group.map(df.Group.value_counts())

0     2
1     2
2     1
3     5
4     5
5     5
6     5
7     5
8     4
9     4
10    4
11    4
12    1
Name: Group, dtype: int64

Then, we check for the elements having a value of 4 or more which is our threshold limit and take only those subset from the entire DF.

>>> df[df.Group.map(df.Group.value_counts().ge(4))]   

    Group  Id_In_Group  SomeQuantity
3       3            1            16
4       3            2            22
5       3            3             5
6       3            4            12
7       3            5            28
8       4            1             1
9       4            2            28
10      4            3            14
11      4            4             7

Inorder to use groupby.filter operation on this, we must make sure that we return a single boolean value corresponding to each grouped key when we perform the sorting process and compare the fourth element to the threshold which is 20. np.any returns all such possiblities matching our filter.

>>> df[df.Group.map(df.Group.value_counts().ge(4))]         \
      .groupby('Group').apply(lambda x: x['SomeQuantity'].sort_values().iloc[3])

 Group
3    22
4    18
dtype: int64

From these, we compare the fourth element .iloc[3] as it is 0-based indexed and return all such favourable matches.

Slicing Pandas Dataframe according to number of lines

Tags:

python

slice

pandas

dataframe

Matt

2 Answers

MaxU - stop WAR against UA

Nickil Maveli

Recent Activity

Donate For Us

Slicing Pandas Dataframe according to number of lines

Tags:

python

slice

pandas

dataframe

Matt

2 Answers

MaxU - stop WAR against UA

Nickil Maveli

Related questions

Recent Activity

Donate For Us