Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter a set of rows according to an indexed position?

I am stuck on how to filter out a set of rows based on the indexed position. For more clarity, lets have a dummy problem, lets say I have a datframe of user having multiple profile, like in example df1 i have three user John, Johnny and Ben having their multiple profile..

df1 = pd.DataFrame({"user": ["Peter (1)", "Peter (2)", "Peter (3)","John (1)","John (2)","John (3)","Johnny (1)","Johnny (2)"], "value": [1, 3, 3, 1, 6, 3, 4, 1]}, )

I sort the df1 based on value and reindex it

df1=df1.sort_values(by='value', ascending=False)

df1.index=[0, 1, 2, 3, 4, 5, 6, 7]

df1 looks like this

enter image description here

Now i am stuck as how to filter out rows for user having first indexed value (in this case John), lets say to new dataframe df2, and also to filter out rows for second indexed user (in this case Johnny) to new dataframe df3 expected df2 should look like this

enter image description here

df3 should look like below

enter image description here

like image 685
Arun Menon Avatar asked Feb 20 '21 09:02

Arun Menon


3 Answers

After sorting the dataframe you can use str.split to split the strings in the user column to create a grouping key, then group the dataframe on this grouping key and for each subgroup per user create a mapping of user -> dataframe inside a dict comprehension:

key = df1['user'].str.split().str[0]
dct = {user:grp.reset_index(drop=True) for user, grp in df1.groupby(key)}

Now to access the dataframe corresponding to the user we can simply lookup inside the dictionary:

>>> dct['John']

       user  value
0  John (2)      6
1  John (3)      3
2  John (1)      1

>>> dct['Peter']

        user  value
0  Peter (2)      3
1  Peter (3)      3
2  Peter (1)      1

>>> dct['Johnny']

         user  value
0  Johnny (1)      4
1  Johnny (2)      1
like image 79
Shubham Sharma Avatar answered Oct 07 '22 07:10

Shubham Sharma


You can get the first index value and split it and exclude last item(assuming that user name may have parenthesis), and then search for the value in the entire dataframe for that particular column. For example:

firstIndexUser = df1['user'].str.split('(').str[:-1].str.join('(').iloc[0]

This firstIndexUser will have value as 'John' Now you can compare with against the entire dataframe to get your df2

df2 = df1[df1['user'].str.split('(').str[:-1].str.join('(')==firstIndexUser]

The output looks like this:

>>df2
       user  value
0  John (2)      6
4  John (3)      3
6  John (1)      1

If you want, you can reset the index for df2

>>df2.reset_index(drop=True, inplace=True)
>>df2
       user  value
0  John (2)      6
1  John (3)      3
2  John (1)      1

You can follow the similar approach for your df3

like image 26
ThePyGuy Avatar answered Oct 07 '22 06:10

ThePyGuy


df1 = pd.DataFrame({"user": ["Peter (1)", "Peter (2)", "Peter (3)","John (1)","John (2)","John (3)","Johnny (1)","Johnny (2)"], "value": [1, 3, 3, 1, 6, 3, 4, 1]}, )

df1=df1.sort_values(by='value', ascending=False)

cols = df1.columns.tolist()
df1['name'] = df1['user'].replace(r'\s\(\d\)','',regex=True)
grp = df1.groupby(by=['name'])
dataframes = [grp.get_group(x)[cols] for x in grp.groups]

df2, df3 = dataframes[:2]  # as mentioned, we are interested just in first two users

df2:

       user  value
3  John (1)      1
4  John (2)      6
5  John (3)      3     

df3:

       user    value
6  Johnny (1)      4
7  Johnny (2)      1    
like image 3
Roman Avatar answered Oct 07 '22 08:10

Roman