Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas with Groupby: assign a value from a column conditioned on another column

I have a DataFrame like this:

df = pd.DataFrame({'col0': list('aabb'), 
                   'col1': np.arange(4),
                   'col2': list('wxyz'),
                   'col3': np.nan})

    col0 col1 col2 col3
0   a    0    w    NaN
1   a    1    x    NaN
2   b    2    y    NaN
3   b    3    z    NaN

I want to assign to 'col3' the value of 'col2' corresponding to the minimum value of 'col1', grouped by 'col0'. Expected output:

    col0 col1 col2 col3
0   a    0    w    w
1   a    1    x    w
2   b    2    y    y
3   b    3    z    y

If grouping by 'col0' was not needed, this would work:

df['col3'] = df[df['col1']==df['col1'].min()]['col2'].iloc[0]

    col0 col1 col2 col3
0   a    0    w    w
1   a    1    x    w
2   b    2    y    w
3   b    3    z    w

Similarly, this is my try using groupby/apply, which doesn't work as expected:

df['col3'] = df.groupby('col0').apply(lambda x: x[x['col1']==x['col1'].min()]['col2'].iloc[0])

    col0 col1 col2 col3
0   a    0    w    NaN
1   a    1    x    NaN
2   b    2    y    NaN
3   b    3    z    NaN

like image 735
makpalan Avatar asked Jun 17 '21 17:06

makpalan


People also ask

How do you get the value of a column based on another column Pandas?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do you use between conditions in Pandas?

Boolean Series in PandasThe between() function is used to get boolean Series equivalent to left <= series <= right. This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.


3 Answers

another transforming with idxmin and loc:

df["col3"] = df.groupby("col0").col1.transform(lambda x: df.loc[x.idxmin(), "col2"])

to get

  col0  col1 col2 col3
0    a     0    w    w
1    a     1    x    w
2    b     2    y    y
3    b     3    z    y
like image 133
Mustafa Aydın Avatar answered Nov 14 '22 06:11

Mustafa Aydın


you can use groupby.apply to get a series and then merge it into the df

df
  col0  col1 col2
0    a     0    w
1    a     1    x
2    b     2    y
3    b     3    z

col3 = df.groupby("col0").apply(lambda x: x.loc[x["col1"].idxmin(), "col2"])
col3.name = "col3"
df = df.merge(col3, how="left", left_on= "col0", right_index= True)

df
 col0  col1 col2 col3
0    a     0    w    w
1    a     1    x    w
2    b     2    y    y
3    b     3    z    y
like image 22
Stryder Avatar answered Nov 14 '22 06:11

Stryder


you can groupby with transform idxmin and then series.map:

d = dict(zip(df['col1'],df['col2']))
df['col3'] = df['col3'].fillna(df.groupby("col0")['col1'].transform('idxmin').map(d))

print(df)

  col0  col1 col2 col3
0    a     0    w    w
1    a     1    x    w
2    b     2    y    y
3    b     3    z    y    
like image 27
anky Avatar answered Nov 14 '22 04:11

anky