Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find col2 values based on certain col1 value, if not presents keep nearest value using pandas

I have a data frame like this:

df
col1      col2      
 1         10
 2         15
 4         12
 5         23
 6         11
 8         32
 9         12
 11        32
 2         23
 3         21
 4         12
 6         15
 9         12
 10        32

I want to select col2 values for every 1, 5 and 10 values of col1. If col1 value is not 1, 5 or 10 keep the col2 values where col1 values is nearest to 1,5 or 10

for example the final df will look like:

df
col1      col2      
 1         10
 5         23
 11        32
 2         23
 6         15
 10        32

how to do it using pandas without using any loop

like image 256
Kallol Avatar asked Oct 16 '22 15:10

Kallol


1 Answers

  • df.col1.diff().lt(0).cumsum() defines groups of ascending values
  • set_index with those groups and col1 but keep col1 in the dataframe proper with drop=False
  • groupby and pd.concat to use reindex with method='nearest'

I left the old col1 index so you can see what got mapped to what.

c = df.set_index([df.col1.diff().lt(0).cumsum().rename('grp'), 'col1'], drop=False)
pd.concat([c.xs(k).reindex([1, 5, 10], method='nearest') for k, c in c.groupby(level=0)])

      col1  col2
col1            
1        1    10
5        5    23
10      11    32
1        2    23
5        6    15
10      10    32

If you don't like that extra col1 in the index, you can rename the index then drop it:

c = df.set_index([df.col1.diff().lt(0).cumsum().rename('grp'), 'col1'], drop=False)
pd.concat([c.xs(k).reindex([1, 5, 10], method='nearest') for k, c in c.groupby(level=0)]) \
    .rename_axis(None).reset_index(drop=True)

   col1  col2
0     1    10
1     5    23
2    11    32
3     2    23
4     6    15
5    10    32
like image 68
piRSquared Avatar answered Oct 20 '22 20:10

piRSquared