I have a data frame like this:
df
col1 col2
1 10
2 15
4 12
5 23
6 11
8 32
9 12
11 32
2 23
3 21
4 12
6 15
9 12
10 32
I want to select col2 values for every 1, 5 and 10 values of col1. If col1 value is not 1, 5 or 10 keep the col2 values where col1 values is nearest to 1,5 or 10
for example the final df will look like:
df
col1 col2
1 10
5 23
11 32
2 23
6 15
10 32
how to do it using pandas without using any loop
df.col1.diff().lt(0).cumsum()
defines groups of ascending valuesset_index
with those groups and col1
but keep col1
in the dataframe proper with drop=False
groupby
and pd.concat
to use reindex
with method='nearest'
I left the old col1
index so you can see what got mapped to what.
c = df.set_index([df.col1.diff().lt(0).cumsum().rename('grp'), 'col1'], drop=False)
pd.concat([c.xs(k).reindex([1, 5, 10], method='nearest') for k, c in c.groupby(level=0)])
col1 col2
col1
1 1 10
5 5 23
10 11 32
1 2 23
5 6 15
10 10 32
If you don't like that extra col1
in the index, you can rename the index then drop it:
c = df.set_index([df.col1.diff().lt(0).cumsum().rename('grp'), 'col1'], drop=False)
pd.concat([c.xs(k).reindex([1, 5, 10], method='nearest') for k, c in c.groupby(level=0)]) \
.rename_axis(None).reset_index(drop=True)
col1 col2
0 1 10
1 5 23
2 11 32
3 2 23
4 6 15
5 10 32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With