This question is very similar to one I posted before with just one change. Instead of doing just the absolute difference for all the columns I also want to find the magnitude difference for the 'Z' column, so if the current Z is 1.1x greater than prev than keep it.
(more context to the problem)
Pandas using the previous rank values to filter out current row
df = pd.DataFrame({
'rank': [1, 1, 2, 2, 3, 3],
'x': [0, 3, 0, 3, 4, 2],
'y': [0, 4, 0, 4, 5, 5],
'z': [1, 3, 1.2, 3.25, 3, 6],
})
print(df)
# rank x y z
# 0 1 0 0 1.00
# 1 1 3 4 3.00
# 2 2 0 0 1.20
# 3 2 3 4 3.25
# 4 3 4 5 3.00
# 5 3 2 5 6.00
Here's what I want the output to be
output = pd.DataFrame({
'rank': [1, 1, 2, 3],
'x': [0, 3, 0, 2],
'y': [0, 4, 0, 5],
'z': [1, 3, 1.2, 6],
})
print(output)
# rank x y z
# 0 1 0 0 1.0
# 1 1 3 4 3.0
# 2 2 0 0 1.2
# 5 3 2 5 6.00
basically what I want to happen is if the previous rank has any rows with x, y (+- 1 both ways) AND z (<1.1z) to remove it.
So for the rows rank 1 ANY rows in rank 2 that have any combo of x = (-1-1), y = (-1-1), z= (<1.1) OR x = (2-5), y = (3-5), z= (<3.3) I want it to be removed
Apply Lambda Expression to Single Column You can apply the lambda expression for a single column in the DataFrame. The following example subtracts every cell value by 2 for column A – df["A"]=df["A"]. apply(lambda x:x-2) . Yields below output.
We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.
Using DataFrame. apply() method & lambda functions the resultant DataFrame can be any number of columns whereas with transform() function the resulting DataFrame must have the same length as the input DataFrame.
Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.
Here's a solution using numpy broadcasting:
# Initially, no row is dropped
df['drop'] = False
for r in range(df['rank'].min(), df['rank'].max()):
# Find the x_min, x_max, y_min, y_max, z_max of the current rank
cond = df['rank'] == r
x, y, z = df.loc[cond, ['x','y','z']].to_numpy().T
x_min, x_max = x + [[-1], [1]] # use numpy broadcasting to ±1 in one command
y_min, y_max = y + [[-1], [1]]
z_max = z * 1.1
# Find the x, y, z of the next rank. Raise them one dimension
# so that we can make a comparison matrix again x_min, x_max, ...
cond = df['rank'] == r + 1
if not cond.any():
continue
x, y, z = df.loc[cond, ['x','y','z']].to_numpy().T[:, :, None]
# Condition to drop a row
drop = (
(x_min <= x) & (x <= x_max) &
(y_min <= y) & (y <= y_max) &
(z <= z_max)
).any(axis=1)
df.loc[cond, 'drop'] = drop
# Result
df[~df['drop']]
An even more condensed version (and likely faster). This is a really good way to puzzle your future teammates when they read the code:
r, x, y, z = df[['rank', 'x', 'y', 'z']].T.to_numpy()
rr, xx, yy, zz = [col[:,None] for col in [r, x, y, z]]
drop = (
(rr == r + 1) &
(x-1 <= xx) & (xx <= x+1) &
(y-1 <= yy) & (yy <= y+1) &
(zz <= z*1.1)
).any(axis=1)
# Result
df[~drop]
What this does is comparing every row in df
against each other (including itself) and return True (i.e. drop) if:
rank
== the other row's rank + 1
; andx, y, z
fall within the specified range of the other row's x, y, z
You need to slightly modify my previous code:
def check_previous_group(rank, d, groups):
if not rank-1 in groups.groups:
# check is a previous group exists, else flag all rows False (i.e. not to be dropped)
return pd.Series(False, index=d.index)
else:
# get previous group (rank-1)
d_prev = groups.get_group(rank-1)
# get the absolute difference per row with the whole dataset
# of the previous group: abs(d_prev-s)
# if all differences are within 1/1/0.1*z for x/y/z
# for at least one rows of the previous group
# then flag the row to be dropped (True)
return d.apply(lambda s: abs(d_prev-s)[['x', 'y', 'z']].le([1,1,.1*s['z']]).all(1).any(), axis=1)
groups = df.groupby('rank')
mask = pd.concat([check_previous_group(rank, d, groups) for rank,d in groups])
df[~mask]
output:
rank x y z
0 1 0 0 1.0
1 1 3 4 3.0
2 2 0 0 1.2
5 3 2 5 6.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With