Pandas using apply lambda with two different operators

Q: How do you use lambda and apply?

Apply Lambda Expression to Single Column You can apply the lambda expression for a single column in the DataFrame. The following example subtracts every cell value by 2 for column A – df["A"]=df["A"]. apply(lambda x:x-2) . Yields below output.

Q: How do I apply a lambda function to a column in pandas?

We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.

Q: Can a lambda function takes more than one column?

Using DataFrame. apply() method & lambda functions the resultant DataFrame can be any number of columns whereas with transform() function the resulting DataFrame must have the same length as the input DataFrame.

Q: Can pandas apply return two columns?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

Tags:

python

pandas

lambda

This question is very similar to one I posted before with just one change. Instead of doing just the absolute difference for all the columns I also want to find the magnitude difference for the 'Z' column, so if the current Z is 1.1x greater than prev than keep it.

(more context to the problem)

Pandas using the previous rank values to filter out current row

df = pd.DataFrame({
    'rank': [1, 1, 2, 2, 3, 3],
    'x': [0, 3, 0, 3, 4, 2],
    'y': [0, 4, 0, 4, 5, 5],
    'z': [1, 3, 1.2, 3.25, 3, 6],
})
print(df)
#    rank  x  y     z
# 0     1  0  0  1.00
# 1     1  3  4  3.00
# 2     2  0  0  1.20
# 3     2  3  4  3.25
# 4     3  4  5  3.00
# 5     3  2  5  6.00

Here's what I want the output to be

output = pd.DataFrame({
    'rank': [1, 1, 2, 3],
    'x': [0, 3, 0, 2],
    'y': [0, 4, 0, 5],
    'z': [1, 3, 1.2, 6],
})
print(output)
#    rank  x  y    z
# 0     1  0  0  1.0
# 1     1  3  4  3.0
# 2     2  0  0  1.2
# 5     3  2  5  6.00

basically what I want to happen is if the previous rank has any rows with x, y (+- 1 both ways) AND z (<1.1z) to remove it.

So for the rows rank 1 ANY rows in rank 2 that have any combo of x = (-1-1), y = (-1-1), z= (<1.1) OR x = (2-5), y = (3-5), z= (<3.3) I want it to be removed

848

asked Sep 16 '21 16:09

mike_gundy123

Video Answer

2 Answers

Here's a solution using numpy broadcasting:

# Initially, no row is dropped
df['drop'] = False

for r in range(df['rank'].min(), df['rank'].max()):
    # Find the x_min, x_max, y_min, y_max, z_max of the current rank
    cond = df['rank'] == r
    x, y, z = df.loc[cond, ['x','y','z']].to_numpy().T
    x_min, x_max = x + [[-1], [1]] # use numpy broadcasting to ±1 in one command
    y_min, y_max = y + [[-1], [1]]
    z_max        = z * 1.1

    # Find the x, y, z of the next rank. Raise them one dimension
    # so that we can make a comparison matrix again x_min, x_max, ...
    cond = df['rank'] == r + 1
    if not cond.any():
        continue
    x, y, z = df.loc[cond, ['x','y','z']].to_numpy().T[:, :, None]

    # Condition to drop a row
    drop = (
        (x_min <= x) & (x <= x_max) &
        (y_min <= y) & (y <= y_max) &
        (z <= z_max)
    ).any(axis=1)
    df.loc[cond, 'drop'] = drop

# Result
df[~df['drop']]

Condensed

An even more condensed version (and likely faster). This is a really good way to puzzle your future teammates when they read the code:

r, x, y, z = df[['rank', 'x', 'y', 'z']].T.to_numpy()
rr, xx, yy, zz = [col[:,None] for col in [r, x, y, z]]

drop = (
    (rr == r + 1) &
    (x-1 <= xx) & (xx <= x+1) &
    (y-1 <= yy) & (yy <= y+1) &
    (zz <= z*1.1)
).any(axis=1)

# Result
df[~drop]

What this does is comparing every row in df against each other (including itself) and return True (i.e. drop) if:

The current row's rank == the other row's rank + 1; and
The current row's x, y, z fall within the specified range of the other row's x, y, z

answered Oct 16 '22 03:10

Code Different

You need to slightly modify my previous code:

def check_previous_group(rank, d, groups):
    if not rank-1 in groups.groups:
        # check is a previous group exists, else flag all rows False (i.e. not to be dropped)
        return pd.Series(False, index=d.index)

    else:
        # get previous group (rank-1)
        d_prev = groups.get_group(rank-1)

        # get the absolute difference per row with the whole dataset 
        # of the previous group: abs(d_prev-s)
        # if all differences are within 1/1/0.1*z for x/y/z
        # for at least one rows of the previous group
        # then flag the row to be dropped (True)
        return d.apply(lambda s: abs(d_prev-s)[['x', 'y', 'z']].le([1,1,.1*s['z']]).all(1).any(), axis=1)

groups = df.groupby('rank')
mask = pd.concat([check_previous_group(rank, d, groups) for rank,d in groups])
df[~mask]

output:

   rank  x  y    z
0     1  0  0  1.0
1     1  3  4  3.0
2     2  0  0  1.2
5     3  2  5  6.0

answered Oct 16 '22 02:10

mozway

Related questions
                            
                                Why is multiprocessing slower here?
                            
                                function of `with` in `concurrent.futures`
                            
                                How to join strings between parentheses in a list of strings
                            
                                Setting up coc.nvim for python
                            
                                Display Pytorch tensor as image using Matplotlib
                            
                                How to mock `name` attribute with unittest.mock.MagicMock or Mock classes?
                            
                                Attempting to run RPY2 in Python and receiving error 0X7e
                            
                                Seaborn violinplot transparency
                            
                                Contour (iso-z) or threshold lines in seaborn heatmap
                            
                                Return value from list according to index number
                            
                                Can't make a virtual env in PyCharm using a WSL Python interpreter
                            
                                Numpy split array into chunks of equal size with remainder
                            
                                How do you broadcast np.random.choice across each row of a numpy array?
                            
                                How to generate a Blob signed url in Google Cloud Run?
                            
                                In JSON created from a pydantic.BaseModel exclude Optional if not set
                            
                                norm.ppf vs norm.cdf in python's scipy.stats
                            
                                Matplotlib plots not showing in VS Code
                            
                                Find the difference between the max value and 2nd highest value within a subset of pandas columns
                            
                                Find single number in pairs of unique numbers of a Python list in O(lg n)
                            
                                heroku telegram bot, BadRequest: Bad webhook: ip address 0.0.0.0 is reserved

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With