Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to nested np.where in Pandas DataFrame

I have this code (which works) - a bunch of nested conditional statements to set the value in the 'paragenesis1' row of a dataframe (myOxides['cpx']), depending on the values in various other rows of the frame.

I'm very new to python and programming in general. I am thinking that I should write a function to perform this, but how then to apply that function elementwise? This is the only way I have found to avoid the 'truth value of a series is ambiguous' error.

Any help greatly appreciated!

myOxides['cpx'].loc['paragenesis1'] = np.where(
            ((cpxCrOx>=0.5) & (cpxAlOx<=4)),
            "GtPeridA", 
            np.where(
                    ((cpxCrOx>=2.25) & (cpxAlOx<=5)), 
                    "GtPeridB", 
                    np.where(
                            ((cpxCrOx>=0.5)&
                             (cpxCrOx<=2.25)) &
                             ((cpxAlOx>=4) & (cpxAlOx<=6)),
                             "SpLhzA",
                             np.where(
                                     ((cpxCrOx>=0.5) &
                                      (cpxCrOx<=(5.53125 - 
                                                 0.546875 * cpxAlOx))) &
                                      ((cpxAlOx>=4) & 
                                       (cpxAlOx <= ((cpxCrOx - 
                                                     5.53125)/ -0.546875))),
                             "SpLhzB",
                             "Eclogite, Megacryst, Cognate"))))

or;

df.loc['a'] = np.where(
            (some_condition),
            "value", 
            np.where(
                    ((conditon_1) & (condition_2)), 
                    "some_value", 
                    np.where(
                            ((condition_3)& (condition_4)),
                             "some_other_value",
                              np.where(
                                      ((condition_5),
                                        "another_value",
                                        "other_value"))))
like image 690
K. Mather Avatar asked Mar 13 '18 09:03

K. Mather


People also ask

How to perform a similar operation in a pandas Dataframe?

We can perform a similar operation in a pandas DataFrame by using the pandas where () function, but the syntax is slightly different. df ['col'] = (value_if_false).where(condition, value_if_true) The following example shows how to use the pandas where () function in practice.

Is there any library with similar syntax as pandas?

Fortunately many of these libraries have similar syntax as Pandas hence making the learning curve less steep. Dask provides multi-core and distributed parallel execution on larger-than-memory datasets. A Dask DataFrame is a large parallel DataFrame composed of many smaller Pandas DataFrames, split along the index.

What is a DASK Dataframe in Python?

A Dask DataFrame is a large parallel DataFrame composed of many smaller Pandas DataFrames, split along the index. These Pandas DataFrames may live on disk for larger-than-memory computing on a single machine, or on many different machines in a cluster. One Dask DataFrame operation triggers many operations on the constituent Pandas DataFrames.

How to create conditional columns on pandas with NumPy select() and where()?

Creating conditional columns on Pandas with Numpy select () and where () methods 1 Step 1: Combine price lists together and set fruit column as index#N#The first step is to combine all price lists into one... 2 Step 2: Incorporate Numpy select () with Pandas DataFrame More ...


1 Answers

One possible solution is use numpy.select:

m1 = (cpxCrOx>=0.5) & (cpxAlOx<=4)
m2 = (cpxCrOx>=2.25) & (cpxAlOx<=5)
m3 = ((cpxCrOx>=0.5) & (cpxCrOx<=2.25)) & ((cpxAlOx>=4) & (cpxAlOx<=6))
m4 = ((cpxCrOx>=0.5) &(cpxCrOx<=(5.53125 -  0.546875 * cpxAlOx))) & \
     ((cpxAlOx>=4) &  (cpxAlOx <= ((cpxCrOx -  5.53125)/ -0.546875))

vals = [ "GtPeridA", "GtPeridB", "SpLhzA", "SpLhzB"]
default = 'Eclogite, Megacryst, Cognate'

myOxides['paragenesis1'] = np.select([m1,m2,m3,m4], vals, default=default)
like image 148
jezrael Avatar answered Sep 19 '22 06:09

jezrael