Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create an if-else condition column in dask dataframe

I need to create a column which is based on some condition on dask dataframe. In pandas it is fairly straightforward:

ddf['TEST_VAR'] = ['THIS' if x == 200607 else  
              'NOT THIS' if x == 200608 else 
              'THAT' if x == 200609 else 'NONE'  
              for x in ddf['shop_week'] ]

While in dask I have to do same thing like below:

def f(x):
    if x == 200607:
         y= 'THIS'
    elif x == 200608 :
         y= 'THAT'
    else :
         y= 1 
    return y

ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
ddf1.compute()

Questions:

  1. Is there a better/more straightforward way to achieve it?
  2. I can't modify the first dataframe ddf, i need to create ddf1 to se the changes is dask dataframe Immutable object?
like image 543
Puneet Tripathi Avatar asked Jul 27 '16 09:07

Puneet Tripathi


People also ask

How do I create a conditional column in pandas?

You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.

Is DASK faster than pandas?

The original pandas query took 182 seconds and the optimized Dask query took 19 seconds, which is about 10 times faster. Dask can provide performance boosts over pandas because it can execute common operations in parallel, where pandas is limited to a single core.

How to apply an IF condition in a Dataframe?

So far you have seen how to apply an IF condition by creating a new column. Alternatively, you may store the results under an existing DataFrame column. For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros:

How to create Dataframe conditional column in pandas?

In this article, you have learned how Pandas create DataFrame conditional column by using np.where (), np.select (), DataFrame.apply (), DataFrame.assign (), DataFrame.map (), loc [], mask () method, transform () and lambda functions to create single and multiple functions.

Is it possible to insert columns directly into a DASK Dataframe?

As of version 0.10.2 you can now insert columns directly into dask.dataframes The dask dataframes are mutable by default or they are mutable post release of version 0.10.2? Unfortunately I don't have an answer to the second question or I don't understand it...

How do I assign a value if condition is met in pandas?

df.loc [df [‘column name’] condition, ‘new column name’] = ‘value if condition is met’ Let us create a Pandas DataFrame that has 5 numbers (say from 51 to 55). Let us apply IF conditions for the following situation. If the particular number is equal or lower than 53, then assign the value of ‘True’.


2 Answers

Answers:

  1. What you're doing now is almost ok. You don't need to call compute until you're ready for your final answer.

    # ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
    ddf1 = ddf.assign(col1 = ddf.shop_week.apply(f))
    

    For some cases dd.Series.where might be a good fit

    ddf1 = ddf.assign(col1 = ddf.shop_week.where(cond=ddf.balance > 0, other=0))
    
  2. As of version 0.10.2 you can now insert columns directly into dask.dataframes

    ddf['col'] = ddf.shop_week.apply(f)
    
like image 171
MRocklin Avatar answered Sep 28 '22 02:09

MRocklin


You could just use:

f = lambda x: 'THIS' if x == 200607 else 'NOT THIS' if x == 200608 else 'THAT' if x == 200609 else 'NONE'

And then:

ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))

Unfortunately I don't have an answer to the second question or I don't understand it...

like image 22
Ohumeronen Avatar answered Sep 28 '22 03:09

Ohumeronen