I need to create a column which is based on some condition on dask dataframe. In pandas it is fairly straightforward:
ddf['TEST_VAR'] = ['THIS' if x == 200607 else
'NOT THIS' if x == 200608 else
'THAT' if x == 200609 else 'NONE'
for x in ddf['shop_week'] ]
While in dask I have to do same thing like below:
def f(x):
if x == 200607:
y= 'THIS'
elif x == 200608 :
y= 'THAT'
else :
y= 1
return y
ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
ddf1.compute()
Questions:
You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.
The original pandas query took 182 seconds and the optimized Dask query took 19 seconds, which is about 10 times faster. Dask can provide performance boosts over pandas because it can execute common operations in parallel, where pandas is limited to a single core.
So far you have seen how to apply an IF condition by creating a new column. Alternatively, you may store the results under an existing DataFrame column. For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros:
In this article, you have learned how Pandas create DataFrame conditional column by using np.where (), np.select (), DataFrame.apply (), DataFrame.assign (), DataFrame.map (), loc [], mask () method, transform () and lambda functions to create single and multiple functions.
As of version 0.10.2 you can now insert columns directly into dask.dataframes The dask dataframes are mutable by default or they are mutable post release of version 0.10.2? Unfortunately I don't have an answer to the second question or I don't understand it...
df.loc [df [‘column name’] condition, ‘new column name’] = ‘value if condition is met’ Let us create a Pandas DataFrame that has 5 numbers (say from 51 to 55). Let us apply IF conditions for the following situation. If the particular number is equal or lower than 53, then assign the value of ‘True’.
Answers:
What you're doing now is almost ok. You don't need to call compute
until you're ready for your final answer.
# ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
ddf1 = ddf.assign(col1 = ddf.shop_week.apply(f))
For some cases dd.Series.where
might be a good fit
ddf1 = ddf.assign(col1 = ddf.shop_week.where(cond=ddf.balance > 0, other=0))
As of version 0.10.2 you can now insert columns directly into dask.dataframes
ddf['col'] = ddf.shop_week.apply(f)
You could just use:
f = lambda x: 'THIS' if x == 200607 else 'NOT THIS' if x == 200608 else 'THAT' if x == 200609 else 'NONE'
And then:
ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
Unfortunately I don't have an answer to the second question or I don't understand it...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With