Trying to replicate Pandas Functionality in Databricks-Koalas In Pandas:
df = pd.DataFrame({'a': [450, 1, 26],
'b': [1, 450, 70],
})
thresh = [x for x in range(26)] # create a list 1 to 25
df["c"] = np.where((df.a.isin(thresh) | df.b.isin(thresh)), 1, 0) # find the values within the threshold and flag column 'c'
df
# returns
Out[32]:
a b c
0 450 1 1
1 1 450 1
2 26 70 0
In Koalas:
df = ks.DataFrame({'a': [450, 1, 26],
'b': [1, 450, 70],
})
thresh = [x for x in range(26)] # create a list 1 to 25
df = df.assign(c=np.where((df.a.isin(thresh) | df.b.isin(thresh)), 1, 0)) # find the values within the threshold and flag column 'c'
# returns
PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.
How do I properly use to_numpy as it is expecting or wrap the Numpy result in a ks.Series() so that the assign() will take the result?
df = df.assign(c=ks.Series(np.where((df.a.isin(thresh) | df.b.isin(thresh)), 1, 0))) gives the same error as above.
Is there a way to replicate the pandas functionality in the koalas?
To perform the operation you do here in a ks.DataFrame, you don't need np.where, but you could use astype:
df = df.assign(c= (df.a.isin(thresh) | df.b.isin(thresh)).astype(int) )
df
a b c
0 450 1 1
1 1 450 1
2 26 70 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With