consider df
df = pd.DataFrame(np.ones((10, 10)) * 2,
list('abcdefghij'), list('ABCDEFGHIJ'))
df
How can I nullify ~20% of these values at random?
You could use numpy.random.choice to generate a mask:
import numpy as np
mask = np.random.choice([True, False], size=df.shape, p=[.2,.8])
df.mask(mask)
In one line:
df.mask(np.random.choice([True, False], size=df.shape, p=[.2,.8]))
Speed tested using timeit
at ~770μs:
>>> python -m timeit -n 10000
-s "import pandas as pd;import numpy as np;df=pd.DataFrame(np.ones((10,10))*2)"
"df.mask(np.random.choice([True,False], size=df.shape, p=[.2,.8]))"
10000 loops, best of 3: 770 usec per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With