I have a data frame (data_train) with NaN values, A sample is given below:
republican n y
republican n NaN
democrat NaN n
democrat n y
I want to replace all the NaN with some random values like .
republican n y
republican n rnd2
democrat rnd1 n
democrat n y
How do I do it.
I tried the following, but had no luck:
df_rand = pd.DataFrame(np.random.randn(data_train.shape[0],data_train.shape[1]))
data_train[pd.isnull(data_train)] = dfrand[pd.isnull(data_train)]
when I do the above with a dataframe with random numerical data the above script works fine.
We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
You can use the pandas update command, this way:
1) Generate a random DataFrame with the same columns and index as the original one:
import numpy as np; import pandas as pd
M = len(df.index)
N = len(df.columns)
ran = pd.DataFrame(np.random.randn(M,N), columns=df.columns, index=df.index)
2) Then use update
, so that the NaN values in df
will be replaced by the generated random values
df.update(ran)
In the above example I used values from a standard normal, but you can also use values randomly picked from the original DataFrame:
import numpy as np; import pandas as pd
M = len(df.index)
N = len(df.columns)
val = np.ravel(df.values)
val = val[~np.isnan(val)]
val = np.random.choice(val, size=(M,N))
ran = pd.DataFrame(val, columns=df.columns, index=df.index)
df.update(ran)
Well, if you use fillna
to fill the NaN
, a random generator works only once and will fill all N/As with the same number.
So, make sure that a random number is generated and used each time. For a dataframe like this :
Date A B
0 2015-01-01 NaN NaN
1 2015-01-02 NaN NaN
2 2015-01-03 NaN NaN
3 2015-01-04 NaN NaN
4 2015-01-05 NaN NaN
5 2015-01-06 NaN NaN
6 2015-01-07 NaN NaN
7 2015-01-08 NaN NaN
8 2015-01-09 NaN NaN
9 2015-01-10 NaN NaN
10 2015-01-11 NaN NaN
11 2015-01-12 NaN NaN
12 2015-01-13 NaN NaN
13 2015-01-14 NaN NaN
14 2015-01-15 NaN NaN
15 2015-01-16 NaN NaN
I used the following code to fill up the NaNs
in column A:
import random
x['A'] = x['A'].apply(lambda v: random.random() * 1000)
Which will give us something like:
Date A B
0 2015-01-01 96.538211 NaN
1 2015-01-02 404.683392 NaN
2 2015-01-03 849.614253 NaN
3 2015-01-04 590.030660 NaN
4 2015-01-05 203.167519 NaN
5 2015-01-06 980.508258 NaN
6 2015-01-07 221.088002 NaN
7 2015-01-08 285.013762 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With