I have a data frame (data_train) with NaN values, A sample is given below:
republican                n                          y   
republican                n                          NaN   
democrat                 NaN                         n
democrat                  n                          y   
I want to replace all the NaN with some random values like .
republican                n                           y   
republican                n                          rnd2
democrat                 rnd1                         n
democrat                  n                           y   
How do I do it.
I tried the following, but had no luck:
df_rand = pd.DataFrame(np.random.randn(data_train.shape[0],data_train.shape[1]))
data_train[pd.isnull(data_train)] = dfrand[pd.isnull(data_train)]
when I do the above with a dataframe with random numerical data the above script works fine.
We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
You can use the pandas update command, this way:
1) Generate a random DataFrame with the same columns and index as the original one:
import numpy as np; import pandas as pd
M = len(df.index)
N = len(df.columns)
ran = pd.DataFrame(np.random.randn(M,N), columns=df.columns, index=df.index)
2) Then use update, so that the NaN values in df will be replaced by the generated random values
df.update(ran)
In the above example I used values from a standard normal, but you can also use values randomly picked from the original DataFrame:
import numpy as np; import pandas as pd
M = len(df.index)
N = len(df.columns)
val = np.ravel(df.values)
val = val[~np.isnan(val)]
val = np.random.choice(val, size=(M,N))
ran = pd.DataFrame(val, columns=df.columns, index=df.index)
df.update(ran)
                        Well, if you use fillna to fill the NaN, a random generator works only once and will fill all N/As with the same number. 
So, make sure that a random number is generated and used each time. For a dataframe like this :
          Date         A       B
0   2015-01-01       NaN     NaN
1   2015-01-02       NaN     NaN
2   2015-01-03       NaN     NaN
3   2015-01-04       NaN     NaN
4   2015-01-05       NaN     NaN
5   2015-01-06       NaN     NaN
6   2015-01-07       NaN     NaN
7   2015-01-08       NaN     NaN
8   2015-01-09       NaN     NaN
9   2015-01-10       NaN     NaN
10  2015-01-11       NaN     NaN
11  2015-01-12       NaN     NaN
12  2015-01-13       NaN     NaN
13  2015-01-14       NaN     NaN
14  2015-01-15       NaN     NaN
15  2015-01-16       NaN     NaN
I used the following code to fill up the NaNs in column A:
import random
x['A'] = x['A'].apply(lambda v: random.random() * 1000)
Which will give us something like:
          Date           A       B
0   2015-01-01   96.538211     NaN
1   2015-01-02  404.683392     NaN
2   2015-01-03  849.614253     NaN
3   2015-01-04  590.030660     NaN
4   2015-01-05  203.167519     NaN
5   2015-01-06  980.508258     NaN
6   2015-01-07  221.088002     NaN
7   2015-01-08  285.013762     NaN
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With