Python Pandas Dataframe fill NaN values

Tags:

I am trying to fill NaN values in a dataframe with values coming from a standard normal distribution. This is currently my code:

 sqlStatement = "select * from sn.clustering_normalized_dataset"
 df = psql.frame_query(sqlStatement, cnx)
 data=df.pivot("user","phrase","tfw")
 dfrand = pd.DataFrame(data=np.random.randn(data.shape[0],data.shape[1]))
 data[np.isnan(data)] = dfrand[np.isnan(data)]

After pivoting the dataframe 'data' it looks like that:

phrase      aaron  abbas  abdul       abe  able  abroad       abu     abuse  \
user                                                                          
14233664      NaN    NaN    NaN       NaN   NaN     NaN       NaN       NaN   
52602716      NaN    NaN    NaN       NaN   NaN     NaN       NaN       NaN   
123456789     NaN    NaN    NaN       NaN   NaN     NaN       NaN       NaN   
500158258     NaN    NaN    NaN       NaN   NaN     NaN       NaN       NaN   
517187571     0.4    NaN    NaN  0.142857     1     0.4  0.181818       NaN

However, I need that each NaN value will be replaced with a new random value. So I created a new df consists of only random values (dfrand) and then trying to swap the missing numbers (Nan) by the values from dfrand corresponding to indices of the NaN's. Well - unfortunately it doesn't work - Although the expression

 np.isnan(data)

returns a dataframe consists of True and False values, the expression

  dfrand[np.isnan(data)]

return only NaN values so the overall trick doesn't work. Any ideas what the issue ?

884

asked Dec 16 '14 14:12

user4045430

1 Answers

Three-thousand columns is not so many. How many rows do you have? You could always make a random dataframe of the same size and do a logical replacement (the size of your dataframe will dictate whether this is feasible or not.

if you know the size of your dataframe:

import pandas as pd
import numpy as np

# create random dummy dataframe
dfrand = pd.DataFrame(data=np.random.randn(rows,cols))

# import "real" dataframe
data = pd.read_csv(etc.) # or however you choose to read it in

# replace nans
data[np.isnan(data)] = dfrand[np.isnan(data)]

if you do not know the size of your dataframe, just shuffle things around

import pandas as pd
import numpy as np



# import "real" dataframe
data = pd.read_csv(etc.) # or however you choose to read it in

# create random dummy dataframe
dfrand = pd.DataFrame(data=np.random.randn(data.shape[0],data.shape[1]))

# replace nans
data[np.isnan(data)] = dfrand[np.isnan(data)]

EDIT Per "users" last comment: "dfrand[np.isnan(data)] returns NaN only."

Right! And that is exactly what you wanted. In my solution I have: data[np.isnan(data)] = dfrand[np.isnan(data)]. Translated, this means: take the randomly-generated value from dfrand that corresponds to the NaN-location within "data" and insert it in "data" where "data" is NaN. An example will help:

a = pd.DataFrame(data=np.random.randint(0,100,(10,3)))
a[0][5] = np.nan

In [32]: a
Out[33]: 
    0   1   2
0   2  26  28
1  14  79  82
2  89  32  59
3  65  47  31
4  29  59  15
5 NaN  58  90
6  15  66  60
7  10  19  96
8  90  26  92
9   0  19  23

# define randomly-generated dataframe, much like what you are doing, and replace NaN's
b = pd.DataFrame(data=np.random.randint(0,100,(10,3)))

In [39]: b
Out[39]: 
    0   1   2
0  92  21  55
1  65  53  89
2  54  98  97
3  48  87  79
4  98  38  62
5  46  16  30
6  95  39  70
7  90  59   9
8  14  85  37
9  48  29  46


a[np.isnan(a)] = b[np.isnan(a)]

In [38]: a
Out[38]: 
    0   1   2
0   2  26  28
1  14  79  82
2  89  32  59
3  65  47  31
4  29  59  15
5  46  58  90
6  15  66  60
7  10  19  96
8  90  26  92
9   0  19  23

As you can see, all NaN's in have been replaced with the randomly-generated value in based on 's nan-value indices.

answered Oct 24 '22 05:10

tnknepp

Related questions
                            
                                mrjob: setup logging on EMR
                            
                                How to get the mode for string variable when resampling with pandas
                            
                                Leaking TarInfo objects
                            
                                Check if string exists in a text file
                            
                                python augmented assignment for boolean operators
                            
                                numpy.polyfit gives empty residuals array
                            
                                How to plot error bars in polar coordinates in python?
                            
                                Faster way to calculate hexagon grid coordinates
                            
                                Why am I getting a python ImportError: No module named html_parser?
                            
                                How to debug basic issues configuring django to be served with apache and mod-wsgi?
                            
                                Matplotlib: change background color of colorbar (when using transparent colors)
                            
                                Making columns and ordering consistent in a Pandas DataFrame
                            
                                In Python, what is a method_descriptor?
                            
                                move values of 3D array knowing new coordinates with mask
                            
                                Is there a way to protect built-ins in python?
                            
                                Python Pandas: How fill date ranges in a multiindex
                            
                                Matplotlib simple case memory leak with pandas [closed]
                            
                                Install package which has setup_requires from local source distributions
                            
                                matplotlib: How can you specify colour levels in a 2D historgram
                            
                                Equivalent of Javascript "match" in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas Dataframe fill NaN values

Tags:

python

random

pandas

dataframe

nan

user4045430

People also ask

1 Answers

tnknepp

Recent Activity

Donate For Us