Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NaN values when new column added to pandas DataFrame

I'm trying to generate a new column in a pandas DataFrame that equals values in another pandas DataFrame. When I attempt to create the new column I just get NaNs for the new column values.

First I use an API call to get some data, and the 'mydata' DataFrame is one column of data indexed by dates

mydata = Quandl.get(["YAHOO/INDEX_MXX.4"],                     trim_start="2001-04-01", trim_end="2014-03-31",                     collapse="monthly") 

The next DataFrame I get from a CSV with the following code, and it contains many columns of data with the same number of rows as 'mydata'

DWDATA = pandas.DataFrame.from_csv("filename",                                    header=0,                                    sep=',',                                    index_col=0,                                    parse_dates=True,                                    infer_datetime_format=True) 

I then try to generate the new column like this:

DWDATA['MXX'] = mydata.iloc[:,0] 

Again, I just get NaN values. Can someone help me understand why it's doing this and how to resolve? From what I've read it looks like I might have something wrong with my indexes. The indexes are dates in each DataFrame, but 'mydata' have end-of-month dates while 'DWDATA' has beginning-of-month dates.

like image 570
gtnbz2nyt Avatar asked Oct 06 '14 17:10

gtnbz2nyt


People also ask

How do you fill columns with a NaN values in pandas?

pandas. DataFrame. fillna() method is used to fill column (one or multiple columns) contains NA/NaN/None with 0, empty, blank or any specified values e.t.c. NaN is considered a missing value.

How pandas handle DataFrame NaN values?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


1 Answers

Because the indexes are not exactly equal, NaNs will result. Either one or both of the indexes must be changed to match. Example:

mydata = mydata.set_index(DWDATA.index) 

The above will change the index of the 'mydata' DataFrame to match the index of the 'DWDATA' DataFrame.

Since the number of rows are exactly equal for the two DataFrames, you can also just pass the values of 'mydata' to the new 'DWDATA' column:

DWDATA['MXX'] = mydata.iloc[:,0].values 
like image 154
gtnbz2nyt Avatar answered Sep 21 '22 14:09

gtnbz2nyt