Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

After Pandas Dataframe pd.concat I get NaNs

I have three pandas df one of them has been 'row'-shifted and the first element is empty. When I concatenate the three df to obtain a single 3-column dataframe I get all NaN in two out of three columns:

df1:

                    S
2010-12-31         True
2011-01-01        False
2011-01-02        False

df2:

               P
2010-12-31           
2011-01-01    On
2011-01-02    On

df3:

              C
2010-12-31    On
2011-01-01    On
2011-01-02    On

res = pd.concat([df1, df2, df3]):

                    P         C           S
2010-12-31        NaN        NaN         True
2011-01-01        NaN        NaN        False
2011-01-02        NaN        NaN        False

The order seems to be inverted as well...

Many thanks

like image 528
prre72 Avatar asked Mar 20 '14 11:03

prre72


People also ask

Why am I getting NaN in pandas?

In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.

What does PD concat do in pandas?

concat(), you can combine similar datasets from Series, DataFrame and Panel objects within the Pandas library. A Series can appear as a list, array or sequence of data objects and uses the syntax pd. Series() to declare the dataset. A DataFrame object displays tabular data types in rows and columns and uses pd.

How do you fix NaN in Python?

Deleting the row with missing data If there is a certain row with missing data, then you can delete the entire row with all the features in that row. axis=1 is used to drop the column with `NaN` values. axis=0 is used to drop the row with `NaN` values.


1 Answers

In [2]: index = pd.DatetimeIndex(['2010-12-31', '2011-01-01', '2011-01-02'])

In [3]: df1 = pd.DataFrame({'S':[True,False,False]}, index=index)

In [4]: df2 = pd.DataFrame({'P':['','On','On']}, index=index)

In [5]: df3 = pd.DataFrame({'C':['On','On','On']}, index=index)

If your DataFrames are defined as above, then pd.concat with axis=1 should work:

In [7]: pd.concat([df1,df2,df3], axis=1)
Out[7]: 
                S   P   C
2010-12-31   True      On
2011-01-01  False  On  On
2011-01-02  False  On  On

[3 rows x 3 columns]
like image 71
unutbu Avatar answered Sep 27 '22 02:09

unutbu