I have three pandas df one of them has been 'row'-shifted and the first element is empty. When I concatenate the three df to obtain a single 3-column dataframe I get all NaN in two out of three columns:
df1:
S
2010-12-31 True
2011-01-01 False
2011-01-02 False
df2:
P
2010-12-31
2011-01-01 On
2011-01-02 On
df3:
C
2010-12-31 On
2011-01-01 On
2011-01-02 On
res = pd.concat([df1, df2, df3]):
P C S
2010-12-31 NaN NaN True
2011-01-01 NaN NaN False
2011-01-02 NaN NaN False
The order seems to be inverted as well...
Many thanks
In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.
concat(), you can combine similar datasets from Series, DataFrame and Panel objects within the Pandas library. A Series can appear as a list, array or sequence of data objects and uses the syntax pd. Series() to declare the dataset. A DataFrame object displays tabular data types in rows and columns and uses pd.
Deleting the row with missing data If there is a certain row with missing data, then you can delete the entire row with all the features in that row. axis=1 is used to drop the column with `NaN` values. axis=0 is used to drop the row with `NaN` values.
In [2]: index = pd.DatetimeIndex(['2010-12-31', '2011-01-01', '2011-01-02'])
In [3]: df1 = pd.DataFrame({'S':[True,False,False]}, index=index)
In [4]: df2 = pd.DataFrame({'P':['','On','On']}, index=index)
In [5]: df3 = pd.DataFrame({'C':['On','On','On']}, index=index)
If your DataFrames are defined as above, then pd.concat
with axis=1
should work:
In [7]: pd.concat([df1,df2,df3], axis=1)
Out[7]:
S P C
2010-12-31 True On
2011-01-01 False On On
2011-01-02 False On On
[3 rows x 3 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With