Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove NaN values from dataframe without fillna or Interpolate

I have a dataset:

           367235   419895  992194
1999-01-11  8   5   1
1999-03-23  NaN 4   NaN
1999-04-30  NaN NaN 1
1999-06-02  NaN 9   NaN
1999-08-08  2   NaN NaN
1999-08-12  NaN 3   NaN
1999-08-17  NaN NaN 10
1999-10-22  NaN 3   NaN
1999-12-04  NaN NaN 4
2000-03-04  2   NaN NaN
2000-09-29  9   NaN NaN
2000-09-30  9   NaN NaN

When I plot it, using plt.plot(df, '-o') I get this:

output from plotting dataframe

But what I would like is for the datapoints from each column to be connected in a line, like so:

desired output from plotting dataframe

I understand that matplotlib does not connect datapoints that are separate by NaN values. I looked at all the options here for dealing with missing data, but all of them would essentially misrepresent the data in the dataframe. This is because each value within the dataframe represents an incident; if I try to replace the NaNs with scalar values or use the interpolate option, I get a bunch of points that are not actually in my dataset. Here's what interpolate looks like:

df_wanted2 = df.apply(pd.Series.interpolate)

enter image description here

If I try to use dropna I'll lose entire rows\columns from the dataframe, and these rows hold valuable data.

Does anyone know a way to connect up my dots? I suspect I need to extract individual arrays from the datasframe and plot them, as is the advice given here, but this seems like a lot of work (and my actual dataframe is much bigger.) Does anyone have a solution?

like image 641
oymonk Avatar asked Dec 20 '16 22:12

oymonk


Video Answer


1 Answers

use interpolate method with parameter 'index'

df.interpolate('index').plot(marker='o')

enter image description here

alternative answer

plot after iteritems

for _, c in df.iteritems():
    c.dropna().plot(marker='o')

enter image description here


extra credit
only interpolate from first valid index to last valid index for each column

for _, c in df.iteritems():
    fi, li = c.first_valid_index(), c.last_valid_index()
    c.loc[fi:li].interpolate('index').plot(marker='o')

enter image description here

like image 122
piRSquared Avatar answered Oct 05 '22 06:10

piRSquared