I have a dataset:
367235 419895 992194
1999-01-11 8 5 1
1999-03-23 NaN 4 NaN
1999-04-30 NaN NaN 1
1999-06-02 NaN 9 NaN
1999-08-08 2 NaN NaN
1999-08-12 NaN 3 NaN
1999-08-17 NaN NaN 10
1999-10-22 NaN 3 NaN
1999-12-04 NaN NaN 4
2000-03-04 2 NaN NaN
2000-09-29 9 NaN NaN
2000-09-30 9 NaN NaN
When I plot it, using plt.plot(df, '-o')
I get this:
But what I would like is for the datapoints from each column to be connected in a line, like so:
I understand that matplotlib does not connect datapoints that are separate by NaN values. I looked at all the options here for dealing with missing data, but all of them would essentially misrepresent the data in the dataframe. This is because each value within the dataframe represents an incident; if I try to replace the NaNs with scalar values or use the interpolate option, I get a bunch of points that are not actually in my dataset. Here's what interpolate looks like:
df_wanted2 = df.apply(pd.Series.interpolate)
If I try to use dropna
I'll lose entire rows\columns from the dataframe, and these rows hold valuable data.
Does anyone know a way to connect up my dots? I suspect I need to extract individual arrays from the datasframe and plot them, as is the advice given here, but this seems like a lot of work (and my actual dataframe is much bigger.) Does anyone have a solution?
use interpolate
method with parameter 'index'
df.interpolate('index').plot(marker='o')
alternative answer
plot
after iteritems
for _, c in df.iteritems():
c.dropna().plot(marker='o')
extra credit
only interpolate from first valid index to last valid index for each column
for _, c in df.iteritems():
fi, li = c.first_valid_index(), c.last_valid_index()
c.loc[fi:li].interpolate('index').plot(marker='o')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With