Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot pandas dataframe containing NaNs

Tags:

I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009).

This is a subset of the data (the main dataset is 3487235 rows...):

                    R2          R7         R8 1235.000000 116.321959  100.805197  96.519977 1235.000116 NaN         100.771133  96.234957 1235.000231 NaN         100.584559  97.249262 1235.000347 118.823610  100.169055  96.777833 1235.000463 NaN         99.753551   96.598350 1235.000579 NaN         99.338048   95.283989 1235.000694 113.995003  98.922544   95.154067 

The dataframe has form:

 Index: 6071320 entries, 127.67291667 to 1338.51805556 Data columns: R2    3487235  non-null values R7    3875864  non-null values R8    1092430  non-null values dtypes: float64(3) 

R2 sampled at a different rate to R7 and R8 hence the NaNs which appear systematically at that spacing.

Trying df.plot() to plot the whole dataframe (or indexed row locations thereof) works fine in terms of plotting R7 and R8, but doesn't plot R2. Similarly, just doing df.R2.plot() also doesn't work. The only way to plot R2 is to do df.R2.dropna().plot(), but this also removes NaNs which signify periods of no data (rather than just a coarser sampling frequency than the other receivers).

Has anyone else come across this? Any ideas on the problem would be gratefully received :)

like image 786
ajt Avatar asked Nov 28 '12 10:11

ajt


People also ask

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.


1 Answers

The reason your not seeing anything is because the default plot style is only a line. But the line gets interupted at NaN's so only multiple consequtive values will be plotted. And the latter doesnt happen in your case. You need to change the style of plotting, which depends on what you want to see.

For starters, try adding:

.plot(marker='o') 

That should make all data points appear as circles. It easily gets cluttered so adjusting markersize, edgecolor etc might be usefull. Im not fully adjusted to how Pandas is using matplotlib so i often switch to matplotlib myself if plots get more complicated, eg:

plt.plot(df.R2.index.to_pydatetime(), df.R2, 'o-') 
like image 144
Rutger Kassies Avatar answered Oct 04 '22 04:10

Rutger Kassies