How to plot and work with NaN values in matplotlib

Tags:

I have hourly data consisting of a number of columns. First column is a date (date_log), and the rest of columns contain different sample points. The trouble is sample points are logged using different time even on hourly basis, so every column has at least a couple of NaN. If I plot up using the first code it works nicely, but I want to have gaps where there no logger data for a day or so and do not want the points to be joined. If I use the second code I can see the gaps but due to NaN points the data points are not getting joined. In the example below, I’m just plotting the first three columns.

When there is a big gap like the blue points (01/06-01/07/2015) I want to have a gap then the points getting joined. The second example does not join the points. I like the first chart but I want to create gaps like the second method when there are no sample data points for 24h date range etc. leaving missing data points for longer times as a gap.

Is there any work around? Thanks

Method-1:

Log_1a_mask = np.isfinite(Log_1a) # Log_1a is column 2 data points Log_1b_mask = np.isfinite(Log_1b) # Log_1b is column 3 data points  plt.plot_date(date_log[Log_1a_mask], Log_1a[Log_1a_mask], linestyle='-', marker='',color='r',) plt.plot_date(date_log[Log_1b_mask], Log_1b[Log_1b_mask], linestyle='-', marker='', color='b') plt.show()

Method-2:

plt.plot_date(date_log, Log_1a, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN plt.plot_date(date_log, Log_1b, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN plt.show()

Method-1 output: enter image description here

Method-2 output: enter image description here

584

asked Apr 06 '16 15:04

Curtis

1 Answers

If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaNs) that you want filled and larger gaps that you don't.

Using `pandas` to "forward-fill" gaps

One option is to use pandas fillna with a limited amount of fill values.

As a quick example of how this works:

In [1]: import pandas as pd; import numpy as np  In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])  In [3]: x.fillna(method='ffill', limit=1) Out[3]: 0     1 1     1 2     2 3     2 4   NaN 5     3 6     3 7   NaN 8   NaN 9     4 dtype: float64  In [4]: x.fillna(method='ffill', limit=2) Out[4]: 0     1 1     1 2     2 3     2 4     2 5     3 6     3 7     3 8   NaN 9     4 dtype: float64

As an example of using this for something similar to your case:

import pandas as pd import numpy as np import matplotlib.pyplot as plt np.random.seed(1977)  x = np.random.normal(0, 1, 1000).cumsum()  # Set every third value to NaN x[::3] = np.nan  # Set a few bigger gaps... x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan  # Use pandas with a limited forward fill # You may want to adjust the `limit` here. This will fill 2 nan gaps. filled = pd.Series(x).fillna(limit=2, method='ffill')  # Let's plot the results fig, axes = plt.subplots(nrows=2, sharex=True) axes[0].plot(x, color='lightblue') axes[1].plot(filled, color='lightblue')  axes[0].set(ylabel='Original Data') axes[1].set(ylabel='Filled Data')  plt.show()

enter image description here

Using `numpy` to interpolate gaps

Alternatively, we can do this using only numpy. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.

Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.

As an example, let's define an interpolate_gaps function:

def interpolate_gaps(values, limit=None):     """     Fill gaps using linear interpolation, optionally only fill gaps up to a     size of `limit`.     """     values = np.asarray(values)     i = np.arange(values.size)     valid = np.isfinite(values)     filled = np.interp(i, i[valid], values[valid])      if limit is not None:         invalid = ~valid         for n in range(1, limit+1):             invalid[:-n] &= invalid[n:]         filled[invalid] = np.nan      return filled

Note that we'll get interpolated value, unlike the previous pandas version:

In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]  In [12]: interpolate_gaps(values, limit=1) Out[12]: array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,         3.        ,         nan,         nan,  3.75      ,  4.        ])

In the plotting example, if we replace the line:

filled = pd.Series(x).fillna(limit=2, method='ffill')

With:

filled = interpolate_gaps(x, limit=2)

We'll get a visually identical plot:

enter image description here

As a complete, stand-alone example:

import numpy as np import matplotlib.pyplot as plt np.random.seed(1977)  def interpolate_gaps(values, limit=None):     """     Fill gaps using linear interpolation, optionally only fill gaps up to a     size of `limit`.     """     values = np.asarray(values)     i = np.arange(values.size)     valid = np.isfinite(values)     filled = np.interp(i, i[valid], values[valid])      if limit is not None:         invalid = ~valid         for n in range(1, limit+1):             invalid[:-n] &= invalid[n:]         filled[invalid] = np.nan      return filled  x = np.random.normal(0, 1, 1000).cumsum()  # Set every third value to NaN x[::3] = np.nan  # Set a few bigger gaps... x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan  # Interpolate small gaps using numpy filled = interpolate_gaps(x, limit=2)  # Let's plot the results fig, axes = plt.subplots(nrows=2, sharex=True) axes[0].plot(x, color='lightblue') axes[1].plot(filled, color='lightblue')  axes[0].set(ylabel='Original Data') axes[1].set(ylabel='Filled Data')  plt.show()

Note: I originally completely mis-read the question. See version history for my original answer.

114

answered Sep 28 '22 02:09

Joe Kington

Related questions
                            
                                Accessing a dict by variable in Django templates?
                            
                                Add an object to a python list
                            
                                Driving a Windows GUI program from a script
                            
                                EOFError: EOF when reading a line
                            
                                Django REST Framework: raise error when extra fields are present on POST
                            
                                Does spark predicate pushdown work with JDBC?
                            
                                Load pickled object in different file - Attribute error
                            
                                How to run python interactive in current file's directory in Visual Studio Code?
                            
                                Parallelism in Python
                            
                                Converting a Mercurial (hg) repository to Git on Windows (7)
                            
                                When is a python object's hash computed and why is the hash of -1 different?
                            
                                Descriptors as instance attributes in python
                            
                                Python, WSGI, multiprocessing and shared data
                            
                                Handle exception in __init__
                            
                                PyCharm noinspection for whole file?
                            
                                Python official installer missing python27.dll
                            
                                Why is numpy.power 60x slower than in-lining?
                            
                                What does Thread Local Objects mean in Flask?
                            
                                How to use super() with one argument?
                            
                                Python: How to turn a dictionary of Dataframes into one big dataframe with column names being the key of the previous dict?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to plot and work with NaN values in matplotlib

Tags:

python

matplotlib

Curtis

People also ask

1 Answers

Using `pandas` to "forward-fill" gaps

Using `numpy` to interpolate gaps

Joe Kington

Recent Activity

Donate For Us

How to plot and work with NaN values in matplotlib

Tags:

python

matplotlib

Curtis

People also ask

1 Answers

Using pandas to "forward-fill" gaps

Using numpy to interpolate gaps

Joe Kington

Related questions

Recent Activity

Donate For Us

Using `pandas` to "forward-fill" gaps

Using `numpy` to interpolate gaps