Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill up missing datetime with NaN or supress straight line in line plot

I have problems with getting my plot look like I want it to look using matplotlib. I have aggregated data (Y) as float corresponding to dates (X) as datetime64 format. My data starts on 2019/04/23 and ends on 2019/08/02. Unfortunately, the data is not complete, I'm missing a period between 2019/06/18 and 2019/07/08.

This leads to a straight line between those two dates in my plot, which I want to disappear.

I know one possibility would be to fill up the missing dates and times and the column LEVEL with NaN using pandas. As I read, matplotlib will ignore the the NaN so my problem would be solved? If so, how can I do this in my case? I tried out a lot of code snippets already, e.g. using the DATETIME as a Index (which I would like to avoid in order to not screw up the following code).

Another possibility: Maybe there is a strategy to just suppress this line while plotting without touching the data frame? Not clean, but would be efficient enough. Yes, a scatter plot with points would be a way to avoid the line, but I need it like this.

Here an example of how my dataframe looks like:

      DATETIME     LEVEL
0     2019-04-23 16:30:00  0.087074
1     2019-04-23 16:35:00  0.093089
2     2019-04-23 16:40:00  0.081103
3     2019-04-23 16:45:00  0.093117
4     2019-04-23 16:50:00  0.093131
5     2019-04-23 16:55:00  0.087145
6     2019-04-23 17:00:00  0.087159
7     2019-04-23 17:05:00  0.087174
8     2019-04-23 17:10:00  0.087188

You can see the line between the two vertical red and green lines, which have another meaning.

enter image description here

Thank you very much for your time and help

like image 969
psalterium Avatar asked Aug 07 '19 13:08

psalterium


1 Answers

Let's say we have your example data frame but with the three rows in the middle missing:

In [65]: df
Out[65]: 
             DATETIME     LEVEL
0 2019-04-23 16:30:00  0.087074
1 2019-04-23 16:35:00  0.093089
2 2019-04-23 16:40:00  0.081103
3 2019-04-23 17:00:00  0.087159
4 2019-04-23 17:05:00  0.087174
5 2019-04-23 17:10:00  0.087188

Now we can fill those missing values by indexing the DataFrame with the DATETIME column and then resample() that. Afterwards we kann reset the index again to turn the index back into a normal column again:

In [66]: df.set_index('DATETIME').resample('5min').first().reset_index()
Out[66]: 
             DATETIME     LEVEL
0 2019-04-23 16:30:00  0.087074
1 2019-04-23 16:35:00  0.093089
2 2019-04-23 16:40:00  0.081103
3 2019-04-23 16:45:00       NaN
4 2019-04-23 16:50:00       NaN
5 2019-04-23 16:55:00       NaN
6 2019-04-23 17:00:00  0.087159
7 2019-04-23 17:05:00  0.087174
8 2019-04-23 17:10:00  0.087188
like image 155
BlackJack Avatar answered Nov 19 '22 04:11

BlackJack