I have problems with getting my plot look like I want it to look using matplotlib. I have aggregated data (Y) as float corresponding to dates (X) as datetime64 format. My data starts on 2019/04/23 and ends on 2019/08/02. Unfortunately, the data is not complete, I'm missing a period between 2019/06/18 and 2019/07/08.
This leads to a straight line between those two dates in my plot, which I want to disappear.
I know one possibility would be to fill up the missing dates and times and the column LEVEL with NaN using pandas. As I read, matplotlib will ignore the the NaN so my problem would be solved? If so, how can I do this in my case? I tried out a lot of code snippets already, e.g. using the DATETIME as a Index (which I would like to avoid in order to not screw up the following code).
Another possibility: Maybe there is a strategy to just suppress this line while plotting without touching the data frame? Not clean, but would be efficient enough. Yes, a scatter plot with points would be a way to avoid the line, but I need it like this.
Here an example of how my dataframe looks like:
DATETIME LEVEL
0 2019-04-23 16:30:00 0.087074
1 2019-04-23 16:35:00 0.093089
2 2019-04-23 16:40:00 0.081103
3 2019-04-23 16:45:00 0.093117
4 2019-04-23 16:50:00 0.093131
5 2019-04-23 16:55:00 0.087145
6 2019-04-23 17:00:00 0.087159
7 2019-04-23 17:05:00 0.087174
8 2019-04-23 17:10:00 0.087188
You can see the line between the two vertical red and green lines, which have another meaning.
Thank you very much for your time and help
Let's say we have your example data frame but with the three rows in the middle missing:
In [65]: df
Out[65]:
DATETIME LEVEL
0 2019-04-23 16:30:00 0.087074
1 2019-04-23 16:35:00 0.093089
2 2019-04-23 16:40:00 0.081103
3 2019-04-23 17:00:00 0.087159
4 2019-04-23 17:05:00 0.087174
5 2019-04-23 17:10:00 0.087188
Now we can fill those missing values by indexing the DataFrame with the DATETIME
column and then resample()
that. Afterwards we kann reset the index again to turn the index back into a normal column again:
In [66]: df.set_index('DATETIME').resample('5min').first().reset_index()
Out[66]:
DATETIME LEVEL
0 2019-04-23 16:30:00 0.087074
1 2019-04-23 16:35:00 0.093089
2 2019-04-23 16:40:00 0.081103
3 2019-04-23 16:45:00 NaN
4 2019-04-23 16:50:00 NaN
5 2019-04-23 16:55:00 NaN
6 2019-04-23 17:00:00 0.087159
7 2019-04-23 17:05:00 0.087174
8 2019-04-23 17:10:00 0.087188
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With