I'm parsing a file that has chronologically timestamped data for multiple time series that I would like to parse in python and then use matplotlib to create a single line plot with independent lines for each set of time series data. The data I'm parsing looks something like this:
time label value
1.05 seriesA 3.925
1.09 seriesC 0.245
2.13 seriesB 12.32
2.73 seriesC 4.921
I've parsed the file into a dictionary of lists that contain (time,value) tuples keyed on the series label. I'm struggling with how to get from this to a single line plot with all this data. I want independent lines for seriesA, seriesB, seriesC, etc. on a single plot. Any pointers?
Edit: As requested the dictionary is below. I had a hard time figuring out the best way to store this data so maybe the data structure I'm using is also a problem. The keys below are the different time series labels and the values are a list of (time,value) tuples. In any case, here it is:
{'client1': [(861.991698574, 298189000.0), (862.000768158, 0.0)],
'client2': [(861.781502324, 0.0), (861.78903722, 153600000.0),
(862.281483262, 0.0), (862.289038158, 153600000.0)], 'client3':
[(862.004470762, 3295674368.0), (862.004563939, 3295674368.0),
(862.03981821, 799014912.0), (862.040403314, 1599078400.0),
(862.540269616, 3295674368.0), (862.55133097, 1599078400.0)]}
I like pandas for this type of problem.
First, put the data in a pandas dataframe:
import pandas as pd
data = {'client1': [(861.991698574, 298189000.0), (862.000768158, 0.0)],
'client2': [(861.781502324, 0.0), (861.78903722, 153600000.0),
(862.281483262, 0.0), (862.289038158, 153600000.0)], 'client3':
[(862.004470762, 3295674368.0), (862.004563939, 3295674368.0),
(862.03981821, 799014912.0), (862.040403314, 1599078400.0),
(862.540269616, 3295674368.0), (862.55133097, 1599078400.0)]}
time = []
label = []
value = []
for k, v in data.items():
for tup in v:
label.append(k)
time.append(tup[0])
value.append(tup[1])
df = pd.DataFrame({'time':time, 'label':label, 'value':value})
Resulting in this dataframe:
>>> df
label time value
0 client1 861.991699 2.981890e+08
1 client1 862.000768 0.000000e+00
2 client2 861.781502 0.000000e+00
3 client2 861.789037 1.536000e+08
4 client2 862.281483 0.000000e+00
5 client2 862.289038 1.536000e+08
6 client3 862.004471 3.295674e+09
7 client3 862.004564 3.295674e+09
8 client3 862.039818 7.990149e+08
9 client3 862.040403 1.599078e+09
10 client3 862.540270 3.295674e+09
11 client3 862.551331 1.599078e+09
Then, you can do this:
by_label = df.groupby('label')
for name, group in by_label:
plt.plot(group['time'], group['value'], label=name)
plt.legend()
plt.show
Regarding how you should store your data in a dictionary; There are different ways to go about this, but if I were you, and to be able to use your data easily with pandas, I would use a dictionary of the form:
data = {'label':['client1', 'client1', 'client2', ...],
'time':[time1, time2, time3, ...],
'value':[value1, value2, value3, ...]}
making sure all your lists are ordered in the proper way (index 0 of all 3 keys is row 0 of your dataframe, index 1 is row 1, etc...). Then to import into pandas, all you would need to do is df = pd.DataFrame(data)
Short answer:
Highlight and ctrl+c the data below:
label time value
client1 861.991699 2.981890e+08
client1 862.000768 0.000000e+00
client2 861.781502 0.000000e+00
client2 861.789037 1.536000e+08
client2 862.281483 0.000000e+00
client2 862.289038 1.536000e+08
client3 862.004471 3.295674e+09
client3 862.004564 3.295674e+09
client3 862.039818 7.990149e+08
client3 862.040403 1.599078e+09
client3 862.540270 3.295674e+09
client3 862.551331 1.599078e+09
Then run this snippet:
# imports
import pandas as pd
# read data from the clipboard
df = pd.read_clipboard(sep='\\s+')
# reshape the data to get values by time for each label
df = df.pivot(index='time', columns='label', values='value')
# Replace nans by forward filling existing values
df = df.fillna(method = 'ffill')
# You'll still have to handle the missing values in the beginning of the coloumns
df = df.fillna(method = 'bfill')
# A simple plot:
df.plot()
Then you'll get:

The Details
There are a few confusing elements in this question. If your source data is, as you say, of the form:
time label value
1.05 seriesA 3.925
1.09 seriesC 0.245
2.13 seriesB 12.32
2.73 seriesC 4.921
But the true content of your data is:
{'client1': [(861.991698574, 298189000.0), (862.000768158, 0.0)],
'client2': [(861.781502324, 0.0), (861.78903722, 153600000.0),
(862.281483262, 0.0), (862.289038158, 153600000.0)], 'client3':
[(862.004470762, 3295674368.0), (862.004563939, 3295674368.0),
(862.03981821, 799014912.0), (862.040403314, 1599078400.0),
(862.540269616, 3295674368.0), (862.55133097, 1599078400.0)]}
Then the true content AND form of your data should be:
label time value
client1 861.991699 2.981890e+08
client1 862.000768 0.000000e+00
client2 861.781502 0.000000e+00
client2 861.789037 1.536000e+08
client2 862.281483 0.000000e+00
client2 862.289038 1.536000e+08
client3 862.004471 3.295674e+09
client3 862.004564 3.295674e+09
client3 862.039818 7.990149e+08
client3 862.040403 1.599078e+09
client3 862.540270 3.295674e+09
client3 862.551331 1.599078e+09
In any case, there is absolutely no reason to utilize a dictionary to obtain your
[...]single line plot with all this data. I want independent lines for seriesA, seriesB, seriesC, etc. on a single plot.
I believe the most efficient approach would be Reshaping and Pivot Tables from the pandas docs. From there you can plot the data directly using df.plot().
Highlight and ctrl+c the data above, and you're good to go:
# imports
import pandas as pd
# read data from the clipboard
df = pd.read_clipboard(sep='\\s+')
# reshape the data to get values by time for each label
df = df.pivot(index='time', columns='label', values='value')
print(df)
This should represent the desired form of your data:
label client1 client2 client3
time
861.781502 NaN 0.0 NaN
861.789037 NaN 153600000.0 NaN
861.991699 298189000.0 NaN NaN
862.000768 0.0 NaN NaN
862.004471 NaN NaN 3.295674e+09
862.004564 NaN NaN 3.295674e+09
862.039818 NaN NaN 7.990149e+08
862.040403 NaN NaN 1.599078e+09
862.281483 NaN 0.0 NaN
862.289038 NaN 153600000.0 NaN
862.540270 NaN NaN 3.295674e+09
862.551331 NaN NaN 1.599078e+09
There are still a few issues to be handled given the somewhat peculiar time index. To make this data plot-friendly, we should handle the missing values. This is easily done in the next snippet using df.fillna from the pandas docs:
# Replace nans by forward filling existing values
df = df.fillna(method = 'ffill')
# You'll still have to handle the missing values
# in the beginning of the coloumns
df = df.fillna(method = 'bfill')
Now you'll get a line chart simply by using df.plot():

Edit:
Let me know what your data source is in order to give you a few tips on how to read and store your data. Again, pandas and is most likely the way to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With