Suppose I made a groupby on the valgdata DataFrame like below:
grouped_valgdata = valgdata.groupby(['news_site','dato_uden_tid']).mean()
Now I get this:
sentiment
news_site dato_uden_tid
dr.dk 2015-06-15 54.777183
2015-06-16 54.703167
2015-06-17 54.948775
2015-06-18 54.424881
2015-06-19 53.290554
eb.dk 2015-06-15 53.279251
2015-06-16 53.285643
2015-06-17 53.558753
2015-06-18 52.854750
2015-06-19 54.415988
jp.dk 2015-06-15 56.590428
2015-06-16 55.313752
2015-06-17 53.771377
2015-06-18 53.218408
2015-06-19 54.392638
pol.dk 2015-06-15 54.759532
2015-06-16 55.182641
2015-06-17 55.001800
2015-06-18 56.004326
2015-06-19 54.649052
Now I want to make a timeseries for each of the news_site, where dato_uden_tid is on the X axis and sentiment is on Y axis.
What is the best and easiest way to accomplish that?
Thank you!
Here is a solution using Pandas and Matplotlib with more fine-grained control.
First, I provided below a function that generates a random dataframe for testing. Importantly, it creates three columns that generalize to more abstract problems:
my_timestamp
is a datetime
column containing timestampsmy_series
is the string label to which you want to apply the groupby
my_value
is a numeric value recorded for my_series
at time my_timestamp
Replace the column names with whatever dataframe that you have.
def generate_random_data(N=100):
'''
Returns a dataframe with N rows of random data.
'''
list_of_lists = []
labels = ['foo', 'bar', 'baz']
epoch = 1515617110
for _ in range(N):
key = random.choice(labels)
value = 0
if key == 'foo':
value = random.randint(1, 10)
elif key == 'bar':
value = random.randint(50, 60)
else:
value = random.randint(80, 90)
epoch += random.randint(5000, 30000)
row = [key, epoch, value]
list_of_lists.append(row)
df = pd.DataFrame(list_of_lists, columns=['my_series', 'epoch', 'my_value'])
df['my_timestamp'] = pd.to_datetime(df['epoch'], unit='s')
df = df[['my_timestamp', 'my_series', 'my_value']]
#df.set_index('ts', inplace=True)
return df
Here is some example data that was generated:
Now, the following code will run the groupby
and plot a nice time series graph.
def plot_gb_time_series(df, ts_name, gb_name, value_name, figsize=(20,7), title=None):
'''
Runs groupby on Pandas dataframe and produces a time series chart.
Parameters:
----------
df : Pandas dataframe
ts_name : string
The name of the df column that has the datetime timestamp x-axis values.
gb_name : string
The name of the df column to perform group-by.
value_name : string
The name of the df column for the y-axis.
figsize : tuple of two integers
Figure size of the resulting plot, e.g. (20, 7)
title : string
Optional title
'''
xtick_locator = DayLocator(interval=1)
xtick_dateformatter = DateFormatter('%m/%d/%Y')
fig, ax = plt.subplots(figsize=figsize)
for key, grp in df.groupby([gb_name]):
ax = grp.plot(ax=ax, kind='line', x=ts_name, y=value_name, label=key, marker='o')
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_dateformatter)
ax.autoscale_view()
ax.legend(loc='upper left')
_ = plt.xticks(rotation=90, )
_ = plt.grid()
_ = plt.xlabel('')
_ = plt.ylim(0, df[value_name].max() * 1.25)
_ = plt.ylabel(value_name)
if title is not None:
_ = plt.title(title)
_ = plt.show()
Here is an example invocation:
df = generate_random_data()
plot_gb_time_series(df, 'my_timestamp', 'my_series', 'my_value',
figsize=(10, 5), title="Random data")
And here is the resulting time series plot:
(Am a bit amused, as this question caught me doing the exact same thing.)
You could do something like
valgdata\
.groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\
.mean()\
.unstack()
which would
reverse the groupby
unstack the new sites to be columns
To plot, just do the previous snippet immediately followed by .plot()
:
valgdata\
.groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\
.mean()\
.unstack()\
.plot()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With