I've started using pandas
to do some aggregation by date. My goal is to count all of the instances of a measurement that occur on a particular day, and to then represent this in D3
. To illustrate my workflow, I have a queryset (from Django
) that looks like this:
queryset = [{'created':"05-16-13", 'counter':1, 'id':13}, {'created':"05-16-13", 'counter':1, 'id':34}, {'created':"05-17-13", 'counter':1, 'id':12}, {'created':"05-16-13", 'counter':1, 'id':7}, {'created':"05-18-13", 'counter':1, 'id':6}]
I make a dataframe in pandas
and aggregate the measure 'counter' by the day created:
import pandas as pd
queryset_df = pd.DataFrame.from_records(queryset).set_index('id')
aggregated_df = queryset_df.groupby('created').sum()
This gives me a dataframe like this:
counter
created
05-16-13 3
05-17-13 1
05-18-13 1
As I'm using D3
I thought that a JSON
object would be the most useful. Using the Pandas
to_json()
function I convert my dataframe like this:
aggregated_df.to_json()
giving me the following JSON
object
{"counter":{"05-16-13":3,"05-17-13":1,"05-18-13":1}}
This is not exactly what I want, as I would like to be able to access both the date, and the measurement. Is there a way that I can export the data such that I end up with something like this?
data = {"c1":{"date":"05-16-13", "counter":3},"c2":{"date":"05-17-13", "counter":1}, "c3":{"date":"05-18-13", "counter":1}}
I thought that if I could structure this differently on the Python
side, it would reduce the amount of data formatting I would need to do on the JS
side as I planned to load the data doing something like this:
x.domain(d3.extent(data, function(d) { return d.date; }));
y.domain(d3.extent(data, function(d) { return d.counter; }));
I'm very open to suggestions of better workflows overall as this is something I will need to do frequently but am unsure of the best way of handling the connection between D3
and pandas
. (I have looked at several packages that combine both python
and D3
directly, but that is not something that I am looking for as they seem to focus on static chart generation and not making an svg)
Transform your date index back into a simple data column with reset_index
, and then generate your json object by using the orient='index'
property:
In [11]: aggregated_df.reset_index().to_json(orient='index')
Out[11]: '{"0":{"created":"05-16-13","counter":3},"1":{"created":"05-17-13","counter":1},"2":{"created":"05-18-13","counter":1}}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With