I have a Pandas Dataframe with a DateTimeIndex and columns with hourly objects and I would like to transform and output a single column into a JSON file composed of an array of daily arrays of hourly values.
A simple example:
If I have the Dataframe:
In [106]:
rng = pd.date_range('1/1/2011 01:00:00', periods=12, freq='H')
df = pd.DataFrame(randn(12, 1), index=rng, columns=['A'])
In [107]:
df
Out[107]:
A
2011-01-01 01:00:00 -0.067214
2011-01-01 02:00:00 0.820595
2011-01-01 03:00:00 0.442557
2011-01-01 04:00:00 -1.000434
2011-01-01 05:00:00 -0.760783
2011-01-01 06:00:00 -0.106619
2011-01-01 07:00:00 0.786618
2011-01-01 08:00:00 0.144663
2011-01-01 09:00:00 -1.455017
2011-01-01 10:00:00 0.865593
2011-01-01 11:00:00 1.289754
2011-01-01 12:00:00 0.601067
I would like this json file:
[
[-0.0672138259,0.8205950583,0.4425568167,-1.0004337373,-0.7607833867,-0.1066187698,0.7866183048,0.1446634381,-1.4550165851,0.8655931982,1.2897541164,0.6010672247]
]
My actual dataframe is many days longer therefore would roughly look like this:
[
[value@hour1day1, [email protected]@hour24day1],
[value@hour1day2, [email protected]@hour24day2],
[value@hour1day3, [email protected]@hour24day3],
....
[value@hour1LastDay, [email protected]@hour24LastDay]
]
To convert the object to a JSON string, then use the Pandas DataFrame. to_json() function. Pandas to_json() is an inbuilt DataFrame function that converts the object to a JSON string. To export pandas DataFrame to a JSON file, then use the to_json() function.
You can convert pandas DataFrame to JSON string by using DataFrame. to_json() method. This method takes a very important param orient which accepts values ' columns ', ' records ', ' index ', ' split ', ' table ', and ' values '.
Pandas read_json()This API from Pandas helps to read JSON data and works great for already flattened data like we have in our Example 1. You can download the JSON from here. Just reading the JSON converted it into a flat table below.
import json
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011 01:00:00', periods=12, freq='H')
df = pd.DataFrame(np.random.randn(12, 1), index=rng, columns=['A'])
print json.dumps(df.T.as_matrix().tolist(),indent=4)
out:
[
[
-0.6916923670267555,
0.23075256008033393,
1.2390943452146521,
-0.9421708175530891,
-1.4622768586461448,
-0.3973987276444045,
-0.04983495806442656,
-1.9139530636627042,
1.9562147260518052,
-0.8296105620697014,
0.2888681009437529,
-2.3943000262784424
]
]
Or as a full example with multiple days, using groupby
functionality:
rng = pd.date_range('1/1/2011 01:00:00', periods=48, freq='H')
df = pd.DataFrame(np.random.randn(48, 1), index=rng, columns=['A'])
grouped = df.groupby(lambda x: x.day)
data = [group['A'].values.tolist() for day, group in grouped]
print json.dumps(data, indent=4)
out:
[
[
-0.8939584996681688,
...
-1.1332895023662326
],
[
-0.1514553673781838,
...
-1.8380494963443343
],
[
-1.8342085568898159
]
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With