I am trying to do some descriptives on my own location data that I got from my Google timeline. But when trying to get some workable data, to convert it from a JSON-file to a DataFrame. It brought up some questions that I would like to have some answers to because when trying to convert the JSON-file to a DataFrame it felt for me that I was going to do it in an inefficient way.
To give a description of what my JSON looks like. It is a JSON of 3 levels deep and has around 4.5 million lines. A small example of the JSON:
"locations" : [
{
"timestampMs" : "1489591483",
"latitudeE7" : -21.61909,
"longitudeE7" : 121.65283,
"accuracy" : 23,
"velocity" : 18,
"heading" : 182,
"altitude" : 55,
"activity" : [ {
"timestampMs" : "1489591507",
"activity" : [ {
"type" : "IN_VEHICLE",
"confidence" : 49
}, {
"type" : "UNKNOWN",
"confidence" : 17
}, {
"type" : "ON_BICYCLE",
"confidence" : 15
}, {
"type" : "ON_FOOT",
"confidence" : 9
}, {
"type" : "STILL",
"confidence" : 9
}, {
"type" : "WALKING",
"confidence" : 9
} ]
} ]
},
...
]
To convert it to a DataFrame I want to flatten those 3 levels down to 0 levels. I have seen some implementations with json_normalize in combination with .apply or .append but therefore you still needed to know the key to the value, which I would have rather seen to be more generic (so without knowing the key). And it also required to manually iterate over the values. Now what I would like to know is: "Is there a method that automatically flattens the JSON down to 0 levels without using apply or append?" If there isn't such a method, what would be the preferred way of flatting JSON and converting it to a DataFrame?
Edit: Added an example of what the DataFrame should look like and a better example of the JSON.
To give a small example of what the DataFrame should look like, see the image below:
To include a better example of what the JSON looks like I have included a Pastebin URL below: tiny location history sample
Using an iterative approach to flatten deeply nested JSON The idea is that we scan each element in the JSON file and unpack just one level if the element is nested. We keep iterating until all values are atomic elements (no dictionary or list).
Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables. Parameters: data – dict or list of dicts. errors – {'raise', 'ignore'}, default 'raise'
Approach to flatten JSON: There are many ways to flatten JSON. There is one recursive way and another by using the json-flatten library. Now we can flatten the dictionary array by a recursive approach which is quite easy to understand. The recursive approach is a bit slower than using the json-flatten library.
Use json_normalize
, specifying the record_path
and meta_path
.
df = pd.io.json.json_normalize(d, ['locations', 'activity', 'activity'],
['locations', ['locations', 'activity', 'timestampMs']])
df = df.drop('locations', 1).add_prefix('activity.')
v = pd.DataFrame(df['locations'].tolist()).drop('activity', 1)
pd.concat([df, v], 1)
activity.confidence activity.type activity.locations.activity.timestampMs \
0 49 IN_VEHICLE 1489591507
1 17 UNKNOWN 1489591507
2 15 ON_BICYCLE 1489591507
3 9 ON_FOOT 1489591507
4 9 STILL 1489591507
5 9 WALKING 1489591507
accuracy altitude heading latitudeE7 longitudeE7 timestampMs velocity
0 23 55 182 -21.61909 121.65283 1489591483 18
1 23 55 182 -21.61909 121.65283 1489591483 18
2 23 55 182 -21.61909 121.65283 1489591483 18
3 23 55 182 -21.61909 121.65283 1489591483 18
4 23 55 182 -21.61909 121.65283 1489591483 18
5 23 55 182 -21.61909 121.65283 1489591483 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With