Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest and generic way to flatten deeply nested JSON to get a Dataframe?

I am trying to do some descriptives on my own location data that I got from my Google timeline. But when trying to get some workable data, to convert it from a JSON-file to a DataFrame. It brought up some questions that I would like to have some answers to because when trying to convert the JSON-file to a DataFrame it felt for me that I was going to do it in an inefficient way.

To give a description of what my JSON looks like. It is a JSON of 3 levels deep and has around 4.5 million lines. A small example of the JSON:

"locations" : [ 
{
  "timestampMs" : "1489591483",
  "latitudeE7" : -21.61909,
  "longitudeE7" : 121.65283,
  "accuracy" : 23,
  "velocity" : 18,
  "heading" : 182,
  "altitude" : 55,
  "activity" : [ {
    "timestampMs" : "1489591507",
    "activity" : [ {
      "type" : "IN_VEHICLE",
      "confidence" : 49
    }, {
      "type" : "UNKNOWN",
      "confidence" : 17
    }, {
      "type" : "ON_BICYCLE",
      "confidence" : 15
    }, {
      "type" : "ON_FOOT",
      "confidence" : 9
    }, {
      "type" : "STILL",
      "confidence" : 9
    }, {
      "type" : "WALKING",
      "confidence" : 9
    } ]
  } ]
},
...
]

To convert it to a DataFrame I want to flatten those 3 levels down to 0 levels. I have seen some implementations with json_normalize in combination with .apply or .append but therefore you still needed to know the key to the value, which I would have rather seen to be more generic (so without knowing the key). And it also required to manually iterate over the values. Now what I would like to know is: "Is there a method that automatically flattens the JSON down to 0 levels without using apply or append?" If there isn't such a method, what would be the preferred way of flatting JSON and converting it to a DataFrame?


Edit: Added an example of what the DataFrame should look like and a better example of the JSON.


To give a small example of what the DataFrame should look like, see the image below: An example of DataFrame

To include a better example of what the JSON looks like I have included a Pastebin URL below: tiny location history sample

like image 470
user3473161 Avatar asked Nov 18 '17 15:11

user3473161


People also ask

How do I flatten a deep nested JSON object?

Using an iterative approach to flatten deeply nested JSON The idea is that we scan each element in the JSON file and unpack just one level if the element is nested. We keep iterating until all values are atomic elements (no dictionary or list).

How do I flatten nested JSON in a data frame?

Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables. Parameters: data – dict or list of dicts. errors – {'raise', 'ignore'}, default 'raise'

How do you flatten JSON in Python?

Approach to flatten JSON: There are many ways to flatten JSON. There is one recursive way and another by using the json-flatten library. Now we can flatten the dictionary array by a recursive approach which is quite easy to understand. The recursive approach is a bit slower than using the json-flatten library.


1 Answers

Use json_normalize, specifying the record_path and meta_path.

df = pd.io.json.json_normalize(d, ['locations', 'activity', 'activity'], 
                         ['locations', ['locations', 'activity', 'timestampMs']])
df = df.drop('locations', 1).add_prefix('activity.')
v = pd.DataFrame(df['locations'].tolist()).drop('activity', 1)    

pd.concat([df, v], 1)


   activity.confidence activity.type activity.locations.activity.timestampMs  \
0                   49    IN_VEHICLE                              1489591507   
1                   17       UNKNOWN                              1489591507   
2                   15    ON_BICYCLE                              1489591507   
3                    9       ON_FOOT                              1489591507   
4                    9         STILL                              1489591507   
5                    9       WALKING                              1489591507   

   accuracy  altitude  heading  latitudeE7  longitudeE7 timestampMs  velocity  
0        23        55      182   -21.61909    121.65283  1489591483        18  
1        23        55      182   -21.61909    121.65283  1489591483        18  
2        23        55      182   -21.61909    121.65283  1489591483        18  
3        23        55      182   -21.61909    121.65283  1489591483        18  
4        23        55      182   -21.61909    121.65283  1489591483        18  
5        23        55      182   -21.61909    121.65283  1489591483        18  
like image 60
cs95 Avatar answered Sep 28 '22 03:09

cs95