What is the fastest and generic way to flatten deeply nested JSON to get a Dataframe?

Tags:

I am trying to do some descriptives on my own location data that I got from my Google timeline. But when trying to get some workable data, to convert it from a JSON-file to a DataFrame. It brought up some questions that I would like to have some answers to because when trying to convert the JSON-file to a DataFrame it felt for me that I was going to do it in an inefficient way.

To give a description of what my JSON looks like. It is a JSON of 3 levels deep and has around 4.5 million lines. A small example of the JSON:

"locations" : [ 
{
  "timestampMs" : "1489591483",
  "latitudeE7" : -21.61909,
  "longitudeE7" : 121.65283,
  "accuracy" : 23,
  "velocity" : 18,
  "heading" : 182,
  "altitude" : 55,
  "activity" : [ {
    "timestampMs" : "1489591507",
    "activity" : [ {
      "type" : "IN_VEHICLE",
      "confidence" : 49
    }, {
      "type" : "UNKNOWN",
      "confidence" : 17
    }, {
      "type" : "ON_BICYCLE",
      "confidence" : 15
    }, {
      "type" : "ON_FOOT",
      "confidence" : 9
    }, {
      "type" : "STILL",
      "confidence" : 9
    }, {
      "type" : "WALKING",
      "confidence" : 9
    } ]
  } ]
},
...
]

To convert it to a DataFrame I want to flatten those 3 levels down to 0 levels. I have seen some implementations with json_normalize in combination with .apply or .append but therefore you still needed to know the key to the value, which I would have rather seen to be more generic (so without knowing the key). And it also required to manually iterate over the values. Now what I would like to know is: "Is there a method that automatically flattens the JSON down to 0 levels without using apply or append?" If there isn't such a method, what would be the preferred way of flatting JSON and converting it to a DataFrame?

Edit: Added an example of what the DataFrame should look like and a better example of the JSON.

To give a small example of what the DataFrame should look like, see the image below: An example of DataFrame

To include a better example of what the JSON looks like I have included a Pastebin URL below: tiny location history sample

470

asked Nov 18 '17 15:11

user3473161

1 Answers

Use json_normalize, specifying the record_path and meta_path.

df = pd.io.json.json_normalize(d, ['locations', 'activity', 'activity'], 
                         ['locations', ['locations', 'activity', 'timestampMs']])
df = df.drop('locations', 1).add_prefix('activity.')
v = pd.DataFrame(df['locations'].tolist()).drop('activity', 1)    

pd.concat([df, v], 1)


   activity.confidence activity.type activity.locations.activity.timestampMs  \
0                   49    IN_VEHICLE                              1489591507   
1                   17       UNKNOWN                              1489591507   
2                   15    ON_BICYCLE                              1489591507   
3                    9       ON_FOOT                              1489591507   
4                    9         STILL                              1489591507   
5                    9       WALKING                              1489591507   

   accuracy  altitude  heading  latitudeE7  longitudeE7 timestampMs  velocity  
0        23        55      182   -21.61909    121.65283  1489591483        18  
1        23        55      182   -21.61909    121.65283  1489591483        18  
2        23        55      182   -21.61909    121.65283  1489591483        18  
3        23        55      182   -21.61909    121.65283  1489591483        18  
4        23        55      182   -21.61909    121.65283  1489591483        18  
5        23        55      182   -21.61909    121.65283  1489591483        18

answered Sep 28 '22 03:09

cs95

Related questions
                            
                                User defined function on pandas dataframe
                            
                                How do I format all the cells in an excel to a single style using openpyxl?
                            
                                Identify if there are two of the same character adjacent to eachother
                            
                                How to get a list of modules imported by a python module
                            
                                Why are some python variables uppercase whereas others are lowercase?
                            
                                Testing if a certain number is within a list of ranges
                            
                                NLTK words vs word_tokenize
                            
                                Python/socket: How to send a file to another computer which is on a different network?
                            
                                Scrapy file download how to use custom filename
                            
                                Python: point on a line closest to third point
                            
                                pySerial AttributeError: module 'serial' has no attribute 'Serial'
                            
                                pandas to_Datetime conversion with timezone aware index
                            
                                Django rest framework filtering by dateTime
                            
                                Django combine multiple querysets (same model)
                            
                                To use Turbolinks 5 with Django, how can I automate inclusion of the Turbolinks-Location header when using redirect()?
                            
                                How to add a Numpy Array to a dictionary
                            
                                ValueError: operands could not be broadcast together with shapes (5,) (30,)
                            
                                Pandas, are there any faster ways to update values?
                            
                                How to display Greek letters in Axis labels when plotting with Altair and Jupyter?
                            
                                Sensitivity Analysis using PyFMI - FMU in for-loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the fastest and generic way to flatten deeply nested JSON to get a Dataframe?

Tags:

python

json

python-3.x

pandas

dataframe

user3473161

People also ask

1 Answers

cs95

Recent Activity

Donate For Us