Pandas read nested json

Tags:

I am curious how I can use pandas to read nested json of the following structure:

{     "number": "",     "date": "01.10.2016",     "name": "R 3932",     "locations": [         {             "depTimeDiffMin": "0",             "name": "Spital am Pyhrn Bahnhof",             "arrTime": "",             "depTime": "06:32",             "platform": "2",             "stationIdx": "0",             "arrTimeDiffMin": "",             "track": "R 3932"         },         {             "depTimeDiffMin": "0",             "name": "Windischgarsten Bahnhof",             "arrTime": "06:37",             "depTime": "06:40",             "platform": "2",             "stationIdx": "1",             "arrTimeDiffMin": "1",             "track": ""         },         {             "depTimeDiffMin": "",             "name": "Linz/Donau Hbf",             "arrTime": "08:24",             "depTime": "",             "platform": "1A-B",             "stationIdx": "22",             "arrTimeDiffMin": "1",             "track": ""         }     ] }

This here keeps the array as json. I would rather prefer it to be expanded into columns.

pd.read_json("/myJson.json", orient='records')

edit

Thanks for the first answers. I should refine my question: A flattening of the nested attributes in the array is not mandatory. It would be ok to just [A, B, C] concatenate the df.locations['name'].

My file contains multiple JSON objects (1 per line) I would like to keep number, date, name, and locations column. However, I would need to join the locations.

allLocations = "" isFirst = True for location in result.locations:     if isFirst:         isFirst = False         allLocations = location['name']     else:         allLocations += "; " + location['name'] allLocations

My approach here does not seem to be efficient / pandas style.

916

asked Nov 14 '16 12:11

Georg Heiler

1 Answers

You can use json_normalize:

import json  with open('myJson.json') as data_file:         data = json.load(data_file)    df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'],                      record_prefix='locations_') print (df)   locations_arrTime locations_arrTimeDiffMin locations_depTime  \ 0                                                        06:32    1             06:37                        1             06:40    2             08:24                        1                         locations_depTimeDiffMin           locations_name locations_platform  \ 0                        0  Spital am Pyhrn Bahnhof                  2    1                        0  Windischgarsten Bahnhof                  2    2                                    Linz/Donau Hbf               1A-B       locations_stationIdx locations_track number    name        date   0                    0          R 3932         R 3932  01.10.2016   1                    1                         R 3932  01.10.2016   2                   22                         R 3932  01.10.2016

EDIT:

You can use read_json with parsing name by DataFrame constructor and last groupby with apply join:

df = pd.read_json("myJson.json") df.locations = pd.DataFrame(df.locations.values.tolist())['name'] df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index() print (df)         date    name number                                          locations 0 2016-01-10  R 3932         Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...

104

answered Sep 21 '22 23:09

jezrael

Related questions
                            
                                Is there a unicode-ready substitute I can use for urllib.quote and urllib.unquote in Python 2.6.5?
                            
                                Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?
                            
                                How to import a module from a different folder?
                            
                                Inconsistent behaviour between dict.values() and dict.keys() equality in Python 3.x and Python 2.7
                            
                                Partially transparent scatter plot, but with a solid color bar
                            
                                Dealing with duplicate primary keys on insert in SQLAlchemy (declarative style)
                            
                                Specifying optional dependencies in pypi python setup.py
                            
                                how to filter duplicate requests based on url in scrapy
                            
                                Merge two python pandas data frames of different length but keep all rows in output data frame
                            
                                How to draw a rectangle over a specific region in a matplotlib graph
                            
                                upgade python version using pip
                            
                                Pip install from pypi works, but from testpypi fails (cannot find requirements)
                            
                                Booleans have two possible values. Are there types that have three possible values? [duplicate]
                            
                                Manually set color of points in legend
                            
                                Random number in the range 1 to sys.maxsize is always 1 mod 2^10
                            
                                What's the difference between KFold and ShuffleSplit CV?
                            
                                Detect text area in an image using python and opencv
                            
                                Catch all routes for Flask [duplicate]
                            
                                list exported functions from dll with ctypes
                            
                                Remove name, dtype from pandas output of dataframe or series

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas read nested json

Tags:

python

json

pandas

parsing

edit

Georg Heiler

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us