Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How to convert an array of json objects to a Dataframe?

I'm completely new to python. And I need a little help to be able to filter my JSON.

json = { 
    "selection":[ 
         {
          "person_id":105894,
          "position_id":1,
          "label":"Work",
          "description":"A description",
          "startDate":"2017-07-16T19:20:30+01:00",
          "stopDate":"2017-07-16T20:20:30+01:00"
          },
          {
         "person_id":945123,
         "position_id":null,
         "label":"Illness",
         "description":"A description",
         "startDate":"2017-07-17T19:20:30+01:00",
         "stopDate":"2017-07-17T20:20:30+01:00"
         }
       ]
     }

Concretely what I'm trying to do is to transform my JSON (here above) into a Dataframe to be able to use the query methods on it, like:

selected_person_id = 105894
query_person_id = json[(json['person_id'] == selected_person_id)]
or
json.query('person_id <= 105894')

The columns must be:

cols = ['person_id', 'position_id', 'label', 'description', 'startDate', 'stopDate']

How can I do it ?

like image 505
Louis W. Avatar asked Oct 24 '17 12:10

Louis W.


People also ask

How do you convert an array to a DataFrame in Python?

How do you convert an array to a DataFrame in Python? To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) .

Can we create DataFrame from array?

Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.

Can you convert a list to DataFrame in Python?

We can create data frames using lists in the dictionary.


2 Answers

Use:

df = pd.DataFrame(json['selection'])
print (df)
     description    label  person_id  position_id                  startDate  \
0  A description     Work     105894          1.0  2017-07-16T19:20:30+01:00   
1  A description  Illness     945123          NaN  2017-07-17T19:20:30+01:00   

                    stopDate  
0  2017-07-16T20:20:30+01:00  
1  2017-07-17T20:20:30+01:00  

EDIT:

import json

with open('file.json') as data_file:    
    json = json.load(data_file)
like image 139
jezrael Avatar answered Oct 06 '22 01:10

jezrael


for more complicated examples where a flattening of the structure is neeeded use json_normalize:

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...          'info': {
...               'governor': 'Rick Scott'
...          },
...          'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {
...               'governor': 'John Kasich'
...          },
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> from pandas.io.json import json_normalize
>>> result = json_normalize(data, 'counties', ['state', 'shortname',
...                                           ['info', 'governor']])
>>> result
         name  population info.governor    state shortname
0        Dade       12345    Rick Scott  Florida        FL
1     Broward       40000    Rick Scott  Florida        FL
2  Palm Beach       60000    Rick Scott  Florida        FL
3      Summit        1234   John Kasich     Ohio        OH
4    Cuyahoga        1337   John Kasich     Ohio        OH
like image 30
Nickpick Avatar answered Oct 05 '22 23:10

Nickpick