Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read and normalize following json in pandas?

I have seen many json reading problems in stackoverflow using pandas, but still I could not manage to solve this simple problem.

Data

{"session_id":{"0":["X061RFWB06K9V"],"1":["5AZ2X2A9BHH5U"]},"unix_timestamp":{"0":[1442503708],"1":[1441353991]},"cities":{"0":["New York NY, Newark NJ"],"1":["New York NY, Jersey City NJ, Philadelphia PA"]},"user":{"0":[[{"user_id":2024,"joining_date":"2015-03-22","country":"UK"}]],"1":[[{"user_id":2853,"joining_date":"2015-03-28","country":"DE"}]]}}

My attempt

import numpy as np
import pandas as pd
import json
from pandas.io.json import json_normalize

# attempt1
df = pd.read_json('a.json')

# attempt2
with open('a.json') as fi:
    data = json.load(fi)
    df = json_normalize(data,record_path='user',meta=['session_id','unix_timestamp','cities'])

Both of them do not give me the required output.

Required output

      session_id unix_timestamp       cities  user_id joining_date country 
0  X061RFWB06K9V     1442503708  New York NY     2024   2015-03-22      UK   
0  X061RFWB06K9V     1442503708    Newark NJ     2024   2015-03-22      UK 

Preferred method

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html

I would love to see implementation of pd.io.json.json_normalize

pandas.io.json.json_normalize(data: Union[Dict, List[Dict]], record_path: Union[str, List, NoneType] = None, meta: Union[str, List, NoneType] = None, meta_prefix: Union[str, NoneType] = None, record_prefix: Union[str, NoneType] = None, errors: Union[str, NoneType] = 'raise', sep: str = '.', max_level: Union[int, NoneType] = None)

Related links

  • Pandas explode list of dictionaries into rows
  • How to normalize json correctly by Python Pandas
  • JSON to pandas DataFrame
like image 603
BhishanPoudel Avatar asked Dec 03 '22 17:12

BhishanPoudel


1 Answers

Here is another way:

df = pd.read_json(r'C:\path\file.json')

final=df.stack().str[0].unstack()
final=final.assign(cities=final['cities'].str.split(',')).explode('cities')
final=final.assign(**pd.DataFrame(final.pop('user').str[0].tolist()))
print(final)

      session_id unix_timestamp            cities  user_id joining_date  \
0  X061RFWB06K9V     1442503708       New York NY     2024   2015-03-22   
0  X061RFWB06K9V     1442503708         Newark NJ     2024   2015-03-22   
1  5AZ2X2A9BHH5U     1441353991       New York NY     2024   2015-03-22   
1  5AZ2X2A9BHH5U     1441353991    Jersey City NJ     2024   2015-03-22   
1  5AZ2X2A9BHH5U     1441353991   Philadelphia PA     2024   2015-03-22   

  country  
0      UK  
0      UK  
1      UK  
1      UK  
1      UK  
like image 94
anky Avatar answered Dec 09 '22 15:12

anky