I converted a JSON into <code>DataFrame</code> and ended up with a column 'Structure_value' having below values as list of dictionary/dictionaries: <pre class="prettyprint"><code> Structure_value [{'Room': 6, 'Length': 7}, {'Room': 6, 'Length': 7}] [{'Room': 6, 'Length': 22}] [{'Room': 6, 'Length': 8}, {'Room': 6, 'Length': 9}] </code></pre> Since it is an object so I guess it ended up in this format. I need to split it into below four columns: Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 Structure_value_length_2 All other solutions on StackOverflow only deal with converting Simple JSON into DataFrame and not the nested structure. P.S.: I know I can do something by explicitly naming fields but I need a generic solution so that in future any JSON of this format can be handled [Edit]: The output should look like this: <pre class="prettyprint"><code> Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 \ 0 6 7 6.0 1 6 22 NaN 2 6 8 6.0 Structure_value_length_2 0 7.0 1 NaN 2 9.0 </code></pre>

Use list comprehension with nested dictionary comprehension with enumerate for deduplicate keys of dicts, last pass list of dictionaries to <code>DataFrame</code> constructor: <pre class="prettyprint"><code>L = [ {f"{k}_{i}": v for i, y in enumerate(x, 1) for k, v in y.items()} for x in df["Structure_value"] ] df = pd.DataFrame(L) print(df) Room_1 Length_1 Room_2 Length_2 0 6 7 6.0 7.0 1 6 22 NaN NaN 2 6 8 6.0 9.0 </code></pre> For columns names from question use: <pre class="prettyprint"><code>def json_to_df(df, column): L = [ {f"{column}_{k.lower()}_{i}": v for i, y in enumerate(x, 1) for k, v in y.items()} for x in df[column] ] return pd.DataFrame(L) df1 = json_to_df(df, 'Structure_value') print(df1) Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 \ 0 6 7 6.0 1 6 22 NaN 2 6 8 6.0 Structure_value_length_2 0 7.0 1 NaN 2 9.0 </code></pre>

How to convert nested json structure to dataframe

Tags:

python

json

python-3.x

pandas

dataframe

I converted a JSON into DataFrame and ended up with a column 'Structure_value' having below values as list of dictionary/dictionaries:

                   Structure_value
[{'Room': 6, 'Length': 7}, {'Room': 6, 'Length': 7}]
[{'Room': 6, 'Length': 22}]
[{'Room': 6, 'Length': 8}, {'Room': 6, 'Length': 9}]

Since it is an object so I guess it ended up in this format.

I need to split it into below four columns:

Structure_value_room_1
Structure_value_length_1
Structure_value_room_2
Structure_value_length_2

All other solutions on StackOverflow only deal with converting Simple JSON into DataFrame and not the nested structure.

P.S.: I know I can do something by explicitly naming fields but I need a generic solution so that in future any JSON of this format can be handled

[Edit]: The output should look like this:

   Structure_value_room_1  Structure_value_length_1  Structure_value_room_2  \
0                       6                         7                     6.0   
1                       6                        22                     NaN   
2                       6                         8                     6.0   

   Structure_value_length_2  
0                       7.0  
1                       NaN  
2                       9.0

202

asked Nov 11 '19 12:11

IceBurger

1 Answers

Use list comprehension with nested dictionary comprehension with enumerate for deduplicate keys of dicts, last pass list of dictionaries to DataFrame constructor:

L = [ {f"{k}_{i}": v for i, y in enumerate(x, 1) 
                     for k, v in y.items()}
                     for x in df["Structure_value"] ]
df = pd.DataFrame(L)
print(df)

   Room_1  Length_1  Room_2  Length_2
0       6         7     6.0       7.0
1       6        22     NaN       NaN
2       6         8     6.0       9.0

For columns names from question use:

def json_to_df(df, column):

    L = [ {f"{column}_{k.lower()}_{i}": v for i, y in enumerate(x, 1) 
                         for k, v in y.items()}
                         for x in df[column] ]
    return pd.DataFrame(L)


df1 = json_to_df(df, 'Structure_value')
print(df1)
   Structure_value_room_1  Structure_value_length_1  Structure_value_room_2  \
0                       6                         7                     6.0   
1                       6                        22                     NaN   
2                       6                         8                     6.0   

   Structure_value_length_2  
0                       7.0  
1                       NaN  
2                       9.0

193

answered Oct 14 '22 20:10

jezrael

Related questions
                            
                                New/override SQLAlchemy operator compiler output
                            
                                Why is the python client not receiving SSE events?
                            
                                Using simple averaging for reinforcment learning
                            
                                How to pass an intermediate amount of data to a subprocess?
                            
                                Keras functional API and TensorFlow Hub
                            
                                Plotly Chloropleth combined with ScatterGeo
                            
                                Can't import tensorflow.keras in VS Code
                            
                                Google Cloud Platform - Deploy a Cloud Function that starts a webdriver
                            
                                write spark dataframe as array of json (pyspark)
                            
                                filtering a Pandas DataFrame using dictionary
                            
                                How do I install hunspell on windows10?
                            
                                Finding the proper Python type hint, for instance, the signature of the built-in function map()
                            
                                Why am I getting "An error ocurred while starting the kernel" in Spyder while running Python?
                            
                                Python Setuptools and PBR - how to create a package release using the git tag as the version?
                            
                                Delete row/column from Excel with xlsxwriter
                            
                                Bert Embedding Layer raises `Type Error: unsupported operand type(s) for +: 'None Type' and 'int'` with BiLSTM
                            
                                How to build TensorFlow lite with select TensorFlow ops for x86_64 systems?
                            
                                How to extract data from a Tweepy object into a pandas dataframe?
                            
                                Generate a column based on a constraint in pandas
                            
                                Why does my Streamlit application open multiple times?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With