I would like to convert a JSON to Pandas dataframe.
My JSON looks like: like:
{ 
   "country1":{ 
      "AdUnit1":{ 
         "floor_price1":{ 
            "feature1":1111,
            "feature2":1112
         },
         "floor_price2":{ 
            "feature1":1121
         }
      },
      "AdUnit2":{ 
         "floor_price1":{ 
            "feature1":1211
         },
         "floor_price2":{ 
            "feature1":1221
         }
      }
   },
   "country2":{ 
      "AdUnit1":{ 
         "floor_price1":{ 
            "feature1":2111,
            "feature2":2112
         }
      }
   }
}
I read the file from GCP using this code:
project = Context.default().project_id
sample_bucket_name = 'my_bucket'
sample_bucket_path = 'gs://' + sample_bucket_name
print('Object: ' + sample_bucket_path + '/json_output.json')
sample_bucket = storage.Bucket(sample_bucket_name)
sample_bucket.create()
sample_bucket.exists()
sample_object = sample_bucket.object('json_output.json')
list(sample_bucket.objects())
json = sample_object.read_stream()
My goal to get Pandas dataframe which looks like:

I tried using json_normalize, but didn't succeed.
You can convert JSON to pandas DataFrame by using json_normalize() , read_json() and from_dict() functions. Some of these methods are also used to extract data from JSON files and store them as DataFrame. JSON stands for JavaScript object notation . JSON is used for sharing data between servers and web applications.
pandas read_json() function can be used to read JSON file or string into DataFrame. It supports JSON in several formats by using orient param. JSON is shorthand for JavaScript Object Notation which is the most used file format that is used to exchange data between two systems or web applications.
Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads() method. The result will be a Python dictionary.
Nested JSONs are always quite tricky to handle correctly.
A few months ago, I figured out a way to provide an "universal answer" using the beautifully written flatten_json_iterative_solution from here: which unpacks iteratively each level of a given json.
Then one can simply transform it to a Pandas.Series then Pandas.DataFrame like so:
df = pd.Series(flatten_json_iterative_solution(dict(json_))).to_frame().reset_index()
Intermediate Dataframe result
Some data transformation can easily be performed to split the index in the columns names you asked for:
df[["index", "col1", "col2", "col3", "col4"]] = df['index'].apply(lambda x: pd.Series(x.split('_')))
Final result
You could use this:
def flatten_dict(d):
    """ Returns list of lists from given dictionary """
    l = []
    for k, v in sorted(d.items()):
        if isinstance(v, dict):
            flatten_v = flatten_dict(v)
            for my_l in reversed(flatten_v):
                my_l.insert(0, k)
            l.extend(flatten_v)
        elif isinstance(v, list):
            for l_val in v:
                l.append([k, l_val])
        else:
            l.append([k, v])
    return l
This function receives a dictionary (including nesting where values could also be lists) and flattens it to a list of lists.
Then, you can simply:
df = pd.DataFrame(flatten_dict(my_dict))
Where my_dict is your JSON object.
Taking your example, what you get when you run print(df) is:
          0        1             2         3     4
0  country1  AdUnit1  floor_price1  feature1  1111
1  country1  AdUnit1  floor_price1  feature2  1112
2  country1  AdUnit1  floor_price2  feature1  1121
3  country1  AdUnit2  floor_price1  feature1  1211
4  country1  AdUnit2  floor_price2  feature1  1221
5  country2  AdUnit1  floor_price1  feature1  2111
6  country2  AdUnit1  floor_price1  feature2  2112
And when you create the dataframe, you can name your columns and index
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With