Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert JSON file to Pandas dataframe

I would like to convert a JSON to Pandas dataframe.

My JSON looks like: like:

{ 
   "country1":{ 
      "AdUnit1":{ 
         "floor_price1":{ 
            "feature1":1111,
            "feature2":1112
         },
         "floor_price2":{ 
            "feature1":1121
         }
      },
      "AdUnit2":{ 
         "floor_price1":{ 
            "feature1":1211
         },
         "floor_price2":{ 
            "feature1":1221
         }
      }
   },
   "country2":{ 
      "AdUnit1":{ 
         "floor_price1":{ 
            "feature1":2111,
            "feature2":2112
         }
      }
   }
}

I read the file from GCP using this code:

project = Context.default().project_id
sample_bucket_name = 'my_bucket'
sample_bucket_path = 'gs://' + sample_bucket_name
print('Object: ' + sample_bucket_path + '/json_output.json')

sample_bucket = storage.Bucket(sample_bucket_name)
sample_bucket.create()
sample_bucket.exists()

sample_object = sample_bucket.object('json_output.json')
list(sample_bucket.objects())
json = sample_object.read_stream()

My goal to get Pandas dataframe which looks like:

Given dataframe

I tried using json_normalize, but didn't succeed.

like image 221
Alexandr Fruman Avatar asked Nov 04 '19 13:11

Alexandr Fruman


People also ask

How do I convert JSON data to pandas?

You can convert JSON to pandas DataFrame by using json_normalize() , read_json() and from_dict() functions. Some of these methods are also used to extract data from JSON files and store them as DataFrame. JSON stands for JavaScript object notation . JSON is used for sharing data between servers and web applications.

Can Panda read JSON file?

pandas read_json() function can be used to read JSON file or string into DataFrame. It supports JSON in several formats by using orient param. JSON is shorthand for JavaScript Object Notation which is the most used file format that is used to exchange data between two systems or web applications.

Can you convert JSON to Python?

Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads() method. The result will be a Python dictionary.


2 Answers

Nested JSONs are always quite tricky to handle correctly.

A few months ago, I figured out a way to provide an "universal answer" using the beautifully written flatten_json_iterative_solution from here: which unpacks iteratively each level of a given json.

Then one can simply transform it to a Pandas.Series then Pandas.DataFrame like so:

df = pd.Series(flatten_json_iterative_solution(dict(json_))).to_frame().reset_index()

Intermediate Dataframe result

Some data transformation can easily be performed to split the index in the columns names you asked for:

df[["index", "col1", "col2", "col3", "col4"]] = df['index'].apply(lambda x: pd.Series(x.split('_')))

Final result

like image 169
Luc Bertin Avatar answered Sep 30 '22 09:09

Luc Bertin


You could use this:

def flatten_dict(d):
    """ Returns list of lists from given dictionary """
    l = []
    for k, v in sorted(d.items()):
        if isinstance(v, dict):
            flatten_v = flatten_dict(v)
            for my_l in reversed(flatten_v):
                my_l.insert(0, k)

            l.extend(flatten_v)

        elif isinstance(v, list):
            for l_val in v:
                l.append([k, l_val])

        else:
            l.append([k, v])

    return l

This function receives a dictionary (including nesting where values could also be lists) and flattens it to a list of lists.

Then, you can simply:

df = pd.DataFrame(flatten_dict(my_dict))

Where my_dict is your JSON object. Taking your example, what you get when you run print(df) is:

          0        1             2         3     4
0  country1  AdUnit1  floor_price1  feature1  1111
1  country1  AdUnit1  floor_price1  feature2  1112
2  country1  AdUnit1  floor_price2  feature1  1121
3  country1  AdUnit2  floor_price1  feature1  1211
4  country1  AdUnit2  floor_price2  feature1  1221
5  country2  AdUnit1  floor_price1  feature1  2111
6  country2  AdUnit1  floor_price1  feature2  2112

And when you create the dataframe, you can name your columns and index

like image 33
Zionsof Avatar answered Sep 30 '22 11:09

Zionsof