Here is a JSON string that contains a list of objects with each having another list embedded.
[
{
"name": "Alice",
"hobbies": [
"volleyball",
"shopping",
"movies"
]
},
{
"name": "Bob",
"hobbies": [
"fishing",
"movies"
]
}
]
Using pandas.read_json()
this turns into a DataFrame like this:
name hobbies
--------------------------------------
1 Alice [volleyball, shopping, movies]
2 Bob [fishing, movies]
However, I would like to flatten the lists into Boolean columns like this:
name volleyball shopping movies fishing
----------------------------------------------------
1 Alice True True True False
2 Bob False False True True
I.e. when the list contains a value, the field in the corresponding column is filled with a Boolean True
, otherwise with False
.
I have also looked into pandas.io.json.json_normalize()
, but that does not seem support this idea either. Is there any built-in way (either Python3, or pandas) to do this?
(PS. I realize that you can cook up your own code to 'normalize' the dictionary objects before loading the whole list into a DataFrame, but I might be reinventing the wheel with that and probably in a very inefficient way).
you can do the following:
In [56]: data = [
....: {
....: "name": "Alice",
....: "hobbies": [
....: "volleyball",
....: "shopping",
....: "movies"
....: ]
....: },
....: {
....: "name": "Bob",
....: "hobbies": [
....: "fishing",
....: "movies"
....: ]
....: }
....: ]
In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
In [59]: df['count'] = 1
In [60]: df
Out[60]:
hobby name count
0 volleyball Alice 1
1 shopping Alice 1
2 movies Alice 1
3 fishing Bob 1
4 movies Bob 1
In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby fishing movies shopping volleyball
name
Alice 0.0 1.0 1.0 1.0
Bob 1.0 1.0 0.0 0.0
Or even better:
In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
In [89]: r
Out[89]:
hobby fishing movies shopping volleyball
name
Alice 0.0 1.0 1.0 1.0
Bob 1.0 1.0 0.0 0.0
let's generate list of 'boolean' columns dynamically
In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']
In [91]: r = r[cols_boolean].astype(bool)
In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice False True True True
Bob True True False False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With