Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert list of dictionaries containing another list of dictionaries to dataframe

I tried to look for the solution and I am unable to get 1. I have the following output from an api in python.

insights = [ <Insights> {
    "account_id": "1234",
    "actions": [
        {
            "action_type": "add_to_cart",
            "value": "8"
        },
        {
            "action_type": "purchase",
            "value": "2"
        }
    ],
    "cust_id": "xyz123",
    "cust_name": "xyz",
}, <Insights> {
    "account_id": "1234",
    "cust_id": "pqr123",
    "cust_name": "pqr",
},  <Insights> {
    "account_id": "1234",
    "actions": [
        {
            "action_type": "purchase",
            "value": "45"
        }
    ],
    "cust_id": "abc123",
    "cust_name": "abc",
 }
 ]

I want the data frame something like this

- account_id    add_to_cart purchase    cust_id cust_name
- 1234                    8        2    xyz123  xyz
- 1234                                  pqr123  pqr
- 1234                            45    abc123  abc

When I use the following

> insights_1 = [x for x in insights]

> df = pd.DataFrame(insights_1)

I get the following

- account_id                                       actions  cust_id cust_name
- 1234  [{'value': '8', 'action_type': 'add_to_cart'},{'value': '2', 'action_type': 'purchase'}]                                    xyz123  xyz
- 1234                                              NaN     pqr123  pqr
- 1234  [{'value': '45', 'action_type': 'purchase'}]        abc123  abc

How do I move ahead with this?

like image 881
raul0002 Avatar asked Apr 30 '18 18:04

raul0002


2 Answers

This is one solution.

df = pd.DataFrame(insights)

parts = [pd.DataFrame({d['action_type']: d['value'] for d in x}, index=[0])
         if x == x else pd.DataFrame({'add_to_cart': [np.nan], 'purchase': [np.nan]})
         for x in df['actions']]

df = df.drop('actions', 1)\
       .join(pd.concat(parts, axis=0, ignore_index=True))

print(df)

  account_id cust_id cust_name add_to_cart purchase
0       1234  xyz123       xyz           8        2
1       1234  pqr123       pqr         NaN      NaN
2       1234  abc123       abc         NaN       45

Explanation

  • Utilise pandas to read the outer list of dictionaries into a dataframe.
  • For the inner dictionaries, use a list comprehension together with a dictionary comprehension.
  • Account for nan values by testing for equality within the list comprehension.
  • Concatenate and join the parts to the original dataframe.

Explanation - detail

This details the construction and use of parts:

  1. Take each entry in df['actions']; each entry will be a list of dictionaries.
  2. Iterate them one by one, i.e. by row, in a for loop.
  3. The else part says "if it is np.nan [i.e. null] then return a dataframe of nans". The if part takes the list of dictionaries and creates a mini-dataframe for each row.
  4. We then use the next part to concatenate these mini-dictionaries, one for each row, and join them to the original dataframe.
like image 158
jpp Avatar answered Sep 30 '22 09:09

jpp


I think using apply to your df will be an option. First I would replace NaN with empty list:

df['actions'][df['actions'].isnull()] = df['actions'][df['actions'].isnull()].apply(lambda x: [])

You create a function add_to_cart to read in the list of actions if the type is add_to_cart and use apply to create the column:

def add_to_cart(list_action):
    for action in list_action:
        # for each action, see if the key action_type has the value add_to_cart and return the value
        if action['action_type'] == 'add_to_cart':
            return action['value']
    # if no add_to_cart action, then empty
    return ''

df['add_to_cart'] = df['actions'].apply(add_to_cart)

Same idea for purchase:

def purchase(list_action):
    for action in list_action:
        if action['action_type'] == 'purchase':
            return action['value']
    return ''

df['purchase'] = df['actions'].apply(purchase)

Then you can drop the column actions if you want:

df = df.drop('actions',axis=1)

EDIT: define a unique function find_action and then apply with an argument, such as:

def find_action(list_action, action_type):
    for action in list_action:
        # for each action, see if the key action_type is the one wanted
        if action['action_type'] == action_type:
            return action['value']
    # if not the right action type found, then empty
    return ''
df['add_to_cart'] = df['actions'].apply(find_action, args=(['add_to_cart']))
df['purchase'] = df['actions'].apply(find_action, args=(['purchase']))
like image 29
Ben.T Avatar answered Sep 30 '22 11:09

Ben.T