Pandas Group Columns into Level

Question

Assume I have a dataframe of the form:

>>> df = pd.DataFrame([['2012', 'A', 1], ['2012', 'B', 2], ['2011', 'A', 3], 
                       ['2011', 'B', 2]],
                      columns=['branch_year', 'branch_name', 'employee_id'])

  branch_year branch_name  employee_id
0        2012           A            1
1        2012           B            2
2        2011           A            3
3        2011           B            2

How can I combine columns branch_year and branch_name so that they have a parent column branch -- and ideally renaming them to get rid of branch_ prefix.

       branch      branch  employee_id
         year        name
0        2012           A            1
1        2012           B            2
2        2011           A            3
3        2011           B            2

The end goal is to create a list of dictionaries of the form:

[
    {
      "employeed_id": 1,
      "branch": {
                  "name": "A",
                  "year": "2012"
      }
    },
    {...}
]

Mike Müller · Accepted Answer

You can apply a function to each row and turn the result into a list:

def to_nested_dict(row):
    return {'employee_id': row.employee_id,
            'branch': {'year': row.branch_year, 'name': row.branch_name}}

list(df.apply(to_nested_dict, axis=1))

This preserves the original order of the rows:

[{'branch': {'name': 'A', 'year': '2012'}, 'employee_id': 1},
 {'branch': {'name': 'B', 'year': '2012'}, 'employee_id': 2},
 {'branch': {'name': 'A', 'year': '2011'}, 'employee_id': 3},
 {'branch': {'name': 'B', 'year': '2011'}, 'employee_id': 2}]

Programmatic approach nested on column names that have underscores:

def to_nested_dict(row):
    res = {}
    for col in row.index:
        outer_key, inner_key = col.split('_')
        outer = res.setdefault(outer_key, {})
        outer[inner_key] = row[col]
    return res

list(df.apply(to_nested_dict, axis=1))

Result:

[{'branch': {'name': 'A', 'year': '2012'}, 'employee': {'id': 1}},
 {'branch': {'name': 'B', 'year': '2012'}, 'employee': {'id': 2}},
 {'branch': {'name': 'A', 'year': '2011'}, 'employee': {'id': 3}},
 {'branch': {'name': 'B', 'year': '2011'}, 'employee': {'id': 2}}]

Kevin · Answer

Not pretty, but it gets what you want using groupby:

lst = []
for k,g in pd.groupby(df, by=['branch_name','branch_year']):
    d = {'employee_id': int(g['employee_id']), 'branch': {'name': k[0], 'year': k[1]}}
    lst.append(d)

lst
[{'branch': {'name': 'A', 'year': '2011'}, 'employee_id': 3},
 {'branch': {'name': 'A', 'year': '2012'}, 'employee_id': 1},
 {'branch': {'name': 'B', 'year': '2011'}, 'employee_id': 2},
 {'branch': {'name': 'B', 'year': '2012'}, 'employee_id': 2}]

Pandas Group Columns into Level

Tags:

python

pandas

Kamil Sindi

2 Answers

Mike Müller

Kevin

Recent Activity

Donate For Us

Pandas Group Columns into Level

Tags:

python

pandas

Kamil Sindi

2 Answers

Mike Müller

Kevin

Related questions

Recent Activity

Donate For Us