Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a column containing a list of dictionaries to multiple columns in pandas dataframe

Tags:

python

pandas

I have a Pandas dataframe like :

pd.DataFrame({'a':[1,2], 'b':[[{'c':1,'d':5},{'c':3, 'd':7}],[{'c':10,'d':50}]]})
Out[2]: 
   a                                         b
0  1  [{u'c': 1, u'd': 5}, {u'c': 3, u'd': 7}]
1  2                    [{u'c': 10, u'd': 50}]

And I want to expand the 'b' column and repeat 'a' column if there are more than one element in 'b' as follow:

Out[2]: 
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

I tried to use apply function on each row but I was not successful, apparently apply convert one row to one row.

like image 708
Ali Mirzaei Avatar asked Jul 26 '17 09:07

Ali Mirzaei


People also ask

How do I turn a list of dictionaries into a pandas DataFrame?

Use pd. DataFrame. from_dict() to transform a list of dictionaries to pandas DatFrame. This function is used to construct DataFrame from dict of array-like or dicts.

Can we create DataFrame from list of dictionaries?

When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.

How do I convert multiple columns in pandas?

You can use the DataFrame. apply() and pd. to_datetime() function to convert multiple columns to DataTime. apply() function applies a function to each and every row and column of the DataFrame.


1 Answers

You can use concat with list comprehension:

df = pd.concat([pd.DataFrame(x) for x in df['b']], keys=df['a'])
       .reset_index(level=1, drop=True).reset_index()

print (df)
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

EDIT:

If index is unique, then is possible use join for all columns:

df1 = pd.concat([pd.DataFrame(x) for x in df['b']], keys=df.index)
        .reset_index(level=1,drop=True)
df = df.drop('b', axis=1).join(df1).reset_index(drop=True)
print (df)
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

I try simplify solution:

l = df['b'].str.len()
df1 = pd.DataFrame(np.concatenate(df['b']).tolist(), index=np.repeat(df.index, l))
df = df.drop('b', axis=1).join(df1).reset_index(drop=True)
print (df)
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50
like image 182
jezrael Avatar answered Oct 24 '22 19:10

jezrael