I have a Pandas dataframe like :
pd.DataFrame({'a':[1,2], 'b':[[{'c':1,'d':5},{'c':3, 'd':7}],[{'c':10,'d':50}]]})
Out[2]:
a b
0 1 [{u'c': 1, u'd': 5}, {u'c': 3, u'd': 7}]
1 2 [{u'c': 10, u'd': 50}]
And I want to expand the 'b' column and repeat 'a' column if there are more than one element in 'b' as follow:
Out[2]:
a c d
0 1 1 5
1 1 3 7
2 2 10 50
I tried to use apply function on each row but I was not successful, apparently apply convert one row to one row.
Use pd. DataFrame. from_dict() to transform a list of dictionaries to pandas DatFrame. This function is used to construct DataFrame from dict of array-like or dicts.
When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.
You can use the DataFrame. apply() and pd. to_datetime() function to convert multiple columns to DataTime. apply() function applies a function to each and every row and column of the DataFrame.
You can use concat
with list comprehension
:
df = pd.concat([pd.DataFrame(x) for x in df['b']], keys=df['a'])
.reset_index(level=1, drop=True).reset_index()
print (df)
a c d
0 1 1 5
1 1 3 7
2 2 10 50
EDIT:
If index is unique, then is possible use join
for all columns:
df1 = pd.concat([pd.DataFrame(x) for x in df['b']], keys=df.index)
.reset_index(level=1,drop=True)
df = df.drop('b', axis=1).join(df1).reset_index(drop=True)
print (df)
a c d
0 1 1 5
1 1 3 7
2 2 10 50
I try simplify solution:
l = df['b'].str.len()
df1 = pd.DataFrame(np.concatenate(df['b']).tolist(), index=np.repeat(df.index, l))
df = df.drop('b', axis=1).join(df1).reset_index(drop=True)
print (df)
a c d
0 1 1 5
1 1 3 7
2 2 10 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With