I have a dataframe column with the following format:
col1 col2
A [{'Id':42,'prices':['30',’78’]},{'Id': 44,'prices':['20','47',‘89’]}]
B [{'Id':47,'prices':['30',’78’]},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]
How can I transform it to the following ?
col1 Id price
A 42 ['30',’78’]
A 44 ['20','47',‘89’]
B 47 ['30',’78’]
B 94 ['20']
B 84 ['20','98']
I was thinking of using apply and lambda as a solution but I am not sure how.
Edit : In order to recreate this dataframe I use the following code :
data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"],
['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]]
df = pd.DataFrame(data, columns = ['col1', 'col2'])
Split column by delimiter into multiple columns Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
loc and iloc are interchangeable when labels are 0-based integers.
Solution if there are lists in column col2
:
print (type(df['col2'].iat[0]))
<class 'list'>
L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in b]
df = pd.DataFrame(L)
print (df)
col1 Id prices
0 A 42 [30, 78]
1 A 44 [20, 47, 89]
2 B 47 [30, 78]
3 B 94 [20]
4 B 84 [20, 98]
If there are strings:
print (type(df['col2'].iat[0]))
<class 'str'>
import ast
L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in ast.literal_eval(b)]
df = pd.DataFrame(L)
print (df)
col1 Id prices
0 A 42 [30, 78]
1 A 44 [20, 47, 89]
2 B 47 [30, 78]
3 B 94 [20]
4 B 84 [20, 98]
For better understanding is possible use:
import ast
L = []
for a, b in df[['col1','col2']].to_numpy():
for x in ast.literal_eval(b):
d = {'col1': a}
out = {**d, **x}
L.append(out)
df = pd.DataFrame(L)
print (df)
col1 Id prices
0 A 42 [30, 78]
1 A 44 [20, 47, 89]
2 B 47 [30, 78]
3 B 94 [20]
4 B 84 [20, 98]
Considering second parameter of "data" as list.
data= [
['A', [{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]],
['B', [{'Id':47,'prices':['30','78']}, {'Id':94,'prices':['20']},{'Id':84,'prices':
['20','98']}]]
]
t_list = []
for i in range(len(data)):
for j in range(len(data[i][1])):
t_list.append((data[i][0], data[i][1][j]['Id'], data[i][1][j]['prices']))
df = pd.DataFrame(t_list, columns=['col1', 'id', 'price'])
print(df)
col1 id price
0 A 42 [30, 78]
1 A 44 [20, 47, 89]
2 B 47 [30, 78]
3 B 94 [20]
4 B 84 [20, 98]
You can use df.explode
here with pd.Series.apply
and df.set_index
and df.reset_index
df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()
col1 Id prices
0 A 42 [30, 78]
1 A 44 [20, 47, 89]
2 B 47 [30, 78]
3 B 94 [20]
4 B 84 [20, 98]
When col2
is string, use ast.literal_eval
import ast
data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"],
['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]]
df = pd.DataFrame(data, columns = ['col1', 'col2'])
df['col2'] = df['col2'].map(ast.literal_eval)
df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()
col1 Id prices
0 A 42 [30, 78]
1 A 44 [20, 47, 89]
2 B 47 [30, 78]
3 B 94 [20]
4 B 84 [20, 98]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With