I have a dataframe column with the following format:
col1    col2   
 A     [{'Id':42,'prices':['30',’78’]},{'Id': 44,'prices':['20','47',‘89’]}]
 B     [{'Id':47,'prices':['30',’78’]},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]
How can I transform it to the following ?
col1    Id            price   
  A     42         ['30',’78’]
  A     44         ['20','47',‘89’]
  B     47         ['30',’78’]
  B     94         ['20']
  B     84         ['20','98']
I was thinking of using apply and lambda as a solution but I am not sure how.
Edit : In order to recreate this dataframe I use the following code :
data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"], 
        ['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]] 
df = pd.DataFrame(data, columns = ['col1', 'col2'])
                Split column by delimiter into multiple columns Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
loc and iloc are interchangeable when labels are 0-based integers.
Solution if there are lists in column col2:
print (type(df['col2'].iat[0]))
<class 'list'>
L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in b]
df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]
If there are strings:
print (type(df['col2'].iat[0]))
<class 'str'>
import ast
L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in ast.literal_eval(b)]
df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]
For better understanding is possible use:
import ast
L = []
for a, b in df[['col1','col2']].to_numpy():
    for x in ast.literal_eval(b):
        d = {'col1': a}
        out = {**d, **x}
        L.append(out)
df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]
                        Considering second parameter of "data" as list.
data= [
  ['A', [{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]], 
  ['B', [{'Id':47,'prices':['30','78']}, {'Id':94,'prices':['20']},{'Id':84,'prices': 
        ['20','98']}]]
  ]
t_list = []
for i in range(len(data)):
    for j in range(len(data[i][1])):
        t_list.append((data[i][0], data[i][1][j]['Id'], data[i][1][j]['prices']))
df = pd.DataFrame(t_list, columns=['col1', 'id', 'price'])
print(df)
     col1  id         price
0    A     42      [30, 78]
1    A     44  [20, 47, 89]
2    B     47      [30, 78]
3    B     94          [20]
4    B     84      [20, 98]
                        You can use df.explode here with pd.Series.apply and df.set_index and df.reset_index
df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]
When col2 is string, use ast.literal_eval
import ast
data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"], 
        ['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]] 
df = pd.DataFrame(data, columns = ['col1', 'col2'])
df['col2'] = df['col2'].map(ast.literal_eval)
df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With