Here's my problem. I have a dataframe with x columns and y lines. Some columns are actually lists. I want to transform those columns to multiple columns containing single values.
An example speaks by itself :
My dataframe :
ans_length ans_unigram_numbers ... levenshtein_dist que_entropy
0 [19, 14] [12, 8] ... 9.00 3.189898
1 [19] [12] ... 4.00 3.189898
2 [0] [0] ... 170.00 4.299996
3 [0] [0] ... 170.00 4.303341
4 [0] [0] ... 170.00 4.304335
5 [0] [0] ... 170.00 4.311820
28 [56] [23] ... 24.00 4.110291
29 [0] [0] ... 56.00 4.181720
... ... ... ... ... ...
1976 [24] [11] ... 24.00 3.084963
1977 [24] [11] ... 24.00 3.084963
1992 [31, 24, 32, 28] [14, 15, 17, 11] ... 18.75 3.292770
1993 [31, 24, 32, 28] [14, 15, 17, 11] ... 18.75 3.292770
[1998 rows x 9 columns]
What I expect :
ans_length_0 ans_length_1 ans_length_2 ans_length_3 \
0 19 14
1 19
2 0
3 0
4 0
5 0
28 56
29 0
1976 24
1977 24
1992 31 24 32 28
1993 31 24 32 28
ans_unigram_numbers_0 ans_unigram_numbers_1 ans_unigram_numbers_2 ans_unigram_numbers_3 \
12 8
12
0
0
0
0
23
0
11
11
14 15 17 11
14 15 17 11
levenshtein_dist que_entropy
9 3.189898
4 3.189898
170 4.299996
170 4.303341
170 4.304335
170 4.31182
24 4.110291
56 4.18172
24 3.084963
24 3.084963
18.75 3.29277
18.75 3.29277
The newly generated columns should take the name of the old one, adding an index at the end of it.
In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame. apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values.
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
I think you can use:
cols = ['ans_length','ans_unigram_numbers']
df1 = pd.concat([pd.DataFrame(df[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
df = pd.concat([df1, df.drop(cols, axis=1)], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With