I have the following DataFrame. I am wondering whether it is possible to break the data
column into multiple columns. E.g., from this:
ID Date data 6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 6 21/01/2014 B: 5, C: 5, D: 7 6 02/04/2013 A: 4, D:7 7 05/06/2014 C: 25 7 12/08/2014 D: 20 8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4
into this:
ID Date data A B C D E F 6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 15 8 5 5 0 0 6 21/01/2014 B: 5, C: 5, D: 7 0 5 5 7 0 0 6 02/04/2013 B: 4, D: 7, B: 6 0 10 0 7 0 0 7 05/06/2014 C: 25 0 0 25 0 0 0 7 12/08/2014 D: 20 0 0 0 20 0 0 8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 2 7 3 0 5 0 8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4 0 8 0 6 0 11
I have tried this Split strings in tuples into columns, in Pandas, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.
EDIT
There is a bit of complexity the data
column has duplicate values for example in first row A
is repeated, and therefore these values are summed up under the A
column (please see second table).
Split Pandas DataFrame column by Mutiple Delimiter In this example, we are using the str.split () method to split the “Mark ” column into multiple columns by using this multiple delimiter (- _; / %) The “ Mark ” column will be split as “ Mark “ and “ Mark _”. 3. Split column by Multiple delimiters no digit check
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist () function to the column. The following is the syntax. import pandas as pd # assuming 'Col' is the column you want to split
We can use Pandas string method .str.split (',') in order to split the values into lists of lists. If you have missing data you need to ensure that you default it by empty list by .fillna (' []'): As you can see the result DataFrame has 45 columns.
We will use the Series.str.split () function to separate the Number column and pass the - in split () method . Make sure you pass True to the expand keyword. This example will split every value of series (Number) by -.
df = pd.DataFrame([
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
], columns=['ID', 'dictionary'])
def str2dict(s):
split = s.strip().split(',')
d = {}
for pair in split:
k, v = [_.strip() for _ in pair.split(':')]
d[k] = v
return d
df.dictionary.apply(str2dict).apply(pd.Series)
Or:
pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With