Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split each cell in dataframe (pandas/python)

Tags:

python

pandas

I have a large pandas dataframe consisting of many rows and columns containing binary data like '0|1', '0|0','1|1','1|0' which i would like to split either in 2 dataframes, and/or expand so that this (both are useful to me):

        a   b   c   d
rowa    1|0 0|1 0|1 1|0
rowb    0|1 0|0 0|0 0|1
rowc    0|1 1|0 1|0 0|1

becomes

        a   b   c   d
rowa1   1   0   0   1
rowa2   0   1   1   0
rowb1   0   0   0   0
rowb2   1   0   0   1
rowc1   0   1   1   0
rowc2   1   0   0   1

and/or

    df1:    a   b   c   d
    rowa    1   0   0   1
    rowb    0   0   0   0
    rowc    0   1   1   0


    df2:    a   b   c   d
    rowa    0   1   1   0
    rowb    1   0   0   1
    rowc    1   0   0   1

currently i'm trying to do something like the following, but believe this is not very effective, any guidance would be helpful.

Atmp_dict=defaultdict(list)
Btmp_dict=defaultdict(list)

for index,row in df.iterrows():
    for columnname in list(df.columns.values):
        Atmp_dict[columnname].append(row[columnname].split('|')[0])
        Btmp_dict[columnname].append(row[columnname].split('|')[1])
like image 909
tafelplankje Avatar asked Dec 10 '25 10:12

tafelplankje


2 Answers

user2734178 is close, but his or her answer has some issues. Here is a slight variation that works

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()

# df is your original DataFrame
for col in df.columns:
    df1[col] = df[col].apply(lambda x: x.split('|')[0])
    df2[col] = df[col].apply(lambda x: x.split('|')[1])

Here is another option that is slightly more elegant. Replace the loop with:

for col in df.columns:
    df1[col] = df[col].str.extract("(\d)\|")
    df2[col] = df[col].str.extract("\|(\d)")
like image 150
drootang Avatar answered Dec 13 '25 00:12

drootang


This is pretty compact, but it seems like there should be an even easier and more compact way.

df1 = df.applymap( lambda x: str(x)[0] ) 
df2 = df.applymap( lambda x: str(x)[2] )

Or loop over the columns as in the other answers. I don't think it matters. Note that because the question specified binary data, it is OK (and simpler) to just do str[0] and str[2] rather than using split or extract.

Or you could do this, which seems almost silly, but there's nothing actually wrong with it and it is fairly compact.

df1 = df.stack().str[0].unstack()
df2 = df.stack().str[2].unstack()

stack just converts it to a series so you can use str and then unstack converts it back to a dataframe.

like image 38
JohnE Avatar answered Dec 12 '25 23:12

JohnE



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!