Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split multiple columns in pandas dataframe by delimiter

I have survey data which annoying has returned multiple choice questions in the following way. It's in an excel sheet There is about 60 columns with responses from single to multiple that are split by /. This is what I have so far, is there any way to do this quicker without having to do this for each individual column

data = {'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']}

df = pd.DataFrame(data)

df[['q2a', 'q2b', 'q2c']]= df['q2'].str.split('/', expand = True, n=0)
df[['q3a', 'q3b', 'q3c']]= df['q2'].str.split('/', expand = True, n=0)

clean_df = df.drop(df[['q2', 'q3']], axis=1)
like image 668
em456 Avatar asked Oct 19 '20 16:10

em456


People also ask

How do I split a column in pandas based on delimiter?

Split column by delimiter into multiple columns Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How do I split all columns in a DataFrame in Python?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

How do I split one column into multiple columns in Python?

We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.


1 Answers

We can use list comprehension with add_prefix, then we use pd.concat to concatenate everything to your final df:

splits = [df[col].str.split(pat='/', expand=True).add_prefix(col) for col in df.columns]
clean_df = pd.concat(splits, axis=1)
     q10  q20  q21    q22 q30 q31 q32
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

If you actually want your column names to be suffixed by a letter, you can do the following with string.ascii_lowercase:

from string import ascii_lowercase

dfs = []
for col in df.columns:
    d = df[col].str.split('/', expand=True)
    c = d.shape[1]
    d.columns = [col + l for l in ascii_lowercase[:c]]
    dfs.append(d)
    
clean_df = pd.concat(dfs, axis=1)
     q1a  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i
like image 199
Erfan Avatar answered Nov 14 '22 23:11

Erfan