Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a pandas dataframe column by delimiter

Tags:

python

pandas

i have a small sample data:

import pandas as pd  df = {'ID': [3009, 129, 119, 120, 121, 122, 130, 3014, 266, 849, 174, 844],   'V': ['IGHV7-B*01', 'IGHV7-B*01', 'IGHV6-A*01', 'GHV6-A*01', 'IGHV6-A*01',         'IGHV6-A*01', 'IGHV4-L*03', 'IGHV4-L*03', 'IGHV5-A*01', 'IGHV5-A*04',         'IGHV6-A*02','IGHV6-A*02'],   'Prob': [1, 1, 0.8, 0.8056, 0.9, 0.805, 1, 1, 0.997, 0.401, 1, 1]}  df = pd.DataFrame(df) 

looks like

df      Out[25]:        ID    Prob           V 0    3009  1.0000  IGHV7-B*01 1     129  1.0000  IGHV7-B*01 2     119  0.8000  IGHV6-A*01 3     120  0.8056  IGHV6-A*01 4     121  0.9000  IGHV6-A*01 5     122  0.8050  IGHV6-A*01 6     130  1.0000  IGHV4-L*03 7    3014  1.0000  IGHV4-L*03 8     266  0.9970  IGHV5-A*01 9     849  0.4010  IGHV5-A*04 10    174  1.0000  IGHV6-A*02 11    844  1.0000  IGHV6-A*02 

I want to split the column 'V' by the '-' delimiter and move it to another column named 'allele'

    Out[25]:        ID    Prob      V    allele 0    3009  1.0000  IGHV7    B*01 1     129  1.0000  IGHV7    B*01 2     119  0.8000  IGHV6    A*01 3     120  0.8056  IGHV6    A*01 4     121  0.9000  IGHV6    A*01 5     122  0.8050  IGHV6    A*01 6     130  1.0000  IGHV4    L*03 7    3014  1.0000  IGHV4    L*03 8     266  0.9970  IGHV5    A*01 9     849  0.4010  IGHV5    A*04 10    174  1.0000  IGHV6    A*02 11    844  1.0000  IGHV6    A*02 

the code i have tried so far is incomplete and didn't work:

df1 = pd.DataFrame() df1[['V']] = pd.DataFrame([ x.split('-') for x in df['V'].tolist() ]) 

or

df.add(Series, axis='columns', level = None, fill_value = None) newdata = df.DataFrame({'V':df['V'].iloc[::2].values,                          'Allele': df['V'].iloc[1::2].values}) 
like image 330
Jessica Avatar asked May 19 '16 20:05

Jessica


People also ask

How do I split a column in a DataFrame?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

How do I split a single column into multiple columns in Python?

We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.

How do you split data into columns in Python?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do you split a list inside a DataFrame cell into columns in pandas?

To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list. Let's see it action with the help of an example.


1 Answers

Use vectoried str.split with expand=True:

In [42]: df[['V','allele']] = df['V'].str.split('-',expand=True) df  Out[42]:       ID    Prob      V allele 0   3009  1.0000  IGHV7   B*01 1    129  1.0000  IGHV7   B*01 2    119  0.8000  IGHV6   A*01 3    120  0.8056   GHV6   A*01 4    121  0.9000  IGHV6   A*01 5    122  0.8050  IGHV6   A*01 6    130  1.0000  IGHV4   L*03 7   3014  1.0000  IGHV4   L*03 8    266  0.9970  IGHV5   A*01 9    849  0.4010  IGHV5   A*04 10   174  1.0000  IGHV6   A*02 11   844  1.0000  IGHV6   A*02 
like image 60
EdChum Avatar answered Sep 22 '22 08:09

EdChum