Splitting a pandas dataframe column by delimiter

Tags:

python

pandas

i have a small sample data:

import pandas as pd  df = {'ID': [3009, 129, 119, 120, 121, 122, 130, 3014, 266, 849, 174, 844],   'V': ['IGHV7-B*01', 'IGHV7-B*01', 'IGHV6-A*01', 'GHV6-A*01', 'IGHV6-A*01',         'IGHV6-A*01', 'IGHV4-L*03', 'IGHV4-L*03', 'IGHV5-A*01', 'IGHV5-A*04',         'IGHV6-A*02','IGHV6-A*02'],   'Prob': [1, 1, 0.8, 0.8056, 0.9, 0.805, 1, 1, 0.997, 0.401, 1, 1]}  df = pd.DataFrame(df)

looks like

df      Out[25]:        ID    Prob           V 0    3009  1.0000  IGHV7-B*01 1     129  1.0000  IGHV7-B*01 2     119  0.8000  IGHV6-A*01 3     120  0.8056  IGHV6-A*01 4     121  0.9000  IGHV6-A*01 5     122  0.8050  IGHV6-A*01 6     130  1.0000  IGHV4-L*03 7    3014  1.0000  IGHV4-L*03 8     266  0.9970  IGHV5-A*01 9     849  0.4010  IGHV5-A*04 10    174  1.0000  IGHV6-A*02 11    844  1.0000  IGHV6-A*02

I want to split the column 'V' by the '-' delimiter and move it to another column named 'allele'

    Out[25]:        ID    Prob      V    allele 0    3009  1.0000  IGHV7    B*01 1     129  1.0000  IGHV7    B*01 2     119  0.8000  IGHV6    A*01 3     120  0.8056  IGHV6    A*01 4     121  0.9000  IGHV6    A*01 5     122  0.8050  IGHV6    A*01 6     130  1.0000  IGHV4    L*03 7    3014  1.0000  IGHV4    L*03 8     266  0.9970  IGHV5    A*01 9     849  0.4010  IGHV5    A*04 10    174  1.0000  IGHV6    A*02 11    844  1.0000  IGHV6    A*02

the code i have tried so far is incomplete and didn't work:

df1 = pd.DataFrame() df1[['V']] = pd.DataFrame([ x.split('-') for x in df['V'].tolist() ])

df.add(Series, axis='columns', level = None, fill_value = None) newdata = df.DataFrame({'V':df['V'].iloc[::2].values,                          'Allele': df['V'].iloc[1::2].values})

330

asked May 19 '16 20:05

Jessica

1 Answers

Use vectoried str.split with expand=True:

In [42]: df[['V','allele']] = df['V'].str.split('-',expand=True) df  Out[42]:       ID    Prob      V allele 0   3009  1.0000  IGHV7   B*01 1    129  1.0000  IGHV7   B*01 2    119  0.8000  IGHV6   A*01 3    120  0.8056   GHV6   A*01 4    121  0.9000  IGHV6   A*01 5    122  0.8050  IGHV6   A*01 6    130  1.0000  IGHV4   L*03 7   3014  1.0000  IGHV4   L*03 8    266  0.9970  IGHV5   A*01 9    849  0.4010  IGHV5   A*04 10   174  1.0000  IGHV6   A*02 11   844  1.0000  IGHV6   A*02

answered Sep 22 '22 08:09

EdChum

Related questions
                            
                                Is there a built-in or more Pythonic way to try to parse a string to an integer
                            
                                How to calculate the sum of all columns of a 2D numpy array (efficiently)
                            
                                Django Model MultipleChoice
                            
                                Path to a file without basename [duplicate]
                            
                                Comparing numpy arrays containing NaN
                            
                                How do I print functions as they are called
                            
                                How to downcase the first character of a string?
                            
                                How do I add space between the ticklabels and the axes in matplotlib
                            
                                python pip specify a library directory and an include directory
                            
                                How to do product of matrices in PyTorch
                            
                                Difference between Python self and Java this
                            
                                Python iterating through object attributes [duplicate]
                            
                                JSON object must be str, bytes or bytearray, not dict
                            
                                What happens when you assign the value of one variable to another variable in Python?
                            
                                Differences between numpy.random.rand vs numpy.random.randn in Python
                            
                                Remove Sub String by using Python
                            
                                How to copy directory recursively in python and overwrite all?
                            
                                How to get current isoformat datetime string including the default timezone?
                            
                                Jinja2 inline comments
                            
                                How to convert a pymongo.cursor.Cursor into a dict?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With