i have a small sample data:
import pandas as pd df = {'ID': [3009, 129, 119, 120, 121, 122, 130, 3014, 266, 849, 174, 844], 'V': ['IGHV7-B*01', 'IGHV7-B*01', 'IGHV6-A*01', 'GHV6-A*01', 'IGHV6-A*01', 'IGHV6-A*01', 'IGHV4-L*03', 'IGHV4-L*03', 'IGHV5-A*01', 'IGHV5-A*04', 'IGHV6-A*02','IGHV6-A*02'], 'Prob': [1, 1, 0.8, 0.8056, 0.9, 0.805, 1, 1, 0.997, 0.401, 1, 1]} df = pd.DataFrame(df)
looks like
df Out[25]: ID Prob V 0 3009 1.0000 IGHV7-B*01 1 129 1.0000 IGHV7-B*01 2 119 0.8000 IGHV6-A*01 3 120 0.8056 IGHV6-A*01 4 121 0.9000 IGHV6-A*01 5 122 0.8050 IGHV6-A*01 6 130 1.0000 IGHV4-L*03 7 3014 1.0000 IGHV4-L*03 8 266 0.9970 IGHV5-A*01 9 849 0.4010 IGHV5-A*04 10 174 1.0000 IGHV6-A*02 11 844 1.0000 IGHV6-A*02
I want to split the column 'V' by the '-' delimiter and move it to another column named 'allele'
Out[25]: ID Prob V allele 0 3009 1.0000 IGHV7 B*01 1 129 1.0000 IGHV7 B*01 2 119 0.8000 IGHV6 A*01 3 120 0.8056 IGHV6 A*01 4 121 0.9000 IGHV6 A*01 5 122 0.8050 IGHV6 A*01 6 130 1.0000 IGHV4 L*03 7 3014 1.0000 IGHV4 L*03 8 266 0.9970 IGHV5 A*01 9 849 0.4010 IGHV5 A*04 10 174 1.0000 IGHV6 A*02 11 844 1.0000 IGHV6 A*02
the code i have tried so far is incomplete and didn't work:
df1 = pd.DataFrame() df1[['V']] = pd.DataFrame([ x.split('-') for x in df['V'].tolist() ])
or
df.add(Series, axis='columns', level = None, fill_value = None) newdata = df.DataFrame({'V':df['V'].iloc[::2].values, 'Allele': df['V'].iloc[1::2].values})
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list. Let's see it action with the help of an example.
Use vectoried str.split
with expand=True
:
In [42]: df[['V','allele']] = df['V'].str.split('-',expand=True) df Out[42]: ID Prob V allele 0 3009 1.0000 IGHV7 B*01 1 129 1.0000 IGHV7 B*01 2 119 0.8000 IGHV6 A*01 3 120 0.8056 GHV6 A*01 4 121 0.9000 IGHV6 A*01 5 122 0.8050 IGHV6 A*01 6 130 1.0000 IGHV4 L*03 7 3014 1.0000 IGHV4 L*03 8 266 0.9970 IGHV5 A*01 9 849 0.4010 IGHV5 A*04 10 174 1.0000 IGHV6 A*02 11 844 1.0000 IGHV6 A*02
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With