I have a dataframe such as
COl1
scaffold_97606_2-BACs_-__SP1_1
UELV01165908.1_2-BACs_+__SP2_2
UXGC01046554.1_9-702_+__SP3_3
scaffold_12002_1087-1579_-__SP4_4
and I would like to separate both into two columns and get :
COL1 COL2
scaffold_97606 2-BACs_-__SP1_1
UELV01165908.1 2-BACs_+__SP2_2
UXGC01046554.1 9-702_+__SP3_3
scaffold_12002 1087-1579_-__SP4_4
so as you can see the separator changes it can be .Number_
or Number_Number
So far I wrote ;
df2 <- df1 %>%
separate(COL1, paste0('col', 1:2), sep = " the separator patterns ", extra = "merge")
but I do not know what separator I should use here in the " the separator patterns "
part
You may use
> df1 %>%
separate(COl1, paste0('col', 1:2), sep = "(?<=\\d)_(?=\\d+-)", extra = "merge")
col1 col2
1 scaffold_97606 2-BACs_-__SP1_1
2 UELV01165908.1 2-BACs_+__SP2_2
3 UXGC01046554.1 9-702_+__SP3_3
4 scaffold_12002 1087-1579_-__SP4_4
See the regex demo
Pattern details
(?<=\d)
- a positive lookbehind that requires a digit immediately to the left of the current location_
- an underscore(?=\d+-)
- a positive lookahead that requires one or more digits and then a -
immediately to the right of the current location.You can use extract
:
tidyr::extract(df, COl1, c('Col1', 'Col2'), regex = '(.*?\\d+)_(.*)')
# Col1 Col2
#1 scaffold_97606 2-BACs_-__SP1_1
#2 UELV01165908.1 2-BACs_+__SP2_2
#3 UXGC01046554.1 9-702_+__SP3_3
#4 scaffold_12002 1087-1579_-__SP4_4
data
df <- structure(list(COl1 = c("scaffold_97606_2-BACs_-__SP1_1",
"UELV01165908.1_2-BACs_+__SP2_2",
"UXGC01046554.1_9-702_+__SP3_3", "scaffold_12002_1087-1579_-__SP4_4"
)), class = "data.frame", row.names = c(NA, -4L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With