I have a column of values that are little messy
Col1
----------------------------------------
B-Lipotropin(S)...............874 BTETLS
IgE-Dandelion(S).............4578 BTETLS
Beta Gamma-Globulin..........2807 BTETLS
Lactate, P
Phospholipid .........8296 BTETLS
How do I split these values into three columns like this
Col1 Col2 Col3
-----------------------------------------------
B-Lipotropin(S) 874 BTETLS
IgE-Dandelion(S) 4578 BTETLS
Beta Gamma-Globulin 2807 BTETLS
Lactate, P
Phospholipid 8296 BTETLS
Appreciate any help.
You can also use tidyr for this:
library(tidyr)
dat <- read.table(text="B-Lipotropin(S)...............874 BTETLS
IgE-Dandelion(S).............4578 BTETLS
Beta Gamma-Globulin..........2807 BTETLS
Lactate, P
Phospholipid .........8296 BTETLS",
sep=";", stringsAsFactors=F, col.names = 'Col1')
dat %>%
separate(Col1, c('Col1', 'Col2'), '\\.+', extra = 'drop') %>%
separate(Col2, c('Col2', 'Col3'), ' ', extra = 'drop')
# Col1 Col2 Col3
# 1 B-Lipotropin(S) 874 BTETLS
# 2 IgE-Dandelion(S) 4578 BTETLS
# 3 Beta Gamma-Globulin 2807 BTETLS
# 4 Lactate, P <NA> <NA>
# 5 Phospholipid 8296 BTETLS
edit: you can also do it in one step with separate(Col1, paste0('Col', 1:3), '([^,] )|(\\.+)', extra = 'drop')
Without the actual data, it is difficult to give a general solution. However, below is one using regular expressions.
Here I assumed that the first two columns are always separated by at least one ., possibly with spaces before or after; the second and third column are presumably separated by spaces.
dat <- read.table(text="B-Lipotropin(S)...............874 BTETLS
IgE-Dandelion(S).............4578 BTETLS
Beta Gamma-Globulin..........2807 BTETLS
Lactate, P
Phospholipid .........8296 BTETLS",
sep=";", stringsAsFactors=F)
# separate first column
l <- strsplit(dat[,1], split="[[:space:]]*\\.+[[:space:]]*")
l <- lapply(l, function(x) c(x,rep("",2-length(x))))
l <- do.call(rbind,l)
dat <- cbind(dat, l[,1])
# separate last two columns
l <- strsplit(l[,2], split="[[:space:]]+")
l <- lapply(l, function(x) c(x,rep("",2-length(x))))
l <- do.call(rbind,l)
dat <- cbind(dat, l)
colnames(dat) <- c("original","col1","col2","col3")
The separated columns look like this:
> dat[,-1]
col1 col2 col3
1 B-Lipotropin(S) 874 BTETLS
2 IgE-Dandelion(S) 4578 BTETLS
3 Beta Gamma-Globulin 2807 BTETLS
4 Lactate, P
5 Phospholipid 8296 BTETLS
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With