Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split string into columns

I have a column of values that are little messy

   Col1
   ----------------------------------------
   B-Lipotropin(S)...............874 BTETLS
   IgE-Dandelion(S).............4578 BTETLS
   Beta Gamma-Globulin..........2807 BTETLS
   Lactate, P
   Phospholipid .........8296 BTETLS

How do I split these values into three columns like this

    Col1                        Col2         Col3
    -----------------------------------------------
    B-Lipotropin(S)             874          BTETLS
    IgE-Dandelion(S)            4578         BTETLS
    Beta Gamma-Globulin         2807         BTETLS
    Lactate, P
    Phospholipid                8296         BTETLS

Appreciate any help.

like image 317
Ezra Polson Avatar asked Jun 07 '26 07:06

Ezra Polson


2 Answers

You can also use tidyr for this:

library(tidyr)

dat <- read.table(text="B-Lipotropin(S)...............874 BTETLS
IgE-Dandelion(S).............4578 BTETLS
Beta Gamma-Globulin..........2807 BTETLS
Lactate, P
Phospholipid .........8296 BTETLS",
sep=";", stringsAsFactors=F, col.names = 'Col1')

dat %>%
  separate(Col1, c('Col1', 'Col2'), '\\.+', extra = 'drop') %>%
  separate(Col2, c('Col2', 'Col3'), ' ', extra = 'drop')

#                                    Col1 Col2   Col3
# 1                       B-Lipotropin(S)  874 BTETLS
# 2                      IgE-Dandelion(S) 4578 BTETLS
# 3                   Beta Gamma-Globulin 2807 BTETLS
# 4                            Lactate, P <NA>   <NA>
# 5                         Phospholipid  8296 BTETLS

edit: you can also do it in one step with separate(Col1, paste0('Col', 1:3), '([^,] )|(\\.+)', extra = 'drop')

like image 152
shadowtalker Avatar answered Jun 10 '26 05:06

shadowtalker


Without the actual data, it is difficult to give a general solution. However, below is one using regular expressions.

Here I assumed that the first two columns are always separated by at least one ., possibly with spaces before or after; the second and third column are presumably separated by spaces.

dat <- read.table(text="B-Lipotropin(S)...............874 BTETLS
IgE-Dandelion(S).............4578 BTETLS
Beta Gamma-Globulin..........2807 BTETLS
Lactate, P
Phospholipid .........8296 BTETLS",
sep=";", stringsAsFactors=F)

# separate first column
l <- strsplit(dat[,1], split="[[:space:]]*\\.+[[:space:]]*")
l <- lapply(l, function(x) c(x,rep("",2-length(x))))
l <- do.call(rbind,l)

dat <- cbind(dat, l[,1])

# separate last two columns
l <- strsplit(l[,2], split="[[:space:]]+")
l <- lapply(l, function(x) c(x,rep("",2-length(x))))
l <- do.call(rbind,l)

dat <- cbind(dat, l)
colnames(dat) <- c("original","col1","col2","col3")

The separated columns look like this:

> dat[,-1]
                 col1 col2   col3
1     B-Lipotropin(S)  874 BTETLS
2    IgE-Dandelion(S) 4578 BTETLS
3 Beta Gamma-Globulin 2807 BTETLS
4          Lactate, P            
5        Phospholipid 8296 BTETLS
like image 29
SimonG Avatar answered Jun 10 '26 04:06

SimonG



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!