Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Pulling data from one column to create new columns

Tags:

dataframe

r

I have data with sample names that need to be unpacked and created into new columns.

sample
P10.1
P11.2
S1.1
S3.3

Using the sample ID data, I need to make three new columns: tissue, plant, stage.

sample tissue plant stage
P10.1  P      10    1
P11.2  P      11    2
S1.1   S      1     1
S3.3   S      3     3

Is there a way to pull the data from the sample column to populate the three new columns?

like image 266
Lauren Maynard Avatar asked Dec 24 '22 07:12

Lauren Maynard


2 Answers

using dplyr and tidyr.

First we insert a "." in the sample code, next we separate sample into 3 columns.

library(dplyr)
library(tidyr)

df %>% 
  mutate(sample = paste0(substring(df$sample, 1, 1), ".", substring(df$sample, 2))) %>% 
  separate(sample, into = c("tissue", "plant", "stage"), remove = FALSE)

  sample tissue plant stage
1 P.10.1      P    10     1
2 P.11.2      P    11     2
3  S.1.1      S     1     1
4  S.3.3      S     3     3

data:

df <- structure(list(sample = c("P10.1", "P11.2", "S1.1", "S3.3")), 
                .Names = "sample", 
                class = "data.frame", 
                row.names = c(NA, -4L))
like image 153
phiver Avatar answered Jan 05 '23 18:01

phiver


Similar to @phiver, but uses regular expressions.

Within pattern:

  • The first parentheses captures any single uppercase letter (for tissue)
  • The second parentheses captures any one or two digit number (for plant)
  • The third parentheses captures any one or two digit number (for stage)

The sub() function pulls out those capturing groups, and places then in new variables.

library(magrittr)
pattern <- "^([A-Z])(\\d{1,2})\\.(\\d{1,2})$"
df %>% 
  dplyr::mutate(
    tissue   = sub(pattern, "\\1", sample),
    plant    = as.integer(sub(pattern, "\\2", sample)),
    stage    = as.integer(sub(pattern, "\\3", sample))
  )

Result (displayed with str()):

'data.frame':   4 obs. of  4 variables:
 $ sample: chr  "P10.1" "P11.2" "S1.1" "S3.3"
 $ tissue: chr  "P" "P" "S" "S"
 $ plant : int  10 11 1 3
 $ stage : int  1 2 1 3
like image 32
wibeasley Avatar answered Jan 05 '23 16:01

wibeasley